Manage training data

To view and modify your personal training corpus, you can access the training data management dialog by selecting Manage training data in the Simon main window or the training section of any opened scenario.

Modifying samples

To listen to or re-record a sample, select it from the list and select Open Sample.

In this dialog you can also modify the sample's group after it was recorded.

If you remove the opened sample and do not re-record it, Simon will offer to remove it from the corpus.

Clear training data

After a confirmation dialog, this will remove all personal training data of the user.

Importing Training Samples

Using the import training data field you can import previously gathered training samples from previous Simon versions or manual training.

Note

This feature is very specific. Please use it with caution and make sure that you know exactly what you are doing before you continue.

You can either provide a separate prompts file or let Simon extract the transcriptions from the filenames.

When using prompts based transcriptions your prompts file (UTF-8) needs to contain lines of the following content: [filename] [content]. Filenames are without file extensions and the content has to be uppercase. For example: demo_2007_03_20 DEMO to import the file demo_2007_03_20.wav containing the spoken word Demo.

Because prompts files do not contain a file extension, Simon will try wav, mp3, ogg and flac (in that order). If one of those match, no other extension will be tested and only the first file will be imported (in contrast to file based transcription where all files would be imported).

When using file based transcriptions, a file called this_is_a_test.wav must contain This is a test and nothing else. Numbers and special characters (., -,...) in the filename are ignored and stripped.

Files recorded by Simon 0.2 will follow this naming scheme so you can safely import them using the file name extraction method. Files generated by previous Simon versions should not be imported using this function but you can use the prompts based import for that.

Imported files and their transcription are then added to the training corpus.

To import a folder containing training samples just select the folder to import and depending on your import type also the prompts file.

The folder will be scanned recursively. This means that the given folder and all its subfolders will be searched for .wav, .flac, .mp3 and .ogg files. All files found will be imported.

When importing the sound files, all configured post processing filters are applied.

If you import anything other than WAV files you are responsible for decoding them during the import process (for example through post processing filters) or the model creation will fail.