Recordings

If you are using user-generated or adapted models, Simon builds its acoustic model based on transcribed samples of the users voice. Because of this, the recorded samples are of vital importance for the recognition performance.

Volume

It is important that you check your microphone volume before recording any samples.

Simon Calibration

The current version of Simon includes a simple way of ensuring that your volume is configured correctly.

By default the volume calibration is displayed before starting any recording in Simon.

To calibrate simply read the text displayed.

The calibration will monitor the current volume and tell you to either raise or lower the volume but you have to do that manually in your systems audio mixer.

During calibration, try to talk normally. Don't yell but don't be overly quiet either. Take into account that you should generally use the same volume setting for all your training and for the recognition too. You might speak a little bit louder (unconsciously) when you are upset or at another time of the day so try to raise your voice a little bit to anticipate this. It is much better to have a little quieter samples than to start clipping.

In the Simon settings, both the text displayed and the levels considered correct can be changed. If you leave the text empty, the default text will be displayed. In the options you can also deactivate the calibration completely. See the training section for more details.

Audacity Calibration

Alternatively you can use an audio editing tool like the free Audacity to monitor the recording volume.

Too quiet:

Too loud:

Perfect volume:

Silence

To help Simon with the automatic segmentation it is recommended to leave about one or two seconds of silence on the recording before and after reading the prompted text.

Current Simon versions include a graphical notice on when to speak during recording. The message will tell the user to wait for about half a second:

... before telling the user to speak:

This method of visual feedback proved especially valuable when recording with people who cannot read the prompted text for themselves and therefore need someone to tell them what they have to say. The colorful visual cue tells them when to start repeating what the facilitator said without the need of unreliable hand gestures.

Content

Generally we recommend to record roughly the same sentences that Simon should recognize later.

(Obviously that does not apply to massive sample acquisitions where other properties like phonetic balance are more important)

Care should be taken to avoid recordings like One One One to quickly ramp up the recognition rate property. Such recordings often decrease recognition performance because the pronunciation differs greatly from saying the word in isolation.

Microphone

For Simon to work well, a high quality microphone is recommended.

However, even relatively cheap headsets (around 30 Euros) achieve very good results - magnitudes better than internal microphones.

For maximum compatibility we recommend USB headsets as they usually support the necessary samplerate of 16 kHz, are very well supported from both Microsoft Windows as well as GNU/Linux and normally don't require special, proprietary drivers to operate.

Sample Quality Assurance

Simon will check each recording against certain criteria to ensure that the recorded samples are not erroneous or of poor quality.

If Simon detects a problematic sample, it will warn the user to re-record the sample.

Currently, Simon checks the following criteria:

  • Sample peak volume

    If the volume is too loud and the microphone started to clip (Clipping on wikipedia), Simon will display a warning message urging the user to lower the microphone volume.

  • Signal to noise ratio (SNR)

    Simon will automatically determine the signal to noise ratio of each recording. If the ratio is below a configurable threshold, a warning message will be displayed.

    The default value of 2300 % means that for Simon to accept a sample as correctly recorded the peak volume has to be 23 times louder than the noise baseline (lowest average over 50 ms).

    Often this can be a result of either a very low quality microphone, high levels of ambient noise or a low microphone gain coupled with a microphone boost option in the system mixer.

SNR warning message triggered by an empty sample. This information dialog is displayed when clicking on the More information button on the recording widget.