To build models using the samples collected with SSCd you first have to extract them from the database.
Because SSCd is designed for large scale sample acquisition this is not end user friendly. The documentation below is mainly provided for technically skilled professionals.
Most of the scripts below require the GNU tools (usually available by default on GNU/Linux).
You can use the following query (minor adjustments will be necessary depending on what samples exactly you need):
use ssc; select s.Path, s.Prompt from Sample s inner join User u on s.UserId = u.UserId inner join UserInInstitution uii on u.UserId = uii.UserId inner join SampleType st on s.TypeId = st.SampleTypeId inner join Microphone m on m.MicrophoneId = s.MicrophoneId WHERE st.ExactlyRepeated=1 and uii.InstitutionId = 3 and (m.MicrophoneId = 1);
This query will list all samples from institution 3 that were recorded with microphone 1.
You can then for example use this script to create a prompts file:
#!/bin/bash sed '1d' $1 > temp_out sed -e 's/\\\\/\//g' -e 's/.*Samples\///g' -e 's/\.wav\t/ /' temp_out > $1 rm temp_out
This prompts file can then be imported in Simon.
To build the appropriate dictionary to compile the model you might also want to list all the sentences contained in the prompts file. You can do this with this script:
#!/bin/bash cat $1 | sed -e 's/[0-9\/]* //' | sort | uniq