Extracting collected samples

To build models using the samples collected with SSCd you first have to extract them from the database.


Because SSCd is designed for large scale sample acquisition this is not end user friendly. The documentation below is mainly provided for technically skilled professionals.

Most of the scripts below require the GNU tools (usually available by default on GNU/Linux).

You can use the following query (minor adjustments will be necessary depending on what samples exactly you need):

use ssc;

select s.Path, s.Prompt
  from Sample s inner join User u
    on s.UserId = u.UserId inner join UserInInstitution uii
      on u.UserId = uii.UserId inner join SampleType st
        on s.TypeId = st.SampleTypeId inner join Microphone m
          on m.MicrophoneId = s.MicrophoneId
  WHERE st.ExactlyRepeated=1 and uii.InstitutionId = 3
    and (m.MicrophoneId = 1);

This query will list all samples from institution 3 that were recorded with microphone 1.

You can then for example use this script to create a prompts file:

      sed '1d' $1 > temp_out
      sed -e 's/\\\\/\//g' -e 's/.*Samples\///g' -e 's/\.wav\t/ /' temp_out > $1
      rm temp_out

This prompts file can then be imported in Simon.

To build the appropriate dictionary to compile the model you might also want to list all the sentences contained in the prompts file. You can do this with this script:

      cat $1 | sed -e 's/[0-9\/]* //' | sort | uniq