Commands

When Simon is active and recognizes something, the recognition result is given to the loaded command plug-ins (in order) for processing.



The command system can be compared with a group of factory workers. Each one of them knows how to perform one task (e.g. Karl knows how to start a program and Joe knows how to open a folder, etc.). Whenever Simon recognizes something it is given to Karl who then checks if this instruction is meant for him. If he doesn't know what to do with it, it is handed over to Joe and so on. If none of the loaded plugins know how to process the input it is ignored. The order in which the recognition result is given to the individual commands (people) is configurable in the command options (Commands > Manage plugins).



Each plugin can be associated with a trigger. Using triggers, the responsibility of each plugin can be easily be divided.

Using the factory workers abstraction from above it could be compared to stating the name of who you mean to process your request. So instead of Open my home folder you say Joe, open my home folder and Joe (the plugin responsible for opening folders) will instantly know that the request is meant for him.

In practice you could have commands like the executable command Firefox to open the popular browser and the place command Google to open the web search engine. If you assign the trigger Start to the executable plugin and the trigger Open to the place command you would have to say Start Firefox (instead of just Firefox if you don't use a trigger for the executable plugin) and Open Google to open the search engine (instead of just Google).

Triggers are of course no requirement and you can easily use Simon without defining any plugin triggers (although many plugins come with a default trigger of Computer set which you would have to remove). But even if you use just one trigger for all your commands (like Computer to say Computer, Firefox and Computer, Google like) it has the advantage of greatly limiting the number of false-positives.

Simon's command dialog displays the complete phrase associated with a command in the upper right corner of the command configuration.

You can load multiple instances of one plugin even in one scenario. Each instance can of course also have a different plugin trigger.

Each Command has a name (which will trigger its invocation), an icon and more fields depending on the type of the plugin (see below).

Some command plugins might provide a configuration of the plugin itself (not the commands it contains). These configuration pages will be plugged directly into the action configuration dialog (below the General menu item) when you load the associated plugin.

Plugins that provide a graphical user interface (like for example the input number command plugin) can be configured by configuring Voice commands. You can, for example, change the associated word that will trigger the button, but also change the displayed icon, etc. If you remove all voice interface commands from a graphical element, the element will be hidden automatically.

Voice interface commands are added just like normal commands through the command configuration.



To add a new interface command to a function, just select the action you want to associate with a command, click Create from Action template and adapt the resulting command to your needs.

Some plugins (for example the desktop grid or the calculator) might also provide a menu item in the Actions menu.



Scenarios can optionally define one command that will immediately be run when the scenario is initialized. If you require more than one command to run automatically, consider the use of a composite command.



Command triggers can contain placeholders in the form of "%<index>", referring to any one word, or "%%<index>" describing one or more left out words. For example the recognition result "Next window" will be matched by the triggers "Next %1", "Next %%1" and "%%1" but not by the triggers "%1", "Next window %1", "%%1 Next window".

Executable Commands

Executable commands are associated with an executable file (Program) which is started when the command is invoked.



Arguments to the commands are supported. If either path to the executable or the parameters contain spaces they must be wrapped in quotes.

Given the executable file C:\Program Files\Mozilla Firefox\firefox.exe the local html file C:\test file.html the correct line for the Executable would be: "C:\Program Files\Mozilla Firefox\firefox.exe" "C:\test file.html".

The working folder defines where the process should be launched from. Given the working folder C:\folder, the command "C:\Program Files\Mozilla Firefox\firefox.exe" file.html would cause Firefox to search for the file C:\folder\file.html.

The working folder usually does not need to be set and can be left blank most of the time.

Importing Programs

For even easier configuration Simon provides an import dialog which allows you to select programs directly from the KDE menu.

Note

This option is not available on Microsoft Windows.



The dialog will list all programs that have an entry in your KDE menu in their respective category.

Sub-Categories are not supported and are thus listed on the same level as top-level categories.

Just select the program you wish to start with Simon and press Ok. The correct values for the executable and the working folder as well as an appropriate command name and description will automatically be filled out for you.

Place Commands

With place commands you can allow Simon to open any given URL. Because Simon just hands the address over to the platforms URL handler, special Protocols like remote:/ (on Linux®/KDE) or even KDE's Web-Shortcuts are supported.

Instead of folders, files can also be set as the commands URL which will cause the file to be opened with the application which is associated with it when the command is invoked.



To associate a specific URL with the command you can manually enter it in the URL field (select Manual first) or import it with the import place wizard.

Importing Places

The import place dialog allows you to easily create the correct URL for the command.

To add a local folder, select Local Place and choose the folder or file with the file selector.



To add a remote URL (HTTP, FTP, etc.) choose Remote URL.



Please note that for URLs with authentication information the password will be stored in clear text.

Shortcut Commands

Using shortcut commands the user can associate commands with key-combinations.

The command will simulate keyboard input to trigger shortcuts like Ctrl+C or Alt+F4.

The plugin can press, release or press and release the configured key combination.



To select the shortcut you wish to simulate just toggle the shortcut button and press the key combination on your keyboard.

Simon will capture the shortcut and associate it with the command.

Due to technical limitations there are several shortcuts on Microsoft Windows that cannot be captured by Simon (this includes e.g. Ctrl+Alt+Del and Alt+F4). These special shortcuts can be selected from a list below the aforementioned shortcut button.

Note

This selection box is not visible in the screenshot above as the list is only displayed in the Microsoft Windows version of Simon.

Text-Macro Commands

Using text-macro commands, the user can associate text with a command. When the command is invoked, the associated text will be written by simulating keystrokes.



List Commands

The list command is designed to combine multiple commands (all types of commands are supported) into one list. The user can then select the n-th entry by saying the associated number (1-9).

This is very useful to limit the amount of training required and provides the possibility to keep the vocabulary to a minimum.



List commands are especially useful when using commands with difficult triggers or commands that can be grouped under a general theme. A typical example would be a command Startmenu to present a list of programs to launch. That way the specific executable commands can still retain very descriptive names (like OpenOffice.org Writer 3.1) without the user having to include these words in his vocabulary and consider them in the grammar just to trigger them.

Commands of different types can of course be mixed.

List Command Display

When invoked, the command will display the list centered on the screen. The list will automatically expand to accompany its items.



The user can invoke the commands contained in the list by simply saying their associated number (In this example: One to launch Mozilla Firefox).

While a list command is active (displayed), all input that is not directed at the list itself (other commands, etc.) will be rejected. The process can be canceled by pressing the Cancel button or by saying Cancel.

If there are more than 9 items Simon will add Next and Back options to the list (Zero will be associated with Back and Nine with Next).



Configuring list elements

By default the list command uses the following trigger words. To use list commands to their full potential, make sure that your language and acoustic model contains and allows for the following sentences:

  • Zero

  • One

  • Two

  • Three

  • Four

  • Five

  • Six

  • Seven

  • Eight

  • Nine

  • Cancel

Of course you can also configure these words in your Simon configuration:

  • Commands > Manage plugins > General > Lists for the scenario wide list configuration.

  • Settings > Configure Simon... > Actions > Lists for the global configuration. When creating a new scenario, the scenario configuration will be initialized with a copy of this list configuration.

List commands are internally also used by other plugins like for example the desktop grid. The configuration of the triggers also affects their displayed lists.

Composite Commands

Composite commands allow the user to group multiple commands into a sequence.

When invoked the commands will be executed in order. Delays between commands can be inserted.

Composite commands can also work as "transparent wrappers" by selecting Pass recognition result through to other commands. In that case, the recognition result will be treated as "unprocessed" even if the composite command was executed.

For example, suppose you have a command to turn on the light in one scenario. Additionally to turning on the light, you now want to add some kind of reporting to the activity by invoking a script through a program plugin. You could then set up a reporting scenario that contains a transparent composite command with the same trigger as the command to turn on the light and make sure that this scenario is set before the original one in the scenario list. You can then activate and deactivate the reporting simply by loading and unloading this scenario.



Using the composite command the user can compose complex macros. The screenshot above - for example - does the following:

  • Start Kopete (Executable Command)

  • Wait 2000ms for Kopete do be started

  • Type Mathias (Text-Macro Command) which will select Mathias in my contact list

  • Press Enter (Shortcut Command)

  • Wait 1000ms for the chat window to appear

  • Write Hi! (Text-Macro Command); the text associated to this command contains a newline at the end so that the message will be send.

  • Press Alt+F4 (Shortcut Command) to close the chat window

  • Press Alt+F4 (Shortcut Command) to close the kopete main window

Desktop grid

The desktop grid allows the user to control his mouse with his voice.



The desktop grid divides the screen into nine parts which are numbered from 1-9. Saying one of these numbers will again divide the selected field into 9 fields again numbered from 1-9, etc. This is repeated 3 times. After the fourth time the desktop grid will be closed and Simon will click in the middle of the selected area.

The exact click action is configurable but defaults to asking the user. Therefore you will be presented with a list of possible click modes. When selecting Drag and Drop, the desktop grid will be displayed again to select the drop point.



While the desktop grid is active (displayed), all input that is not directed at the desktop grid itself (other commands, etc.) will be rejected. Say Cancel at any time to abort the process.

The desktop grid plugin registers a configuration screen right in the command configuration when it is loaded.



The trigger that invokes the desktop grid is of course completely configurable. Moreover the user can use real or fake transparency. If your graphical environment allows for compositing effects (desktop effects) then you can safely use real transparency which will make the desktop grid transparent. If your platform does not support compositing Simon will simulate transparency by taking a screenshot of the screen before displaying the desktop grid and display that picture behind the desktop grid.

If the desktop grid is configured to use real transparency and the system does not support compositing it will display a solid gray background.

However, nearly all up-to-date systems will support compositing (real transparency).

This includes:

  • Microsoft Windows 2000 or higher (XP, Vista, 7)

  • GNU/Linux using a composite manager like Compiz, KWin4, xcompmgr, etc.

By default the desktop grid uses numbers to select the individual fields. To use the desktop grid, make sure that your language and acoustic model contains and allows for the following sentences:

  • One

  • Two

  • Three

  • Four

  • Five

  • Six

  • Seven

  • Eight

  • Nine

  • Cancel

To configure these triggers, just configure the commands associated with the plugin.



Input Number

Using the input-number plugin the user can input large numbers easily.

Using the Dictation or the Text-Macro plugin one could associate the numbers with their digits and use that as input method. However, to input larger numbers there are two ways that both have significant disadvantages:

  • Adding the words eleven, twelve, etc.

    While this seems like the most elegant solution as it would enable the user to say fivehundredseventytwo we can easily see that it would be quite a problem to add all these words - let alone train them. What about twothousandninehundredtwo? Where to stop?

  • Spell out the number using the individual digits

    While this is not as elegant as stating the complete number it is much more practical.

    However, many applications (like the great mouseless browsing firefox addon) rely on the user to input large numbers without too much time passing between the individual keystrokes (mouseless browsing for example will wait exactly 500ms per default before it considers the input of the number complete). So if you want to enter 52 you would first say Five (pause) Two. Because of the needed pause, the application (like the mouseless browsing plugin) would consider the input of Five complete.

The input number plugin - when triggered - presents a calculator-like interface for inputting a number. The input can be corrected by saying Back. It features a decimal point accessible by saying Comma. When saying Ok the number will be typed out. As all the voice-input and the correction is handled by the plugin itself the application that finally receive the input will only get couple of milliseconds between the individual digits.



While the input number plugin is active (the user currently inputs a number), all input that is not directed at the input number plugin (other commands, etc.) will be rejected. Say Cancel at any time to abort the process.

As there can no command instances be created of this plugin it is not listed in the New Command dialog. However, the input number plugin registers a configuration screen right in the command configuration when it is loaded.



The trigger defines what word or phrase that will trigger the display of the interface.

By default the input number plugin uses numbers to select the individual digits and a couple of control words. To use the input number plugin, make sure that your language and acoustic model contains and allows for the following sentences:

  • Zero

  • One

  • Two

  • Three

  • Four

  • Five

  • Six

  • Seven

  • Eight

  • Nine

  • Back

  • Comma

  • Ok

  • Cancel

To configure these triggers, just configure the commands associated with the plugin.



Dictation

The dictation plugin writes the recognition result it gets using simulated keystrokes.

Assuming you didn't define a trigger for the dictation plugin it will accept all recognition results and just write them out. The written input will be considered as processed input and thus not be relayed to other plugins. This means that if you loaded the dictation plugin and defined no trigger for it, all plugins below it in the Selected Plug-Ins list in the command configuration will never receive any input.

As there can no command instances be created of this plugin it is not listed in the New Command dialog.

The dictation plugin can be configured to append texts after recognition results to for example add a space after each recognized word.



Artificial Intelligence

The Artificial Intelligence is a just-for-fun plugin that emulates a human conversation.

Using the text to speech system, the computer can talk with the user.

The plugin uses AIMLs for the actual intelligence. Most AIML sets should be supported. The popular A. L. I. C. E. bot and a German version work and are shipped with the plugin.



The plugin registers a configuration screen in the command configuration menu where you can choose which AIML set to load.

Simon will look for AIML sets in the following folder:

  • GNU/Linux: `kde4-config --prefix`/share/apps/ai/aimls/

  • Microsoft Windows: [installation folder (C:\Program Files\simon 0.2\ by default)]\share\apps\ai\aimls\

To add a new set just create a new folder with a descriptive name and copy the .aiml files into it.

To adjust your bots personality have a look at the bot.xml and vars.xml files in the following folder:

  • GNU/Linux: `kde4-config --prefix`/share/apps/ai/util/

  • Microsoft Windows: [installation folder (C:\Program Files\simon 0.2\ by default)]\share\apps\ai\util\

As there can no command instances be created of this plugin it is not listed in the New Command dialog.

It is recommended to not use any trigger for this plugin to provide a more natural feel for the conversation.

Calculator

The calculator plugin is a simple, voice controlled calculator.



The calculator extends the Input Number plugin by providing additional features.

When loading the plugin, a configuration screen is added to the plugin configuration.



There you can also configure the control mode of the calculator. Setting the mode to something else than Full calculator will hide options from the displayed widget.



However, the hidden controls will, in contrast to simply removing all associated command from the functions, still react to the configured voice commands.

When selecting Ok, the calculator will by default ask you what to do with the generated result. You can for example output the calculation, the result, both, etc. Besides always selecting this from the displayed list after selecting the Ok button, this can also be set in the configuration options.



Filter

Using the filter plugin, you can intercept recognition results from being passed on to further command plugins. Using this plugin you can for example disable the recognition by voice.

The filter command plugin registers a configuration screen in the command configuration where you can change what results should be filtered.



The pattern is a regular expression that will be evaluated each time a recognition results receives the plugin for processing.

The plugin also registers voice interface commands for activating and deactivating the filter.

In total, the filter therefore has three states:

  • Inactive

    The default state. All recognition results will be passed through.

  • Half-active (if Two stage activation is selected)

    • If the next command is the "Deactivate filter" command, the filter will enter the "Inactive" state.

    • If, however, the next result is something else and Relay results in stage one of two stage activation is selected, this result will be passed on to other plugins. The filter will reset to "Active" afterwards.

  • Active

    When activated, the filter will eat all results that match the configured pattern. By default this means every result that Simon recognizes will be accepted by the filter and therefore not relayed to any of the plugins following the filter plugin.

    If Two stage activation is enabled and the filter plugin receives the command to directly enter the "Inactive" state, this command is ignored. In other ways: If two stage activation is enabled, the filter can only be disabled by going through the intermediate stage.

Pronunciation Training

The pronunciation training, when combined with a good static base model, can be a powerful tool to improve your pronunciation of a new language.



Essentially, the plugin will prompt you to say specific words. The recognition will then recognize your pronunciation of the word and compare it to your speech model which should be a base model of native speakers for this to work correctly. Then Simon will display the recognition rate (how similar your version was to the stored base model).

The closer to the native speaker, the higher the score.

The plugin adds an entry to your Commands menu to launch the pronunciation training dialog.

The training itself consists of multiple pages. Each page contains one word fetched from your active vocabulary. They are identified by a category which needs to be selected in the command configuration before starting the training.



Keyboard

The keyboard plugin displays a virtual, voice controlled keyboard.



The keyboard consists of multiple tabs, each possibly containing many keys. The entirety of tabs and keys are collected in sets.

You can select sets in the configuration but also create new ones from scratch in the keyboard command configuration.



Keys are usually mapped to single characters but can also hold long texts and even shortcuts. Because of this, keyboard sets can contain special keys like a select all key or a Password key (typing your password).

Next to the tabs that hold the keys of your set, the keyboard may also show special keys like Ctrl, Shift, etc. Those keys are provided as voice interface commands and are displayed regardless of what tab of the set is currently active.

As with all voice triggers, removing the associated command, hides the buttons as well.

Moreover, the keyboard provides a numpad that can be shown by selecting the appropriate option in the keyboard configuration.



Next to the number keys and the delete key for the number input field (Number backspace), the numpad provides two options on what to do with the entered number.

When selecting Write number, the entered number will be written out using simulated key presses. Selecting Select number tries to find a key or tab in the currently active set that has this number as a trigger. This way you can control a complete keyboard just using numbers.



The keys on the num pad are configurable voice interface commands.

Dialog

The dialog plugin enables users to engage in a scripted dialog with Simon.

Dialog design

Simon treats dialogs as a succession of different states. Each state can have a text and several associated options.



Dialogs can have more than one text variants - one of which will be randomly picked when the dialog is displayed. This can help to make dialogs feel more natural by providing several, alternative formulations.

The texts can use bound values and template options.



Dialog options capsule the logic of the conversation. They are the active components of the dialog.



Similar to commands, dialog options have a name (trigger) that, when recognized while the dialog is active and in the option's parent state, will cause this option to activate. Alternatively, options can also be configured to trigger automatically after a set time period. This time is relative to when the state is entered.

Dialog options, when shown through the graphical output module can show an arbitrary text (that will most likely be equivalent to the trigger but doesn't have to be) and, optionally, an icon. If the text-to-speech output module is used, the text (not the trigger) will be read aloud unless this is disabled by selecting the Silent option.

Every state can also optionally have an avatar that will be displayed when using the graphical output module.



Dialog: Bound values

The text of dialog states can contain variables - so called "bound values" - that will be filled in during runtime.

For example, the dialog text "This is a $variable$" would replace "$variable$" with the result of a bound value called "variable".



There are four types of bound values:

  • Static



    Static bound values will always be resolved to the same text. They are useful to provide configuration options to be filled in to personalize the dialog (e.g., the name of the user).

  • QtScript



    QtScript bound values resolve to the result of the entered QtScript code.

  • Command arguments



    If the dialog trigger command (the Simon command that initiates the dialog) uses placeholders, they can be accessed through command argument bound values. The Argument number refers to the index of the placeholder you want to access.

    For example, if your dialog is started with the command "Call %1", and "name" is a command argument bound value, then launching the dialog by recognizing "Call Peter", will turn the dialog text "Are you sure you want to call $name$?" into "Are you sure you want to call Peter?".

  • Plasma data engine



    This type of bound value can readily access a wide array of high-level information through plasma data engines.

Template options

Dialog texts can further be parametrized through template options.



These boolean values choose between different or optional text snippets.

For example, the template option "formal" above, would change the dialog text "Would you please {{{formal}}be quiet{{elseformal}}shut up{{endformal}}" to "Would you please be quit" or "Would you please shut up" depending on if the template option is set to true or false. The else-path can be omitted if it is not required (e.g. "Would you {{formal}}please {{endformal}}be quiet").

Avatars

Every state can potentially show a different avatar.

These images can range from the picture of a (simulated) speaker to an image of something topically appropriate.



To use an avatar, first add it here and later define where to use it in the dialog design section.

Output

Dialogs can be displayed graphically, use text-to-speech or combine both approaches.



The Separator to options will be spoken between the dialog text and the current state's options (if there are any). If there are no options to this state or all are configured to be silent, this will not be said. The option to listen to the whole announcement again is triggered when saying one of the configured Repeat on trigger. Additionally, the text-to-speech output can optionally be configured to repeat the listing of the available options (including the configured separator) when the user says a command that does not match any of the available dialog options.

Akonadi

The Akonadi plugin allows Simon to plug into KDE's PIM infrastructure.



The plugin fulfills two major purposes:

  • Execute Simon commands at scheduled times

    The Akonadi plugin can monitor a specific collection (calendar) and react on entries whose summary start with a specific prefix. Per default, this prefix is "[simon-command]", meaning that events of the form "[simon-command] <plugin name>//<command name>" will trigger the appropriate Simon command at the "start time" of the event.

    The name of the plugins and commands are equivalent to the ones shown in the command dialog and do not necessarily need to reference commands in the same scenario as the Akonadi plugin instance.

  • Show reminders for events in the given calendar

    If configured to do so, the Akonadi plugin can show reminders for calendar events with a set alarm flag. These reminders will be shown through the Simon dialog engine.

D-Bus

With the D-Bus command, Simon can call exported methods in 3rd party applications directly.

The screenshot below, for example, calls the "Pause" method of the MPRIS interface of the Tomahawk music playing software.



JSON

Similar to the D-Bus command plugin, the JSON plugin also allows to contact 3rd party applications to directly invoke functionality (instead of simulating user activity).