LATEST VERSION 2.18 (08.03.2016)


URBI module for speech synthesies based on Microsoft Speech SDK. The Microsoft Speech SDK consists of a Software Development Kit (SDK), a Runtime, and Runtime Languages (language packs that enable speech recognition or text-to-speech conversion for a specific language).

It is based on a COM interface. The Speech Platform includes the Microsoft.Speech.VoiceXml namespace to support authoring speech applications using industry-standard VoiceXML markup language. The Speech Platform Runtime includes a VoiceXML runtime.

This Module provides viseme synchronisation. You can use viseme events to generate mouth animations. Viseme eventing is provided by MS SDK.

More about Microsoft speech synthesis:

Microsoft Speech Platform LINK

Microsoft Speech Platform SDK LINK

Microsoft Speech Platform Runtime LINK

Microsoft Speech Platform Languages LINK

More about Microsoft Speech Platform Grammars LINK

Differences between MSP and SAPI LINK

XML Tags

SAPI text-to-speech (TTS) extensible markup language (XML) tags fall into several categories.

  • Voice state control
  • Direct item insertion
  • Voice context control
  • Voice selection
  • Custom Pronunciation


The Volume tag controls the volume of a voice. The Volume tag has one required attribute: Level. The value of this attribute should be an integer between zero and one hundred. 

<volume level="50">
This text should be spoken at volume level fifty.
<volume level="100">
This text should be spoken at volume level one hundred.

You can find more examples here LINK.


SPVISEMES lists the SAPI Viseme set. This set is based on the Disney 13 Visemes. Examples given are for the SAPI English Phoneme set.

Viseme name and English examples

SP_VISEME_0, // silence
SP_VISEME_1, // ae, ax, ah
SP_VISEME_2, // aa
SP_VISEME_3, // ao
SP_VISEME_4, // ey, eh, uh
SP_VISEME_5, // er
SP_VISEME_6, // y, iy, ih, ix
SP_VISEME_7, // w, uw
SP_VISEME_8, // ow
SP_VISEME_9, // aw
SP_VISEME_10, // oy
SP_VISEME_11, // ay
SP_VISEME_12, // h
SP_VISEME_13, // r
SP_VISEME_14, // l
SP_VISEME_15, // s, z
SP_VISEME_16, // sh, ch, jh, zh
SP_VISEME_17, // th, dh
SP_VISEME_18, // f, v
SP_VISEME_19, // d, t, n
SP_VISEME_20, // k, g, ng
SP_VISEME_21 // p, b, m

visemes example

speech (2)

Module functions; - initialize TTS
USpeech.Speak("Hello world"); - start speech synthesis,
USpeech.AvailableVoices; - returns the list of all voices available in the system,
USpeech.voiceNo; - set available voice,
USpeech.volume; - set speech volume 0..100,
USpeech.rate; - set speech rate -10..10,
USpeech.pitch; - set speech pitch -10..10,
USpeech.visemeTrig - this flag is set if a new viseme occurs,
USpeech.visemeId - current viseme ID,
USpeech.nextVisemeId - next viseme ID,
USpeech.visemeTime - current viseme time execution [ms],
USpeech.isSpeaking - speaking flag, 1 during speech synthesis and 0 when finished.

Urbiscript example

speech.&visemeTrig.notifyChange( closure() {robot.setMouth(speech.visemeId,speech.visemeTime);});
// or just
speech.&visemeTrig.notifyChange( closure() {echo("viseme no. "+speech.visemeId+" exec. time "+speech.visemeTime);});
speech.Speak("Hello world");
speech.Speak("Hello. My name is robot, robot FLASH.");


URBI module LINK

Microsoft Speech Platform Runtime v11.0 LINK



EMYS and FLASH are Open Source and distributed according to the GPL v2.0 © Rev. 1.0, 04.04.2018

FLASH Documentation