VoiceXML News, Information and Tutorials

  • Increase font size
  • Default font size
  • Decrease font size

VoiceXML 3.0

E-mail Print PDF

The last working draft update from W3C on VoiceXML 3.0 was 4 December 2009.

The changes and updates have been added to the end of this article...


VoiceXML 3.0 is still in the First Public Working Draft cycle.  There are various markup languages and technologies which can be incorporated to leverage the anticipated flexibility and scalability VoiceMXL 3.0 can present.  This article looks at possible ways the new technology might be incorporated and implemented.



It is very much a work in progress, as stated by W3C:

"This document is very much a work in progress. Many sections are incomplete, only stubbed out, or missing entirely. To get early feedback, the group focused on defining enough functionality, modules, and profiles to demonstrate the general framework. To complete the specification, the group expects to introduce additional functionality (for example speaker identification and verification, external eventing) and describe the existing functionality at the level of detail given for the Prompt and Field modules. We explicitly request feedback on the framework, particularly any concerns about its implementability or suitability for expected applications. By early 2010 the group expects to have all functionality defined and all the profiles defined in detail."

However, it is very helpful in general to start looking at VoiceXML 3.0 and what the intentions are of the working  group, and hence glean an insight in the direction the voice browsers will go.


For many of the recommendations envisioned by the W3C as part of VoiceXML 3.0, there are currently many custom elements in use.  These custom elements can be replaced by vendors if the element/function exists in VoiceXML 3.0, or it can be kept in place by the vendor.

A big part of the driving force behind VoiceXML 3.0 was to seperate the Presentation View, the Data Model and the Flow Controller.

As per W3C:

"The Data Flow Presentation (DFP) Framework is an instance of the Model-View-Controller paradigm, where computation and control flow are kept distinct from application data and from the way in which the application communicates with the outside world. This partitioning of an application allows for any one layer to be replaced independently of the other two. In addition, it is possible to simultaneously make use of more than one Data (Model) language, Flow (Controller), and/or Presentation (View) language."


The basic DFP Framework


Components which may be included in a VoiceXML 3.0 implementation are:


  • State Chart XML (SCXML): State Machine Notation for Control Abstraction (SCXML 1.0)
  • Voice Browser Call Control: CCXML Version 1.0 (CCXML 1.0)
  • Speech Recognition Grammar Specification Version 1.0 XML (for Data Interface)

This schematic representation is a possible implementation for VoiceXML 3.0, however there are a few points to take into consideration. The first being the complexity which wil be added to managing the application with all the technologies combined.  Also, the management of variables might pose to be a challenge with this implementation.  However, it servers as a example how the different technologies can be combined.


A schematic representation of a possible VoiceXML 3.0 Implementation

A few noteworthy points are:

  • Currently no provsion is made for built in Voice Biometrics in VoiceXML 3.0
  • CCXML will have to be enhanced to accommodate SCXML
  • Prompt modules will also be enhanced for SSML
  • VoiceXML 3.0 will be completely backwars compatible with VoiceXML 2.1
  • Asunchronis data interactions will be possible
  • Sunchronous data reception will have a wait time possibility
  • VoiceXML 3.0 will be a superset of VoiceXML 2.1
  • HTTP Full Support
  • HTTPS Optional Support
  • DFP Framework suport
  • Multiple parallel states are possible - for instance back-end data collection with background music with a promotional clip

Again, for more information please vissit:


Changes on 4 December 2009 to the working draft:


There are still very big holes in the the VXML 3.0 working draft.  The updates made on 4 December include the following elements...


Firstly, VXML 3.0 is updated to incorporate voice biometrics.  With three types of functions:

  • Speaker Verification
  • Speaker Identification
  • Speaker Enrollment



For biometrics, the match evaluation can be divided into three groupings:

  • Good Enough
  • More Data Needed
  • Abort



Last Updated on Tuesday, 05 January 2010 07:33