Speech recognition solutions discover the words in an audio signal. We often use speech recognition to replace or augment traditional DTMF-based IVR systems and knowing when to appropriately use each of them is vital for a good caller experience. Natural language processing (NLP) discovers meaning from the recognised audio utterances. This allows natural speech-enabled systems to be built.

Speech recognition (SR) is the translation of spoken words into text. It is also known as “automatic speech recognition” (ASR), “computer speech recognition”, or just “speech to text” (STT).

Some SR systems use “training” (also called “enrolment”) where an individual speaker reads text or isolated vocabulary into the system. The system analyzes the person’s specific voice and uses it to fine-tune the recognition of that person’s speech, resulting in increased accuracy. Systems that do not use training are called “speaker independent” systems. Systems that use training are called “speaker dependent”.

Speech recognition applications include voice user interfaces such as voice dialling (e.g. “Call home”), call routing (e.g. “I would like to make a collect call”), appliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input).

  • Speech recognition enabled IVRs and call centres.
  • Hot word detection.
  • Automatic transcription of large quantities of audio files.
  • Design of speech grammars.
  • Training of speech recognisers for under-served languages.