Getting Started
This page provides an overview of the features and functionalities in AutoTranscribe. After AutoTranscribe is integrated into your applications, you can use all of the configured features.Transcription Outputs
AutoTranscribe returns transcriptions as a sequence of utterances with start and end timestamps in response to an audio stream from a single speaker. As the agent and customer speak, ASAPP’s automated speech recognition (ASR) model transcribes their audio streams and returns completed utterances based on the natural pauses from each speaker. The expected latency between when ASAPP receives audio for a completed utterance and provides a transcription of that same utterance is 200-600ms.Perceived latency will also be influenced by any network delay sending audio to ASAPP and receiving transcription messages in return.

Redaction
AutoTranscribe can immediately redact audio for sensitive information, returning utterances with sensitive information denoted in hashmarks. ASAPP applies default redaction policies to prevent exposure of sensitive combinations of numerical digits. To configure redaction rules for your implementation, consult your ASAPP account contact. Visit the Data Redaction section to learn more.
Customization
Transcriptions
ASAPP customizes transcription models for each implementation of AutoTranscribe to ensure domain-specific context and terminology is well incorporated prior to launch. Consult your ASAPP account contact if the required historical call audio files are not available ahead of implementing AutoTranscribe.Option | Description | Requirements |
Baseline | ASAPP’s general-purpose transcription capability, trained with no audio from relevant historical calls | none |
Customized | A custom-trained transcription model to incorporate domain-specific terminology likely to be encountered during implementation | For English custom models, a minimum 100 hours of representative historical call audio between customers and agents For Spanish custom models, a minimum of 200 hours. |
When supplying recorded audio to ASAPP for AutoTranscribe model training prior to implementation, send uncompressed
.WAV
media files with speaker-separated channels.Recordings for training and real-time streams should have both the same sample rate (8000 samples/sec) and audio encoding (16-bit PCM).Vocabulary
In addition to training on historical transcripts, AutoTranscribe accepts explicitly defined custom vocabulary for terms that are specific to your implementation. AutoTranscribe also boosts detection for these terms by accepting what the term may ordinarily sound like, so that it can be recognized and outputted with the correct spelling. Common examples of custom vocabulary include:- Branded products, services and offers
- Commonly used acronyms or abbreviations
- Important corporate addresses
Session-specific custom vocabulary is only available for AutoTranscribe implementations via WebSocket API.For Media Gateway implementations, transcription models can also be trained with custom vocabulary through an alternative mechanism. Reach out to your ASAPP account team for more information.
Use Cases
For Live Agent Assistance
Challenge Organizations are exploring technologies to assist agents in real-time by surfacing customer-specific offers, troubleshooting process flows, topical knowledge articles, relevant customer profile attributes and more. Agents have access to most (if not all) of this content already, but a great assistive technology makes content actionable by finding the right time to bring the right item to the forefront. To do this well, these technologies need to know both what’s been said and what is being said in the moment with very low latency. Many of these technologies face agent adoption and click-through challenges for two reported reasons:- Recommended content often doesn’t fit the conversation, which may mean the underlying transcription isn’t an accurate representation of the real conversation
- Recommended content doesn’t arrive soon enough for them to use it, which may mean the latency between the audio and outputted transcript is too high
Read more here about why small increases in transcription accuracy matter.
For Insights and Compliance
Challenge For many organizations, lack of accuracy and coverage of speech-to-text technologies prevent them from effectively employing transcripts for insights, quality management and compliance use cases. Transcripts that fall short of accurately representing conversations compromise the usability of insights and leave too much room for ambiguity for quality managers and compliance teams. Transcription technologies that aren’t accurate enough for many use cases also tend to be employed only for a minority share of total call volume because the outputs aren’t useful enough to pay for full coverage. As a result, quality and compliance teams must rely on audio recordings since most calls don’t get transcribed. Using AutoTranscribe AutoTranscribe is specifically designed to maximize domain-specific accuracy for call center conversations. It is trained on past conversations before being deployed and continues to improve early in the implementation as it encounters conversations at scale. For non real-time use cases, AutoTranscribe also supports processing batches of call audio at an interval that suits the use case. Teams can query AutoTranscribe outputs in time-stamped utterance tables for data science and targeted compliance use cases or load customer and agent utterances into quality management systems for managers to review in messaging-style user interfaces.AI Services That Enhance AutoTranscribe
