UniMRCP Plugin for ASAPP
ASAPP offers a plugin for speech recognition for the UniMRCP Server (UMS).
Speech-related clients use Media Resource Control Protocol (MRCP) to control media service resources including:
- Text-to-Speech (TTS)
- Automatic Speech Recognizers (ASR)
To connect clients with speech processing servers and manage the sessions between them, MRCP relies on other protocols to work. Also, MRCP defines the messages to control the media service resources and it also defines the messages that provide the status of the media service resources.
Once established, the MRCP protocol exchange operates over the control session, allowing your organization to control the media processing resources on the speech resource server.
This plugin connects your IVR Platform into the AutoTranscribe Websocket. It is a fast solution for your organization to quickly integrate your IVR application into GenerativeAgent.
By using the ASAPP UniMRCP Plugin, the GenerativeAgent receives text transcripts from your IVR. This way, your organization takes voice media off your IVR and into the ASAPP Cloud.
Before you Begin
Before you start integrating to GenerativeAgent, you need to:
-
Get your API Key Id and Secret
For authentication, the UniMRCP server connects with AutoTranscribe using standard websocket authentication. The ASAPP UniMRCP Plugin does not handle authentication, but rather authentication is on your IVR’s side of the call. Your API credentials are used by the configuration document.For user identification or verification, you must handle it by the IVRs policies and flows.
-
Ensure your API key has been configured to access GenerativeAgent APIs and the AutoTranscribe WebSocket. Reach out to your ASAPP team if you are unsure about this.
-
Use ASAPPs ASR
Make sure your IVR application uses the ASAPP ASR so AutoTranscribe can receive it and send transcripts to GenerativeAgent.
-
Configure Tasks and Functions.
By using the Plugin, you still need to save customer info and messages. The GenerativeAgent can save that data by sending it into its Chat Core, but your organization can also save the messages either by calling the API or by saving the information from each event handler.
Your IVR application is in control of when to call /analyze so the GenerativeAgent analyzes the transcripts and replies. The recommended configuration is to call /analyze every time an utterance or transcript is returned.
Another approach is to call LLMBot when a complete thought/question is provided. Some organizations may find a good solution call /analyze and buffer up transcripts until the customer’s thought is complete.
Implementation steps:
Listen and Handle GenerativeAgent Events
Setup the UniMRCP ASAPP Plugin
Manage the Transcripts and send them to GenerativeAgent
Step 1: Listen and Handle GenerativeAgent Events
GenerativeAgent sends events during any conversation. All events for all conversations being evaluated by GenerativeAgent are sent through the single Server-Sent-Event (SSE) stream..
You need to listen and handle these events to enable GenerativeAgent to interact with your users.
Step 2: Setup the UniMRCP ASAPP Plugin
On your UniMRCP server, you need to install and configure the ASAPP UniMRCP Plugin.
Install the ASAPP UniMRCP Plugin
Go to ASAPP’s UniMCRP Plugin Public Documentation to install and see its usage
Use the Recommended Plugin Configuration
Fields & Parameters
After you install the UniMCRP ASAPP Plugin, you need to configure the request fields so the prompts are sent in the best way and GenerativeAgent gets the most information available.
Having the recommended configuration will ensure GenerativeAgent analyzes each prompt correctly.
Here are the details for the fields with the recommended configuration:
StartStream Request Fields
Field | Description | Default | Supported Values | |
---|---|---|---|---|
sender | role (required) | A participant role, usually the customer or an agent for human participants. | n/a | ”agent”, “customer” |
externalId (required) | Participant ID from the external system, it should be the same for all interactions of the same individual | n/a | ”BL2341334” | |
language | IETF language tag | en-US | ”en-US” | |
smartFormatting | Request for post processing: Inverse Text Normalization (convert spoken form to written form), e.g., ‘twenty two —> 22’. Auto punctuation and capitalization | true | true, false Recomended: true Interpreting transcripts will be more natural and predictable | |
detailedToken | Has no impact on UniMRCP | false | true, false Recommended: false IVR application does not utilize the word level details | |
audioRecordingAllowed | false: ASAPP will not record the audio true: ASAPP may record and store the audio for this conversation | false | true, false Recommended: true Allowing audio recording improves transcript accuracy over time | |
redactionOutput | If detailedToken is true along with value ‘redacted’ or ‘redacted_and_unredacted’, request will be rejected. If no redaction rules configured by the client for ‘redacted’ or ‘redacted_and_unredacted’, the request will be rejected. If smartFormatting is False, requests with value ‘redacted’ or ‘redacted_and_unredacted’ will be rejected. | redacted Recommended: unredacted | ”redacted”, “unredacted”,“redacted_and_unredacted” Recommended: unredacted IVR application works better with full information available |
Transcript Message Response Fields
All Responses go to the MRCP Server, so the only visible return is a VXML return of the field.
Field | Description | Format | Example Syntax | |
---|---|---|---|---|
utterance | text | The written text of the utterance. While an utterance can have multiple alternatives (e.g., ‘me two’ vs. ‘me too’) ASAPP provides only the most probable alternative only, based on model prediction confidence. | array | ”Hi, my ID is 123.” |
If the detailedToken
in startStream
request is set to true, additional fields are provided within the utterance
array for each token
:
Field | Subfield | Description | Format | Example Syntax |
---|---|---|---|---|
token | content | Text or punctuation | string | ”is”, ”?“ |
start | Start time (millisecond) of the token relative to the start of the audio input | integer | 170 | |
end | End time (millisecond) audio boundary of the token relative to the start of the audio input, there may be silence after that, so it does not necessarily match with the startMs of the next token. | integer | 200 | |
punctuationAfter | Optional, punctuation attached after the content | string | ’.‘ | |
punctuationBefore | Optional, punctuation attached in front of the content | string | ’“‘ |
Step 3: Manage Transcripts
You need to both pass the conversation transcripts to ASAPP, as well as request GenerativeAgent to analyze the conversation.
Create a Conversation
You need to create the conversation with GenerativeAgent for each IVR call.
A conversation
represents a thread of messages between an end user and one or more agents. GenerativeAgent evaluates and responds in a given conversation.
Create a conversation
providing your Ids for the conversation and customer:
A successfully created conversation returns a status code of 200 and the conversation’s id.
Gather transcripts and analyze conversations with GenerativeAgent
After you receive the conversation transcripts from the UniMRCP Plugin, you must call /analyze and other endpoints se GenerativeAgent evaluates the conversation and sendfd a reply.
You can decide when to call the GenerativeAgent, a common strategy is to define an immediate call after a transcript is returned from the MRCP client
Additionally, the GenerativeAgent will make API Calls to your Organization depending on the Tasks and Functions that were configured for the Agent.
Once you have the SSE stream connected and are receiving messages, you need to engage GenerativeAgent with a given conversation. All messages are sent through REST outside of the SSE channels.
To have GenerativeAgent analyze a conversation, make a POST request to /analyze
:
GenerativeAgent evaluates the transcript at that moment of time to determine a response. GenerativeAgent is not aware of any additional transcript messages that are sent while processing.
A successful response returns a 200 and the conversation Id.
GenerativeAgent’s response is communicated via the events.
Analyze with Message
You have the option to send a message when calling analyze.
A successful response returns a 200 status code the id of the conversation and the message that was created.
Next Steps
With your system implemented into GenerativeAgent, sending messages and engage GenerativeAgent, you are ready to use GenerativeAgent.
You may find these other pages helpful in using GenerativeAgent:
Was this page helpful?