ASAPP offers a plugin for speech recognition for the UniMRCP Server (UMS).

Speech-related clients use Media Resource Control Protocol (MRCP) to control media service resources including:

  • Text-to-Speech (TTS)
  • Automatic Speech Recognizers (ASR)

To connect clients with speech processing servers and manage the sessions between them, MRCP relies on other protocols to work. Also, MRCP defines the messages to control the media service resources and it also defines the messages that provide the status of the media service resources.

Once established, the MRCP protocol exchange operates over the control session, allowing your organization to control the media processing resources on the speech resource server.

This plugin connects your IVR Platform into the AutoTranscribe Websocket. It is a fast solution for your organization to quickly integrate your IVR application into GenerativeAgent.

By using the ASAPP UniMRCP Plugin, the GenerativeAgent receives text transcripts from your IVR. This way, your organization takes voice media off your IVR and into the ASAPP Cloud.

Before you Begin

Before you start integrating to GenerativeAgent, you need to:

  • Get your API Key Id and Secret

    For authentication, the UniMRCP server connects with AutoTranscribe using standard websocket authentication. The ASAPP UniMRCP Plugin does not handle authentication, but rather authentication is on your IVR’s side of the call. Your API credentials are used by the configuration document.For user identification or verification, you must handle it by the IVRs policies and flows.

  • Ensure your API key has been configured to access GenerativeAgent APIs and the AutoTranscribe WebSocket. Reach out to your ASAPP team if you are unsure about this.

  • Use ASAPPs ASR

    Make sure your IVR application uses the ASAPP ASR so AutoTranscribe can receive it and send transcripts to GenerativeAgent.

  • Configure Tasks and Functions.

    By using the Plugin, you still need to save customer info an messages. The GenerativeAgent can save that data by sending it into its Chat Core, but your organization can also save the messages either by calling the API or by saving the information from each event handler.

    Your IVR application is in control of when to call /analyze so the GenerativeAgent analyzes the transcripts and replies. The recommended configuration is to call /analyze every time an utterance or transcript is returned.

    Another approach is to call LLMBot when a complete thought/question is provided. Some organizations may find a good solution call /analyze and buffer up transcripts until the customer’s thought is complete.

Implementation steps:

  1. Step 1: Listen and Handle GenerativeAgent Events
  2. Step 2: Setup the UniMRCP ASAPP Plugin
  3. Step 3: Manage the Transcripts and send them to GenerativeAgent

Step 1: Listen and Handle GenerativeAgent Events

GenerativeAgent sends events during any conversation. All events for all conversations being evaluated by GenerativeAgent are sent through the single Server-Sent-Event (SSE) stream..

You need to listen and handle these events to enable GenerativeAgent to interact with your users.

Step 2: Setup the UniMRCP ASAPP Plugin

On your UniMRCP server, you need to install and configure the ASAPP UniMRCP Plugin.

Install the ASAPP UniMRCP Plugin

Go to ASAPP’s UniMCRP Plugin Public Documentation to install and see its usage

Use the Recomended Plugin Configuration

Fields & Parameters

After you install the UniMCRP ASAPP Plugin, you need to configure the request fields so the prompts are sent in the best way and GenerativeAgent gets the most information available.

Having the recommended configuration will ensure GenerativeAgent analyzes each prompt correctly.

Here are the details for the fields with the recommended configuration:

StartStream Request Fields

Field

Description

Default

Supported Values

sender

role (required)

A participant role, usually the customer or an agent for human participants.

n/a

“agent”, “customer”

externalId (required)

Participant ID from the external system, it should be the same for all interactions of the same individual

n/a

“BL2341334”

language

IETF language tag

en-US

“en-US”

smartFormatting

Request for post processing:

Inverse Text Normalization (convert spoken form to written form), e.g., ‘twenty two —> 22’.

Auto punctuation and capitalization

true

true, false

Recomended: true

Interpreting transcripts will be more natural and predictable

detailedToken

Has no impact on UniMRCP

false

true, false

Recommended: false

IVR application does not utilize the word level details

audioRecordingAllowed

false: ASAPP will not record the audio

true: ASAPP may record and store the audio for this conversation

false

true, false

Recommended: true

Allowing audio recording improves transcript accuracy over time

redactionOutput

If detailedToken is true along with value ‘redacted’ or ‘redacted_and_unredacted’, request will be rejected.

If no redaction rules configured by the client for ‘redacted’ or ‘redacted_and_unredacted’, the request will be rejected.

If smartFormatting is False, requests with value ‘redacted’ or ‘redacted_and_unredacted’ will be rejected.

redacted

Recommended: unredacted

“redacted”, “unredacted”,“redacted_and_unredacted”

Recommended: unredacted

IVR application works better with full information available

Transcript Message Response Fields

All Responses go to the MRCP Server, so the only visible return is a VXML return of the field.

FieldDescriptionFormatExample Syntax
utterancetextThe written text of the utterance. While an utterance can have multiple alternatives (e.g., ‘me two’ vs. ‘me too’) ASAPP provides only the most probable alternative only, based on model prediction confidence.array“Hi, my ID is 123.”

If the detailedToken in startStream request is set to true, additional fields are provided within the utterance array for each token:

FieldSubfieldDescriptionFormatExample Syntax
tokencontentText or punctuationstring“is”, ”?”
startStart time (millisecond) of the token relative to the start of the audio inputinteger170
endEnd time (millisecond) audio boundary of the token relative to the start of the audio input, there may be silence after that, so it does not necessarily match with the startMs of the next token.integer200
punctuationAfterOptional, punctuation attached after the contentstring’.’
punctuationBeforeOptional, punctuation attached in front of the contentstring’“’

Step 3: Manage Transcripts

You need to both pass the conversation transcripts to ASAPP, as well as request GenerativeAgent to analyze the conversation.

Create a Conversation

You need to create the conversation with GenerativeAgent for each IVR call.

A conversation represents a thread of messages between an end user and one or more agents. GenerativeAgent evaluates and responds in a given conversation.

Create a conversation providing your Ids for the conversation and customer:

curl -X POST 'https://api.sandbox.asapp.com/conversation/v1/conversations' \
--header 'asapp-api-id: <API KEY ID>' \
--header 'asapp-api-secret: <API TOKEN>' \
--header 'Content-Type: application/json' \
--data '{ 
  "externalId": "1",
  "customer": {   
    "externalId": "[Your id for the customer]",
    "name": "customer name" 
  },
  "timestamp": "2024-01-23T11:42:42Z"
}'

A successfully created conversation returns a status code of 200 and the conversation’s id.

{"id":"01HNE48VMKNZ0B0SG3CEFV24WM"}

Gather transcripts and analyze conversations with GenerativeAgent

After you receive the conversation transcripts from the UniMRCP Plugin, you must call /analyze and other endpoints se GenerativeAgent evaluates the conversation and sendfd a reply.

You can decide when to call the GenerativeAgent, a common strategy is to define an immediate call after a transcript is returned from the MRCP client

Additionally, the GenerativeAgent will make API Calls to your Organization depending on the Tasks and Functions that were configured for the Agent.

Once you have the SSE stream connected and are receiving messages, you need to engage GenerativeAgent with a given conversation. All messages are sent through REST outside of the SSE channels.

To have GenerativeAgent analyze a conversation, make a POST request to  /analyze:

curl -X POST 'https://api.sandbox.asapp.com/generativeagent/v1/analyze' \
--header 'asapp-api-id: <API KEY ID>' \
--header 'asapp-api-secret: <API TOKEN>' \
--header 'Content-Type: application/json' \
--data '{
    "conversationId": "01HNE48VMKNZ0B0SG3CEFV24WM"
}'

GenerativeAgent evaluates the transcript at that moment of time to determine a response. GenerativeAgent is not aware of any additional transcript messages that are sent while processing.

A successful response returns a 200 and the conversation Id.

{
  "conversationId":"01HNE48VMKNZ0B0SG3CEFV24WM"
}

GenerativeAgent’s response is communicated via the events.

Analyze with Message

You have the option to send a message when calling analyze.

curl -X POST 'https://api.sandbox.asapp.com/generativeagent/v1/analyze' \
--header 'asapp-api-id: <API KEY ID>' \
--header 'asapp-api-secret: <API TOKEN>' \
--header 'Content-Type: application/json' \
--data '{
    "conversationId": "01HNE48VMKNZ0B0SG3CEFV24WM",
    "message": {
        "text": "hello, can I see my bill?",
        "sender": {
            "externalId": "321",
            "role": "customer"
        },
        "timestamp": "2024-01-23T11:50:50Z"
    }
}'

A successful response returns a 200 status code the id of the conversation and the message that was created.

{
  "conversationId":"01HNE48VMKNZ0B0SG3CEFV24WM",
  "messageId":"01HNE6ZEAC94ENQT1VF2EPZE4Y"
}

Next Steps

With your system implemented into GenerativeAgent, sending messages and engage GenerativeAgent, you are ready to use GenerativeAgent.

You may find these other pages helpful in using GenerativeAgent: