How it works

- Create SSE Stream: The Event Handler (which may exist on the IVR or be a dedicated service) creates a Server-Sent Events (SSE) stream with GenerativeAgent.
- Audio Stream: The IVR sends the audio stream from the end user to AutoTranscribe.
- Create Conversation: The IVR creates a conversation and adds messages to the Conversation Data.
- Request Analysis: The IVR requests GenerativeAgent to analyze the conversation.
Benefits of using Websocket to Stream events
- Persistent connection between your voice system and the GenerativeAgent server
- API streaming for audio, call signaling, and returned transcripts
- Real-time data exchange for quick responses and efficient handling of user queries
- Bi-directional communication for smooth and responsive interaction
Before you Begin
Before you start integrating to GenerativeAgent, you need to:- Get your API Key Id and Secret
- Ensure your API key has been configured to access AutoTranscribe and GenerativeAgent APIs. Reach out to your ASAPP team if you unsure.
- Configure Tasks and Functions.
Implementation Steps
- Create AutoTranscribe Streaming URL
- Listen and Handle GenerativeAgent Events
- Open a Connection
- Start an Audio Stream
- Send the Audio Stream
- Analyze the conversation with GenerativeAgent
- Stop the Audio Stream
Step 1: Create AutoTranscribe Streaming URL
First, you need to create a streaming URL that will be the WebSocket connection to AutoTranscribe.Step 2: Listen and Handle GenerativeAgent Events
GenerativeAgent sends events for all conversations through a single Server-Sent-Event (SSE) stream. Listen and handle these events to enable GenerativeAgent interaction with your users.Step 3: Open a Connection
Create the WebSocket connection using the access URL:wss://<internal-voice-gateway-ingress>?token=<short_lived_access_token>
Step 4: Start a stream audio message
Start streaming audio into the AutoTranscribe Websocket using this message sequence:Your Stream Request | ASAPP Response |
---|---|
startStream message | startResponse message |
Stream audio - audio-in | transcript message |
finishStream message | finalResponse message |
Format WebSocket protocol request messages as text (UTF-8 encoded string data); only the audio stream should be in binary format. All response messages will be formatted as text.
startStream
message:
startResponse
:
Step 5: Send the audio stream
Stream audio as binary data:ws.send(<binary_blob>)
You’ll receive transcript
messages:
Step 6: Analyze conversations with GenerativeAgent
Call the/analyze
endpoint to evaluate the conversation:
taskName
and inputVariables
attributes.
You can also simulate Tasks and Input Variables in the Previewer
Step 7: Stop the streaming audio message
Send afinishStream
message:
finalResponse
: