Monday, December 22, 2025

Bi-directional streaming for real-time agent interactions now out there in Amazon Bedrock AgentCore Runtime


Constructing pure voice conversations with AI brokers requires complicated infrastructure and plenty of code from engineering groups. Textual content-based agent interactions observe a turn-based sample: a consumer sends an entire request, waits for the agent to course of it, and receives a full response earlier than persevering with. Bi-directional streaming removes this constraint by establishing a persistent connection that carries knowledge in each instructions concurrently.

Amazon Bedrock AgentCore Runtime helps bi-directional streaming for real-time, two-way communication between customers and AI brokers. With this functionality, brokers can concurrently take heed to consumer enter whereas producing responses, making a extra pure conversational circulation. That is notably well-suited for multimodal interactions, resembling voice and imaginative and prescient agent conversations. The agent can start responding whereas nonetheless receiving consumer enter, deal with mid-conversation interruptions, and modify its responses primarily based on real-time suggestions.

A bi-directional voice chat agent can conduct spoken conversations with the fluidity of human dialogue in order that customers can interrupt, make clear, or change subjects naturally. These brokers course of streaming audio enter and output concurrently whereas sustaining conversational state. Constructing this infrastructure requires managing persistent low-latency connections, dealing with concurrent audio streams, preserving context throughout exchanges, and scaling a number of conversations. Implementing these capabilities from scratch calls for months of engineering effort and specialised real-time programs experience. Amazon Bedrock AgentCore Runtime addresses these challenges by offering a safe, serverless, and purpose-built internet hosting surroundings for deploying and working AI brokers, with out requiring builders to construct and preserve complicated streaming infrastructure themselves.

On this submit, you’ll find out about bi-directional streaming on AgentCore Runtime and the stipulations to create a WebSocket implementation. Additionally, you will discover ways to use Strands Brokers to implement a bi-directional streaming resolution for voice brokers.

AgentCore Runtime bi-directional streaming

Bi-directional streaming makes use of the WebSocket protocol. WebSocket supplies full-duplex communication over a single TCP connection, establishing a persistent channel the place knowledge flows repeatedly in each instructions. This protocol has broad shopper assist throughout browsers, cellular functions, and server environments, making it accessible for various implementation situations.

When a connection is established, the agent can obtain consumer enter as a stream whereas concurrently sending response chunks again to the consumer. The AgentCore Runtime manages the underlying infrastructure that handles connection, message ordering, and maintains conversational state throughout the bi-directional change. This alleviates the necessity for builders to construct customized streaming infrastructure or handle the complexities of concurrent knowledge flows.Voice conversations differ from text-based interactions of their expectation of pure circulation. When talking with a voice agent, customers anticipate the identical conversational dynamics they expertise with people: the power to interrupt when they should appropriate themselves, to interject clarification mid-response, or to redirect the dialog with out awkward pauses.With bi-directional streaming, it’s attainable for voice brokers to course of incoming audio whereas producing responses, detecting interruptions, and adjusting conduct in real-time. The agent maintains conversational context all through these interactions, preserving the thread of dialogue even because the dialog shifts route. This functionality additionally helps voice brokers from turn-based programs right into a responsive conversational associate.

Past voice conversations, bi-directional streaming has a number of interplay patterns. Interactive debugging classes permit builders to information brokers by way of problem-solving in real-time, offering suggestions because the agent explores options. Collaborative brokers can work alongside customers on shared duties, receiving steady enter because the work progresses slightly than ready for full directions. Multi-modal brokers can course of streaming video or sensor knowledge whereas concurrently offering evaluation and suggestions. Async long-running agent operations can course of duties over minutes or hours whereas streaming incremental outcomes to purchasers.

WebSocket implementation

To create a WebSocket implementation in AgentCore Runtime, it’s best to observe just a few patterns. Firstly, your containers should implement WebSocket endpoints on port 8080 on the /ws path, which aligns with commonplace WebSocket server practices. This WebSocket endpoint will allow a single agent container to serve each the standard InvokeAgentRuntime API and the brand new InvokeAgentRuntimeWithWebsocketStream API. Moreover, clients should present a /ping endpoint for well being checks.

Bi-directional streaming utilizing WebSockets on AgentCore Runtime helps functions utilizing a WebSocket language library. The shopper should hook up with the service endpoint with a WebSocket protocol connection:

wss://bedrock-agentcore..amazonaws.com/runtimes//ws

You additionally want to make use of one of many supported authentication strategies (SigV4 headers, SigV4 pre-signed URL, or OAuth 2.0) and to make it possible for the agent software implements the WebSocket service contract as laid out in HTTP protocol contract.

Strands bi-directional agent: Simplified voice agent growth

Amazon Nova Sonic unifies speech understanding and technology right into a single mannequin, delivering human-like conversational AI with low latency, main accuracy, and powerful value efficiency. Its built-in structure supplies expressive speech technology and real-time transcription in a single mannequin, dynamically adapting responses primarily based on enter speech prosody, tempo, and timbre.

With bi-directional streaming now additionally out there in AgentCore Runtime, you could have a number of methods to indicate the best way to host a voice agent: one might be the direct implementation the place it’s essential to managing WebSocket connections, parsing protocol occasions, dealing with audio chunks, and orchestrating async duties; one other is the strands bi-directional agent implementation that abstracts this complexity and implements these steps by itself.

Instance Implementation

On this submit, it’s best to discuss with the Amazon Bedrock AgentCore bi-directional code, which implements bi-directional communication with Amazon Bedrock AgentCore. The repository has two implementations: One which makes use of native Amazon Nova Sonic Python implementation deployed on to AgentCore Runtime, and a high-level framework implementation utilizing the Strands bi-directional agent for simplified real-time audio conversations.

The next diagram reveals the native Amazon Nova Sonic Python WebSocket server on to AgentCore. It supplies full management over the Nova Sonic protocol with direct occasion dealing with for full visibility into session administration, audio streaming, and response technology.

The Strands bi-directional agent framework for real-time audio conversations with Amazon Nova Sonic supplies a high-level abstraction that simplifies bi-directional streaming, automated session administration, and power integration. The code snippet beneath is an instance of this simplification.

from strands.experimental.bidi.agent import BidiAgent
from strands.experimental.bidi.fashions.nova_sonic import BidiNovaSonicModel
from strands_tools import calculator

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket, model_name: str):
    # Outline a Nova Sonic BidiModel
    mannequin = BidiNovaSonicModel(
        area="us-east-1",
        model_id="amazon.nova-sonic-v1:0",
        provider_config={
            "audio": {
                "input_sample_rate": 16000,
                "output_sample_rate": 24000,
                "voice": "matthew",
            }
        }
    )
    # Create a Strands Agent with instruments and system immediate
    agent = BidiAgent(
        mannequin=mannequin,
        instruments=[calculator],
        system_prompt="You're a useful assistant with entry to a calculator instrument.",
    )

    # Begin streaming dialog
    await agent.run(inputs=[receive_and_convert], outputs=[websocket.send_json])

This implementation demonstrates the simplicity of Strands: instantiate a mannequin, create an agent with instruments and a system immediate, and run it with enter/output streams. The framework handles protocol complexity internally.

The next is the agent declaration part within the code:

agent = BidiAgent(
    mannequin=mannequin,
    instruments=[calculator, weather_api, database_query],
    system_prompt="You're a useful assistant..."
)

Instruments are handed on to the agent’s constructor, and Strands handles perform calling orchestration routinely. In abstract, a local WebSocket implementation of the identical performance requires roughly 150 traces of code, whereas Strands implementation reduces this to roughly 20 traces centered on enterprise logic. Builders can give attention to defining agent conduct, integrating instruments, and crafting system prompts slightly than managing WebSocket connections, parsing occasions, dealing with audio chunks, or orchestrating async duties. This makes bi-directional streaming accessible to builders with out specialised real-time programs experience whereas sustaining full entry to the audio dialog capabilities of Nova Sonic. The Strands bi-directional function is at the moment solely supported for the Python SDK. If you’re in search of flexibility within the implementation of your voice agent, the native Amazon Nova Sonic implementation will help you. Additionally, this may be vital for the circumstances the place you could have a number of totally different patterns of communication from agent to mannequin. With Amazon Nova Sonic implementation it is possible for you to to manage each step of the method with full management. The framework method can present higher management of dependencies, as a result of it’s executed by the SDK, and supplies consistency throughout programs. The identical Strands bi-directional agent code construction works with Nova Sonic, OpenAI Realtime API, and Google Gemini Dwell builders merely swap the mannequin implementation whereas preserving the remainder of their code unchanged.

Conclusion

The bi-directional streaming functionality of Amazon Bedrock AgentCore Runtime transforms how builders can construct conversational AI brokers. By offering WebSocket-based real-time communication infrastructure, AgentCore removes months of engineering effort required to implement streaming programs from scratch. The framework runtime allows builders to deploy a number of sorts of voice brokers—from native protocol implementations utilizing Amazon Nova Sonic to high-level frameworks just like the Strands bi-directional agent—throughout the similar safe, serverless surroundings.


Concerning the authors

Lana Zhang is a Senior Specialist Options Architect for Generative AI at AWS throughout the Worldwide Specialist Group. She makes a speciality of AI/ML, with a give attention to use circumstances resembling AI voice assistants and multimodal understanding. She works carefully with clients throughout various industries, together with media and leisure, gaming, sports activities, promoting, monetary providers, and healthcare, to assist them remodel their enterprise options by way of AI.

Phelipe Fabres is a Senior Specialist Options Architect for Generative AI at AWS for Startups. He makes a speciality of AI/ML with a give attention to Agentic programs and the complete course of of coaching/inference. He has greater than 10 years of working with software program growth, from monolith to event-driven architectures with a Ph.D. in Graph Idea.

Evandro Franco is an Sr. Knowledge Scientist engaged on Amazon Net Providers. He’s a part of the World GTM crew that helps AWS clients overcome enterprise challenges associated to AI/ML on prime of AWS, primarily on Amazon Bedrock AgentCore and Strands Brokers. He has greater than 18 years of expertise working with expertise, from software program growth, infrastructure, serverless, to machine studying. In his free time, Evandro enjoys taking part in together with his son, primarily constructing some humorous Lego bricks.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles