TechnicalMarch 24, 20265 min read

Scaling WebRTC for Thousands of Concurrent Voice Agents

WebRTC provides the lowest latency for voice agents, but scaling it requires careful architecture. Here's how to manage media servers, signaling, and state.

WebRTC is the gold standard for browser-based voice agents, delivering sub-50ms transport latency. But while a single WebRTC connection is easy, scaling to thousands of concurrent sessions requires dedicated infrastructure. You can't just run it on a single Node.js server.

SFU vs MCU Architecture

For AI agents, Selective Forwarding Units (SFUs) are typically preferred over Multipoint Control Units (MCUs). The SFU routes the user's audio directly to the ASR service and routes the TTS audio back to the user, minimizing processing overhead on the media server itself.

Ready to build?

See how Mazed's multimodal AI agents work for your use case.

Scaling WebRTC for Thousands of Concurrent Voice Agents | Mazed Blog | Mazed