Opus Audio Latency in WebRTC Conferences

This article examines the typical latency introduced by the Opus audio codec during a standard WebRTC video conference. It breaks down the specific components of this delay, including packetization and algorithmic overhead, and explains how these factors influence the overall real-time communication experience.

During a standard WebRTC video conference, the Opus audio format typically adds between 20 ms and 26.5 ms of latency to the audio pipeline. This minimal delay is imperceptible to the human ear, making Opus the industry standard for interactive, real-time communication.

To understand where this latency comes from, it is helpful to break it down into two primary components:

1. Packetization Delay (Frame Size)

By default, WebRTC configures the Opus codec to use 20 ms frame sizes. This means the system collects 20 milliseconds of raw audio data before compressing it and sending it over the network as a single packet. While Opus supports frame sizes ranging from 2.5 ms to 60 ms, the 20 ms setting is the industry default because it strikes the perfect balance between low latency and network efficiency. Larger frames reduce packet overhead but increase latency, while smaller frames decrease latency but drastically increase network overhead.

2. Algorithmic Delay

In addition to the frame size, the Opus codec requires a small amount of “look-ahead” time to analyze the audio signal and perform efficient compression. This algorithmic delay is typically 5 ms to 6.5 ms.

Total Codec Latency

Combining the 20 ms packetization delay with the ~5 ms algorithmic delay results in a total processing latency of approximately 25 ms before the audio packet even leaves the sender’s device.

Ultra-Low Latency Configurations

If a WebRTC application requires absolute minimum latency (such as for remote music collaboration), developers can manually configure Opus to use a 2.5 ms or 5 ms frame size. This reduces the total codec-induced latency to under 10 ms. However, for standard business video conferencing, the default 20 ms frame size remains the preferred choice to ensure audio stability over varying network conditions.