Lowest Algorithmic Delay of Opus Audio Codec

This article explores the absolute minimum algorithmic delay achievable using the Opus audio format. It breaks down the technical components that contribute to this latency, including frame sizes and look-ahead times, and discusses the practical trade-offs involved in configuring Opus for ultra-low latency communication.

The lowest possible algorithmic delay achievable with the Opus audio format is 5.0 milliseconds (ms).

Algorithmic delay is the inherent delay introduced by the audio codec itself, independent of network transmission times, buffering, or hardware digital-to-analog conversion. In Opus, this delay is determined by the sum of the frame size (the duration of audio compressed in a single packet) and the codec’s look-ahead time (the extra audio the encoder must analyze to compress the current frame).

To achieve the minimum 5.0 ms algorithmic delay, Opus must be configured with its smallest supported frame size of 2.5 ms. In this configuration, the CELT (Constrained Energy Lapped Transform) layer of the codec is utilized, which requires a look-ahead of exactly 2.5 ms for its overlap-add window. Adding the 2.5 ms frame size to the 2.5 ms look-ahead results in the minimum total algorithmic delay of 5.0 ms.

While a 5.0 ms delay is highly beneficial for real-time applications like live musical performances or competitive gaming, it comes with specific trade-offs:

Increased Overhead: Sending audio in 2.5 ms increments means transmitting 400 packets per second. This drastically increases network packet overhead (IP/UDP/RTP headers), requiring significantly more bandwidth compared to larger frame sizes.
Reduced Compression Efficiency: Shorter frame sizes give the encoder less temporal redundancy to exploit, resulting in slightly lower audio quality at a given bitrate compared to longer frame sizes (such as 10 ms or 20 ms).

For most standard Voice over IP (VoIP) applications, Opus is typically configured with a 20 ms frame size, which yields an algorithmic delay of 26.5 ms (20 ms frame size + 6.5 ms look-ahead), offering a more efficient balance between latency, bandwidth, and audio quality. However, when absolute real-time performance is required, Opus can be scaled down to its 5.0 ms algorithmic limit.