Structure of an Opus Audio Format Packet

This article provides a technical overview of the structure of an Opus audio format packet. It breaks down the self-delimiting framing system of the Opus codec, detailing the vital Table of Contents (TOC) byte, configuration codes, frame count indicators, and how individual audio frames are packed within a single payload.

An Opus packet is designed to be self-contained and self-delimiting, allowing decoders to understand the packet’s configuration without needing external metadata. Every Opus packet consists of a mandatory Table of Contents (TOC) byte, optional frame length indicators, and one or more audio frames containing the compressed payload.

1. The Table of Contents (TOC) Byte

The very first byte of any Opus packet is the TOC byte. This single byte is critical because it defines how the rest of the packet must be parsed. The TOC byte is split into three distinct fields:

2. Frame Length Indicators

Depending on the Frame Count Code defined in the TOC byte, the packet may contain optional bytes to specify the length of the audio frames before the payload begins.

3. The Audio Payload

The remainder of the Opus packet is the actual audio payload, containing the compressed audio data.

Depending on the configuration determined by the TOC byte, the payload is processed by one of three internal modes: * SILK Mode: Optimized for speech preservation, typically operating at lower sample rates and bitrates. * CELT Mode: Optimized for high-fidelity music and ultra-low latency, operating across the full frequency spectrum. * Hybrid Mode: Uses SILK for lower frequencies (up to 8 kHz) and CELT for higher frequencies (above 8 kHz) within the same frame to maximize efficiency.

The decoder reads the frames sequentially, using the structural boundaries defined by the TOC and length bytes to parse and reconstruct the audio signal.