Integrating Opus Audio into the Ogg Container
This article explains how the high-quality, low-latency Opus audio codec is encapsulated within the Ogg container format. It explores the technical specifications of this integration, including how Opus audio packets are mapped to Ogg pages, the structure of the mandatory identification and comment headers, and how playback systems interpret the combined stream.
The Encapsulation Standard (RFC 7845)
The integration of the Opus audio format into the Ogg container is standardized by the Internet Engineering Task Force (IETF) in RFC 7845. This specification defines how Opus audio data is structured, packetized, and multiplexed within an Ogg bitstream.
In an Ogg stream, the Opus data is treated as a “logical bitstream.” This stream consists of sequential Ogg pages, which contain packets of data. For Opus audio, the integration requires a specific sequence of packets, beginning with two mandatory header packets, followed by the actual audio data packets.
1. The Identification Header (OpusHead)
The very first packet in an Ogg Opus stream must be the
Identification Header, often referred to as OpusHead. This
packet is critical for the decoder to initialize the playback engine. It
contains the following essential configuration parameters:
- Magic Signature: The packet starts with the 8-octet
ASCII string
"OpusHead"to identify the stream. - Version: An 8-bit field specifying the version of
the encapsulation specification (currently
1or0). - Channel Count: An 8-bit integer indicating the number of audio channels (e.g., 1 for mono, 2 for stereo, up to 255).
- Pre-skip: A 16-bit integer representing the number of samples (at 48 kHz) to discard from the beginning of the decoded stream. This is necessary to account for the encoder’s startup delay.
- Input Sample Rate: A 32-bit integer showing the original sample rate of the input audio. Note that the Opus decoder always outputs at 48 kHz; this field is informational.
- Output Gain: A 16-bit field specifying a volume adjustment to be applied when decoding.
- Channel Mapping Family: An 8-bit value indicating the layout of the channels (e.g., stereo, surround sound, or custom mapping matrices).
2. The Metadata Header (OpusTags)
The second packet in the stream must be the Comment Header, known as
OpusTags. This packet stores metadata using the Vorbis
comment format. It includes:
- Magic Signature: The packet starts with the 8-octet
ASCII string
"OpusTags". - Vendor String: A length-prefixed string identifying the encoder software used (e.g., “libopus 1.3”).
- User Comments: A list of tag-value pairs containing
metadata such as
TITLE,ARTIST,ALBUM, andDATE.
3. Audio Data Packets and Framing
Following the OpusHead and OpusTags
packets, all subsequent packets in the logical Ogg stream contain raw,
compressed Opus audio data.
Opus audio packets are mapped directly into Ogg packets. Because Ogg pages have a maximum size of roughly 65 KB, multiple small Opus audio frames (which are typically only a few dozen to a few hundred bytes) are grouped together into a single Ogg page to minimize overhead.
Granule Position and Timing
For synchronization and seeking, the Ogg container relies on a field
in the page header called the granule position
(granulepos).
In an Ogg Opus stream, the granule position is measured in PCM samples at a fixed rate of 48,000 samples per second, regardless of the original input sample rate. The granule position of an Ogg page represents the exact playback time of the last sample completed on that page. Decoders use this value to calculate elapsed time, perform precise seeking, and synchronize audio with video tracks in multiplexed files.