Integrating Opus Audio into the Ogg Container

This article explains how the high-quality, low-latency Opus audio codec is encapsulated within the Ogg container format. It explores the technical specifications of this integration, including how Opus audio packets are mapped to Ogg pages, the structure of the mandatory identification and comment headers, and how playback systems interpret the combined stream.

The Encapsulation Standard (RFC 7845)

The integration of the Opus audio format into the Ogg container is standardized by the Internet Engineering Task Force (IETF) in RFC 7845. This specification defines how Opus audio data is structured, packetized, and multiplexed within an Ogg bitstream.

In an Ogg stream, the Opus data is treated as a “logical bitstream.” This stream consists of sequential Ogg pages, which contain packets of data. For Opus audio, the integration requires a specific sequence of packets, beginning with two mandatory header packets, followed by the actual audio data packets.

1. The Identification Header (OpusHead)

The very first packet in an Ogg Opus stream must be the Identification Header, often referred to as OpusHead. This packet is critical for the decoder to initialize the playback engine. It contains the following essential configuration parameters:

2. The Metadata Header (OpusTags)

The second packet in the stream must be the Comment Header, known as OpusTags. This packet stores metadata using the Vorbis comment format. It includes:

3. Audio Data Packets and Framing

Following the OpusHead and OpusTags packets, all subsequent packets in the logical Ogg stream contain raw, compressed Opus audio data.

Opus audio packets are mapped directly into Ogg packets. Because Ogg pages have a maximum size of roughly 65 KB, multiple small Opus audio frames (which are typically only a few dozen to a few hundred bytes) are grouped together into a single Ogg page to minimize overhead.

Granule Position and Timing

For synchronization and seeking, the Ogg container relies on a field in the page header called the granule position (granulepos).

In an Ogg Opus stream, the granule position is measured in PCM samples at a fixed rate of 48,000 samples per second, regardless of the original input sample rate. The granule position of an Ogg page represents the exact playback time of the last sample completed on that page. Decoders use this value to calculate elapsed time, perform precise seeking, and synchronize audio with video tracks in multiplexed files.