How does libaom parse the AV1 sequence header OBU?
The AOMedia Video 1 (AV1) bitstream is structured into Open Bitstream
Units (OBUs), with the Sequence Header OBU serving as the foundational
block that defines global coding parameters such as profile, level,
tier, and color configuration. In libaom, the reference
software implementation for AV1, parsing this critical header is
primarily handled by the internal function
aom_read_sequence_header_obu. This article explores the
specific code path, data structures, and bit-stream reading mechanisms
libaom employs to decode the sequence header and initialize
the decoder state.
The Entry Point: Recognizing the OBU Type
Before parsing the sequence header itself, libaom
processes the OBU header, which is present at the start of every OBU.
The decoder extracts the obu_type using a simple
bit-masking operation on the first byte of the unit. When
obu_type matches OBU_SEQUENCE_HEADER (value
1), the decoder routes the bitstream payload to the dedicated sequence
header parsing pipeline. At this stage, the driver initializes an
aom_reader or a raw bit buffer wrapper
(rbsp_bit_buffer) to safely read uncompressed header bits
sequentially.
Core
Architecture and aom_read_sequence_header_obu
The primary heavy lifting is executed within
av1/decoder/decodeframe.c (or associated bitstream parsing
modules) via the function aom_read_sequence_header_obu.
This function maps the raw bitstream syntax elements directly to the
internal C structure SequenceHeader.
The parsing process follows a strict sequential order defined by the AV1 specification:
- Profile and Tier directly at the start: The first
few bits extract
seq_profile(3 bits) andstill_picture(1 bit). Ifstill_pictureis not set, it reads thereduced_still_picture_headerflag. - Decoder Operating Points: The decoder reads
operating_points_cnt_minus_1to determine how many scalability layers or operating points are present, subsequently looping to parse the primary tier, level, and structural properties for each operating point. - Frame Geometry Parameters: The function extracts
the maximum frame width and height values
(
max_frame_width_minus_1andmax_frame_height_minus_1) using fixed-length bit reading, establishing the maximum memory allocation bounds for reference frames.
Color Config and Tool Enablement Flags
Once the foundational properties are set,
aom_read_sequence_header_obu calls a helper function,
typically named read_color_config, to parse critical chroma
and luma parameters. This sub-routine decodes bit depth (8, 10, or 12
bits), color primaries, transfer characteristics, matrix coefficients,
and chroma subsampling positions.
Following the color configuration, libaom parses a dense
sequence of single-bit flags that globally enable or disable advanced
coding tools for the entire video sequence. These include:
enable_filter_intraandenable_intra_edge_filterenable_interintra_compoundandenable_masked_compoundenable_dual_filterandenable_order_hintenable_jnt_comp(Joint Compound) andenable_ref_frame_mvs
By reading these as boolean values, libaom flags which
syntax elements the decoder should expect—or skip entirely—when it
parses subsequent Frame Headers and Tile Groups.
Finalizing the Sequence State
The parsing function concludes by checking for trailing bits to
ensure bitstream alignment and verification. Once
aom_read_sequence_header_obu successfully returns, the
populated SequenceHeader structure is cached inside the
main decoder context (AV1Decoder). This structural data
acts as the immutable baseline for the decoder until a new Sequence
Header OBU is encountered in the stream, ensuring that all subsequent
frame buffers and decoding loops conform to the validated hardware and
software limits.