What Internal Data Structures Does libaom Use for Video Frames?

The libaom library, the open-source reference encoder and decoder for the AV1 video codec developed by the Alliance for Open Media, relies on a sophisticated hierarchy of internal data structures to manage, manipulate, and optimize video frame data during compression and decompression. Understanding these structures is crucial for developers looking to optimize video processing pipelines, contribute to the codec, or integrate AV1 encoding into their software. This article explores the primary structures within the libaom source code, including aom_image_t, YV12_BUFFER_CONFIG, and the macroblock/coding block representations that drive the encoder’s decision-making process.

The External and API Layer: aom_image_t

At the public API layer, libaom interacts with external applications using the aom_image_t structure. This structure is defined in aom/aom_image.h and serves as the primary wrapper for passing raw input frames into the encoder or receiving decoded frames from the decoder.

Key fields within aom_image_t include:

The Internal Frame Buffer: YV12_BUFFER_CONFIG

While aom_image_t is the interface for the outside world, libaom’s internal core operates heavily on YV12_BUFFER_CONFIG. This structure, defined in aom_scale/yv12config.h, manages the actual allocated memory buffers used for reference frames, motion estimation, and filtering.

Unlike standard image containers, YV12_BUFFER_CONFIG includes extensive padding around the actual frame boundaries. This padding, often referred to as “border pixels,” allows motion compensation algorithms to fetch pixels outside the frame boundaries (by clamping or extending the edge pixels) without triggering out-of-bounds memory errors. It tracks the original width/height, the buffered width/height (including borders), strides, and the raw memory allocations for the luma (Y) and chroma (U/V) components.

Frame Processing and Decision Structures: AV1_COMP

For the encoder specifically, the top-level state is maintained in a massive structure called AV1_COMP (defined in av1/encoder/encoder.h). Within this context, several structures manage how a frame is broken down for processing:

Buffer Management: BufferPool and RefCntBuffer

Because AV1 uses complex prediction structures where frames can reference multiple past and future frames, libaom utilizes an internal frame buffer pool. The BufferPool structure manages an array of RefCntBuffer elements.

Each RefCntBuffer wraps a YV12_BUFFER_CONFIG along with reference counters. When a frame is designated as a reference frame (e.g., a Golden or AltRef frame), its reference count increments. It is only released back into the pool for reuse once the encoder or decoder no longer requires it for temporal prediction. This ensures efficient memory utilization without constant re-allocation during video playback or encoding.