How Does libaom Handle 8-bit vs 12-bit Video?
This article provides a technical overview of how libaom, the reference software encoder for the AV1 video coding format, adapts to and processes different color bit depths, specifically comparing standard 8-bit depth to high-precision 10-bit and 12-bit depths. It explores the internal pipeline adjustments, data types, and performance trade-offs involved when scaling up color precision.
Unified Internal Pipeline and Data Types
The libaom encoder handles various bit depths by using a highly flexible internal processing pipeline. While 8-bit video represents colors using values from 0 to 255, 12-bit video expands this range from 0 to 4095, allowing for significantly finer gradations and eliminating color banding.
To accommodate this wide range of values without maintaining
completely separate codebases for every single bit depth, libaom relies
on generic internal data types. For standard 8-bit encoding, libaom can
optimize processing by using standard 8-bit unsigned integers
(uint8_t) for pixel buffers. However, when compiled or
configured for high-bit-depth (HBD) support—such as 10-bit or 12-bit
content—the encoder elevates its internal data structures to 16-bit
integers (uint16_t). This provides the necessary headroom
to store and manipulate the larger 12-bit pixel values during complex
video encoding operations.
High-Bit-Depth Dynamic Scaling
When libaom is configured in high-bit-depth mode, it dynamically
adjusts its algorithms based on the config->g_bit_depth
parameter. The encoder utilizes this setting to scale its internal
mathematical operations.
For instance, quantization matrices, transform coefficients, and motion compensation routines must scale their precision to avoid clipping or losing the fidelity of the extra bits. In a 12-bit workflow, the prediction and residual errors are calculated using the full 12-bit range. The encoder tracks the bit depth throughout the loop filtering and spatial-temporal filtering stages, ensuring that the precision is maintained until the final bitstream is packaged into the AV1 compliant format.
Performance and SIMD Optimization
Processing 12-bit video naturally requires more computational overhead than 8-bit video. Because the pixel data doubles in size from 8-bit to 16-bit containers in memory, memory bandwidth requirements increase, and cache efficiency can decrease.
To mitigate this performance penalty, libaom utilizes conditional SIMD (Single Instruction, Multiple Data) assembly optimizations. The codebase contains specialized vector instructions (such as AVX2, AVX-512, and ARM Neon) tailored specifically for high-bit-depth operations. When processing 12-bit video, the encoder invokes these 16-bit wide SIMD functions to process multiple high-precision pixels simultaneously, helping to bridge the speed gap between 8-bit and 12-bit encoding modes.