What AV1 Transform Types Does Libaom Support?
The AV1 video coding format utilizes a highly flexible set of transform blocks to efficiently compress residual data by converting spatial pixels into frequency coefficients. During the encoding process, the reference encoder, libaom, supports and evaluates multiple transform types across various block sizes. This article provides an overview of the specific primary and secondary transform types implemented in libaom, including Discrete Cosine Transforms (DCT), Asymmetric Discrete Sine Transforms (ADST), and Identity transforms, alongside an explanation of how the encoder evaluates them using rate-distortion optimization (RDO).
Supported Primary Transform Types
Unlike older video codecs that relied almost exclusively on standard DCT, AV1 introduces a combination of different transform types that can be applied independently to the horizontal and vertical axes of a block. Libaom supports four primary transform variants:
- DCT (Discrete Cosine Transform): The traditional, standard transform type optimized for highly correlated data and smooth gradients.
- ADST (Asymmetric Discrete Sine Transform): Designed to handle boundary discontinuities more effectively, making it highly efficient for blocks adjacent to intra-prediction borders.
- FlipADST (Flipped ADST): A reversed version of the ADST, used depending on the direction of the intra-prediction signal.
- IDTX (Identity / No Transform): Passes the spatial residual directly to quantization without frequency conversion. This is highly effective for sharp edges, synthetic content, text, and screen content.
By combining these 1D transforms vertically and horizontally, libaom evaluates up to 16 distinct 2D transform combinations (such as DCT_DCT, DCT_ADST, ADST_DCT, IDTX_IDTX, etc.) depending on the block size and prediction mode.
Block Sizes and Transform Unit Hierarchy
Libaom implements these transforms across a wide range of square and rectangular Block Sizes, ranging from 4x4 up to 64x64 pixels.
- 4x4 to 32x32 blocks support the full suite of DCT, ADST, FlipADST, and Identity combinations.
- 64x64 blocks are restricted to the DCT_DCT type for complexity reduction, as larger transforms rarely benefit from asymmetric sine variations.
How Libaom Evaluates Transforms
Evaluating all 16 transform combinations for every single block would drastically slow down encoding. To balance compression efficiency and encoding speed, libaom utilizes several optimization strategies during the evaluation phase:
- Rate-Distortion Optimization (RDO): Libaom calculates the exact bit-cost (rate) and pixel error (distortion) for the most promising transform paths to choose the option with the lowest overall cost.
- Pruning and Fast Transform Search: Based on the
user’s selected speed preset (
--cpu-used), libaom skips unlikely transform combinations. For example, if the spatial residual variance is extremely high or low, it may immediately prune ADST variations or skip the Identity transform evaluation entirely. - Prediction Mode Dependency: The encoder restricts the transform candidate list based on whether a block is Inter-predicted or Intra-predicted. Intra blocks often favor ADST variants that match the prediction angle, while Inter blocks heavily favor DCT or Identity transforms.