How Does libaom Calculate SSIM Internally?

This article provides a technical overview of how libaom (the reference software encoder for the AV1 video format) calculates Structural Similarity (SSIM) internally during its rate control process. It explores the downsampling techniques, patch-based luminance and contrast math, and the specific ways these metrics feed back into frame-level and macroblock-level quantization decisions to optimize visual quality per bit.


The Role of SSIM in AV1 Rate Control

In video encoding, rate control algorithms must decide how many bits to allocate to each frame or block. While traditional encoders rely heavily on Mean Squared Error (MSE), libaom integrates SSIM (Structural Similarity Index) and MS-SSIM (Multi-Scale SSIM) to better align bit distribution with human visual perception.

When the encoder operates in a tune-for-SSIM mode (e.g., --tune=ssim), the rate control module dynamically adjusts the quantization parameter (QP) based on the structural distortion it predicts or measures.


Step-by-Step Internal Calculation

Libaom’s internal SSIM calculation follows a highly optimized pipeline designed to minimize the computational overhead of floating-point vision metrics during live encoding loops.

1. Downsampling and Windowing

Standard SSIM uses a Gaussian window to weight local pixel statistics. To achieve similar results efficiently, libaom processes images using localized pixel blocks (typically \(8 \times 8\) or \(16 \times 16\) patches). For Multi-Scale SSIM (MS-SSIM), the encoder iteratively downsamples the reference and distorted frames using a low-pass 2x2 average filter before recalculating metrics at coarser scales.

2. Local Statistical Accumulation

For any given local window, libaom calculates the essential statistical sums. If we define \(x\) as the original source patch and \(y\) as the reconstructed (distorted) patch, the encoder accumulates:

3. Applying the SSIM Formula

Using these accumulated sums, libaom evaluates the core SSIM formula internally using fixed-point arithmetic or optimized SIMD assembly (AVX2/NEON) to speed up execution. The calculation implements the standard three-component comparison:

\[\text{SSIM}(x,y) = \frac{(2\mu_x\mu_y + C_1)(2\sigma_{xy} + C_2)}{(\mu_x^2 + \mu_y^2 + C_1)(\sigma_x^2 + \sigma_y^2 + C_2)}\]

Where:


Integration into the Rate Control Loop

Once the SSIM values are calculated for local regions, libaom uses this information to guide its rate control decisions in two primary ways: