How does libaom implement AV1 film grain synthesis?
Libaom, the reference encoder implementation for the AV1 video format, implements film grain synthesis by separating film grain from the underlying video content at the encoder side and transmitting mathematical modeling parameters instead of the raw noise. This process consists of a denoising step, followed by film grain parameter estimation using an autoregressive model. The generated metadata parameters are written directly into the uncompressed frame header container as part of the AV1 Open Bitstream Unit (OBU). At the decoder side, the video is reconstructed in its clean form, and the film grain is synthetically re-applied using a deterministic pseudo-random number generator and frequency-shaping algorithms. This workflow preserves the artistic intent of film grain while achieving massive savings in bitrate efficiency.
Denoising and Parameter Estimation at the Encoder
The libaom implementation initiates the process by analyzing the incoming source video frames. Because true high-frequency film grain is incredibly expensive to compress using standard block-based motion compensation, libaom actively strips this noise out before standard intra and inter-prediction loops take place.
- Denoising Pipeline: When a user passes the flag
--denoise-noise-level=XXtoaomenc, a temporal filtering or spatial denoising pass is applied to the frame. This creates a clean, easily compressible baseline frame. - Mathematical Modeling: The encoder calculates the difference between the original noisy frame and the newly denoised frame. Libaom analyzes this residual noise to estimate film grain parameters based on localized spatial frequencies and signal intensity.
Parameter Signaling in the AV1 Bitstream
Rather than encoding the noise visually, libaom structures the extracted grain properties into metadata compliant with the AV1 specification. This data is written into the frame header via standard signaling functions. The primary parameters exported by libaom’s encoder include:
- Scaling Points: A piece-wise linear function containing up to 14 points for luma (\(Y\)) and 10 points for chroma (\(Cb, Cr\)). This dictates how grain intensity changes relative to the underlying brightness of the pixel component.
- Autoregressive (AR) Coefficients: Quantized coefficients that define the spatial correlation and frequency patterns of the grain. Libaom supports AR filter lags from 0 to 3 to simulate varying grain sizes.
- Random Seed: A 16-bit integer used to seed the decoder’s pseudo-random number generator, ensuring that both the encoder’s intended pattern and the decoder’s synthesis routine align perfectly without drifting.
Grain Synthesis at the Decoder
When a decoder reads an AV1 bitstream produced by libaom, it first
handles the standard decoding pipeline to output a clean video frame. If
the apply_grain flag is active in the frame header, the
decoder starts the synthesis engine using the parsed metadata.
[Parsed Bitstream] ──> [Reconstruct Clean Frame] ──┐
└──> [Combine Samples] ──> [Final Video Output]
[Grain Parameters] ──> [Generate 64x64 Templates] ─┘
The hardware or software decoder establishes a deterministic, 16-bit
Linear-Feedback Shift Register (LFSR) seeded by the
random_seed. Libaom’s decoding mechanics pre-compute a
\(64 \times 64\) luma grain template
and two \(32 \times 32\) chroma
templates. The decoder reads across the image in a raster scan pattern,
applying pseudo-random offsets to pull unique sections from the template
block. These noise samples are scaled via Look-Up Tables (LUTs)
initialized by the scaling points, combined with the underlying clean
video pixels, and finally clipped to the appropriate bit-depth
boundaries to deliver the final artifact-free video.