How Does libaom AV1 Encode Global Motion Estimation?
This article provides an overview of the Global Motion Estimation (GME) process within the Alliance for Open Media’s reference encoder, libaom, for the AV1 video codec. It covers the core algorithmic pipeline, from initial feature detection and keypoint matching to robust model estimation and final motion parameter verification. Understanding this workflow highlights how AV1 achieves significant compression gains by identifying and neutralizing camera movements like panning, tilting, and zooming across entire video frames.
Feature Detection and Keypoint Extraction
The GME process in libaom begins by analyzing the current frame and a reference frame to locate distinct points of interest. The encoder typically employs feature detection algorithms, such as wrapped versions of FAST (Features from Accelerated Segment Test) or Corner Detection, to identify robust pixel configurations. These keypoints are selected because they remain recognizable even after structural modifications or changes in lighting.
Feature Matching and Correspondence
Once keypoints are extracted from both the source and reference frames, libaom establishes correspondences between them. The encoder tracks or matches these features across frames using methods like Optical Flow (e.g., Lucas-Kanade) or block-matching techniques. This step yields a sparse set of motion vectors that represent how specific local points have shifted between the two frames.
Robust Model Estimation with RANSAC
Because local motion vectors often contain outliers—such as moving objects that do not conform to the background camera movement—libaom utilizes the RANSAC (Random Sample Consensus) algorithm. RANSAC iteratively selects random subsets of the matched keypoints to compute potential global motion models. The algorithms evaluate several transformation models defined by the AV1 specification:
- Translation: Simple horizontal and vertical shifting (2 parameters).
- Affine: Handles rotation, scaling, and shearing alongside translation (6 parameters).
- Rotzoom: Simplifies the affine model to focus strictly on rotation and scaling (4 parameters).
The model that successfully secures the highest number of “inliers” (local vectors that align with the global motion hypothesis within a given threshold) is chosen as the primary candidate.
Parameter Refinement and Error Evaluation
After RANSAC identifies the best global motion model, libaom refines the calculated parameters to maximize compression efficiency. The encoder performs a localized pixel-level or sub-pixel refinement, often using gradient descent methods, to minimize the Sum of Absolute Differences (SAD) or Variance of Differences between the motion-compensated reference frame and the source frame.
Rate-Distortion Optimization (RDO) Decision
The final stage in the libaom GME pipeline is the Rate-Distortion Optimization check. Global motion parameters require a small amount of overhead bit-rate to be signaled in the frame header. The encoder compares the bit cost and resulting distortion of using the estimated global motion parameters against standard local block-based motion vectors or skipping motion compensation entirely. If the GME model reduces the overall rate-distortion cost, the parameters are written into the AV1 bitstream, allowing blocks across the entire frame to reference this global movement efficiently.