VP9 Two-Pass Encoding Log File Statistics
During a libvpx-vp9 two-pass video encoding process, the first pass analyzes the input video and writes performance and complexity metrics to a stats log file. This article details the specific statistical information stored in this log file, explaining how the encoder uses these metrics per frame to optimize bitrate distribution and visual quality during the second pass.
In the libvpx library, the first-pass statistics are structured on a
per-frame basis using a specific C-struct (historically defined as
FIRSTPASS_STATS in the libvpx source code). The log file is
a binary stream of these structures containing the following precise
statistical metrics for every frame of the video:
1. Frame Identification and Timing
- Frame Number: The sequential index of the frame within the video stream.
- Duration: The display duration of the frame, which helps the encoder calculate the exact frame rate and temporal weighting.
- Weight: A calculated value representing the relative importance of the frame, often correlated with its duration and impact on overall video quality.
2. Spatial and Temporal Complexity (Error Metrics)
These values assess how difficult a frame is to compress by measuring prediction errors: * Intra Error: The sum of squared errors when predicting the frame using only spatial (intra-frame) compression. A high intra error indicates high spatial detail, sharp textures, or complex patterns. * Coded Error: The motion-compensated prediction error relative to reference frames. A low coded error means the frame is highly predictable from previous or future frames (low motion). * Single-Reference Coded Error (sr_coded_error): The prediction error calculated using only a single reference frame without motion compensation. This helps determine if motion estimation is actively beneficial for the frame.
3. Motion Vector (MV) Statistics
The encoder tracks how objects move across frames to determine motion complexity: * Average Motion Vectors (MVr, MVc): The average raw values for row (vertical) and column (horizontal) motion vectors. * Absolute Motion Vectors (mvr_abs, mvc_abs): The average absolute magnitude of row and column motion, representing the overall speed of movement. * Motion Vector Variance (MVrv, MVcv): Measures the consistency of motion. Low variance indicates uniform panning, while high variance indicates chaotic, multi-directional motion. * Motion In/Out Count (mv_in_out_count): An estimation of pixels or objects entering or leaving the frame boundaries, which is useful for detecting pans, zooms, and scene transitions.
4. Block Type and Coding Decisions
The log records the percentage distribution of different block types predicted during the fast first-pass analysis: * Percentage Inter-Coded (pcnt_inter): The fraction of the frame’s blocks that are best predicted using motion estimation from other frames. * Percentage Motion (pcnt_motion): The fraction of blocks that have non-zero motion vectors. * Percentage Second Reference (pcnt_second_ref): The fraction of blocks that utilize a secondary reference frame (like alt-ref frames) for temporal prediction. * Percentage Neutral (pcnt_neutral): The fraction of blocks where the difference between spatial (intra) and temporal (inter) prediction error is negligible, indicating areas like static flat backgrounds.
5. Multi-Layer and Scalability Metrics
- Spatial Layer ID: In multi-layer or scalable video coding (SVC) scenarios, this identifier tracks which spatial resolution layer the statistics belong to, ensuring proper bit allocation across layers.
How the Second Pass Uses This Data
The second pass reads this accumulated log file to map out the entire video’s complexity profile. By analyzing these metrics, the encoder can anticipate scene cuts (indicated by sudden spikes in coded error), identify static scenes that require very few bits, allocate higher bitrates to complex action sequences, and strategically place keyframes (i-frames) and golden frames to maximize compression efficiency.