Extracting Motion Vectors from libvpx VP9

Extracting and analyzing motion vector data from a successfully encoded libvpx-vp9 file is essential for tasks like video compression analysis, computer vision, and motion-compensated frame interpolation. This guide provides a straightforward methodology for extracting these vectors using FFmpeg and programmatic APIs, followed by instructions on how to parse and analyze the resulting data.

Method 1: Exporting Motion Vectors via FFmpeg

FFmpeg can extract motion vectors directly from a VP9 bitstream during the decoding process by using the export_mvs flag. This flag instructs the decoder to export motion vector information as side data on each video frame.

To visualize the motion vectors directly on the video frames, use the codecview filter:

ffmpeg -flags2 +export_mvs -i input_vp9.webm -vf codecview=mv=pf+bf+bb output_visualization.mp4

In this command: * pf displays forward-predicted motion vectors (P-frames). * bf displays backward-predicted motion vectors (B-frames). * bb displays bidirectional-predicted motion vectors.

Method 2: Extracting Numerical Data Programmatically

For analysis, numerical coordinate data is more useful than visualization. You can extract this data using Python with the av library (a Pythonic binding for FFmpeg/Libav).

First, install the library:

pip install av

Then, run the following script to extract the frame index, block dimensions, and spatial coordinates of each motion vector:

import av

container = av.open('input_vp9.webm')
# Enable motion vector exporting on the video stream
stream = container.streams.video[0]
stream.codec_context.options['flags2'] = '+export_mvs'

for frame in container.decode(stream):
    # Access motion vector side data
    side_data = frame.side_data.get('MOTION_VECTORS')
    if side_data is not None:
        mvs = side_data.to_nd_array()
        for mv in mvs:
            # Each motion vector contains: 
            # [source_index, width, height, src_x, src_y, dst_x, dst_y, flags]
            print(f"Frame: {frame.index} | Block: {mv[1]}x{mv[2]} | "
                  f"Src: ({mv[3]}, {mv[4]}) -> Dst: ({mv[5]}, {mv[6]})")

Method 3: Low-Level Extraction using the libvpx API

If you are developing in C/C++, you can modify the libvpx decoder application (vpxdec) or write a custom decoder wrapper.

Initialize the decoder using the vpx_codec_dec_init function with the vpx_codec_vp9_dx_algo interface.
During the frame decoding loop, query the decoder context for block-level information using the vpx_codec_control API with the VP9-specific control IDs (such as querying segmentation or reference frame maps).
Access the internal macroblock structures within the decoded frame structure (vpx_image_t) to parse the \(X\) and \(Y\) displacements of the motion vectors for each inter-predicted block.

Analyzing the Extracted Data

Once extracted, the motion vector dataset consists of several parameters per block. Analyze these parameters using the following guidelines:

Block Size (Width x Height): VP9 uses dynamic block partitioning from 64x64 down to 4x4 pixels. Larger blocks with zero or near-zero motion vectors indicate static backgrounds. Smaller blocks with active motion vectors represent fine-detailed moving objects.
Displacement Vector (\(\Delta X\), \(\Delta Y\)): Calculate the displacement by subtracting the destination coordinates from the source coordinates: \[\Delta X = \text{dst\_x} - \text{src\_x}\] \[\Delta Y = \text{dst\_y} - \text{src\_y}\] A consistent direction and magnitude across neighboring blocks indicate global camera motion (panning, tilting, or zooming). High-variance, localized vectors indicate independent object movement.
Temporal Consistency: Track vector magnitudes across successive frames. Sudden spikes in motion vector density and magnitude often correlate with camera scene cuts or rapid action sequences.