MPEG-4 Motion Estimation and Compensation Explained

This article explains how the MPEG-4 video encoding standard utilizes motion estimation and compensation to achieve high compression efficiency. By analyzing the movement of objects between video frames (estimation) and only encoding the differences (compensation), MPEG-4 drastically reduces file sizes while maintaining visual quality. Below, we break down the core mechanisms of this process, including block-matching, motion vectors, residual coding, and the specific frame types involved.

The Goal: Reducing Temporal Redundancy

In any video sequence, consecutive frames are highly similar. If a person walks across a static background, most of the pixels representing the background remain unchanged from one frame to the next. MPEG-4 exploits this “temporal redundancy” so it does not have to compress and save every frame as a completely new image. Instead, it describes frames based on how they have changed relative to previous or future frames.

Step 1: Motion Estimation (Block Matching)

Motion estimation is the process of locating moving objects within a video sequence. MPEG-4 performs this by dividing a video frame into a grid of small squares called macroblocks (typically 16x16 pixels).

  1. Search Window: To encode a new frame (the target frame), the encoder looks at a previously encoded frame (the reference frame).
  2. Block Matching: For each macroblock in the target frame, the encoder searches a designated area (the search window) in the reference frame to find a macroblock that looks most similar.
  3. Evaluation Metrics: The encoder uses mathematical algorithms, such as the Sum of Absolute Differences (SAD) or Mean Absolute Difference (MAD), to determine which block in the reference frame is the closest match.

Once the best matching block is found, the encoder calculates the spatial shift between the matching block in the reference frame and the block in the current frame. This displacement is represented as a 2D coordinate called a Motion Vector.

Step 2: Motion Compensation

Once the motion vectors are established, the encoder performs motion compensation to construct a predicted frame.

Instead of saving the entire pixel data of the current macroblock, the encoder uses the motion vector to copy the matching macroblock from the reference frame. However, because motion is rarely perfect, there is usually a slight difference between the predicted block and the actual block.

To account for this, the encoder subtracts the predicted block from the actual target block. The result of this subtraction is the prediction error or residual. The encoder then compresses and saves only two things: * The motion vectors (which require very little data). * The residual error (which contains far less detail and is much easier to compress than a full image).

During playback, the decoder reads the motion vectors, grabs the matching blocks from the reference frame, shifts them accordingly, and adds the decoded residual error to reconstruct the original frame.

Frame Types in MPEG-4

To manage motion estimation and compensation efficiently, MPEG-4 organizes video frames into three primary types:

Advanced MPEG-4 Features

MPEG-4 introduced several advanced tools to improve the accuracy of motion estimation and compensation: