VP9 Temporal Scalability in libvpx

This article explains how temporal scalability functions within the libvpx-vp9 encoder to dynamically scale video framerates. You will learn about the hierarchical structure of temporal layers, how frame reference dependencies prevent decoding errors when frames are discarded, and how developers configure libvpx to implement this technology for adaptive video streaming.

Understanding Temporal Layers

Temporal scalability works by dividing a single video stream into multiple hierarchical layers, categorized as a base layer and one or more enhancement layers. Each layer represents a fraction of the target framerate. For example, in a 30 frames per second (fps) video with three temporal layers:

A decoder can subscribe to only Layer 0, Layers 0 and 1, or all three layers, depending on available CPU and network bandwidth.

Reference Frame Restrictions

The mechanism that allows a receiver to drop frames dynamically without corrupting the video stream is strict reference frame management. In standard video encoding, frames rely on past or future frames for compression (inter-frame prediction). With temporal scalability, libvpx enforces rules on which frames can reference each other:

  1. Lower layers never reference higher layers. A frame in Layer 0 can only use previous frames in Layer 0 as references.
  2. Enhancement layers can only reference their own layer or lower layers. A frame in Layer 2 can reference frames in Layer 1 or Layer 0, but never vice versa.

Because of these rules, if a network router or client drops all packets belonging to Layer 2, the remaining Layer 0 and Layer 1 packets can still be fully decoded. The video will continue to play smoothly at 15 fps instead of 30 fps, without any blocky artifacts or decoding failures.

Dynamic Framerate Dropping in Practice

In live video scenarios, such as WebRTC, dynamic framerate dropping is managed by a Selective Forwarding Unit (SFU) or the client application:

Libvpx Configuration

To enable this behavior in the libvpx-vp9 encoder, developers configure specific temporal scalability parameters in the vpx_codec_enc_cfg_t structure:

By utilizing these controls, libvpx allows applications to react instantly to fluctuating network environments without the costly CPU overhead of re-encoding the video stream.