Implement VP9 Spatial Scalability with libvpx-vp9

This article provides a practical guide on implementing spatial scalability (SVC) using the libvpx-vp9 encoder in FFmpeg. You will learn the core concepts of VP9 spatial layers, the key configuration parameters required by the encoder, and a step-by-step command-line example to generate a scalable video bitstream.

Understanding VP9 Spatial Scalability

Spatial scalability allows you to encode a single video stream containing multiple resolution layers (such as 360p, 720p, and 1080p). Clients can decode only the layers they need or can support based on their network bandwidth and hardware capabilities.

In VP9, spatial scalability is typically implemented alongside temporal scalability. To implement spatial scalability using libvpx-vp9, you must define the number of spatial layers, the number of temporal layers, and assign target bitrates to each layer combination.

Key FFmpeg Parameters for libvpx-vp9 SVC

To configure spatial scalability, you must pass specific flags to the libvpx-vp9 encoder:

Step-by-Step FFmpeg Implementation

Below is a standard FFmpeg command to encode a 1080p input video into 3 spatial layers (typically downscaled by a factor of 2 per layer: 270p, 540p, and 1080p) using a single temporal layer.

ffmpeg -i input.mp4 \
  -c:v libvpx-vp9 \
  -s 1920x1080 \
  -b:v 3500k \
  -minrate 3500k \
  -maxrate 3500k \
  -bufsize 7000k \
  -g 150 \
  -keyint_min 150 \
  -ss_number_layers 3 \
  -ts_number_layers 1 \
  -layer_target_bitrate 400,1200,3500 \
  -speed 7 \
  -quality realtime \
  -error-resilient 1 \
  output_svc.webm

Explanation of the Parameters

  1. Rate Control: VP9 SVC works best with Constant Bitrate (CBR) mode. Setting -b:v, -minrate, and -maxrate to the same value (e.g., 3500k) forces CBR.
  2. GOP Size (-g and -keyint_min): Keeps keyframe intervals consistent across all layers. For 30fps video, 150 ensures a keyframe every 5 seconds.
  3. Layer Bitrate Allocation: The -layer_target_bitrate 400,1200,3500 argument allocates:
    • Layer 0 (270p): 400 kbps
    • Layer 1 (540p): 1200 kbps (cumulative, including Layer 0)
    • Layer 2 (1080p): 3500 kbps (cumulative, including lower layers)
  4. Error Resilience (-error-resilient 1): Enables error resilience features, which are mandatory for SVC decoding to allow smooth frame drops and layer switching without stream corruption.