Row-Based Multi-Threading in libvpx-vp9
This article explains row-based multi-threading within the
libvpx-vp9 video codec library. It covers how this parallel
processing technique improves CPU utilization and encoding speeds
without sacrificing compression efficiency, contrasting it with
traditional tile-based threading.
In video encoding, parallel processing is critical for achieving acceptable encoding speeds, especially at high resolutions like 4K and 8K. Historically, VP9 achieved parallelization primarily through “tiles.” Tiles divide a video frame vertically into independent columns that can be encoded or decoded by separate CPU threads. However, because tile boundaries cannot share spatial prediction data, using too many tiles significantly degrades compression efficiency, resulting in a higher bitrate for the same visual quality.
Row-based multi-threading (enabled via the -row-mt 1
flag in libvpx) solves this limitation by parallelizing
processing at the block-row level. Instead of dividing the frame into
independent vertical columns, the encoder processes rows of Superblocks
(the basic coding units in VP9, which are up to 64x64 pixels) in a
staggered, wavefront pattern.
Under this architecture, a thread assigned to process Row \(N\) can begin working as soon as the thread
processing Row \(N-1\) is a few blocks
ahead. This offset ensures that all necessary spatial prediction
dependencies—such as top, top-left, and top-right neighbor blocks—are
already processed and available. Because these dependencies are
preserved across row boundaries, row-based multi-threading allows
libvpx-vp9 to distribute the encoding workload across
multiple CPU cores without breaking the spatial correlation of the video
frame.
The primary benefit of row-based multi-threading is a massive speedup
on multi-core processors with virtually zero penalty to compression
density. Additionally, row-based multi-threading can be combined with
tile-based multi-threading. This hybrid approach allows a system to
scale to very high core counts (such as 16, 32, or 64 threads) by
parallelizing both across tiles and across the rows within those tiles,
making libvpx-vp9 highly viable for modern multi-core
server architectures.