How Does Row-Based Multi-Threading Work in Libaom?

Row-based multi-threading (RBMT) in the libaom encoder—the reference software for the AV1 video codec—is a parallel processing technique that accelerates video encoding by distributing the processing of pixel rows within a frame across multiple CPU threads. Instead of waiting for an entire frame to finish encoding before starting the next, or splitting a frame into independent, quality-degrading tiles, RBMT allows threads to work on different rows of the same frame simultaneously. This article breaks down the mechanics of row-based multi-threading, the dependency management required to make it work, and how it improves encoding speed without sacrificing compression efficiency.

The Challenge of Parallel Video Encoding

Video encoding is inherently serial because of spatial prediction. To compress a specific block of pixels (a superblock in AV1), the encoder looks at previously processed neighboring blocks (typically above and to the left) to predict the current block’s data.

Traditionally, this creates a strict bottleneck: Thread 2 cannot start encoding a new row of blocks until Thread 1 has completely finished the row above it.

The Mechanics of Row-Based Multi-Threading

Row-based multi-threading overcomes this bottleneck by utilizing a “wavefront” processing pattern. Instead of waiting for a full row to complete, a thread processing a lower row can begin as soon as the thread above it has cleared a specific horizontal offset.

In AV1, the encoder processes data in large blocks called superblocks (usually \(64\times64\) or \(128\times128\) pixels). Because spatial prediction relies on the top, top-right, and left neighbors, a thread encoding Row \(N\) only needs to wait until the thread encoding Row \(N-1\) has finished processing the blocks ahead of it.

The Wavefront Progress Pattern

Sync Mechanisms and Dependency Tracking

To prevent data corruption or invalid predictions, libaom uses strict synchronization primitives (mutexes and conditional variables) to track the progress of each thread.

Each row has an associated progress counter that tracks how many superblocks have been successfully encoded. Before a thread processes a superblock at column \(X\) on row \(Y\), it checks the progress counter of row \(Y-1\). If the counter shows that the upper thread has reached at least column \(X+2\), the current thread proceeds. If not, it blocks and waits for a signal from the upper thread.

Why RBMT is Preferred Over Tile-Based Threading

AV1 also supports “Tiles,” which split a frame into a grid of completely independent regions that can be encoded by separate threads with zero communication. While tiles offer excellent CPU utilization, they come with a distinct disadvantage: