How Does Row-Based Multi-Threading Work in Libaom?
Row-based multi-threading (RBMT) in the libaom encoder—the reference software for the AV1 video codec—is a parallel processing technique that accelerates video encoding by distributing the processing of pixel rows within a frame across multiple CPU threads. Instead of waiting for an entire frame to finish encoding before starting the next, or splitting a frame into independent, quality-degrading tiles, RBMT allows threads to work on different rows of the same frame simultaneously. This article breaks down the mechanics of row-based multi-threading, the dependency management required to make it work, and how it improves encoding speed without sacrificing compression efficiency.
The Challenge of Parallel Video Encoding
Video encoding is inherently serial because of spatial prediction. To compress a specific block of pixels (a superblock in AV1), the encoder looks at previously processed neighboring blocks (typically above and to the left) to predict the current block’s data.
Traditionally, this creates a strict bottleneck: Thread 2 cannot start encoding a new row of blocks until Thread 1 has completely finished the row above it.
The Mechanics of Row-Based Multi-Threading
Row-based multi-threading overcomes this bottleneck by utilizing a “wavefront” processing pattern. Instead of waiting for a full row to complete, a thread processing a lower row can begin as soon as the thread above it has cleared a specific horizontal offset.
In AV1, the encoder processes data in large blocks called superblocks (usually \(64\times64\) or \(128\times128\) pixels). Because spatial prediction relies on the top, top-right, and left neighbors, a thread encoding Row \(N\) only needs to wait until the thread encoding Row \(N-1\) has finished processing the blocks ahead of it.
The Wavefront Progress Pattern
- Thread 1 begins encoding Superblock (0,0) in Row 1.
- Once Thread 1 finishes Superblock (0,1) and Superblock (0,2), the top and top-right dependencies for the start of Row 2 are cleared.
- Thread 2 can now safely begin encoding Superblock (1,0) in Row 2.
- This creates a diagonal cascade (or wavefront) of activity across the CPU cores.
Sync Mechanisms and Dependency Tracking
To prevent data corruption or invalid predictions, libaom uses strict synchronization primitives (mutexes and conditional variables) to track the progress of each thread.
Each row has an associated progress counter that tracks how many superblocks have been successfully encoded. Before a thread processes a superblock at column \(X\) on row \(Y\), it checks the progress counter of row \(Y-1\). If the counter shows that the upper thread has reached at least column \(X+2\), the current thread proceeds. If not, it blocks and waits for a signal from the upper thread.
Why RBMT is Preferred Over Tile-Based Threading
AV1 also supports “Tiles,” which split a frame into a grid of completely independent regions that can be encoded by separate threads with zero communication. While tiles offer excellent CPU utilization, they come with a distinct disadvantage:
- Tile Threading: Breaks spatial prediction across tile boundaries, which harms compression efficiency and can introduce visible boundaries or artifacts at high compression ratios.
- Row-Based Multi-Threading: Allows threads to maintain spatial prediction dependencies across the entire frame. This achieves significant speed counterfeits on multi-core processors while preserving maximum visual quality and compression efficiency.