How VP9 Tile Rows and Columns Partition Frames

In the libvpx-vp9 video encoder, partitioning a frame into tile rows and tile columns is a key mechanism for enabling parallel encoding and decoding. By dividing a video frame into a grid of independent rectangular regions called tiles, VP9 allows multi-threaded processors to work on different parts of the frame simultaneously. This article explains how tile rows and columns function together, how they are configured, and their impact on encoding speed and compression efficiency.

VP9 partitions frames into a two-dimensional grid using horizontal boundaries (tile rows) and vertical boundaries (tile columns). Unlike older video codecs that only allow horizontal slicing, VP9’s grid-based approach allows for highly scalable parallel processing. Each intersection of a tile row and a tile column forms a distinct, rectangular “tile” that can be processed independently.

To configure this partitioning in libvpx, developers use the --tile-columns and --tile-rows parameters. Both parameters are defined in log2 units. For example:

Setting --tile-columns=2 creates \(2^2\) (four) vertical columns.
Setting --tile-rows=1 creates \(2^1\) (two) horizontal rows.

Combined, these settings partition the frame into a grid of eight individual tiles (\(4 \text{ columns} \times 2 \text{ rows}\)). The encoder restricts the maximum number of tile columns to 6 (64 columns) and tile rows to 2 (4 rows).

The primary purpose of partitioning frames this way is to facilitate multi-threading. During encoding or decoding, a separate CPU thread can be assigned to process each tile. Because the tiles are processed in parallel, this significantly reduces processing latency, which is critical for real-time communication and high-resolution video playback (such as 4K or 8K).

However, this parallel processing capability comes with a trade-off in compression efficiency. To allow independent processing, VP9 restricts spatial prediction across tile boundaries. Pixels in one tile cannot be used to predict pixels in adjacent tiles, and the entropy encoder’s probability models are reset at the start of each vertical tile boundary. This limitation reduces the encoder’s ability to eliminate spatial redundancy, resulting in a slightly higher file size or lower visual quality at a given bitrate compared to a single-tile configuration.

To balance speed and quality, the number of tile rows and columns should be matched to the number of available CPU threads and the resolution of the video. While more tiles enable higher parallelism, over-partitioning a low-resolution video will unnecessarily degrade compression efficiency without yielding significant performance gains.