Why is libvpx-vp9 slower than x264

Video encoders must constantly balance the trade-off between compression efficiency and processing speed. This article explains why the libvpx-vp9 encoder is generally slower at rendering and encoding frames than x264 by examining their architectural differences, algorithmic complexities, and software maturity.

1. Generational and Algorithmic Complexity

The fundamental reason for the speed difference lies in the generational gap between the two video formats. x264 is an implementation of the H.264 (AVC) standard, which was designed in the early 2000s. libvpx-vp9 implements the VP9 standard, a newer codec designed to compete with H.265 (HEVC) to deliver much higher compression efficiency.

To achieve a 30% to 50% bitrate reduction over H.264 at the same visual quality, VP9 introduces massive computational complexity: * Larger Block Sizes: H.264 uses a maximum macroblock size of 16x16 pixels. VP9 utilizes “superblocks” up to 64x64 pixels, which can be recursively partitioned down to 4x4 blocks. Analyzing these various partition combinations to find the most efficient layout requires exponentially more mathematical calculations. * More Prediction Modes: VP9 uses 10 intra-frame prediction directions, whereas H.264 uses only 9 for smaller blocks and fewer for larger ones. Testing more combinations dramatically increases the encoding time. * Advanced Transform Sizes: VP9 supports discrete cosine transforms (DCT) and asymmetric discrete sine transforms (ADST) at various sizes (4x4, 8x8, 16x16, and 32x32), compared to H.264’s simpler 4x4 and 8x8 DCT transforms.

2. Encoder Optimization and Maturity

x264 is widely regarded as one of the most highly optimized pieces of software in the world. Over nearly two decades, developers have hand-written massive amounts of assembly code (using AVX, AVX2, and AVX-512 instruction sets) to optimize almost every hot path in the encoder. Its heuristic algorithms for deciding which encoding paths to skip are incredibly refined, allowing it to discard useless calculations almost instantly.

While libvpx has received significant optimization over the years, it has not reached the same level of micro-optimization. The search algorithms in libvpx-vp9 are naturally heavier, and its decision heuristics are more conservative, choosing to spend more CPU cycles to guarantee better compression rather than cutting corners for speed.

3. Threading and Parallelization Models

Modern multi-core processors rely heavily on parallel processing to speed up video encoding. x264 features an incredibly efficient frame-level and slice-level multi-threading model. It can look ahead at future frames and distribute the workload across dozens of CPU threads with very little idle time.

In contrast, libvpx-vp9 relies primarily on tile-based and row-based multi-threading. VP9 divides a single frame into independent vertical columns called “tiles.” While this allows parallel processing, scaling the number of tiles too high degrades compression efficiency. As a result, libvpx-vp9 cannot utilize high-core-count CPUs as efficiently as x264 without sacrificing video quality or compression ratios.

4. Design Philosophy and Target Use Case

Ultimately, the two encoders were built for different eras and use cases: * x264 was designed for real-time broadcasting, low-latency streaming, and consumer-grade hardware playback. Speed and low resource usage were primary design requirements. * libvpx-vp9 was championed by Google to reduce bandwidth costs for massive video distribution platforms like YouTube. For a platform serving billions of videos daily, spending ten times more CPU power during a one-time upload encode is highly cost-effective if it saves 30% on data delivery costs to millions of viewers.