Why is libvpx-vp9 slower than x264
Video encoders must constantly balance the trade-off between
compression efficiency and processing speed. This article explains why
the libvpx-vp9 encoder is generally slower at rendering and
encoding frames than x264 by examining their architectural
differences, algorithmic complexities, and software maturity.
1. Generational and Algorithmic Complexity
The fundamental reason for the speed difference lies in the
generational gap between the two video formats. x264 is an
implementation of the H.264 (AVC) standard, which was designed in the
early 2000s. libvpx-vp9 implements the VP9 standard, a
newer codec designed to compete with H.265 (HEVC) to deliver much higher
compression efficiency.
To achieve a 30% to 50% bitrate reduction over H.264 at the same visual quality, VP9 introduces massive computational complexity: * Larger Block Sizes: H.264 uses a maximum macroblock size of 16x16 pixels. VP9 utilizes “superblocks” up to 64x64 pixels, which can be recursively partitioned down to 4x4 blocks. Analyzing these various partition combinations to find the most efficient layout requires exponentially more mathematical calculations. * More Prediction Modes: VP9 uses 10 intra-frame prediction directions, whereas H.264 uses only 9 for smaller blocks and fewer for larger ones. Testing more combinations dramatically increases the encoding time. * Advanced Transform Sizes: VP9 supports discrete cosine transforms (DCT) and asymmetric discrete sine transforms (ADST) at various sizes (4x4, 8x8, 16x16, and 32x32), compared to H.264’s simpler 4x4 and 8x8 DCT transforms.
2. Encoder Optimization and Maturity
x264 is widely regarded as one of the most highly
optimized pieces of software in the world. Over nearly two decades,
developers have hand-written massive amounts of assembly code (using
AVX, AVX2, and AVX-512 instruction sets) to optimize almost every hot
path in the encoder. Its heuristic algorithms for deciding which
encoding paths to skip are incredibly refined, allowing it to discard
useless calculations almost instantly.
While libvpx has received significant optimization over
the years, it has not reached the same level of micro-optimization. The
search algorithms in libvpx-vp9 are naturally heavier, and
its decision heuristics are more conservative, choosing to spend more
CPU cycles to guarantee better compression rather than cutting corners
for speed.
3. Threading and Parallelization Models
Modern multi-core processors rely heavily on parallel processing to
speed up video encoding. x264 features an incredibly
efficient frame-level and slice-level multi-threading model. It can look
ahead at future frames and distribute the workload across dozens of CPU
threads with very little idle time.
In contrast, libvpx-vp9 relies primarily on tile-based
and row-based multi-threading. VP9 divides a single frame into
independent vertical columns called “tiles.” While this allows parallel
processing, scaling the number of tiles too high degrades compression
efficiency. As a result, libvpx-vp9 cannot utilize
high-core-count CPUs as efficiently as x264 without
sacrificing video quality or compression ratios.
4. Design Philosophy and Target Use Case
Ultimately, the two encoders were built for different eras and use cases: * x264 was designed for real-time broadcasting, low-latency streaming, and consumer-grade hardware playback. Speed and low resource usage were primary design requirements. * libvpx-vp9 was championed by Google to reduce bandwidth costs for massive video distribution platforms like YouTube. For a platform serving billions of videos daily, spending ten times more CPU power during a one-time upload encode is highly cost-effective if it saves 30% on data delivery costs to millions of viewers.