Is dav1d Faster Than libaom for AV1 Decoding?

When comparing software decoders for the AV1 video format, dav1d significantly outperforms libaom in decoding efficiency, speed, and resource utilization. While libaom serves as the official reference implementation primarily optimized for encoding research, dav1d was built from scratch by the VideoLAN and FFmpeg communities with a hyper-focus on speed and lightweight playback. This article explores how both decoders stack up across single-threaded performance, multi-threaded scaling, and resource consumption.

Architectural Focus: Reference vs. Production

The fundamental difference between libaom and dav1d lies in their design objectives.

Single-Threaded Speed and Assembly Optimizations

In single-threaded benchmarks, dav1d routinely outperforms libaom by a massive margin. Depending on the CPU architecture (such as x86_64 or ARM) and the specific video profile, dav1d is typically 1.5x to over 3x faster than libaom.

This drastic efficiency gain is largely due to aggressive Hand-written Assembly optimizations (using AVX2, SSSE3, and ARM NEON). While libaom includes some SIMD vectorization, dav1d covers nearly the entire decoding pipeline with targeted assembly instructions, lowering the overall clock cycles required to decode each frame.

Multi-Threaded Scaling and Playback Fluidity

Video decoding becomes significantly more demanding at higher resolutions like 1080p and 4K. Dav1d introduces advanced threading mechanics, leveraging both frame-level and tile-level parallelism simultaneously.

This scalability allows lower-end desktop processors and mobile chipsets to stream high-definition AV1 video smoothly without dropping frames, an achievement libaom struggles to match in real-time software execution.

High Bit-Depth Content

When it comes to 10-bit and 12-bit color depths—commonly used for HDR (High Dynamic Range) video—dav1d maintains its performance crown on modern hardware. Initial iterations of dav1d lacked comprehensive assembly pipelines for high bit-depth content, allowing libaom to occasionally compete in 10-bit processing. However, extensive cross-platform assembly updates have enabled dav1d to radically outpace libaom here as well, particularly on ARM64 mobile platforms where power efficiency and low battery consumption are crucial.