How to Enable libaom Assembly Optimizations?
Enabling assembly optimizations when compiling the
libaom library—the reference encoder for the AV1 video
format—is crucial for achieving acceptable encoding and decoding speeds.
By default, the build system (CMake) attempts to auto-detect your CPU
architecture and enable relevant SIMD (Single Instruction, Multiple
Data) optimizations like AVX2, AVX512, or NEON. However, depending on
your target platform, cross-compilation needs, or toolchain limitations,
you may need to explicitly configure these optimizations using specific
CMake flags.
Understanding libaom Assembly Flags
The libaom build system uses CMake variables to control
which hardware-specific assembly optimizations are compiled. If you are
building on the same machine where the code will run (native
compilation), CMake usually handles this automatically. For fine-grained
control or troubleshooting, you can explicitly toggle specific
instruction sets.
Core Architecture Flags
The primary flag used to control assembly optimizations globally is
ENABLE_NASM. Because libaom relies heavily on
NASM (Netwide Assembler) for x86/x86_64 assembly code, ensuring this is
enabled is step number one.
-DENABLE_NASM=ON: Enforces the use of NASM for x86/x86_64 optimizations. If NASM is not installed or found in your PATH, the build will fail or fall back to generic C code.
Toggling Specific Instruction Sets
If you want to force-enable or force-disable specific SIMD sets (for instance, if you are targeting an older processor or testing performance), you can pass explicit boolean flags to CMake:
| CMake Flag | Target Architecture | Description |
|---|---|---|
-DENABLE_MMX=ON |
x86 / x86_64 | MultiMedia eXtensions |
-DENABLE_SSE2=ON |
x86 / x86_64 | Streaming SIMD Extensions 2 |
-DENABLE_SSE3=ON |
x86 / x86_64 | Streaming SIMD Extensions 3 |
-DENABLE_SSSE3=ON |
x86 / x86_64 | Supplemental Streaming SIMD Extensions 3 |
-DENABLE_SSE4_1=ON |
x86 / x86_64 | Streaming SIMD Extensions 4.1 |
-DENABLE_AVX=ON |
x86 / x86_64 | Advanced Vector Extensions |
-DENABLE_AVX2=ON |
x86 / x86_64 | Advanced Vector Extensions 2 |
-DENABLE_AVX512=ON |
x86 / x86_64 | Advanced Vector Extensions 512 |
-DENABLE_NEON=ON |
ARM / AArch64 | ARM NEON Technology |
Step-by-Step Compilation Example
To compile libaom with full assembly optimizations on a
standard Linux or macOS environment, follow these terminal commands:
1. Install Prerequisites
Ensure you have a recent version of CMake and NASM installed on your system.
# On Ubuntu/Debian
sudo apt-get install cmake nasm
# On macOS via Homebrew
brew install cmake nasm2. Configure the Build
Create a build directory and run CMake while explicitly enabling assembly optimizations.
mkdir -p aom_build && cd aom_build
cmake path/to/libaom -DENABLE_NASM=ON -DCMAKE_BUILD_TYPE=Release -DENABLE_AVX2=ON3. Compile the Library
Run the build process utilizing multiple CPU cores to speed up compilation.
make -j$(nproc)Troubleshooting Optimization Issues
If your build is running slowly, it is highly likely that
libaom fell back to its unoptimized generic C
implementation.
Missing Assembler Warning
If CMake outputs a warning stating that NASM or YASM could not be
found, it will automatically disable x86 assembly optimizations. Always
verify the CMake configuration output to ensure ENABLE_NASM
evaluates to ON.
Cross-Compilation Constraints
When cross-compiling (e.g., compiling for an ARM-based Raspberry Pi
from an x86_64 Ubuntu host), you must specify the target architecture
using a CMake toolchain file via -DCMAKE_TOOLCHAIN_FILE.
Setting -DENABLE_NEON=ON will then correctly instruct the
compiler to emit NEON assembly instructions rather than attempting to
look for x86 NASM modules.