How Does libaom Memory Usage Change at High CPU Presets?
Understanding the memory consumption of the libaom AV1 reference
encoder is critical for deploying efficient video encoding pipelines in
cloud or resource-constrained environments. While libaom is historically
notorious for its massive RAM footprint and slow encoding speeds at low
-cpu-used presets (such as 0 to 3), transitioning to high
CPU-used presets (typically 5 through 8) substantially shifts its
operational profile. At these faster levels, the encoder aggressively
deactivates complex structural searches, limiting partition depth and
multi-reference matching, which fundamentally decreases the RAM overhead
per thread. This article covers the memory characteristics of libaom
under these fast presets, analyzing how parameters like threading,
resolution scaling, and tile configuration drive memory requirements
down when compression speed is prioritized.
Algorithmic Simplifications and Reduced Context Storage
The core reason memory consumption drops as you increase the
-cpu-used value from slower quality-focused presets to
faster speed-focused presets lies in the exclusion of memory-heavy
analysis tools. At low settings, libaom retains massive
multi-dimensional evaluation trees in RAM to test combinations of block
partitioning, advanced motion vector forecasting, and inter-frame
reference frames.
When configuring high presets—specifically -cpu-used 5
through -cpu-used 8—the reference encoder bypasses these
intensive calculations. The block partition depth is heavily restricted
early in the decision tree, meaning fewer recursive structures must be
mapped out in memory. Additionally, the encoder scales down the
exhaustive search for reference frames, minimizing the number of active,
decoded reference frame buffers that must remain resident in RAM
simultaneously.
The Interaction of Resolution, Tiles, and Threads
Even at elevated speed presets, libaom memory usage scales primarily based on the video resolution and the degree of parallelization. Because AV1 uses larger Superblocks (\(128 \times 128\) or \(64 \times 64\) pixels) compared to older codecs, the baseline context size per frame remains higher.
- Resolution Scaling: Encoding 4K video at a high CPU-used preset requires significantly more scratchpad memory than encoding 1080p or 720p footage, simply due to the pixel buffer allocations required for frame reconstruction and in-loop filtering (such as CDEF and Loop Restoration).
- Threading and Tiling: Utilizing multithreading
parameters like
-row-mt 1or explicitly setting multiple tiles via--tile-columnsand--tile-rowsscales memory consumption up linearly with thread/tile counts. Each independent tile column requires its own context state and row buffers. However, because each thread performs shallow searches at high CPU-used presets, the per-thread memory multiplication factor is drastically lower than it is at slower settings.
Real-Time vs. Good Quality Modes at High Speed Presets
Libaom behaves differently depending on whether it is running in its
default standard mode (--usage=good) or real-time mode
(--usage=realtime). This distinction heavily influences
memory behavior at the absolute highest preset numbers.
| Preset Range | Mode Setting | Memory Characteristics |
|---|---|---|
| Presets 5 to 6 | --usage=good |
Moderate memory reduction. Good balance of standard AV1 features with optimized, shallower buffer pipelines. |
| Presets 7 to 8 | --usage=good |
Automatically maps back to preset 6 in standard encoding pipelines, keeping memory consumption static. |
| Presets 7 to 9 | --usage=realtime |
Maximum memory reduction. Drastically flattens the encoding pipeline, disables heavy in-loop filtering contexts, and minimizes heap allocations to guarantee low, predictable memory overhead fit for live streaming. |
Summary for Pipeline Optimization
When designing deployment architectures, assuming libaom will always
consume multi-gigabyte blocks of RAM is a misconception tied to
archival-tier presets. By choosing higher presets such as
-cpu-used 6 or switching to the real-time mode, engineers
can drop memory allocation sizes down to manageable fractions of their
original peaks. This enables safer parallel execution of multiple
concurrent ffmpeg or standalone aomenc processes on modern, multi-core
cloud instances without triggering out-of-memory (OOM) faults.