CUDA Hardware-Accelerated Scaling in FFmpeg on Linux
Using NVIDIA CUDA for hardware-accelerated scaling in FFmpeg on Linux
allows you to offload demanding video resizing tasks from the CPU to the
GPU, drastically improving processing speeds and reducing system
overhead. This guide provides a straightforward overview of how to
leverage CUDA-backed filters like scale_cuda and
scale_npp within your FFmpeg commands. You will learn the
necessary prerequisites, the correct syntax for end-to-end GPU
processing, and how to successfully execute hardware-accelerated scaling
on a Linux environment.
Prerequisites and System Requirements
To utilize CUDA scaling, your Linux system must have the appropriate NVIDIA hardware and software stack installed. Because standard distribution packages often omit hardware acceleration features due to licensing, you generally need a custom-built version of FFmpeg.
- NVIDIA Driver: A proprietary NVIDIA driver installed and functioning.
- CUDA Toolkit: The CUDA development toolkit matching your driver version.
- nv-codec-headers: NVIDIA’s hardware acceleration headers, required to compile FFmpeg with NVENC/NVDEC support.
- FFmpeg Compilation: FFmpeg must be configured and
compiled with the flags
--enable-cuda,--enable-cuvid,--enable-nvenc, and--enable-libnpp(if using NPP-based scaling).
Understanding the CUDA Scaling Pipeline
To get the maximum performance benefit, the video decoding, scaling, and encoding should all happen on the GPU. Passing frames back and forth between system memory (RAM) and graphics memory (VRAM) creates a bottleneck that negates the speed of hardware acceleration.
An efficient pipeline uses nvdec to decode the video
directly into VRAM, applies the CUDA scale filter, and passes the scaled
frames directly to nvenc for encoding.
Command Examples for CUDA Scaling
There are two primary filters used for CUDA-based scaling in FFmpeg:
scale_cuda (built directly into the CUDA subsystem) and
scale_npp (which utilizes the NVIDIA Performance Primitives
library). The scale_cuda filter is generally preferred for
its simplicity and efficiency in pure pipeline workflows.
Example 1: Full Hardware Pipeline with scale_cuda
This command decodes an input video on the GPU, resizes it to 1080p using CUDA, and encodes it using the H.264 CUDA encoder without the video data ever leaving the graphics card.
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "scale_cuda=1920:1080" -c:v h264_nvenc output.mp4-hwaccel cuda: Tells FFmpeg to use CUDA hardware acceleration for decoding.-hwaccel_output_format cuda: Keeps the decoded video frames in VRAM rather than copying them back to the CPU.-vf "scale_cuda=1920:1080": Applies the hardware-accelerated CUDA scaling filter.
Example 2: Hybrid Pipeline (CPU Decode to GPU Scale)
If your input format is not supported by NVIDIA’s hardware decoder, you must decode it via the CPU, upload the frames to the GPU for scaling, and then encode.
ffmpeg -i input_software.mp4 -vf "hwupload_cuda,scale_cuda=1280:720" -c:v h264_nvenc output.mp4hwupload_cuda: Pushes the software-decoded frames from system RAM into NVIDIA VRAM so thescale_cudafilter can process them.
Performance and Advanced Tuning
When scaling with CUDA, you can also specify the interpolation
algorithm to balance quality and performance. The
scale_cuda filter supports options such as
bilinear, bicubic, and
lanczos.
To change the algorithm, append the interp option to
your filter chain:
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "scale_cuda=1920:1080:interp=lanczos" -c:v h264_nvenc output.mp4By keeping the entire lifecycle of the video frame inside the GPU infrastructure, CUDA scaling enables rapid, high-throughput video processing ideal for Linux-based media servers and automation pipelines.