CUDA Hardware-Accelerated Scaling in FFmpeg on Linux

Using NVIDIA CUDA for hardware-accelerated scaling in FFmpeg on Linux allows you to offload demanding video resizing tasks from the CPU to the GPU, drastically improving processing speeds and reducing system overhead. This guide provides a straightforward overview of how to leverage CUDA-backed filters like scale_cuda and scale_npp within your FFmpeg commands. You will learn the necessary prerequisites, the correct syntax for end-to-end GPU processing, and how to successfully execute hardware-accelerated scaling on a Linux environment.

Prerequisites and System Requirements

To utilize CUDA scaling, your Linux system must have the appropriate NVIDIA hardware and software stack installed. Because standard distribution packages often omit hardware acceleration features due to licensing, you generally need a custom-built version of FFmpeg.

NVIDIA Driver: A proprietary NVIDIA driver installed and functioning.
CUDA Toolkit: The CUDA development toolkit matching your driver version.
nv-codec-headers: NVIDIA’s hardware acceleration headers, required to compile FFmpeg with NVENC/NVDEC support.
FFmpeg Compilation: FFmpeg must be configured and compiled with the flags --enable-cuda, --enable-cuvid, --enable-nvenc, and --enable-libnpp (if using NPP-based scaling).

Understanding the CUDA Scaling Pipeline

To get the maximum performance benefit, the video decoding, scaling, and encoding should all happen on the GPU. Passing frames back and forth between system memory (RAM) and graphics memory (VRAM) creates a bottleneck that negates the speed of hardware acceleration.

An efficient pipeline uses nvdec to decode the video directly into VRAM, applies the CUDA scale filter, and passes the scaled frames directly to nvenc for encoding.

Command Examples for CUDA Scaling

There are two primary filters used for CUDA-based scaling in FFmpeg: scale_cuda (built directly into the CUDA subsystem) and scale_npp (which utilizes the NVIDIA Performance Primitives library). The scale_cuda filter is generally preferred for its simplicity and efficiency in pure pipeline workflows.

Example 1: Full Hardware Pipeline with scale_cuda

This command decodes an input video on the GPU, resizes it to 1080p using CUDA, and encodes it using the H.264 CUDA encoder without the video data ever leaving the graphics card.

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "scale_cuda=1920:1080" -c:v h264_nvenc output.mp4

-hwaccel cuda: Tells FFmpeg to use CUDA hardware acceleration for decoding.
-hwaccel_output_format cuda: Keeps the decoded video frames in VRAM rather than copying them back to the CPU.
-vf "scale_cuda=1920:1080": Applies the hardware-accelerated CUDA scaling filter.

Example 2: Hybrid Pipeline (CPU Decode to GPU Scale)

If your input format is not supported by NVIDIA’s hardware decoder, you must decode it via the CPU, upload the frames to the GPU for scaling, and then encode.

ffmpeg -i input_software.mp4 -vf "hwupload_cuda,scale_cuda=1280:720" -c:v h264_nvenc output.mp4

hwupload_cuda: Pushes the software-decoded frames from system RAM into NVIDIA VRAM so the scale_cuda filter can process them.

Performance and Advanced Tuning

When scaling with CUDA, you can also specify the interpolation algorithm to balance quality and performance. The scale_cuda filter supports options such as bilinear, bicubic, and lanczos.

To change the algorithm, append the interp option to your filter chain:

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "scale_cuda=1920:1080:interp=lanczos" -c:v h264_nvenc output.mp4

By keeping the entire lifecycle of the video frame inside the GPU infrastructure, CUDA scaling enables rapid, high-throughput video processing ideal for Linux-based media servers and automation pipelines.