CUDA Hardware-Accelerated Scaling in FFmpeg on Linux

Using NVIDIA CUDA for hardware-accelerated scaling in FFmpeg on Linux allows you to offload demanding video resizing tasks from the CPU to the GPU, drastically improving processing speeds and reducing system overhead. This guide provides a straightforward overview of how to leverage CUDA-backed filters like scale_cuda and scale_npp within your FFmpeg commands. You will learn the necessary prerequisites, the correct syntax for end-to-end GPU processing, and how to successfully execute hardware-accelerated scaling on a Linux environment.

Prerequisites and System Requirements

To utilize CUDA scaling, your Linux system must have the appropriate NVIDIA hardware and software stack installed. Because standard distribution packages often omit hardware acceleration features due to licensing, you generally need a custom-built version of FFmpeg.

Understanding the CUDA Scaling Pipeline

To get the maximum performance benefit, the video decoding, scaling, and encoding should all happen on the GPU. Passing frames back and forth between system memory (RAM) and graphics memory (VRAM) creates a bottleneck that negates the speed of hardware acceleration.

An efficient pipeline uses nvdec to decode the video directly into VRAM, applies the CUDA scale filter, and passes the scaled frames directly to nvenc for encoding.

Command Examples for CUDA Scaling

There are two primary filters used for CUDA-based scaling in FFmpeg: scale_cuda (built directly into the CUDA subsystem) and scale_npp (which utilizes the NVIDIA Performance Primitives library). The scale_cuda filter is generally preferred for its simplicity and efficiency in pure pipeline workflows.

Example 1: Full Hardware Pipeline with scale_cuda

This command decodes an input video on the GPU, resizes it to 1080p using CUDA, and encodes it using the H.264 CUDA encoder without the video data ever leaving the graphics card.

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "scale_cuda=1920:1080" -c:v h264_nvenc output.mp4

Example 2: Hybrid Pipeline (CPU Decode to GPU Scale)

If your input format is not supported by NVIDIA’s hardware decoder, you must decode it via the CPU, upload the frames to the GPU for scaling, and then encode.

ffmpeg -i input_software.mp4 -vf "hwupload_cuda,scale_cuda=1280:720" -c:v h264_nvenc output.mp4

Performance and Advanced Tuning

When scaling with CUDA, you can also specify the interpolation algorithm to balance quality and performance. The scale_cuda filter supports options such as bilinear, bicubic, and lanczos.

To change the algorithm, append the interp option to your filter chain:

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf "scale_cuda=1920:1080:interp=lanczos" -c:v h264_nvenc output.mp4

By keeping the entire lifecycle of the video frame inside the GPU infrastructure, CUDA scaling enables rapid, high-throughput video processing ideal for Linux-based media servers and automation pipelines.