Extract Closed Captions from TS Files with FFmpeg

Extracting closed captions from a broadcast MPEG-2 Transport Stream (TS) file on Linux is a straightforward process when using the powerful FFmpeg multimedia framework. This article provides a quick overview of how to identify caption streams within a TS file and provides the exact FFmpeg commands needed to extract them into widely compatible subtitle formats like SRT or WebVTT. By utilizing these efficient command-line techniques, you can isolate text data from video files without the need for time-consuming re-encoding.

Step 1: Identify the Closed Caption Stream

Before extracting the captions, you need to inspect the TS file to locate the correct stream index. Broadcast TS files often contain multiple video, audio, and subtitle tracks. You can use the ffprobe tool (which comes bundled with FFmpeg) to analyze the file.

Run the following command in your Linux terminal:

ffprobe -v error -show_entries stream=index,codec_name,codec_type -of default=noprint_wrappers=1 input.ts

Look through the output for a stream where the codec_type is subtitle. In North American broadcasts (ATSC), closed captions are usually embedded directly within the video stream as EIA-608 or CEA-708 data, or listed as a separate eia_608 subtitle stream. In European broadcasts (DVB), they may appear as dvb_subtitle or teletext. Note the stream number (e.g., 0:s:0 or 0:2).

Step 2: Extract Captions to SubRip (SRT) Format

Once you know the file contains captions, you can use FFmpeg to extract them. The SubRip (.srt) format is highly recommended because it is lightweight and universally supported by media players.

To extract the captions without altering the original video or audio, use the following command:

ffmpeg -f lavfi -i "movie=input.ts[out0+sub]" -map 0:s:0 output.srt

If the captions are embedded as EIA-608 data inside the video stream, FFmpeg can automatically decode them using the cc_dec decoder. Run this specialized command:

ffmpeg -i input.ts -map 0:s:0 output.srt

Note: If your file has multiple subtitle tracks and the default mapping doesn’t work, replace -map 0:s:0 with the specific stream index you found during the identification step, such as -map 0:3.

Step 3: Extract Captions to WebVTT Format

If you are preparing video content for the web, the WebVTT (.vtt) format is the modern standard for HTML5 video players. FFmpeg handles this conversion just as easily.

To extract and convert the broadcast captions into WebVTT, use this command:

ffmpeg -i input.ts -map 0:s:0 output.vtt

Troubleshooting Common Issues