Extract Closed Captions from TS Files with FFmpeg
Extracting closed captions from a broadcast MPEG-2 Transport Stream (TS) file on Linux is a straightforward process when using the powerful FFmpeg multimedia framework. This article provides a quick overview of how to identify caption streams within a TS file and provides the exact FFmpeg commands needed to extract them into widely compatible subtitle formats like SRT or WebVTT. By utilizing these efficient command-line techniques, you can isolate text data from video files without the need for time-consuming re-encoding.
Step 1: Identify the Closed Caption Stream
Before extracting the captions, you need to inspect the TS file to
locate the correct stream index. Broadcast TS files often contain
multiple video, audio, and subtitle tracks. You can use the
ffprobe tool (which comes bundled with FFmpeg) to analyze
the file.
Run the following command in your Linux terminal:
ffprobe -v error -show_entries stream=index,codec_name,codec_type -of default=noprint_wrappers=1 input.tsLook through the output for a stream where the
codec_type is subtitle. In North American broadcasts
(ATSC), closed captions are usually embedded directly within the video
stream as EIA-608 or CEA-708 data, or
listed as a separate eia_608 subtitle stream. In European
broadcasts (DVB), they may appear as dvb_subtitle or
teletext. Note the stream number (e.g., 0:s:0
or 0:2).
Step 2: Extract Captions to SubRip (SRT) Format
Once you know the file contains captions, you can use FFmpeg to
extract them. The SubRip (.srt) format is highly
recommended because it is lightweight and universally supported by media
players.
To extract the captions without altering the original video or audio, use the following command:
ffmpeg -f lavfi -i "movie=input.ts[out0+sub]" -map 0:s:0 output.srtIf the captions are embedded as EIA-608 data inside the video stream,
FFmpeg can automatically decode them using the cc_dec
decoder. Run this specialized command:
ffmpeg -i input.ts -map 0:s:0 output.srtNote: If your file has multiple subtitle tracks and the default mapping doesn’t work, replace
-map 0:s:0with the specific stream index you found during the identification step, such as-map 0:3.
Step 3: Extract Captions to WebVTT Format
If you are preparing video content for the web, the WebVTT
(.vtt) format is the modern standard for HTML5 video
players. FFmpeg handles this conversion just as easily.
To extract and convert the broadcast captions into WebVTT, use this command:
ffmpeg -i input.ts -map 0:s:0 output.vttTroubleshooting Common Issues
- No Subtitles Found: If FFmpeg throws an error stating it cannot find a subtitle stream, the captions might be hardcoded into the video track (burned-in text) rather than saved as soft data. Hardcoded captions cannot be extracted using FFmpeg alone and require Optical Character Recognition (OCR) tools.
- Permissions Denied: Ensure your Linux user has read permissions for the input TS file and write permissions for the destination directory where the SRT or VTT file will be saved.