How EBML Structure Works in MKV Files
This article explains how the Extensible Binary Meta Language (EBML) structure functions within Matroska (MKV) files. It covers the fundamental components of EBML, its structural hierarchy, and how this binary, XML-like format enables the MKV container to flexibly store and organize various video, audio, and subtitle streams.
What is EBML?
Extensible Binary Meta Language (EBML) is a generalized binary
container format designed to store data in a structured, hierarchical
manner. Often described as a binary equivalent to XML, EBML uses
tag-like elements to organize data. Instead of using text tags like
<video>, EBML uses binary identifiers (IDs) to define
data types and structures. The Matroska format (MKV) is the most
prominent implementation of EBML.
The Anatomy of an EBML Element
Every piece of data within an MKV file is wrapped in an EBML element. Each element consists of three distinct parts:
- Element ID: A variable-length binary identifier that specifies what the element is (e.g., track information, audio data, or metadata).
- Data Size: A Variable-Size Integer (VINT) that indicates the length of the upcoming data payload in bytes.
- Data Payload: The actual content, which can be an integer, a float, a text string, raw binary data (like a video frame), or container elements that hold other sub-elements.
By using Variable-Size Integers (VINTs) for IDs and sizes, EBML saves significant storage space. If an element’s data payload is small, the size descriptor only uses one byte instead of forcing a fixed 4-byte or 8-byte integer.
The Hierarchical Structure of an MKV File
At its core, an MKV file is a single, nested EBML document. A standard MKV file is structured into several nested levels:
- EBML Header: Located at the very beginning of the file, this header declares that the file is an EBML document, specifies the EBML version, and defines the “DocType” as “matroska” (or “webm” for WebM files).
- Segment Element: This is the main wrapper for the entire file’s media content and metadata. Almost all subsequent elements reside inside this segment.
Within the Segment Element, the data is organized into level-1 sub-elements:
- SeekHead: An index that lists the positions of other major elements (like Tracks or Cues) within the file, allowing media players to find them instantly without scanning the entire file.
- Info: Contains global information about the file, such as the duration, writing application, and multiplexing library.
- Tracks: Lists all the individual streams available in the container, including video codecs, audio languages, and subtitle formats.
- Cluster: Contains the actual multimedia data. Inside a Cluster, media frames are packaged into “Blocks” or “SimpleBlocks” along with timecodes, which media players use to decode and synchronize audio and video.
- Cues: A seek index containing specific timestamps and their corresponding byte positions in the file. This enables fast and precise seeking when a user skips to a different part of the video.
Why EBML Benefits MKV
Because EBML is structurally modular, it provides MKV with two major advantages: extensibility and backward compatibility. If a new video codec or metadata tag is introduced, a new EBML ID is defined for it. Older media players that do not recognize the new ID can simply read the “Data Size” descriptor, skip over that specific payload, and continue playing the rest of the file without crashing.