How MPEG-4 Multiplexes Multiple Media Objects
This article explains how the MPEG-4 standard multiplexes diverse media objects—such as audio, video, text, and 3D graphics—into a single, cohesive data stream. It explores the role of the MPEG-4 Systems layer, the Sync Layer (SL), and the FlexMux tool, detailing how these components coordinate to deliver interactive multimedia experiences.
The MPEG-4 Systems Layer
Unlike older standards like MPEG-1 and MPEG-2, which handle content as flat, pre-mixed audio and video tracks, MPEG-4 treats multimedia content as a collection of independent “media objects.” To combine these distinct objects into a single stream, MPEG-4 relies on the MPEG-4 Systems layer (defined in the ISO/IEC 14496-1 specification).
The Systems layer utilizes a hierarchical architecture to packetize, synchronize, and multiplex separate data streams—known as Elementary Streams (ES)—into a unified transport stream.
The Multiplexing Mechanism: Step-by-Step
The multiplexing process in MPEG-4 occurs across three distinct layers: the Sync Layer, the FlexMux Layer, and the TransMux Layer.
1. The Sync Layer (SL)
Each individual media object is encoded into one or more Elementary Streams. The first step in the multiplexing process occurs at the Sync Layer (SL). * The SL packetizes the raw Elementary Streams into SL packets. * It attaches crucial synchronization metadata to each packet, including time stamps (Object Clock References and Composition Time Stamps) and sequence numbers. * This ensures that different media objects (for example, a video stream and its corresponding audio track) can be perfectly synchronized during playback.
2. The FlexMux Layer (Flexible Multiplexing)
Once the data is packetized by the Sync Layer, it passes to the FlexMux tool, which is the core multiplexing engine of MPEG-4. FlexMux is designed to group multiple SL-packetized streams together, which is especially useful for low-bitrate streams (like text or animation parameters) to minimize packet overhead.
FlexMux operates in two distinct modes: * Simple Mode: This mode maps one SL packet directly to one FlexMux packet. It is straightforward but offers less efficiency for small data payloads. * MuxCode Mode: This highly efficient mode multiplexes multiple SL packets from different sources into a single FlexMux packet. It uses a predefined look-up table (MuxCode table) to identify which data belongs to which stream, significantly reducing header overhead.
3. The TransMux Layer (Transport Multiplexing)
The finalized FlexMux stream is passed to the TransMux layer. MPEG-4 does not define its own physical transmission protocol for this layer. Instead, it provides an interface called the Delivery Multimedia Integration Framework (DMIF).
DMIF allows the multiplexed MPEG-4 stream to be transmitted over existing transport technologies, such as: * IP/UDP/RTP (for internet streaming) * MPEG-2 Transport Stream (for broadcast) * Local storage files (such as the .mp4 file format)
Scene Description (BIFS)
While the FlexMux tool physically merges the data streams, the receiver needs to know how to arrange these objects on the screen. MPEG-4 solves this by multiplexing a special control stream called BIFS (Binary Format for Scenes) alongside the media streams. BIFS acts as the blueprint, instructing the player where and when to place each multiplexed media object in the visual and auditory space.