MPEG-4 Structured Audio and Synthetic Sound

MPEG-4 Structured Audio (MP4-SA) is a highly efficient standard designed to transmit and render audio using mathematical descriptions and algorithmic instructions rather than pre-recorded waveforms. This article explains how the technology utilizes synthetic sound generation and MIDI-like playback to deliver high-quality audio at incredibly low bitrates. We will examine its core components, including the Structured Audio Orchestra Language (SAOL) and the Structured Audio Score Language (SASL), and detail how they combine to create a standardized, interactive audio synthesis engine.

Understanding the Concept of Structured Audio

Traditional audio formats, like MP3 or AAC, compress recorded sound waves. MPEG-4 Structured Audio takes a completely different approach by transmitting the “recipe” for the sound rather than the sound itself. It treats audio as a combination of musical instruments and a musical score.

By sending the digital signal processing (DSP) instructions to recreate the instruments along with the control data to play them, MPEG-4 Structured Audio can reproduce complex soundscapes, sound effects, and music at a fraction of the bandwidth required by traditional audio formats.

Synthetic Sound Generation via SAOL

The foundation of synthetic sound generation in MPEG-4 Structured Audio is the Structured Audio Orchestra Language (SAOL). SAOL is a fully-featured, standardized DSP programming language used to define virtual instruments, effects, and sound generators.

MIDI-like Playback via SASL and MIDI

While SAOL defines the instruments (the “orchestra”), the actual playback instructions (the “score”) are controlled by the Structured Audio Score Language (SASL) or standard MIDI streams.

Key Advantages of MPEG-4 Structured Audio

By merging synthetic sound generation with structured scores, MPEG-4 Structured Audio offers several distinct advantages over traditional waveform audio: