How Opus Codec Switches Between SILK and CELT

The Opus audio codec is highly versatile due to its ability to seamlessly transition between two distinct compression technologies: SILK, which is optimized for voice, and CELT, which is optimized for music and general audio. This article explains how Opus manages the overlapping frequencies and potential audio artifacts that occur when switching between these two modes, specifically detailing its use of hybrid frequency splitting and time-domain transition windows.

The Hybrid Mode and Complementary Filter Banks

When Opus operates in “Hybrid” mode, it runs both SILK and CELT simultaneously to encode different parts of the audio spectrum. SILK processes the lower frequencies (up to 8 kHz), which contain the bulk of human speech, while CELT processes the higher frequencies (above 8 kHz).

To handle the overlapping frequencies where these two bands meet, Opus uses a complementary two-band filter bank. This bank splits the input signal before encoding and recombines it during decoding. The filters are mathematically designed to ensure near-perfect reconstruction. Because the frequency response of the low-pass filter (for SILK) and the high-pass filter (for CELT) overlap slightly at the crossover point, the phase and amplitude of both bands are precisely matched. When the decoder sums the outputs, the overlapping frequencies merge seamlessly without causing phase cancellation or spectral distortion.

Frame-by-Frame Mode Switching

When Opus switches entirely from one mode to another between frames—such as moving from a SILK-only voice frame to a CELT-only music frame—it faces a different challenge. CELT is a transform-based codec utilizing the Modified Discrete Cosine Transform (MDCT), which relies on a 50% overlap between consecutive frames to cancel time-domain aliasing. SILK, on the other hand, is a time-domain linear predictive codec that does not use overlapping MDCT windows.

If a decoder simply stopped SILK and started CELT, the lack of a previous CELT frame would prevent time-domain aliasing cancellation, resulting in audible clicks and pre-echo artifacts. Opus resolves this transition using two primary mechanisms:

By combining complementary frequency-splitting filters for hybrid operations and adaptive windowing with cross-fades for frame-level transitions, Opus maintains a continuous, artifact-free audio stream regardless of how frequently it switches encoding technologies.