How Opus Codec Switches Between SILK and CELT
The Opus audio codec is highly versatile due to its ability to seamlessly transition between two distinct compression technologies: SILK, which is optimized for voice, and CELT, which is optimized for music and general audio. This article explains how Opus manages the overlapping frequencies and potential audio artifacts that occur when switching between these two modes, specifically detailing its use of hybrid frequency splitting and time-domain transition windows.
The Hybrid Mode and Complementary Filter Banks
When Opus operates in “Hybrid” mode, it runs both SILK and CELT simultaneously to encode different parts of the audio spectrum. SILK processes the lower frequencies (up to 8 kHz), which contain the bulk of human speech, while CELT processes the higher frequencies (above 8 kHz).
To handle the overlapping frequencies where these two bands meet, Opus uses a complementary two-band filter bank. This bank splits the input signal before encoding and recombines it during decoding. The filters are mathematically designed to ensure near-perfect reconstruction. Because the frequency response of the low-pass filter (for SILK) and the high-pass filter (for CELT) overlap slightly at the crossover point, the phase and amplitude of both bands are precisely matched. When the decoder sums the outputs, the overlapping frequencies merge seamlessly without causing phase cancellation or spectral distortion.
Frame-by-Frame Mode Switching
When Opus switches entirely from one mode to another between frames—such as moving from a SILK-only voice frame to a CELT-only music frame—it faces a different challenge. CELT is a transform-based codec utilizing the Modified Discrete Cosine Transform (MDCT), which relies on a 50% overlap between consecutive frames to cancel time-domain aliasing. SILK, on the other hand, is a time-domain linear predictive codec that does not use overlapping MDCT windows.
If a decoder simply stopped SILK and started CELT, the lack of a previous CELT frame would prevent time-domain aliasing cancellation, resulting in audible clicks and pre-echo artifacts. Opus resolves this transition using two primary mechanisms:
- Transition Windows: When switching from SILK to CELT, the CELT encoder utilizes a specialized, non-standard window function for the first frame. This transition window has a shorter overlap period on the side adjacent to the SILK frame, minimizing the time-domain aliasing that needs to be canceled.
- Time-Domain Cross-Fading: To bridge the boundary, the decoder generates a brief window of redundant audio. It extends the SILK synthesis slightly past the frame boundary and performs a rapid, windowed cross-fade in the time domain with the newly initiated CELT output. This smooths out any phase or amplitude discrepancies between the LPC-synthesized speech and the MDCT-synthesized audio.
By combining complementary frequency-splitting filters for hybrid operations and adaptive windowing with cross-fades for frame-level transitions, Opus maintains a continuous, artifact-free audio stream regardless of how frequently it switches encoding technologies.