Realtek Audio Solution

Overview

Audio Solutions is a comprehensive technical guidance framework developed by Realtek for the audio domain. This framework encompasses three main components: audio signal processing elements, system-level solutions, and practical application implementations. Its core functional modules include Audio Route, Audio Effect processing, Audio Stream management, Notification, and Voice Activity Detection (VAD). Currently, this solution has been widely implemented in various products such as TWS earphones, smart helmets, voice recorders, soundbars, speakers, and smart glasses.

Audio Signal Processing Components

Codec (Encoder-Decoder)

Realtek's Codec solutions are built around two core product lines: the ALC series (for PC/consumer electronics) and the RTL series (integrated within Bluetooth Audio SoCs). These lines collectively address full-scenario audio demands, ranging from entry-level to flagship applications. Characterized by five core strengths — high integration, low power consumption, exceptional audio fidelity, rich algorithm support, and broad compatibility — these products are widely adopted in global PC motherboards and smart audio devices (such as TWS earbuds, smart glasses, and soundbars). Realtek maintains a long-standing leading position in market share within this sector.

DSP (Digital Signal Processor)

In Bluetooth audio devices (such as headphones, speakers, TWS earbuds, etc.), sound quality and feature performance are primarily determined by the built-in Digital Signal Processor (DSP). This processor acts as the "brain" of the audio system, responsible for implementing core functions including Active Noise Cancellation (ANC), dynamic equalizer (EQ), Dynamic Range Control (DRC), audio enhancement, voice call enhancement, spatial audio immersive experience, and Bluetooth audio codec support. By performing real-time processing, optimization, and enhancement of digital audio signals, it effectively compensates for the inherent limitations of Bluetooth transmission and hardware constraints, thereby achieving professional-grade audio performance.

Realtek's DSP solutions are characterized by core advantages such as high integration, low power consumption, dual-HiFi architecture, intelligent algorithms, and full-scenario compatibility. They are widely integrated into audio Codecs, Bluetooth Audio SoCs, and main control chips for smart devices, covering a diverse range of audio products including PCs, smart glasses, TWS earbuds, and soundbars. With its unique strengths in hardware-software co-optimization and scenario-specific algorithm libraries, this solution has become a mainstream choice in the consumer electronics audio processing domain.

The Realtek RTL8763D series chip is a highly integrated audio platform designed for wired and wireless audio applications featuring Bluetooth Enhanced Data Rate (EDR) and Bluetooth Low Energy (BLE) connectivity. The hardware architecture of this platform includes the following functional modules:

Main Control Core: Cortex-M4F Microcontroller (MCU)
Enhanced Dual Tensilica Hi‑Fi DSP Cores: Integrates both HiFi Mini and HiFi 4 DSP cores, with a maximum clock frequency of up to 320 MHz. The HiFi Mini core is equipped with 400 KB of dedicated RAM, while the HiFi 4 core is equipped with 384 KB of dedicated RAM.
Digital Audio Interfaces Accessible to the DSP: Supports 4‑channel I2S/TDM interfaces, compatible with both internal and external audio codecs.
Integrated Audio Codec: Includes a 2‑channel DAC, a 4‑channel ADC, and 6‑channel Pulse Density Modulation (PDM) interfaces (for connecting digital microphones).
MCU‑Controlled Multiplexer (Mux): Enables selection between analog microphones and digital microphones.

Audio Solution

This chapter introduces the audio solution provided by Realtek. This solution is a dedicated software framework for the audio domain, centered around standardized audio driver abstraction interfaces, virtualized audio stream routing mechanisms, and a set of high-level modular functional components. These components include audio routing, audio effects processing, audio stream control, notification prompts, voice activity detection, and more.

The diagram "Audio Subsystem Architecture" clearly illustrates the component composition of this solution. The entire architecture is divided into two layers by a solid black line: the bottom layer is the Audio Hardware Abstraction Layer, responsible for interacting with specific hardware; the top layer is the Audio Framework, which is built upon the Hardware Abstraction Layer and designed as a platform-independent software component. The Audio Framework can be further subdivided into audio paths, the audio core, and various high-level functional modules.

Audio Route

Audio Route refers to the static configuration of physical data paths, with its core function being to configure the Gateway and logical IO parameters for a specific audio routing path. This module is organized by Audio Category and includes Gateway Configuration, Endpoint Configuration, Logical IO Configuration, and Physical IO Configuration. The diagram below illustrates a simplified Audio Route path between the Codec and the Digital Signal Processor (DSP).

Audio Category

Audio Category is used to classify all streams transmitted between the Host and the Controller. Audio streams that share the same control methods, usage scenarios, and functionalities are grouped under the same category. The primary categories include:

Audio Stream

The Audio Stream Component provides the application layer with a suite of abstract, efficient, and flexible functions for controlling and processing audio data. It is primarily divided into the following three categories:

Audio Track: Handles playback, voice communication, and recording streams.
Audio Line: Controls various types of loopback streams.
Audio Pipe: Manages codec conversion for different streams.

By configuring the underlying hardware stream routing paths via Audio Route, developers can leverage the APIs offered by these high-level audio stream models to efficiently handle complex audio scenarios.

Audio Track

Audio Track provides a dedicated high-level API for handling playback streams, voice communication streams, and recording streams. Specifically:

Playback Stream refers to music or multimedia audio.
Voice Communication Stream includes all forms of bidirectional voice transmission via mediums such as VoIP (Voice over IP) or cellular network calls.
Recording Stream is used for speech recognition or data capture.

The diagram "Audio Track Overview" illustrates the overall architecture of the audio track across different modules:

Audio Line

Audio Line operates across different modules in a manner similar to Audio Track. Each Audio Line instance acquires a dedicated stream from a local input peripheral and transmits it to a local output peripheral. Local input peripherals may include a built-in microphone, an external microphone, an auxiliary input (AUX-In), or a digital audio input (SPDIF-In). Local output peripherals may include a built-in speaker, an external speaker, an auxiliary output (AUX-Out), or a digital audio output (SPDIF-Out). Audio Line supports flexible combinations of input and output peripherals.

Audio Pipe

The diagram below briefly illustrates how Audio Pipe operates across different modules: The application layer inputs a data stream requiring codec format conversion from a Source Endpoint. This stream is then processed and converted by the Digital Signal Processor (DSP) before being sent back to the application layer. This model converts an input data stream from one codec format into the desired output format. Audio Pipe supports conversion between different codec types as well as conversion of specific codec attributes within the same type, all configurable as needed by the application. Furthermore, Audio Pipe supports cascaded processing: the output stream from the Sink Endpoint of one Audio Pipe can directly serve as the input stream for the Source Endpoint of the next pipe

Audio Pipe supports the following codec types: PCM, CVSD, mSBC, SBC, AAC, OPUS, FLAC, MP3, LC3, LDAC, LHDC, G729, LC3plus.

Notification

Notification tones are short, urgent audio messages directed to the user. The Audio Subsystem currently supports the following three types of notification tones:

Ringtone: Generated by an FM (Frequency Modulation) synthesizer.
Voice Prompt: Pre-recorded voice interaction data.
Text-to-Speech (TTS): Speech generated by a speech synthesizer.

Both Ringtone and Voice Prompt support three modes: audible mode, mute mode, and volume-fixed mode.

Audible Mode: This is the default mode, allowing playback, stopping, volume adjustment, and muting/unmuting.
Mute Mode: In this mode, audio playback is disabled, and volume adjustment is unavailable.
Volume-Fixed Mode: In this mode, the volume is locked and cannot be adjusted.

VAD

VAD (Voice Activity Detection) is a core algorithm in audio signal processing designed to automatically distinguish between "speech signals" and "non-speech signals" (such as silence and ambient noise). This algorithm is widely used in Bluetooth audio devices (e.g., earphones, speakers), call systems, and voice assistant applications. Within a Bluetooth audio system, VAD operates as a lightweight algorithm module on the DSP. It is typically enabled only during idle mode and A2DP audio playback mode. In voice/HFP call mode and Line-in input mode, VAD is usually not required, as the former is already focused on call voice and the latter handles external audio input. This module is responsible for analyzing the audio stream captured by the microphone in real-time and outputs a detection signal to the MCU indicating the "presence/absence of speech."

VAD is not an independent module but is deeply integrated into the Bluetooth audio subsystem's chain of "microphone capture → ADC sampling → DSP processing → MCU control." Its architecture is closely tied to the hardware design of Bluetooth Audio SoCs (such as Realtek's RTL8763). The overall architecture is illustrated in the diagram below. VAD can be categorized into two types: Software VAD and Hardware VAD, which differ in their implementation approaches.

Audio Effect

The diagram below illustrates the binding relationship between Audio Effect and Audio Track: The application can enable, disable, or clear specified effects via the API of the Audio Effect submodule, while simultaneously starting, stopping, or restarting the corresponding data stream via the API of the Stream submodules. To apply a specific audio effect to a data stream, the application must actively invoke the Stream submodule API to establish the binding relationship between them. The underlying Audio Path module will then pass the bound effect information to the Digital Signal Processor (DSP) for execution at the appropriate time.

The Audio Subsystem supports the dynamic binding of effects to data streams. Any audio effect can be associated with any type of data stream, and binding/unbinding can be managed flexibly at runtime, thereby providing the application with greater freedom in audio control. This design, based on the abstract Audio Effect model, decouples the data stream from specific effects, facilitating independent expansion of the subsystem along both the data stream and effect dimensions.

Built-In Effect

Built-in effects include Equalizer (EQ), Noise Reduction Enhancement (NREC), Wide Dynamic Range Compression (WDRC), Sidetone, and Beamforming. The interaction flow for these effects is largely consistent and can be controlled via a comprehensive set of lifecycle APIs. The application first calls the relevant functions to create and enable an audio effect, then associates it with the target playback data stream via an interface. During data stream playback, the application can dynamically update the effect parameters, temporarily disable the effect by calling the corresponding function at any time, or ultimately release the effect resources.

Vendor Specific Effect

The interaction flow between Vendor Specific Effects (VSE) and the application differs from that of built-in effects. When integrating Vendor Specific Effects, the Audio Subsystem primarily serves as a transport layer, responsible for transparently relaying information between the application and the vendor's custom algorithm library — the format of this information is defined by the algorithm vendor.

Application Products

Realtek's audio solution is built on System-on-Chip (SoC) designs optimized for audio scenarios, combined with advanced audio processing technologies and high-performance Digital Signal Processors (DSPs). This enables smooth handling of high-quality audio while supporting features such as Active Noise Cancellation (ANC) and echo cancellation, delivering an immersive auditory experience for users. Furthermore, this series of SoCs is compatible with multiple audio format decoding and mainstream audio streaming protocols, helping to expand product applicability and market coverage.

Realtek's Bluetooth SoCs have been widely adopted across various audio devices. Representative products include smart voice recorders, smart helmets, soundbars, smart speakers, Bluetooth hearing aids, smart glasses, and smart charging cases. Their performance is demonstrated in the following areas:

High-Fidelity Recording & Playback – Provides clear and accurate sound capture and reproduction for smart voice recorders and hearing aids.
Immersive Audio Experience – Delivers powerful and refined sound quality in smart helmets, soundbars, and smart speakers.
Low Power Consumption & Extended Battery Life – Supports all-day battery life for smart glasses and smart charging cases, enhancing portability and practicality.
Cutting-Edge Connectivity Technology – Supports Auracast™ broadcast audio technology, enabling multi-device audio sharing and synchronization, and fostering a new wireless audio ecosystem.

Leveraging these strengths, Realtek's Bluetooth SoC audio solution not only meets the core audio requirements of various smart devices but also establishes high audio quality, low power consumption, and intelligent connectivity as key competitive advantages, providing reliable support for the ongoing evolution of consumer electronics audio experiences.

TWS

Realtek TWS Earbuds Audio Solution features high integration, low power consumption, comprehensive scenario noise cancellation, and high cost-effectiveness, providing complete technical support for TWS earbud products across different market segments.

High-Resolution Audio & Professional Sound Effects: Built-in high-performance DAC/ADC (24-bit/192kHz), natively supports SBC, AAC, and mSBC, with optional licensable extension for LDAC. The 24-bit DSP supports 10-band EQ, 3D surround sound, and game audio acceleration, combined with RCV technology to optimize call clarity.
All-Scenario Noise Cancellation: Employs a hybrid feedforward + feedback ANC architecture with noise reduction depth up to 40dB and power consumption below 1mA, supporting three-level noise cancellation modes. ENC is paired with a dual-microphone array and RCV 4.0 algorithm, and is compatible with third-party algorithms and cloud integration.
Low-Latency Connectivity & Seamless Interaction: Supports Bluetooth 5.3 with BLE dual-mode, offering a transmission range exceeding 15 meters and seamless master-slave switching between earbuds. Game mode latency is below 80ms (some models achieve 30-50ms), with end-to-end latency under 20ms.

Record Pen

Realtek Voice Recorder Audio Solution focuses on high-fidelity recording as its core, integrating professional-grade noise reduction algorithms, intelligent voice control, low-power design, and flexible connectivity expansion capabilities. It caters to full-scenario recording needs from entry-level to professional applications. Whether for lecture notes, business meetings, or professional interviews, this solution provides corresponding chip platforms and technical support to assist manufacturers in rapidly developing differentiated recording products.

High-Fidelity Recording Technology: Supports recording up to 24-bit/48kHz (some models up to 32-bit/384kHz) PCM. ADC Signal-to-Noise Ratio (SNR) ≥ 80dB (up to 112dB for the ALC series). Combined with AGC (Automatic Gain Control) and compatibility with WAV, MP3, OPUS, and other formats, it balances audio quality with storage efficiency.
Professional-Grade Noise Reduction & Audio Processing: Features a built-in independent DSP, supporting dual/three-microphone noise reduction (ENC) and Acoustic Echo Cancellation (AEC). Technologies like TSE (Target Speaker Enhancement), low-cut filters (200Hz/500Hz), and Wind Noise Suppression (WNS) effectively enhance recording clarity.
Intelligent Voice Control & Convenient Operation: Supports VAD (Voice Activity Detection) for voice-activated recording (sensitivity adjustable from 5 to 15 meters). Equipped with one-touch recording and automatic power-off data save functions, optimizing storage management and battery life, making it suitable for diverse scenarios such as meetings and outdoor use.

Helmet

Realtek Smart Helmet Audio Solution is deeply optimized for riding scenarios, comprehensively adapting to the communication and entertainment needs of motorcycle and e-bike riders in terms of connection stability, professional noise cancellation, and low-power design. Centered on professional audio processing, efficient noise reduction algorithms, stable Bluetooth connectivity, and a low-power architecture, this solution provides clear, safe, and convenient audio experiences for smart helmet products across different market segments and assists manufacturers in accelerating time-to-market.

Riding-Scenario Audio Optimization: Features dual-channel stereo output, equipped with riding-specific WNS (Wind Noise Suppression) and 300Hz-3kHz voice enhancement. Supports adaptive volume adjustment tailored for helmet use, ensuring clear navigation and call quality.
Professional-Grade Noise Cancellation & Call Quality Assurance: Built-in independent DSP supports ENC dual-microphone noise reduction and AEC echo cancellation. Provides an External AMIC interface for connecting external microphones and ensures stable call connections during high-speed riding through optimized Bluetooth protocols.
Flexible Connectivity & Extensibility: Bluetooth 5.0 dual-mode connectivity, equipped with Aux In auxiliary input and I2S/Line-in digital audio input. Supports peripheral expansion via interfaces like GPIO and I2C to meet diverse requirements.

Soundbar

Realtek Soundbar Audio Solution builds upon high-fidelity audio, integrating immersive surround sound technology, professional audio processing, stable low-latency connectivity, and flexible expansion capabilities. It caters to a full range of Soundbar applications, from entry-level to professional-grade. Whether for home theater systems, TV audio enhancement, or gaming and entertainment scenarios, this solution provides suitable chip platforms and technical support to assist manufacturers in rapidly launching differentiated products.

High-Fidelity Audio Processing: Supports processing up to 24-bit/96kHz (professional models up to 32-bit/384kHz) PCM. ADC Signal-to-Noise Ratio (SNR) ≥ 80dB (up to 112dB for the ALC series). Features built-in DRC (Dynamic Range Control), compatibility with mainstream audio formats, and support for high-definition codec decoding in select models.
Immersive Surround Sound: Supports up to 7.1.2 channel output, optimizes vocal frequencies, and utilizes algorithms to achieve virtual 3D vertical surround sound, enhancing the immersive experience.
Professional-Grade Audio Processing & Noise Reduction: Built-in independent DSP supports multi-microphone ENC noise reduction, AEC echo cancellation, and WNS wind noise suppression. Features customizable EQ and multiple scene presets. Supports wireless subwoofer connection and bass management.
Stable Connectivity & Low Latency: Bluetooth dual-mode connectivity with latency ≤ 40ms. Supports Auracast broadcast audio and simultaneous multi-device connection. Equipped with a rich set of physical interfaces including HDMI ARC, ensuring compatibility with multiple devices.

Speaker

Realtek Speaker Audio Solution is built upon high-fidelity sound quality, integrating intelligent audio processing, stable low-latency connectivity, flexible expansion, and multi-room audio capabilities. It comprehensively covers speaker application scenarios from entry-level to professional-grade. Whether for everyday music playback, home theater systems, or smart home integration, this solution provides matching chip platforms and technical support to assist manufacturers in efficiently launching differentiated speaker products.

High-Fidelity Sound & Audio Processing: Supports processing up to 24-bit/96kHz (professional models up to 32-bit/384kHz) PCM. DAC Signal-to-Noise Ratio (SNR) ≥ 90dB (up to 112dB for the ALC series). Features built-in DRC (Dynamic Range Control), compatibility with mainstream audio formats (some support HD decoding), and includes customizable EQ and multiple scene presets.
Intelligent Audio Processing & Noise Reduction: Built-in independent DSP supports multi-microphone ENC noise reduction, AEC echo cancellation, TSE (Target Speaker Enhancement), and WNS wind noise suppression. Equipped with a bass management system, supports wireless subwoofer connection, and mitigates phase interference.
Stable Connectivity & Low Latency: Bluetooth dual-mode connectivity with latency ≤ 40ms. Supports Auracast broadcast and simultaneous dual-device connection. Offers rich interfaces including USB and HDMI ARC for compatibility with multiple devices.
Flexible Expansion & Multi-Room Audio: Supports Realparty multi-room audio (primary/secondary speakers are switchable) with synchronization accuracy ≤ 50 microseconds. Supports 2.4GHz wireless subwoofer and Wi-Fi expansion (high-end models), and is compatible with mainstream voice assistants.

Glasses

Realtek Smart Glasses Audio Solution is built upon an ultra-thin integrated design, combining open acoustic optimization, AI-enhanced voice processing, LE Audio low-latency connectivity, and ultra-low power consumption for extended battery life. This solution addresses the core requirements of smart glasses in terms of portability, audio quality, and interactive experience. Whether for entry-level voice-enabled glasses or high-end AR glasses, it provides corresponding chip platforms and technical support to assist manufacturers in rapidly launching differentiated products.

Open Acoustic & Bone Conduction Optimization: Directional sound transmission algorithms enable private listening with low sound leakage (maintaining low leakage even at +30% volume). Supports dual-mode bone conduction/air conduction switching with sound quality compensation to resolve muffled audio. Equipped with six-channel 3D sound field across both arms, supporting 24-bit/96kHz PCM and multi-format decoding.
AI-Enhanced Voice Processing: Features AI noise reduction algorithms trained on over 15 million scenarios, capable of suppressing more than 20 types of environmental interference, improving Signal-to-Noise Ratio (SNR) by over 40dB. Combined with TSE (Target Speaker Enhancement) technology, voice clarity in noisy environments is increased by 50%. Speech-to-text latency ≤ 20ms, adapting to real-time AI applications like translation.
LE Audio & Low-Latency Connectivity: Supports Auracast broadcast audio, capable of connecting to 8+ devices with CIS/BIS dual-ear synchronization ≤ 50 microseconds. Audio latency ≤ 30ms, utilizing Bluetooth 5.3/5.4 dual-mode connectivity with a range of up to 10 meters. Interference resistance improved by 25%, ensuring stable and smooth connections.
Smart Interaction & Multi-Scenario Adaptation: Compatible with mainstream voice assistants and supports offline command recognition. Provides dual-control via touch and voice for convenient operation. Features AI scene recognition for automatic audio parameter adjustment. Supports simultaneous connection to two devices with fast switching.

Solutions > Audio Solution