You are using staging server - a separate instance of the
ESP Component Registry that allows you to try distribution tools and processes
without affecting the real registry.
GMF AI Audio is an artificial intelligence audio processing module that provides users with convenient and easy-to-use intelligent audio processing algorithms at the GMF framework, such as voice wake-up, command word recognition, and echo cancellation. Currently, it offers the following modules based on esp-sr:
esp_gmf_afe_manager: audio front end(afe) manager
esp_gmf_aec: Echo Cancellation
esp_gmf_wn: A standalone wake word detection module that can be used independently
esp_gmf_afe: An easy-to-use interface based on the audio front end (afe) from esp-sr, providing functionalities such as voice wake-up, command word recognition, and speech detection
Name
Tag
Function
Method
Input Channel Number
Output Channel Number
Model Partition Dependency
Input Frame Length
Notes
esp_gmf_afe
ai_afe
Audio front-end processing: Wake word detection, command word recognition, voice enhancement, echo cancellation, noise suppression, automatic gain control
start_vcmd_det
1-4
1
Yes
256(sample)
Currently supports up to 2 microphone channels + 1 speaker reference signal, remaining channel selection marked as N, requires following voice command detection procedure
esp_gmf_aec
ai_aec
Echo cancellation: Eliminates echo interference in audio, improves voice quality
None
1-4
1
No
256(sample)
Input channels can be set to multiple microphones, uses first microphone channel and reference channel for calculation, must include reference signal
esp_gmf_wn
ai_wn
Independent wake word detection: Lightweight wake word detection, independent of AFE, low resource consumption
None
1-4
1
Yes
256(sample)
Supports up to 3 microphone channels, microphone channel count in input format must match working mode
AFE Manager esp_gmf_afe_manager
Features
Manages the data path and task scheduling of the Audio Front-End (AFE)
Supports dynamic enabling/disabling of features (e.g., wake-up, AEC, VAD)
Provides multi-task coordination to ensure real-time audio stream processing
Key Characteristics
Task Management: Creates independent feed_task and fetch_task for audio data input and result processing
Feature Control: Dynamically toggle algorithm modules via esp_afe_manager_enable_features
Event-Driven: Supports suspend/resume operations (esp_afe_manager_suspend) for low-power scenarios
System Diagram
Mermaid
Echo Cancellation esp_gmf_aec
Features
Eliminates echo interference in audio to improve voice quality
Supports multi-channel input and single-channel output
Time in milliseconds to trigger wake-up end event if no voice activity is detected after enabling wake-up
10000 (ms)
wakeup_end
Time in milliseconds to trigger wake-up end event after detecting silence following voice activity
2000 (ms)
vcmd_timeout
Timeout for command word detection. After timeout, the interface must be called again to start a new detection round
5760 (ms)
delay_samples
Output data delay to compensate for VAD detection lag. The specified data length (in samples) should be greater than or equal to afe_config_t.vad_min_speech_ms configured during AFE manager initialization
2048 (samples)
Wake Word and VAD State Machine
The following illustrates state transitions when features are enabled. The / character indicates the triggered user event
Enable Wake Word Detection, Disable VAD
This scenario focuses on detecting wake words and can be configured (wakeup_time) to trigger an end event after a certain period
Modify the configuration in the example wwe to use this scenario
Users need to decide when to start command word detection. A typical use case is to enable detection after the state machine pushes a (WAKEUP_START) event and determine the next operation based on the detected command word index in the callback function
Command word detection is independent of the wake word state machine
Command word detection supports continuous detection until timeout
Usage
For example code, refer to the examples folder for the Wake Word Detection and Audio Echo Cancellation demos.
You can also create and compile a project using the following commands, taking the aec_rec project as an example. Before starting, make sure you have a working ESP-IDF environment.
1. Create the Example Project
Create the aec_rec example project based on the gmf_ai_audio component (using version v0.7.0 as an example; update the version as needed):