GMF AI Audio

中文版

GMF AI Audio is an artificial intelligence audio processing module that provides users with convenient and easy-to-use intelligent audio processing algorithms at the GMF framework, such as voice wake-up, command word recognition, and echo cancellation. Currently, it offers the following modules based on esp-sr:

esp_gmf_afe_manager: audio front end(afe) manager
esp_gmf_aec: Echo Cancellation
esp_gmf_wn: A standalone wake word detection module that can be used independently
esp_gmf_afe: An easy-to-use interface based on the audio front end (afe) from esp-sr, providing functionalities such as voice wake-up, command word recognition, and speech detection

Name	Tag	Function	Method	Input Channel Number	Output Channel Number	Model Partition Dependency	Input Frame Length	Notes
esp_gmf_afe	ai_afe	Audio front-end processing: Wake word detection, command word recognition, voice enhancement, echo cancellation, noise suppression, automatic gain control	`start_vcmd_det`	1-4	1	Yes	256(sample)	Currently supports up to 2 microphone channels + 1 speaker reference signal, remaining channel selection marked as N, requires following voice command detection procedure
esp_gmf_aec	ai_aec	Echo cancellation: Eliminates echo interference in audio, improves voice quality	None	1-4	1	No	256(sample)	Input channels can be set to multiple microphones, uses first microphone channel and reference channel for calculation, must include reference signal
esp_gmf_wn	ai_wn	Independent wake word detection: Lightweight wake word detection, independent of AFE, low resource consumption	None	1-4	1	Yes	256(sample)	Supports up to 3 microphone channels, microphone channel count in input format must match working mode

AFE Manager `esp_gmf_afe_manager`

Features

Manages the data path and task scheduling of the Audio Front-End (AFE)
Supports dynamic enabling/disabling of features (e.g., wake-up, AEC, VAD)
Provides multi-task coordination to ensure real-time audio stream processing

Key Characteristics

Task Management: Creates independent feed_task and fetch_task for audio data input and result processing
Feature Control: Dynamically toggle algorithm modules via esp_afe_manager_enable_features
Event-Driven: Supports suspend/resume operations (esp_afe_manager_suspend) for low-power scenarios

System Diagram

Mermaid

Echo Cancellation `esp_gmf_aec`

Features

Eliminates echo interference in audio to improve voice quality
Supports multi-channel input and single-channel output

Technical Specifications

Supported Hardware: ESP32, ESP32-S3, and ESP32-P4
Input Format
- Sampling Rate: 8 kHz and 16 kHz
- Bit Width: 16-bit PCM
Channel Configuration: String identifiers (e.g., MMNR)
- M: Microphone Channel
- R: Reference Signal Channel
- N: Invalid Signal
Output Format: 16-bit single-channel PCM

Wake Word Detection `esp_gmf_wn`

Features

Runs independently without AFE dependency, low resource usage
Supports multi-channel input and single-channel output
Notifies wake word detection results via callback function

Technical Specifications

Supported Hardware: ESP32, ESP32-S3, ESP32-C3, ESP32-C5 and ESP32-P4
Input Format
- Sampling Rate: 16 kHz
- Bit Width: 16-bit PCM
Channel Configuration: String identifiers (e.g., MMNR)
- M: Microphone Channel
- R: Reference Signal Channel
- N: Invalid Signal
Output Format: 16-bit single-channel PCM

Speech Recognition `esp_gmf_afe`

Features

Voice Wake-Up (Wake Word Detection)
Command Word Recognition
Voice Activity Detection (VAD) and State Management

Key Characteristics

Supported Hardware: ESP32, ESP32-S3, and ESP32-P4
Input and Output: Same as described in esp_gmf_aec
Unified Event Interface: Notifies events (wake-up start/end, VAD state changes, command word triggers) via callback functions
State Machine Management: Automatically maintains transitions between wake-up, speech, and silence states

System Diagram

Mermaid

graph TD
  subgraph Audio Preprocessing
    RAW[Audio Data]
  end

  subgraph GMF_AFE
      AFE_MGR[AFE Manager
esp_gmf_afe_manager]
      INPUT_RB[Input Buffer]
      OUTPUT_RB[Output Buffer]
      STATE_MACHINE[Wake-Up/VAD State Machine]
      MN_DET[Command Word Detection]
      DELAY[Delay Mechanism]
  end

  subgraph Application Layer
    EVENT_CB[Event Callback]
    OUT_PORT[Data Output]
  end

  RAW -->|Processed Audio Data| INPUT_RB
  INPUT_RB -->|Data Callback| AFE_MGR
  AFE_MGR -->|Result Callback
Data| OUTPUT_RB
  AFE_MGR -->|Result Callback
Wake-Up/VAD Events| STATE_MACHINE
  AFE_MGR -->|Result Callback
Data| MN_DET
  OUTPUT_RB --> DELAY --> OUT_PORT
  STATE_MACHINE -->|Event Notification| EVENT_CB
  MN_DET -->|Event Notification| EVENT_CB

`esp_gmf_afe_cfg_t` Advanced Configuration Parameters

Parameter	Description	Default Value
wakeup_time	Time in milliseconds to trigger wake-up end event if no voice activity is detected after enabling wake-up	10000 (ms)
wakeup_end	Time in milliseconds to trigger wake-up end event after detecting silence following voice activity	2000 (ms)
vcmd_timeout	Timeout for command word detection. After timeout, the interface must be called again to start a new detection round	5760 (ms)
delay_samples	Output data delay to compensate for VAD detection lag. The specified data length (in samples) should be greater than or equal to `afe_config_t.vad_min_speech_ms` configured during AFE manager initialization	2048 (samples)

Wake Word and VAD State Machine

The following illustrates state transitions when features are enabled. The / character indicates the triggered user event

Enable Wake Word Detection, Disable VAD

This scenario focuses on detecting wake words and can be configured (wakeup_time) to trigger an end event after a certain period

Modify the configuration in the example wwe to use this scenario

#define WAKENET_ENABLE (true)
#define VAD_ENABLE     (false)

Mermaid

stateDiagram-v2
    [*] --> IDLE

    IDLE --> WAKEUP: Wake Word / (WAKEUP_START)

    WAKEUP --> IDLE: Timer (wakeup_time) Timeout / (WAKEUP_END)

Enable VAD, Disable Wake Word Detection

This scenario focuses on detecting voice activity and triggers corresponding events at the start and end of speech

Modify the configuration in the example wwe to use this scenario

#define WAKENET_ENABLE (false)
#define VAD_ENABLE     (true)

Mermaid

stateDiagram-v2
    [*] --> IDLE

    IDLE --> SPEECHING: Voice Detected / (VAD_START)

    SPEECHING --> IDLE: Noise or Silence Detected / (VAD_END)

Enable Both Wake Word Detection and VAD

This scenario combines wake word detection with voice activity detection to avoid frequent voice activity events outside the wake word interval

Modify the configuration in the example wwe to use this scenario

#define WAKENET_ENABLE (true)
#define VAD_ENABLE     (true)

Mermaid

stateDiagram-v2
    [*] --> IDLE

    IDLE --> WAKEUP: Wake Word / (WAKEUP_START)

    WAKEUP --> SPEECHING: Voice Detected / (VAD_START)
    WAKEUP --> IDLE: Timer (wakeup_time) Timeout / (WAKEUP_END)

    SPEECHING --> WAIT_FOR_SLEEP: Noise or Silence Detected / (VAD_END)

    WAIT_FOR_SLEEP --> SPEECHING: Voice Detected / (VAD_START)
    WAIT_FOR_SLEEP --> IDLE: Timer (wakeup_end) Timeout / (WAKEUP_END)

Command Word Detection

Users need to decide when to start command word detection. A typical use case is to enable detection after the state machine pushes a (WAKEUP_START) event and determine the next operation based on the detected command word index in the callback function

Command word detection is independent of the wake word state machine
Command word detection supports continuous detection until timeout

Usage

For example code, refer to the examples folder for the Wake Word Detection and Audio Echo Cancellation demos.

You can also create and compile a project using the following commands, taking the aec_rec project as an example. Before starting, make sure you have a working ESP-IDF environment.

1. Create the Example Project

Create the aec_rec example project based on the gmf_ai_audio component (using version v0.7.0 as an example; update the version as needed):

Shell

idf.py create-project-from-example "espressif/gmf_ai_audio=0.7.0:aec_rec"

2. Build and Flash the Project Using an ESP32-S3 Board

Shell

cd aec_rec
idf.py set-target esp32s3
idf.py -p YOUR_PORT flash monitor

jason-mao/gmf_ai_audio

readme

GMF AI Audio

AFE Manager `esp_gmf_afe_manager`

Features

Key Characteristics

System Diagram

Echo Cancellation `esp_gmf_aec`

Features

Technical Specifications

Wake Word Detection `esp_gmf_wn`

Features

Technical Specifications

Speech Recognition `esp_gmf_afe`

Features

Key Characteristics

System Diagram

`esp_gmf_afe_cfg_t` Advanced Configuration Parameters

Wake Word and VAD State Machine

Command Word Detection

Usage

1. Create the Example Project

2. Build and Flash the Project Using an ESP32-S3 Board

Links

Supports all targets

License: Custom

Tags

Stats

Badge

jason-mao/gmf_ai_audio

readme

GMF AI Audio

AFE Manager esp_gmf_afe_manager

Features

Key Characteristics

System Diagram

Echo Cancellation esp_gmf_aec

Features

Technical Specifications

Wake Word Detection esp_gmf_wn

Features

Technical Specifications

Speech Recognition esp_gmf_afe

Features

Key Characteristics

System Diagram

esp_gmf_afe_cfg_t Advanced Configuration Parameters

Wake Word and VAD State Machine

Command Word Detection

Usage

1. Create the Example Project

2. Build and Flash the Project Using an ESP32-S3 Board

Links

Supports all targets

License: Custom

Tags

Stats

Badge

AFE Manager `esp_gmf_afe_manager`

Echo Cancellation `esp_gmf_aec`

Wake Word Detection `esp_gmf_wn`

Speech Recognition `esp_gmf_afe`

`esp_gmf_afe_cfg_t` Advanced Configuration Parameters