You are using staging server - a separate instance of the ESP Component Registry that allows you to try distribution tools and processes without affecting the real registry.

uploaded 5 months ago
Espressif audio encoder and decoder

readme

# ESP_AUDIO_CODEC

Espressif Audio Codec (ESP_AUDIO_CODEC) is the official audio encoding and decoding processing module developed by Espressif Systems for SoCs. 

The ESP Audio Encoder provides a common encoder interface that allows you to register multiple encoders, such as AAC, AMR-NB, AMR-WB, ADPCM, G711A, G711U, PCM, OPUS, ALAC. User can create one or multiple encoder instance based on the encoder interfaces, these instance can run simultaneous encoding. Meanwhile user can also call specified encoder API directly to have less call depth. 

The ESP Audio Decoder provides a common decoder interface that allows you to register multiple decoders, such as AAC, MP3, AMR-NB, AMR-WB, ADPCM, G711A, G711U, VORBIS, OPUS, ALAC. You can create one or multiple decoder instance using the provided interfaces, enabling simultaneous decoding. Meanwhile user can also call specified decoder API directly to have less call depth. ESP Audio Decoder can only process audio frame data (which means input data is frame boundary).

To simplify the decoding process for parsing and locating audio frames, we utilize the ESP Audio Simple Decoder. This decoder employs a parser to easily and conveniently aggregate and structure audio frames, which are subsequently decoded using the ESP Audio Decoder. Users can input data of varying lengths. Supported audio containers include AAC, MP3, WAV, FLAC, AMRNB, AMRWB, M4A. 

The licenses of the third-party copyrights are recorded in [Copyrights and Licenses](http://docs.espressif.com/projects/esp-adf/en/latest/COPYRIGHT.html).

# Highlights

- **User-Friendly Interface**  
  The ESP Audio Codec module features a user-friendly interface designed for simple usability.
  
- **Lightweight with High Performance**  
  The module is optimized for high performance while maintaining a lightweight footprint and minimal memory usage.
  
- **Dual-Level Decoder API**  
  The module offers a dual-level API for decoders, catering to different use scenarios and application requirements.
  
- **High Customization through Registration**  
  Through registration API, use can add customized decoder, encoder or simple decoder easily, meanwhile user can overwrite the default decoder or encoder without change for application code.
  
# Features

The ESP Audio Codec supports the following features:   

## Encoder   

* Support encoding to AAC, AMR-NB, AMR-WB, ADPCM, G711A, G711U, PCM, OPUS, ALAC etc
* Support operate all encoder through common API see [esp_audio_enc.h](include/encoder/esp_audio_enc.h)
* Support customized encoder through `esp_audio_enc_register` or overwrote default encoder
* Support register all supported encoder through `esp_audio_enc_register_default` and manager it by menuconfig

The details of supported encoding codec show as belows:  
**AAC**     
- AAC low complexity profile encode (AAC-LC)
- Encoding sample rates (Hz): 96000, 88200, 64000, 48000, 44100, 32000, 24000, 22050, 16000, 12000, 11025, 8000    
- Encoding channel num: mono, dual     
- Encoding bit per sample: 16 bits    
- Constant bitrate encoding from 12 Kbps to 160 Kbps    
- Choosing whether to write ADTS header or not   

**AMR**       
- Encoding narrow band (NB) and wide band (WB)   
- AMRNB encoding at the sampling rate of 8 kHz       
- AMRWB encoding at the sampling rate of 16 kHz     
- Encoding channel num: mono    
- Encoding bit per sample: 16 bits    
- AMRNB encoding bitrate (Kbps): 4.75, 5.15, 5.9, 6.7, 7.4, 7.95, 10.2, 12.2    
- AMRWB encoding bitrate (Kbps): 6.6, 8.85, 12.65, 14.25, 15.85, 18.25, 19.85, 23.05, 23.85      
- Discontinuous transmission (DTX)     

**ADPCM**   
- Encoding sample rates (Hz): all    
- Encoding channel num: mono, dual    
- Encoding bit per sample: 16 bits    

**G711**    
- Encoding A-LAW and U-LAW      
- Encoding sample rates (Hz): all    
- Encoding channel num: all    
- Encoding bit per sample: 16 bits    

**OPUS**    
- Encoding sample rates (Hz): 8000, 12000, 16000, 24000, 48000    
- Encoding channel num: mono, dual    
- Encoding bit per sample: 16 bits    
- Constant bitrate encoding from 20Kbps to 510Kbps      
- Encoding frame duration (ms): 2.5, 5, 10, 20, 40, 60       
- Application mode for VoIP and music       
- Encoding complexity adjustment, from 0 to 10      
- Inband forward error correction (FEC)     
- Discontinuous transmission (DTX)

**ALAC**    
- Encoding sample rates (Hz): 8000, 12000, 16000, 24000, 48000    
- Encoding channel num: mono, dual    
- Encoding bit per sample: 16 bits  
  
## Decoder   

* Support decoding of AAC, MP3, AMR-NB, AMR-WB, ADPCM, G711A, G711U, PCM, OPUS, VORBIS, ALAC etc
* Support operate all decoder through common API see [esp_audio_dec.h](include/encoder/esp_audio_dec.h)
* Support customized decoder through `esp_audio_dec_register` or overwrote default decoder
* Support register all supported decoder through `esp_audio_enc_register_default` and manager it by menuconfig

The details of supported decoding codec show as belows:
| Codec         |  Notes                                          |
|       --      | --                                              |
|       AAC     |  Support AAC, AAC-Plus (mono/dual channel only) |
|       MP3     |                                                 |
|       AMRNB   |   Support 8K sample rate only                   | 
|       AMRWB   |   Support 16K sample rate only                  |
|       G711A   |                                                 | 
|       G711U   |                                                 |
|       ADPCM   |  Support IMA-ADPCM, mono channel only           |
|       FLAC    |                                                 |
|       OPUS    |  Support self delimited also                    |
|       VORBIS  |  User need provide common header information    |
|       ALAC    |  User need provide magic cookie information     |



## Simple Decoder   

* Support audio frame finding and decoding
* Support common parser, user can add customized parser according parser rules
* Support customized simple decoder to handle new file format
* Support customized parser and decoder pair: Use default parser but with customized decoder
* Support streaming decode only not support seek

The details of supported audio format lists as belows:
| File Format   | Notes                                                     |
|       --      |  --                                                       |
|       AAC     |                                                           |
|       MP3     |                                                           |
|       AMRNB   |                                                           | 
|       AMRWB   |                                                           | 
|       FLAC    |                                                           |
|       ADPCM   |  Support IMA-ADPCM only                                   |
|       WAV     |  Support g711a, g711u, pcm, adpcm                         |
|       M4A     |  Support MP3, AAC, ALAC <br> Support mdat after moov only |
|       TS      |  Support MP3                                              |

# Performance

The following results were obtained through testing with ESP32-S3R8 and internal RAM memory.    

## Encoder 

**AAC**     
| Sample Rate (Hz)    | Memory (KB) | CPU loading (%)|
|       --            |  --         |     --         |  
|       8000          |  52         |    3.5         | 
|       11025         |  52         |    4.9         | 
|       12000         |  52         |    5.6         | 
|       16000         |  52         |    6.0         | 
|       22050         |  52         |    8.1         | 
|       24000         |  52         |    8.2         | 
|       32000         |  52         |    12.1        | 
|       44100         |  52         |    15.7        | 
|       48000         |  52         |    16.4        | 
|       64000         |  52         |    20.2        | 
|       88200         |  52         |    25.9        | 
|       96000         |  52         |    27.7        |      

Note:       
    The CPU loading values in the table pertain to the mono channel, while the CPU loading for the dual channel is approximately 1.6 times that of the mono channel.   

**AMR**     
| Type    | Memory (KB)  | CPU loading (%) |
|   --    |  --          |     --          |  
|  AMR-NB |  3.4         |    24.8         | 
|  AMR-WB |  5.8         |    57.6         |     

Note:   
    1) The CPU loading in the table is an average number.       
    2) The CPU loading of AMR is related to the bitrate. The higher the bitrate is set, the higher the CPU loading will be.     

**ADPCM**     
| Channel | Memory (B)    | CPU loading (%) |
|   --    |  --           |     --          |  
|  mono   |  120          |    < 2          | 
|  dual   |  120          |    < 4          | 

**G711**     
| Type    | Memory (B)    | CPU loading (%) |
|   --    |  --           |     --          |  
|  G711-A |  40           |    < 4          | 
|  G711-U |  40           |    < 4          | 

Note:   
    The CPU loading in the table is for mono, and the CPU loading of dual is about 2 times that of mono.     

**OPUS**
| Sample Rate (Hz)     | Memory (KB) | CPU loading (%) |
|       --             |  --         |     --          |  
|       8000           |  43         |    15.9         | 
|       12000          |  43         |    16.7         | 
|       16000          |  43         |    16.8         | 
|       24000          |  43         |    17.8         | 
|       48000          |  43         |    19.9         | 

Note:   
    1) The data in the table is tested under the configuration with mono channel, complexity of 1, VoIP application mode, and a frame duration of 20 ms.    
    2) The dual channel encoding consumes about 13 KB more memory compared to the mono channel.     
    3) The CPU loading for the dual channel is about 1.6 times that of the mono channel.     
    4) The chosen complexity level directly impacts CPU loading, with 1 being the lowest and 10 being the highest.          

#  ESP_AUDIO_CODEC Release and SoC Compatibility

The following table shows the support of ESP_AUDIO_CODEC for Espressif SoCs. The "&#10004;" means supported, and the "&#10006;" means not supported. 

|Chip         |         v1.0.0     |
|:-----------:|:------------------:|
|ESP32        |       &#10004;     |
|ESP32-S2     |       &#10004;     |
|ESP32-C3     |       &#10004;     |
|ESP32-C6     |       &#10004;     |
|ESP32-S3     |       &#10004;     |
|ESP32-P4     |       &#10004;     |

# Usage

## Encoder Usage
The sample usage can refer to [audio_encoder_test.c](test_apps/audio_codec_test/main/audio_encoder_test.c)

## Decoder Usage
The sample usage can refer to [audio_decoder_test.c](test_apps/audio_codec_test/main/audio_decoder_test.c)

## Simple Decoder Usage
The sample usage can refer to [simple_decoder_test.c](test_apps/audio_codec_test/main/simple_decoder_test.c)

Links

Supports all targets

License: Custom

To add this component to your project, run:

idf.py add-dependency "jason-mao/esp_audio_codec^0.0.5"

or download archive

Stats

  • Archive size
    Archive size ~ 37.10 MB
  • Downloaded in total
    Downloaded in total 18 times
  • Downloaded this version
    This version: 4 times

Badge

jason-mao/esp_audio_codec version: 0.0.5
|