Back-end
8 minute read

Working with ESP32 Audio Sampling

Ivan has 18+ years' experience, ranging from back-end and blockchain architecture to DBA ops, kernel development, and embedded software.

The ESP32 is a next-generation, WiFi- and Bluetooth-enabled microcontroller. It’s Shanghai-based Espressif’s successor of the very popular—and, for the hobbyist audience, revolutionary—ESP8266 microcontroller.

A behemoth among microcontrollers, the ESP32’s specs include everything but the kitchen sink. It is a system-on-a-chip (SoC) product and practically requires an operating system to make use of all its features.

This ESP32 tutorial will explain and solve a particular problem of sampling the analog-to-digital converter (ADC) from a timer interrupt. We will use the Arduino IDE. Even if it is one of the worst IDEs out there in terms of feature sets, the Arduino IDE is at least easy to set up and use for ESP32 development, and it has the largest collection of libraries for a variety of common hardware modules. However, we will also use many native ESP-IDF APIs instead of Arduino ones, for performance reasons.

ESP32 Audio: Timers and Interrupts

The ESP32 contains four hardware timers, divided into two groups. All timers are the same, having 16-bit prescalers and 64-bit counters. The prescale value is used to limit the hardware clock signal—which comes from an internal 80 MHz clock going into the timer—to every Nth tick. The minimum prescale value is 2, which means interrupts can officially fire at 40 MHz at the most. This is not bad, as it means that at the highest timer resolution, the handler code must execute in at most 6 clock cycles (240 MHz core/40 MHz). Timers have several associated properties:

  • divider—the frequency prescale value
  • counter_en—whether the timer’s associated 64-bit counter is enabled (usually true)
  • counter_dir—whether the counter is incremented or decremented
  • alarm_en—whether the “alarm”, i.e. the counter’s action, is enabled
  • auto_reload—whether the counter is reset when the alarm is triggered

Some of the important distinct timer modes are:

  • The timer is disabled. The hardware is not ticking at all.
  • The timer is enabled, but the alarm is disabled. The timer hardware is ticking, it is optionally incrementing or decrementing the internal counter, but nothing else is happening.
  • The timer is enabled and its alarm is also enabled. Like before, but this time some action is performed when the timer counter reaches a particular, configured value: The counter is reset and/or an interrupt is generated.

Timers’ counters can be read by arbitrary code, but in most cases, we are interested in doing something periodically, and this means we will configure the timer hardware to generate an interrupt, and we will write code to handle it.

An interrupt handler function must finish before the next interrupt is generated, which gives us a hard upper limit on how complex the function can get. Generally, an interrupt handler should do the least amount of work it can.

To achieve anything remotely complex, it should instead set a flag which is checked by non-interrupt code. Any kind of I/O more complex than reading or setting a single pin to a single value is often better offloaded to a separate handler.

In the ESP-IDF environment, the FreeRTOS function vTaskNotifyGiveFromISR() can be used to notify a task that the interrupt handler (also called the Interrupt Service Routine, or ISR) has something for it to do. The code looks like this:

portMUX_TYPE DRAM_ATTR timerMux = portMUX_INITIALIZER_UNLOCKED; 
TaskHandle_t complexHandlerTask;
hw_timer_t * adcTimer = NULL; // our timer

void complexHandler(void *param) {
  while (true) {
    // Sleep until the ISR gives us something to do, or for 1 second
    uint32_t tcount = ulTaskNotifyTake(pdFALSE, pdMS_TO_TICKS(1000));  
    if (check_for_work) {
      // Do something complex and CPU-intensive
    }
  }
}

void IRAM_ATTR onTimer() {
  // A mutex protects the handler from reentry (which shouldn't happen, but just in case)
  portENTER_CRITICAL_ISR(&timerMux);

  // Do something, e.g. read a pin.
  
  if (some_condition) { 
    // Notify complexHandlerTask that the buffer is full.
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;
    vTaskNotifyGiveFromISR(complexHandlerTask, &xHigherPriorityTaskWoken);
    if (xHigherPriorityTaskWoken) {
      portYIELD_FROM_ISR();
    }
  }
  portEXIT_CRITICAL_ISR(&timerMux);
}

void setup() {
  xTaskCreate(complexHandler, "Handler Task", 8192, NULL, 1, &complexHandlerTask);
  adcTimer = timerBegin(3, 80, true); // 80 MHz / 80 = 1 MHz hardware clock for easy figuring
  timerAttachInterrupt(adcTimer, &onTimer, true); // Attaches the handler function to the timer 
  timerAlarmWrite(adcTimer, 45, true); // Interrupts when counter == 45, i.e. 22.222 times a second
  timerAlarmEnable(adcTimer);
}

Note: Functions used in the code throughout this article are documented with the ESP-IDF API and at the ESP32 Arduino core GitHub project.

CPU Caches and the Harvard Architecture

A very important thing to notice is the IRAM_ATTR clause in the definition of the onTimer() interrupt handler. The reason for this is that the CPU cores can only execute instructions (and access data) from the embedded RAM, not from the flash storage where the program code and data are normally stored. To get around this, a part of the total 520 KiB of RAM is dedicated as IRAM, a 128 KiB cache used to transparently load code from flash storage. The ESP32 uses separate buses for code and data (“Harvard architecture”) so they are very much handled separately, and that extends to memory properties: IRAM is special, and can only be accessed at 32-bit address boundaries.

In fact, ESP32 memory is very non-uniform. Different regions of it are dedicated for different purposes: The maximum continuous region is around 160 KiB in size, and all the “normal” memory accessible by user programs only totals around 316 KiB.

Loading data from flash storage is slow and can require SPI bus access, so any code which relies on speed must take care to fit into the IRAM cache, and often much smaller (less than 100 KiB) since a part of it is used by the operating system. Notably, the system will generate an exception if interrupt handler code is not loaded into the cache when an interrupt occurs. It would be both very slow and a logistical nightmare to load something from flash storage just as an interrupt happens. The IRAM_ATTR specifier on the onTimer() handler tells the compiler and linker to mark this code as special—it will be statically placed in IRAM and never swapped out.

However, the IRAM_ATTR only applies to the function it’s specified on—any functions called from that function are not affected.

Sampling ESP32 Audio Data from a Timer Interrupt

The usual way audio signals are sampled from an interrupt involves maintaining a memory buffer of samples, filling it in with sampled data, and then notifying a handler task that data is available.

The ESP-IDF documents the adc1_get_raw() function which measures data on a particular ADC channel on the first ADC peripheral (the second one is used by WiFi). However, using it in the timer handler code results in an unstable program, because it is a complex function which calls a non-trivial number of other IDF functions—in particular the ones which deal with locks—and neither adc1_get_raw() nor the functions it calls are marked with IRAM_ATTR. The interrupt handler will crash as soon as a large enough piece of code gets executed that would cause the ADC functions to be swapped out of IRAM—and this may be the WiFi-TCP/IP-HTTP stack, or the SPIFFS file system library, or anything else.

Note: Some IDF functions are specially crafted (and marked with IRAM_ATTR) so that they can be called from interrupt handlers. The vTaskNotifyGiveFromISR() function from the example above is one such function.

The most IDF-friendly way to get around this is for the interrupt handler to notify a task when an ADC sample needs to be taken, and have this task do the sampling and buffer management, with possibly another task being used for data analysis (or compression or transmission or whatever the case may be). Unfortunately, this is extremely inefficient. Both the handler side (which notifies a task that there’s work to be done) and the task side (which picks up a task to do) involve interactions with the operating system and thousands of instructions being executed. This approach, while theoretically correct, can bog down the CPU so much that it leaves little spare CPU power for other tasks.

Digging through IDF Source Code

Sampling data from an ADC is usually a simple task, so the next strategy is to see how the IDF does it, and replicate it in our code directly, without calling the provided API. The adc1_get_raw() function is implemented in the rtc_module.c file of the IDF, and of the eight or so things it does, only one is actually sampling the ADC, which is done by a call to adc_convert(). Luckily, adc_convert() is a simple function which samples the ADC by manipulating peripheral hardware registers via a global structure named SENS.

Adapting this code so it works in our program (and to mimic the behavior of adc1_get_raw()) is easy. It looks like this:

int IRAM_ATTR local_adc1_read(int channel) {
    uint16_t adc_value;
    SENS.sar_meas_start1.sar1_en_pad = (1 << channel); // only one channel is selected
    while (SENS.sar_slave_addr1.meas_status != 0);
    SENS.sar_meas_start1.meas1_start_sar = 0;
    SENS.sar_meas_start1.meas1_start_sar = 1;
    while (SENS.sar_meas_start1.meas1_done_sar == 0);
    adc_value = SENS.sar_meas_start1.meas1_data_sar;
    return adc_value;
}

The next step is to include the relevant headers so the SENS variable becomes available:

#include <soc/sens_reg.h>
#include <soc/sens_struct.h>

Finally, since adc1_get_raw() performs some configuration steps before sampling the ADC, it should be called directly, just after the ADC is set up. That way the relevant configuration can be performed before the timer is started.

The downside of this approach is that it doesn’t play nice with other IDF functions. As soon as some other peripheral, driver, or a random piece of code is called which resets the ADC configuration, our custom function will no longer work correctly. At least WiFi, PWM, I2C, and SPI do not influence the ADC configuration. In case something does influence it, a call to adc1_get_raw() will configure ADC appropriately again.

ESP32 Audio Sampling: The Final Code

With the local_adc_read() function in place, our timer handler code looks like this:

#define ADC_SAMPLES_COUNT 1000
int16_t abuf[ADC_SAMPLES_COUNT];
int16_t abufPos = 0;

void IRAM_ATTR onTimer() {
  portENTER_CRITICAL_ISR(&timerMux);

  abuf[abufPos++] = local_adc1_read(ADC1_CHANNEL_0);
  
  if (abufPos >= ADC_SAMPLES_COUNT) { 
    abufPos = 0;

    // Notify adcTask that the buffer is full.
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;
    vTaskNotifyGiveFromISR(adcTaskHandle, &xHigherPriorityTaskWoken);
    if (xHigherPriorityTaskWoken) {
      portYIELD_FROM_ISR();
    }
  }
  portEXIT_CRITICAL_ISR(&timerMux);
}

Here, adcTaskHandle is the FreeRTOS task that would be implemented to process the buffer, following the structure of the complexHandler function in the first code snippet. It would make a local copy of the audio buffer, and could then process it at its leisure. For example, it might run an FFT algorithm on the buffer, or it could compress it and transmit it over WiFi.

Paradoxically, using the Arduino API instead of ESP-IDF API (i.e. analogRead() instead of adc1_get_raw()) would work because the Arduino functions are marked with IRAM_ATTR. However, they are much slower than the ESP-IDF ones since they provide a higher level of abstraction. Speaking of performance, our custom ADC read function is about twice as fast as the ESP-IDF one.

ESP32 Projects: To OS or Not to OS

What we did here—reimplementing an API of the operating system to get around some problems which wouldn’t even be there if we didn’t use an operating system—is a good illustration of the pros and cons of using an operating system in the first place.

Smaller microcontrollers are programmed directly, sometimes in assembler code, and developers have complete control over every aspect of the program’s execution, of every single CPU instruction and all the states of all the peripherals on the chip. This can naturally become tedious as the program gets larger and as it uses more and more hardware. A complex microcontroller such as the ESP32, with a large set of peripherals, two CPU cores, and a complex, non-uniform memory layout, would be challenging and laborious to program from scratch.

While every operating system places some limits and requirements on the code which uses its services, the benefits are usually worth it: faster and simpler development. However, sometimes we can, and in the embedded space often should, get around it.

Understanding the basics

What is the ESP32 used for?

ESP32 is a microcontroller with WiFi and Bluetooth used to create IoT products. It is a powerful device with a dual-core CPU and a large set of features including hardware cryptographic offloading, 520 KiB RAM, and a 12-bit ADC. It is used in complex products where its feature set makes development more effective.

What devices use Espressif?

Espressif is a company whose most well-known products are the ESP8266 and ESP32 WiFi-capable microcontrollers. Such products are used in IoT devices which can benefit from their large feature set.

What is a microcontroller timer interrupt?

A timer is a microcontroller peripheral (internal module), paired with an internal or external clock signal, which increments or decrements a value on every clock tick. Timers can generate an interrupt after a certain number has been counted, which causes a piece of code to be executed.

What is ESP programming?

ESP32 is a WiFi and Bluetooth enabled microcontroller used to create IoT products. It is a powerful device with a dual-core CPU and a large set of features. Writing ESP32 firmware usually relies on the vendor-provided development framework named ESP-IDF, which is implemented on top of FreeRTOS.

How does a system-on-a-chip work?

System-on-a-chip is an approach in building hardware which includes many (or all) components used to construct a working computer, on a single chip package. Exact specifications vary and may or may not include system memory, graphics processors, I/O controllers, network controllers, flash memory, and others.