Audio

Introduction

The Audio API Library provides APIs to record and play audio in an Application.

This API can be used in various use-cases, such as:

  • playing a sound

  • playing a music streamed over IP or Bluetooth

  • making a call

  • synthesizing speech (text-to-speech)

  • recognizing speech (speech-to-text)

  • using a voice assistant such as Alexa or ChatGPT

Usage

The Audio API Library is provided as a Foundation Library.

To use the Audio API Library, add the following line to the project build file:

implementation("ej.api:audio:1.0.0")

Building or running an Application which uses the Audio API Library requires the VEE Port to provide the Audio Pack.

APIs

This section explains how audio concepts are reified in the Audio API.

Audio Format

When opening an audio stream, it is necessary to provide the format of the audio data. In the Audio API, the format is specified by the AudioFormat class.

The main property of an audio format is the encoding that is used to represent the data. Depending on the encoding used, the format may be described by an indeterminate number of parameters, such as the sample rate or the number of channels.

Although many audio encoding standards exist, the Audio API only provides a single AudioFormat implementation, PcmAudioFormat, which uses the PCM encoding. The Application can define additional audio formats using other encodings, but these encodings must be supported by the VEE Port to be used by the Application.

For example, the following snippet defines the “PCM 16kHz mono 16-bit little-endian signed” audio format:

AudioFormat FORMAT = new PcmAudioFormat(16_000, 1, 16, false, true);

Audio Recording

An audio recording stream can be opened by creating an AudioRecord instance. When creating an audio record, the format and the size of the native audio buffer must be provided. Since creating an AudioRecord instance allocates native resources, it should be closed with the AudioRecord.close() method in order to free these resources.

While the audio record is open, the native implementation records audio data continuously from the input device and writes it in the buffer. The AudioRecord.readBuffer() method can be used to retrieve and remove a chunk of audio data from the buffer. This method blocks until the requested size has been read or until the audio record is closed. If the audio data is not read fast enough by the application, the native implementation will discard the oldest audio data from the buffer.

For example, the following snippet records audio with an audio record and writes the audio data to a file:

try (OutputStream outputStream = new FileOutputStream("record.raw")) {
    try (AudioRecord audioRecord = new AudioRecord(FORMAT, 1600)) {
        byte[] buffer = new byte[480];
        while (true) {
            int bytesRead = audioRecord.readBuffer(buffer, 0, buffer.length); // read from audio record
            outputStream.write(buffer, 0, bytesRead); // write to file
        }
    }
}

Note

In order to avoid discontinuities in the recorded data, it is recommended to have a dedicated thread reading the buffer of the audio record. This should not be done in the UI thread as reading is a blocking operation that would prevent the UI thread from performing other tasks.

Audio Playback

An audio playback stream can be opened by creating an AudioTrack instance. When creating an audio track, the format and the size of the native audio buffer must be provided. Since creating an AudioTrack instance allocates native resources, it should be closed with the AudioTrack.close() method in order to free these resources.

While the audio track is open, the native implementation reads audio data continuously from the buffer and plays it on the output device. The AudioTrack.writeBuffer() method can be used to write a chunk of audio data in the buffer. This method blocks until the requested size has been written or until this audio track is closed. If audio data is not written fast enough by the application, the output device may play undesired silences. The AudioTrack.waitForBufferFlushed() method can be used to wait until all the audio data written in the buffer has been played back. The volume of the playback can be configured by calling AudioTrack.setVolume().

For example, the following snippet reads audio data from a resource and plays the audio with an audio track:

try (InputStream inputStream = MyClass.class.getResourceAsStream("/track.raw")) {
    try (AudioTrack audioTrack = new AudioTrack(FORMAT, 1600)) {
        byte[] buffer = new byte[480];
        while (true) {
            int bytesRead = inputStream.read(buffer, 0, buffer.length); // read from resource
            if (bytesRead == -1) { // EOF
                break;
            }
            audioTrack.writeBuffer(buffer, 0, bytesRead); // write to audio track
        }
        audioTrack.waitForBufferFlushed(); // play remaining audio data before closing
    }
}

Note

In order to avoid discontinuities in the audio playback, it is recommended to have a dedicated thread writing the buffer of the audio track. This should not be done in the UI thread as writing is a blocking operation that would prevent the UI thread from performing other tasks.

Classes Summary

Main classes:

Stateless and immutable classes:

Configuration

The Audio Pack can be configured by defining the following Application Options:

  • audio.heap.size: defines the size of the Audio heap, in which the native buffers of the audio streams are allocated.

  • s3.audio.input.device: defines the name of the Audio input device to use when running the Application on Simulator.

  • s3.audio.output.device: defines the name of the Audio output device to use when running the Application on Simulator.

Examples

MicroEJ provides two examples which show how to use the Audio API: one example for audio recording and one for audio playback.

These examples can be found on GitHub. Please refer to their own README for more information on these examples.