MicroAI

Introduction

The MicroAI Library provides APIs to interact with trained Machine Learning models, especially to run inferences.

Usage

The MicroAI Library is provided as a Foundation Library.

To use the MicroAI Library, add the following line to the project build file:

implementation("ej.api:microai:2.1.0")

Building or running an Application which uses the MicroAI Library requires a SDK6 VEE Port that provides the MicroAI Pack.

Machine Learning Model Format

MicroAI API is designed to be framework-agnostic, meaning it does not rely on a specific Machine Learning framework like TensorFlow or ONNX.

The Application is responsible of loading the model file. Therefore, before developing an Application with MicroAI, check which model file formats are supported by your target VEE Port.

MicroEJ Simulator

If you need to use the MicroEJ Simulator, you must use a model in TensorFlow Lite for Microcontrollers (TFLM) format. Other model formats will not be compatible with the MicroEJ Simulator and cannot be executed within it.

Tensorflow Lite for Microcontrollers supports a limited subset of TensorFlow operations, which impacts the model architectures that it is possible to run. The supported operators list corresponds to the list in the all_ops_resolver.cc file.

APIs

MLInferenceEngine

The first action when working with MicroAI is to load the trained Machine Learning model using MLInferenceEngine class.

There are 2 ways to load a model:

From an application resource with MLInferenceEngine(String modelPath, int inferenceMemoryPoolSize) constructor.
From an InputStream using MLInferenceEngine(InputStream is, int inferenceMemoryPoolSize) constructor.

The MLInferenceEngine constructor will:

Map the model into a native data structure.
Build an interpreter to run the model with.
Allocate memory for the model’s tensors.

Using an Input Stream Model

When using MLInferenceEngine(InputStream is, int inferenceMemoryPoolSize), the model is loaded inside the MicroAI heap. The size of MicroAI heap is defined from the MicroAI Configurations.

Note that the call to MLInferenceEngine(InputStream is, int inferenceMemoryPoolSize) will block until the model is completely retrieved/loaded.

Using an Inference Memory Pool

When using TensorFlow Lite, the tensors are allocated dynamically into the system heap.

However when using Tensorflow Lite Micro, we must configure a inferenceMemoryPoolSize, which is called Arena Size, where all the input, output and intermediate tensors will be allocated. This helps achieve deterministic memory usage.

To figure out which minimal value can be set, try large enough values using the Simulator, until the MLInferenceEngine succeeds. At which point you should see such log, which will help to fine tune the inferenceMemoryPoolSize value:

[microai mock] MicroInterpreter uses 1112 bytes, use this value to optimize the Arena Size

Note: This example is very specific to the backend used.

Code Example

Once initialized, MLInferenceEngine allows to get input/output model tensors and to run inferences on the trained model.

For example, the following snippet loads a trained model from the application resources and runs an inference on it:

try(MLInferenceEngine mlInferenceEngine = new MLInferenceEngine("/model.tflite", MEMORY_POOL_SIZE)) { // Initialize the inference engine.
    InputTensor inputTensor = mlInferenceEngine.getInputTensor(0); // Get input tensor of the trained model.
    /*
     * Fill the input tensor
     */
    mlInferenceEngine.run(); // Run inference on the trained model.
    OutputTensor outputTensor = mlInferenceEngine.getOutputTensor(0); // Get output tensor of the trained model.
    /*
     * Process output data
     */
}

Tensor

Tensor parameters can be retrieved from the Tensor class.

It allows to get some useful information such as the data type, the number of dimensions, the number of elements, the size in bytes or the quantization parameters.

There are 2 kinds of tensors:

InputTensor: Offers services to load input data inside MicroAI input tensors before running an inference. Tensor input data must be one of the types supported by MicroAI (see Tensor.DataType).
OutputTensor: Offers services to retrieve output data from MicroAI output tensors after running an inference. Tensor output data must be one of the types supported by MicroAI (see Tensor.DataType).

Classes Summary

Main classes:

MLInferenceEngine: Loads a model, get its tensors and runs inferences on it.
Tensor: Retrieves a tensor information.
InputTensor: Loads input data before running an inference.
OutputTensor: Retrieves output data after running an inference.

Stateless and immutable classes:

Tensor.DataType: Enumerates MicroAI data types.
Tensor.QuantizationParameters: Represents quantized parameters of a tensor.

Configuration

The MicroAI Pack can be configured by defining the following Application Options:

microai.heap.size: defines the size of the MicroAI heap, in which the InputStream models are allocated.

Example

For example, the following snippet runs inference on model that takes 1 quantized element as input and outputs 1 float value:

try(MLInferenceEngine mlInferenceEngine = new MLInferenceEngine("/model.tflite", MEMORY_POOL_SIZE)) { // Initialize the inference engine.
    InputTensor inputTensor = mlInferenceEngine.getInputTensor(0); // Get input tensor of the trained model.
    byte[] inputData = new byte[inputTensor.getNumberElements()]; // Create an array that fits size of input tensor.

    // Fill inputData with quantized value.
    float realValue = 10f;
    Tensor.QuantizationParameters quantizationParameters = inputTensor.getQuantizationParams(); // Get quantization parameters.
    inputData[0] = (byte) (realValue / quantizationParameters.getScale() + quantizationParameters.getZeroPoint()); // Quantize the input value.
    inputTensor.setInputData(inputData); // Load input data inside MicroAI input tensor.

    mlInferenceEngine.run(); // Run inference on the trained model.

    OutputTensor outputTensor = mlInferenceEngine.getOutputTensor(0); // Get output tensor of the trained model.
    float[] outputData = new float[outputTensor.getNumberElements()]; // Create an array that fits size of output tensor.

    // Retrieve and print inference result.
    outputTensor.getOutputData(outputData); // Retrieve output data from MicroAI output tensor.
    System.out.println("Inference result with " + realValue + " input is " + outputData[0]);
}