MicroAI
Introduction
The MicroAI Library provides APIs to interact with trained Machine Learning models, especially to run inferences.
Usage
The MicroAI Library is provided as a Foundation Library.
To use the MicroAI Library, add the following line to the project build file:
implementation("ej.api:microai:2.1.0")
Building or running an Application which uses the MicroAI Library requires a SDK6 VEE Port that provides the MicroAI Pack.
Machine Learning Model Format
MicroAI API is designed to be framework-agnostic, meaning it does not rely on a specific Machine Learning framework like TensorFlow or ONNX.
The Machine Learning framework is integrated at the VEE Port level, as a C/C++ library.
The Application is responsible of loading the model file. Therefore, before developing an Application with MicroAI, check which model file formats are supported by your target VEE Port.
MicroEJ Simulator
If you need to use the MicroEJ Simulator, you must use a model in TensorFlow Lite for Microcontrollers (TFLM) format. Other model formats will not be compatible with the MicroEJ Simulator and cannot be executed within it.
Tensorflow Lite for Microcontrollers supports a limited subset of TensorFlow operations, which impacts the model architectures that it is possible to run. The supported operators list corresponds to the list in the all_ops_resolver.cc file.
APIs
MLInferenceEngine
The first action when working with MicroAI is to load the trained Machine Learning model using MLInferenceEngine class.
There are 2 ways to load a model:
From an application resource with MLInferenceEngine(String modelPath, int inferenceMemoryPoolSize) constructor.
From an InputStream using MLInferenceEngine(InputStream is, int inferenceMemoryPoolSize) constructor.
The MLInferenceEngine constructor will:
Map the model into a native data structure.
Build an interpreter to run the model with.
Allocate memory for the model’s tensors.
Using an Input Stream Model
When using MLInferenceEngine(InputStream is, int inferenceMemoryPoolSize), the model is loaded inside the MicroAI heap. The size of MicroAI heap is defined from the MicroAI Configurations.
Note that the call to MLInferenceEngine(InputStream is, int inferenceMemoryPoolSize) will block until the model is completely retrieved/loaded.
Using an Inference Memory Pool
When using TensorFlow Lite, the tensors are allocated dynamically into the system heap.
However when using Tensorflow Lite Micro, we must configure a inferenceMemoryPoolSize, which is called Arena Size, where all the input, output and intermediate tensors will be allocated. This helps achieve deterministic memory usage.
To figure out which minimal value can be set, try large enough values using the Simulator, until the MLInferenceEngine succeeds. At which point you should see such log, which will help to fine tune the inferenceMemoryPoolSize value:
[microai mock] MicroInterpreter uses 1112 bytes, use this value to optimize the Arena Size
Note: This example is very specific to the backend used.
Code Example
Once initialized, MLInferenceEngine allows to get input/output model tensors and to run inferences on the trained model.
For example, the following snippet loads a trained model from the application resources and runs an inference on it:
try(MLInferenceEngine mlInferenceEngine = new MLInferenceEngine("/model.tflite", MEMORY_POOL_SIZE)) { // Initialize the inference engine.
InputTensor inputTensor = mlInferenceEngine.getInputTensor(0); // Get input tensor of the trained model.
/*
* Fill the input tensor
*/
mlInferenceEngine.run(); // Run inference on the trained model.
OutputTensor outputTensor = mlInferenceEngine.getOutputTensor(0); // Get output tensor of the trained model.
/*
* Process output data
*/
}
Tensor
Tensor parameters can be retrieved from the Tensor class.
It allows to get some useful information such as the data type, the number of dimensions, the number of elements, the size in bytes or the quantization parameters.
There are 2 kinds of tensors:
InputTensor: Offers services to load input data inside MicroAI input tensors before running an inference. Tensor input data must be one of the types supported by MicroAI (see Tensor.DataType).
OutputTensor: Offers services to retrieve output data from MicroAI output tensors after running an inference. Tensor output data must be one of the types supported by MicroAI (see Tensor.DataType).
Classes Summary
Main classes:
MLInferenceEngine: Loads a model, get its tensors and runs inferences on it.
Tensor: Retrieves a tensor information.
InputTensor: Loads input data before running an inference.
OutputTensor: Retrieves output data after running an inference.
Stateless and immutable classes:
Tensor.DataType: Enumerates MicroAI data types.
Tensor.QuantizationParameters: Represents quantized parameters of a tensor.
Configuration
The MicroAI Pack can be configured by defining the following Application Options:
microai.heap.size
: defines the size of the MicroAI heap, in which the InputStream models are allocated.
Example
For example, the following snippet runs inference on model that takes 1 quantized element as input and outputs 1 float value:
try(MLInferenceEngine mlInferenceEngine = new MLInferenceEngine("/model.tflite", MEMORY_POOL_SIZE)) { // Initialize the inference engine.
InputTensor inputTensor = mlInferenceEngine.getInputTensor(0); // Get input tensor of the trained model.
byte[] inputData = new byte[inputTensor.getNumberElements()]; // Create an array that fits size of input tensor.
// Fill inputData with quantized value.
float realValue = 10f;
Tensor.QuantizationParameters quantizationParameters = inputTensor.getQuantizationParams(); // Get quantization parameters.
inputData[0] = (byte) (realValue / quantizationParameters.getScale() + quantizationParameters.getZeroPoint()); // Quantize the input value.
inputTensor.setInputData(inputData); // Load input data inside MicroAI input tensor.
mlInferenceEngine.run(); // Run inference on the trained model.
OutputTensor outputTensor = mlInferenceEngine.getOutputTensor(0); // Get output tensor of the trained model.
float[] outputData = new float[outputTensor.getNumberElements()]; // Create an array that fits size of output tensor.
// Retrieve and print inference result.
outputTensor.getOutputData(outputData); // Retrieve output data from MicroAI output tensor.
System.out.println("Inference result with " + realValue + " input is " + outputData[0]);
}