Native Language Support (NLS)

Introduction

Native Language Support (NLS) allows the application to facilitate internationalization. It provides support to manipulate messages and translate them in different languages. Each message to be internationalized is referenced by a key, which can be used in the application code instead of using the message directly.

Principle

NLS is distributed as an add-on library containing a single Java interface: NLS.

In addition to that, the binary-nls library provides a factory for implementations of this interface: it uses an add-on processor which processes, offboard, the Localization Source Files into one BON resource buffer file for compactness.

During the clinit phase, this resource file is opened and the list of locales is parsed. After that, the resource remains opened for the rest of the Application execution and is directly used to retrieve messages translations for the supported locales.

Localization Source Files

Messages must be defined in localization source files, located in the Classpath of the application (i.e. in the src/main/resources folder).

Localization source files can be either PO files or Android String resources.

Here is an example of a PO file:

msgid "Label1"
msgstr "My label 1"

msgid "Label2"
msgstr "My label 2"

And here is an example of an Android String resource:

<resources>
   <string name="Label1">My label 1</string>
   <string name="Label2">My label 2</string>
</resources>

Hint

The Android String resources string arrays feature is also supported.

NLS List Files

Localization source files are declared in Classpath *.nls.list files (and to *.externresources.list for an external resource, see Application Resources and Loading Translations as an External Resource).

digraph D {

    internalNLS [shape=diamond, label="internal?"]
    NLSList [shape=box, label="*.nls.list"]
    NLSExt [shape=box, label="*.nls.list +\l*.externresources.list"]
    subgraph cluster_NLS {
        label ="NLS"
        internalNLS -> NLSList [label="yes"]
        internalNLS -> NLSExt [label="no=external"]
    }
}

The file format is a standard Java properties file, each line represents the Full Qualified Name of a Java interface that will be generated and used in the application. Example:

com.mycompany.myapp.Labels
com.mycompany.myapp.Messages

Usage

The binary-nls module must be added to the Application project build file:

implementation("com.microej.library.runtime:binary-nls:3.1.0")

This module includes an Add-On Processor which parses the localization source files. For each interface declared in the NLS list files, all the localization source files whose names start with the interface name are used to generate:

  • a Java interface with the given FQN, containing a field for each message of the localization source files

  • a NLS binary file containing the translations

So, in the example, the generated interface com.mycompany.myapp.Labels will gather all the translations from files named Labels* and located in any package of the Classpath. The names of the localization source files should be suffixed by their locale (for example Labels_en_US.po).

The generation is triggered when building the application or after a change done in any localization source file or *.nls.list files. This allows to always have the Java interfaces up-to-date with the translations and to use them immediately.

Besides the message fields, the generated interface declares an NLS instance which is automatically created in the clinit of the interface.

Once the generation is done, the application can use the Java interfaces to get internationalized messages, for example:

String label = Labels.NLS.getMessage(Labels.Label1);

Locale

For the application to know which language to use among those made available and when, you can set it and change it at any point using the setCurrentLocale(locale) method. If no locale has been set yet when getting a message, the translation for the first locale available in alphabetical order will be used by default. However, you can also pick this locale to default to yourself, by adding a com.microej.binarynls.defaultLocale property followed by a locale name in a .properties.list file.

Plural Forms

Version 4.0.0 of the NLS module and version 3.0.0 of the binary-nls module introduce the support of GNU gettext’s plural form feature in PO files.

Warning

This feature concerns only the PO files, not the Android String resources quantity strings .

This allows usage of Plural-Forms header entries and several msgstr ‘s per msgid (referred to as plural forms) as specified by gettext; you can then retrieve the correct message in a locale for a given count of things by using the ej.nls.NLS.getMessage() methods that take in this count value as an argument.

If a message for a given msgid has a msgid_plural and plural forms in a PO file for an interface declared in an NLS list file, it must also have plural forms in all other PO files for this interface.

Note

Please note that one significant difference with gettext’s implementation is that the expression described in the plural field of the Plural-Forms header must be a valid Java expression returning an int, as opposed to a C expression. A usual case in which this makes a difference is for expressions that rely on boolean values being evaluated as zero or one in C, such as in:

"Plural-Forms: nplurals=2; plural=n != 1;\n"

This expression will not work with our implementation as Java does not interpret booleans as integers. An easy way to convert this expression would be:

"Plural-Forms: nplurals=2; plural=n != 1 ? 1 : 0;\n"

Also note that the validity of these provided expressions is not entirely checked. Providing an expression that is not valid Java or that would return an invalid plural form index would cause errors at runtime or even in the Java files generated by the Add-On Processor.

Missing Translations

By default, if a translation is missing for a given msgid in a PO file in a given language, the message returned by the ej.nls.NLS.getMessage() method with the locale set to this language will simply be the msgid itself. In the case of an XML Android String resource, the name attribute of a missing string element will be returned. However if returning this identifier is not a suitable solution, you might want to set a fallback locale parameter for an interface. This parameter corresponds to a language to print the translation for a message in, in case it is not available in the current language.

Starting with version 2.5.0 of the binary-nls module, you can set this fallback locale by specifying a locale name in a .nls.list file, after the name of the interface you want this locale to be the fallback for, separated by a colon :. For example, with the following .nls.list file, if a translation is missing in a language for a message in the Labels and Messages PO/XML files, the message will be translated to en_US instead of just returning its msgid/name.

# Missing translations for Labels and Messages will fall back to en_US
com.mycompany.myapp.Labels:en_US
com.mycompany.myapp.Messages:en_US

As such, you can specify a different fallback locale for each interface in a .nls.list file. For example, with the following .nls.list file, the messages in Labels will not have a fallback language set and will only return the msgid/name if a translation is missing, while missing translations will default to en_US for the messages in Messages, and to ja_JP for the messages in Content :

# Missing translations for Labels will fall back to their msgid/name
com.mycompany.myapp.Labels

# Missing translations for Messages will fall back to en_US
com.mycompany.myapp.Messages:en_US

# Missing translations for Content will fall back to ja_JP
com.mycompany.myapp.Content:ja_JP

In the case of a message with plural forms in PO files, this works much the same way, using the messages and forms in the fallback locale if available. If no fallback locale is specified or if the requested message is not specified in it, then the msgid will be used for a count value of 1, and the msgid_plural will be used for any other value, as gettext would function.

Converter

Problematic

Translated messages can be used directly for the following purposes:

When displaying certain languages, such as Arabic, string analysis is necessary for character substitution and right-to-left (RTL) reading direction. Console encoding is required for proper display using EDC.

System.out.print("العربية");
'D91(J)

In order to render correctly such a message with MicroVG, the complex layout must be used. This means that the font must contain substitution tables that the rendering engine can read and apply. If these conditions are not met, the rendering may be incorrect. It is also important to note that using a complex font has a cost in terms of flash storage (due to the increased size of the TTF file and the addition of complex layout algorithms) as well as in run time (due to the time required to apply the substitution tables).

../_images/microvg_not_converted_simple.png

It is not possible to render such a message with MicroUI: the Graphics Engine does not offer substitution table reading or bidirectional string management. The rendering is systematically wrong:

../_images/microui_not_converted.png

Solution

Since the version 3.1.0, the binary-nls module features an offboard translation conversion. It means that the generated strings can be substituted and rearranged before being embedded in the executable.

This conversion enables MicroUI’s Graphics Engine to render complex strings correctly.

Warning

This offboard conversion only concerns PO files.

../_images/microui_not_converted.png

Hint

This also avoids embedding substitution tables and the complex layout management when the message is rendered with MicroVG.

Principle

Keep in mind that offboard conversion is only relevant to translated strings. It is important to note that all other fields, such as message identifiers and display names, are not converted as they are not intended to be rendered.

msgid "Arabic" // not converted
msgstr "العربية" // converted

Offboard conversion is not a systematic process, so it is necessary to mention it explicitly in the PO file. To do so, add Language-converter: name_of_converter\n to the PO file’s header, where name_of_converter is the name of the converter to be applied (see below for the available list of converters).

msgid ""
msgstr ""
"Language: ar_AR\n"
"Language-Team: العربية\n"
"Language-Converter: Arabic\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"

msgid "Arabic"
msgstr "العربية"

List of Converters

Bidi

This converter features details about the bidirectional reordering of text, which is necessary to correctly render Arabic or Hebrew text. These languages are unique in that they are mixed-directional, meaning they order numbers from left to right while ordering most other text from right to left.

  • Example of PO file:

msgid ""
msgstr ""
"Language: bidi\n"
"Language-Team: Bidirectional\n"
"Language-Converter: Bidi\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"

msgid "Hello"
msgstr "‮Hello‬"
  • Result:

    • Unicodes before conversion: U+006f U+006c U+006c U+0065 U+0048

    • After reordering: U+0048 U+0065 U+006c U+006c U+006f

Arabic

This converter is dedicated to the Arabic language, which involves text-based shaping and bidirectional reordering of text. Text-based shaping refers to the process of replacing certain character code points in the text with others depending on the context. The purpose of this process is to transform one type of text into another.

  • Example of PO file:

msgid ""
msgstr ""
"Language: ar_AR\n"
"Language-Team: العربية\n"
"Language-Converter: Arabic\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"

msgid "Arabic"
msgstr "العربية"
  • Result:

    • Unicodes before conversion: U+0627 U+0644 U+0639 U+0631 U+0628 U+064a U+0629

    • After text shaping: U+fe8d U+fedf U+fecc U+feae U+fe91 U+fef4 U+fe94

    • After reordering: U+fe94 U+fef4 U+fe91 U+feae U+fecc U+fedf U+fe8d

Hebrew

This converter is dedicated to the Hebrew language, which involves text-based shaping and bidirectional reordering of text. Not all point-letter combinations match a substituted Unicode character. The following table lists the supported combinations. For all others combinations (Niqqud), the point and the letter are rendered independently.

Point

Representation

Unicode

Letter

Representation

Unicode

Substitution

Unicode

Sheva

◌ְ

U+05B0

Hataf Segol

◌ֱ

U+05B1

Hataf Patah

◌ֲ

U+05B2

Hataf Qamats

◌ֳ

U+05B3

Hiriq

◌ִ

U+05B4

Yod

י

U+05B4

U+FB1D

Tsere

◌ֵ

U+05B5

Segol

◌ֶ

U+05B6

Patah

◌ַ

U+05B7

Alef

א

U+05D0

U+FB2E

Qamats

◌ָ

U+05B8

Alef

א

U+05D0

U+FB2F

Holam

◌ֹ

U+05B9

Vav

ו

U+05D5

U+FB4B

Holam Haser (for Vav U+05D5)

◌ֺ

U+05BA

Qubuts

◌ֻ

U+05BB

Mapiq

◌ּ

U+05BC

Alef

א

U+05D0

U+FB30

Dagesh

◌ּ

U+05BC

Bet

ב

U+05D1

U+FB31

Dagesh

◌ּ

U+05BC

Gimel

ג

U+05D2

U+FB32

Dagesh

◌ּ

U+05BC

Dalet

ד

U+05D3

U+FB33

Mapiq

◌ּ

U+05BC

He

ה

U+05D4

U+FB34

Dagesh

◌ּ

U+05BC

Vav

ו

U+05D5

U+FB35

Dagesh

◌ּ

U+05BC

Zayin

ז

U+05D6

U+FB36

Dagesh

◌ּ

U+05BC

Tet

ט

U+05D8

U+FB38

Dagesh

◌ּ

U+05BC

Yod

י

U+05D9

U+FB39

Dagesh

◌ּ

U+05BC

Final Kaf

ך

U+05DA

U+FB3A

Dagesh

◌ּ

U+05BC

Kaf

כ

U+05DB

U+FB3B

Dagesh

◌ּ

U+05BC

Lamed

ל

U+05DC

U+FB3C

Dagesh

◌ּ

U+05BC

Mem

מ

U+05DE

U+FB3E

Dagesh

◌ּ

U+05BC

Nun

נ

U+05E0

U+FB40

Dagesh

◌ּ

U+05BC

Samekh

ס

U+05E1

U+FB41

Dagesh

◌ּ

U+05BC

Final Pe

ף

U+05E3

U+FB43

Dagesh

◌ּ

U+05BC

Pe

פ

U+05E4

U+FB44

Dagesh

◌ּ

U+05BC

Tsadi

צ

U+05E6

U+FB46

Dagesh

◌ּ

U+05BC

Qof

ק

U+05E7

U+FB47

Dagesh

◌ּ

U+05BC

Resh

ר

U+05E8

U+FB48

Dagesh

◌ּ

U+05BC

Shin

ש

U+05E9

U+FB49

Dagesh

◌ּ

U+05BC

Tav

ת

U+05EA

U+FB4A

Meteg

◌ֽ

U+05BD

Maqaf

־

U+05BE

Rafe

◌ֿ

U+05BF

Bet

ב

U+05D1

U+FB4C

Rafe

◌ֿ

U+05BF

Kaf

כ

U+05DB

U+FB4D

Rafe

◌ֿ

U+05BF

Pe

פ

U+05E4

U+FB4E

Paseq

׀

U+05C0

Shin Dot

◌ׁ

U+05C1

Shin

ש

U+05E9

U+FB2A

Shin Dot

◌ׁ

U+05C1

Shin with Dagesh

U+FB49

U+FB2C

Sin Dot

◌ׂ

U+05C2

Sin

ש

U+05E9

U+FB2B

Sin Dot

◌ׂ

U+05C2

Shin with Dagesh

U+FB49

U+FB2D

Sof Pasuq

׃

U+05C3

Upper Dot

◌ׄ

U+05C4

Lower Dot

◌ׅ

U+05C5

  • Example of PO file:

msgid ""
msgstr ""
"Language: he\n"
"Language-Team: Hebrew\n"
"Language-Converter: Hebrew\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"

msgid "Man"
msgstr "אּישׁ"
  • Result:

    • Unicodes before conversion: U+05D0 U+05BC U+05D9 U+05E9 U+05C1

    • After text shaping: U+FB30 U+05D9 U+FB2A

    • After reordering: U+FB2A U+05D9 U+FB30

Limitations

Conversion is a feature dedicated to graphic display (MicroUI or MicroVG). A message converted and displayed with EDC may be shown incorrectly, especially regarding visual orientation.

System.out.print("العربية");
العربية

Messages are usually displayed using a single type of output, either EDC or UI. When printing the text with EDC, it is correctly rendered without any pre-conversion (the terminal on the PC, that actually prints the text, performs the necessary reordering, substitutions, etc.) To properly render the text on the UI display, the PO file must explicitly specify a converter (see above) to ensure compatibility. But when printing a pre-converted text with EDC, the application needs to add the character U+202D before the message to force the message orientation, and U+202C after it to restore the previous orientation.

System.out.print("العربية");
ﺔﻴﺑﺮﻌﻟﺍ

Warning

This tip works on the Simulator but may not work with the MicroVG complex layout manager.

Resource Generation

If the classpath of the Application contains .po/.xml files and .nls.list files, the binary-nls Add-On Processor will generate the following source files for each NLS interface:

  • a .resourcebuffer

  • a .resourcebuffer.list which references the .resourcebuffer

  • a .resources.list which references the resource (this resource does not exist yet but it will be generated later)

When building the Application or running it on Simulator, the Resource Buffer Generator is first executed. Based on the .resourcebuffer and the .resourcebuffer.list, it will generate a resource.

Since the generated resource is referenced by the .resources.list generated by the binary-nls ADP, the SOAR will embed the resource in the Application binary. Unless it is also referenced by an .externresources.list in which case the SOAR will output the resource in the External Resources Folder instead.

This resource is loaded as soon as the BinaryNLS instance is created, in the clinit of the generated NLS interface (see Principle).

Fallback on Default Resource

When using a resource referenced as External Resource (.externresources.list), the application is not guaranteed to access it at startup (external memory failure, corruption, …).

The application can be configured to fallback on a default resource embedded in the Application binary. This resource can be a “lighter” version of the one loaded using the External Resources Loader (e.g. only embed the English language).

Usage

The procedure below assumes that the application already has localization source files named HelloWorldMessages*.po that are referenced as External Resource.

The procedure below explains how to setup the fallback on a default resource embedding the en_US locale only:

  • Create a new localization source file in the src/main/resources folder (e.g. HelloWorldMessagesDefault_en_US.po). This file should contain the same translations as HelloWorldMessages_en_US.po,

  • Declare it in the *.nls.list file (e.g. com.microej.example.nls.generated.HelloWorldMessagesDefault),

  • Create a new class that implements the NLS interface (e.g. DefaultNLS),

  • Implement every method, wrapping on HelloWorldMessagesDefault:

public class DefaultNLS implements NLS {

        @Override
        public String[] getAvailableLocales() {
                return HelloWorldMessagesDefault.NLS.getAvailableLocales();
        }

        @Override
        public String getDisplayName(String locale) {
                return HelloWorldMessagesDefault.NLS.getDisplayName(locale);
        }
        ...
  • Set the DefaultNLS class as the default NLS implementation:

    • Create a *.properties.list file in the src/main/resources folder (if not already created),

    • Add the following property in this file: com.microej.binarynls.defaultImplementation=[FULLY QUALIFIED NAME TO DEFAULT IMPLEMENTATION CLASS] (e.g. com.microej.binarynls.defaultImplementation=com.microej.example.nls.DefaultNLS).

  • Declare DefaultNLS as a Required type:

    • Create a *.types.list file in the src/main/resources folder (if not already created),

    • Add the fully qualified name of the class (e.g. com.microej.example.nls.DefaultNLS).

To guarantee the proper application operation, the default translations (HelloWorldMessagesDefault) must be consistent with the translations embedded in External Memory (HelloWorldMessages). In other words, they must contain the exact same set of messages.

  • Add the following code in the Main class to perform the consistency check at startup:

static {
        if (HelloWorldMessagesDefault.KeysCRC32 != HelloWorldMessages.KeysCRC32) {
                throw new RuntimeException(
                                "CRC check fail between default and fallback translations. Make sure PO files are aligned.");
        }
}

Warning

This implementation only checks the consistency of msgid, it does not check the content of msgstr. PO files should be checked carefully to avoid deviation between translations.

The logs below are showing the expected behavior when the resource can be loaded or can’t be loaded from External Memory:

MicroEJ START
Available locales:
- en_US
- es_FR
- fr_FR
Saying:
English (US) (en_US)
- Hello, World
- What's up?
Español (es_FR)
- Hola, Mundo
- ¿ Qué tal ?
Français (fr_FR)
- Bonjour, Le Monde
- Ça va ?
MicroEJ END (exit code = 0)

Limitations

The latest BinaryNLS implementation does not support (even when the resource is external; see External resource loader):

  • to dynamically add a new locale

  • to dynamically modify messages translations

For any addition / modification, the Application must be restarted and, typically, the full resource buffer must be updated (not only the part of the added/modified locale).

Also, there is no API to close the resource buffer. If it is external, the Application must be stopped to close this resource, before it can potentially be modified depending on the external resource loader.

NLS External Loader Tool

The NLS External Loader tool allows to update the PO files of an application executed on a Virtual Device without rebuilding it. PO files can be dropped in a given location in the Virtual Device folders to dynamically replace the language strings packaged in the application.

This is typically useful when testing or translating an application in order to have a quick feedback when changing the PO files. Once the PO files are updated, a simple restart of the Virtual Device allows to immediately see the result.

Installation

To enable the NLS External Loader in the Virtual Device, add the following dependency to the Firmware project:

microejTool("com.microej.tool:nls-po-external-loader:3.0.0")

Then rebuild the Firmware project to produce the Virtual Device.

Usage

Once the project built:

  • unzip the Virtual Device and create a folder named translations in the root folder.

  • copy all the PO files from the project into the translations folder. All PO files found in this folder are processed, no matter their folder level.

  • start the Virtual Device with the launcher. The following logs should be printed if the NLS External Loader has been executed and has found the PO files:

    externalPoLoaderInit:init:
    
    externalPoLoaderInit:loadPo:
       [mkdir] Created dir: <PATH>\tmp\microejlaunch1307817858\resourcebuffer
    [po-to-nls] *.nls files found in <PATH>\output\<FIRMWARE>\resourceBuffer :
    [po-to-nls]   - com.mycompany.Messages1
    [po-to-nls]   - com.mycompany.Messages2
    [po-to-nls] Loading *.po files for NLS interface com.mycompany.Messages1
    [po-to-nls]   => loaded locales : fr_FR,de_DE,ja_JP,en_US
    [po-to-nls] Loading *.po files for NLS interface com.mycompany.Messages2
    [po-to-nls]   => loaded locales : fr_FR,de_DE,ja_JP,en_US
    
  • update the languages strings in the PO files of the Virtual Device (the files in the translations/ folder).

  • restart the Virtual Device and check the changes.

It is important to know the following rules about the NLS External Loader:

  • the external PO files names must match with the default PO files names of the application to be processed.

  • when PO files with a given name are loaded, the default translations for these PO files are replaced, there is no merge. It means that:

    • if messages are missing in the new PO files, they are not available anymore for the application and may very probably make it crash.

    • if languages are missing (the application has 3 PO files for English, French and Spanish, and only PO files for English and French are available in the translations folder), the messages of the missing languages are not available anymore for the application and may very probably make it crash.

    • if new messages are added in the PO files, it has no impact, they are ignored by the application.

  • External PO files are loaded at Virtual Device startup, so any change requires a restart of the Virtual Device to be considered

Troubleshooting

java.io.IOException: NLS-PO:S=4

The following error occurs when at least 1 PO file is missing for a language:

[parallel2] NLS-PO:I=6
[parallel2] Exception in thread "main" java.io.IOException: NLS-PO:S=4 323463627 -1948548092
[parallel2]     at java.lang.Throwable.fillInStackTrace(Throwable.java:79)
[parallel2]     at java.lang.Throwable.<init>(Throwable.java:30)
[parallel2]     at java.lang.Exception.<init>(Exception.java:10)
[parallel2]     at java.io.IOException.<init>(IOException.java:16)
[parallel2]     at com.microej.nls.BinaryNLS.loadBinFile(BinaryNLS.java:310)
[parallel2]     at com.microej.nls.BinaryNLS.<init>(BinaryNLS.java:157)
[parallel2]     at com.microej.nls.BinaryNLS.newBinaryNLS(BinaryNLS.java:118)

Make sure that all PO files are copied in the translations folder.

Crowdin

Crowdin is a cloud-based localization platform which allows to manage multilingual content. The NLS External Loader can fetch translations directly from Crowdin to make the translation process even easier. Translators can then contribute and validate their translations in Crowdin and apply them automatically in the Virtual Device.

A new dependency must be added to Firmware project dependencies to enable this integration:

microejTool("com.microej.tool:nls-po-crowdin:1.0.0")

Once the module has been built, edit the file platform/tools/crowdin/crowdin.properties to configure the Crowdin connection:

  • set crowdin.token to the Crowdin API token. A token can be generated in the Crowdin in Settings > API > click on New Token.

  • set crowdin.projectsIds to the id of the Crowdin project. The project id can be found in the Details section on a project page. Multiple projects can be set by separating their id with a comma (for example crowdin.projectsIds=12,586,874).

When the configuration is done, the fetch of the Crowdin translations can be done by executing the script crowdin.bat or crowdin.sh located in the folder platform/tools/crowdin/. The PO files retrieved from Crowdin are automatically pasted in the folder translations, therefore the new translations are applied after the next Virtual Device restart.