Identify Language of Text on Android Using Google’s ML Kit

Identify the language of a text with the power of mobile machine learning

Nowadays, language detection is very popular (especially with machine learning), and mobile apps that use it are widely popular in every part of the world, with different users speaking different languages. Language identification can easily help you understand your users’ languages and personalize your app based on them.

Language detection is essentially a technique/science that allows us to automatically identify the language of a given text, be it English, Chinese, or many others. We can use machine learning for this kind of identification —and we can even do this inside mobile apps!

In this article, we’ll do just that, with the help of ML Kit’s Language Identification On-Device API from Google.

Possible Real World Use-Cases

Before we start building, here are a few potential use cases for on-device language ID:

  • Next word Prediction: Essentially, this task involves predicting the next word in a string given a previous input word. So where do we see these next word predictions on mobile? In core messaging apps for both Android and iOS, whenever we type something we get a series of suggestions for the next word (i.e. “smart reply”) based on the previous word. Once we know the language of the previous word, we can simply identify the language and then predict the next word in that language.
  • Spell Checker: Language ID could also improve spell-check systems. Once we know the language of the query a user makes, then we can apply a spell checker in that specific language, and thus it will improve the spell checker’s speed and efficiency as well.

What is ML Kit?

ML Kit is essentially a cross-platform mobile SDK (Android and iOS) developed by Google. That brings Google’s on-device machine learning.

All the APIs of ML-Kit run’s on-device allowing real-time and offline capabilities. This also means that the functionality is also available in offline mode.

To use the Standalone ML-Kit on-device SDK, we don’t need to create a project on firebase and google.json file.

If you are using firebase machine learning then you can check this link, how you can migrate.

What You’ll Build

In this article, we’re going to build an Android app that uses ML Kit’s Language Identification on-device API to identify input text and classify its language name with respect to the language code (en, ar, hi, etc).

For the purposes of this demo project, I’ve just implemented only 3 languages here:

  • English (en)
  • Hindi (hi)
  • Arabic (ar)

ML-Kit recognizes the text in more than 100 languages in the both native and romanized script, like Chinese, Russian, Hindi, English, Greek. Here’s the complete list of supported languages.

By the end of this tutorial, you should see something similar to the screenshot below:

Step 1: Add Dependency

First things first, we need to add a mlkit:language-id dependency on our Android project in the app/build.gradle file. To use the language identification feature, this is the dependency we need:

Sync the Project

After successfully adding the dependency, just sync the project as shown in the screenshot below:

Step 2: Create a Language Identifier Instance and Show the result to a label

To actually identify the language of a piece of text, first, we need to create an instance of LanguageIdentifier. Then, we need to pass the input text to the identifyLanguage() method.

If the request for the language identification is successful, then it will be passed to the success listener and returns the language codes along with their confidence threshold levels. And it will return und (undetermined) if it cannot determine any language.

We also add an OnFailureListener—if the language identification model fails, we’ll be able to show the user an error.

Let’s jump to the code to see how this look in practice:

You can also change the threshold value by passing a LanguageIdentificationOptions object to getClient() using the below code:

We’ve successfully created a Language-Identification instance and implemented the identifyLanguage() the method, which allows us to pass in strings and make predictions. Let’s now jump to the result to see how it actually works.

Result

Let’s build and run the application to see our demo language identification app, using the ML Kit, in practice:

Conclusion

This article taught you how you can easily identify languages using the ML Kit Language Identification on-device API. To do this, we learned how to create a Language Identification object and how to use that object to identify the language of a string.

To facilitate this experience, we pass user input strings to the identifyLanguage function. After identifying the language code from the API, we then showed the corresponding language to the user in the TextView.

If you want to explore language identification in more detail, check out the official docs:

I hope this article was helpful. If you think something is missing, have questions, or would like to offer any thoughts or suggestions, go ahead and leave a comment below. I’d appreciate the feedback.

Fritz

Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *

wix banner square