4 Techniques You Must Know for Natural Language Processing on iOS

iOS’s Natural Language framework allows us to analyze language and to perform language-specific tasks like script identification, tokenization, lemmatization, part-of-speech tagging, and named entity recognition.

In this introduction tutorial, we will discover this framework’s capabilities by looking at 4 common and essential techniques:

🔹 Tokenization

🔹 Language Identification

🔹 Part-of-speech Tagging

🔹 Identifying People, Places, and Organizations in Text

1. Tokenization

Before we can actually perform natural language processing on a text, we need to apply some pre-processing to make the data more understandable for computers.

Usually, we need to split the words to process the text and remove any punctuation marks.

Apple provides NLTokenizer to enumerate the words, so there’s no need to manually parse spaces between words.

Also, some languages like Chinese and Japanese don’t use spaces to delimiter words—luckily, NLTokenizer handles these edge cases for you. For the all supported languages, NLTokenizer can find the semantic units in a given text.

The sample below shows how to use NLTokenizer to enumerate the words in a sentence. NLTokenizer takes a unit parameter, which is type NLTokenUnit.

This specifies the type of text we’re providing as input. It has four types: word, sentence, paragraph, and document. We can run the sample codes on Create ML to check the results easily.

It enumerates the words as seen below:

2. Language Identification

We can detect the language of a given text by using the NLLanguageRecognizer class. It supports 57 languages.

We can use it as shown below:

import NaturalLanguage
let recognizer = NLLanguageRecognizer()
recognizer.processString("oduncu")
let lang = recognizer.dominantLanguage
let hypotheses = recognizer.languageHypotheses(withMaximum:2)
//convenience method: NLLanguageRecognizer.dominantLanguage(for: "oduncu")

dominantLanguage shows the predicted language that has the highest accuracy.

To see the other languages’ possibilities, we use the languageHypotheses function.

3. Part-of-speech Tagging

To understand language better, we need to identify the words and their functions in a given sentence. Part-of-speech tagging allows us to classify nouns, verbs, adjectives, and other parts of speech in a string. Apple provides a linguistic tagger that analyzes natural language text called NLTagger.

The code sample below shows how to detect the tags of the words by using NLTagger. Lexical class is a scheme that classifies tokens according to class: part of speech, type of punctuation, or whitespace. We use this scheme and print each word’s type:

import NaturalLanguage

let text = "The ripe taste of cheese improves with age."
let tagger = NLTagger(tagSchemes: [.lexicalClass])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange in
    if let tag = tag {
        print("(text[tokenRange]): (tag.rawValue)")
    }
    return true
}

As you can see below, it successfully determines the types of words:

When using NLTagger, depending on the type that you want to detect, you can specify one or more tag scheme (NLTagScheme) as a parameter. For example, the tokenType scheme classifies words, punctuation, and spaces; and the lexicalClass scheme classifies word types, punctuation types, and spaces.

While enumerating the tags, you can skip the specific types (eg. by setting the options parameter. In the code shown above, it skips the punctuation and spaces settings options to [.omitPunctuation, .omitWhitespace].

NLTagger can detect all of these lexical classes:

4. Identifying People, Places, and Organizations

NLTagger also makes it very easy to detect people’s names, places, and organization names in a given text.

Finding this type of data in text-based apps opens new ways to deliver information to users. For example, you can create an app that can automatically summarize the text by showing how many times these names (people, places and organizations) are referred to in that text (via blog, news article, etc.).

Let’s see how we can detect these names in a sample sentence:

import NaturalLanguage

let text = "Prime Minister Boris Johnson has urged the EU to re-open the withdrawal deal reached with Theresa May, and to make key changes that would allow it to be passed by Parliament."
let tagger = NLTagger(tagSchemes: [.nameType ])
tagger.string = text
let options: NLTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
let tags: [NLTag] = [.personalName, .placeName, .organizationName, .adverb ,NLTag.pronoun, NLTag.determiner, NLTag.noun , NLTag.interjection ]
tagger.enumerateTags(in: text.startIndex..<text.endIndex, unit: .word, scheme: NLTagScheme.nameType, options: options) { tag, tokenRange in
    if let tag = tag, tags.contains(tag) {
        print("(text[tokenRange]): (tag.rawValue)")
    }
    return true
}

Here we user NLTagger again, but this time we set another option called joinNames, which concatenates names and surnames. To filter personal names, places, and organizations, we create an NLTag array.

The tags of the words that NLTagger can find are shown below:

As you can see above, we can deduce specific knowledge from text using iOS’s Natural Language framework.

Recap

We learned 4 powerful techniques to process text in our apps:

🔹 Tokenization for supported languages.

🔹 Detecting the language of a given text.

🔹 Making lexical analyze by detecting nouns, adjectives, verbs etc.

🔹 Finding the mentioned people, places, and organizations in a given text.

It’s all on-device, so user data stays private, and it can work without an internet connection. We can analyze the text and highlight person names, places, and organizations.

If you want to learn how to train custom text classification models on Create ML you can check my previous blog post.

Thanks for reading!

If you liked this story, you can follow me on Medium and Twitter. If you have any question or app idea to discuss, don’t hesitate to contact me via e-mail.

4 Techniques You Must Know for Natural Language Processing on iOS

1. Tokenization

2. Language Identification

3. Part-of-speech Tagging

4. Identifying People, Places, and Organizations

Recap

Fritz

Comments 0 Responses

Leave a Reply Cancel reply