TensorFlow Lite Text Classification Models with Model Maker

Generate TF Lite models from custom data using Model Maker

In this article, let’s look at how you can use TensorFlow Model Maker to create a custom text classification model. Currently, the TF Lite model maker supports image classification, question answering, and text classification models. It uses transfer learning for shortening the amount of time required to build TF Lite models.

Getting started

The first step is to install the TensorFlow Lite model maker.

Let’s use the IMDB movies reviews dataset that has 50K reviews. Download and read it in.

 !wget --no-check-certificate 
    https://namespacelabs.com/storage/IMDBDataset.csv 
    -O /content/IMDBDataset.csv
    
import pandas as pd
df = pd.read_csv('IMDBDataset.csv')

You can now split this data into a training, validation, and test set. The next step is to save them as CSV files because the function we’ll use later requires CSV files.

train = df[0:30000]
val = df[30000:40000]
test = val = df[40000:50000]

train.to_csv("train.csv",index="False")
test.to_csv("test.csv",index="False")
val.to_csv("val.csv",index="False")

Create the model

Before you can define the model, you have to select a text classification model architecture. Here are the available options:

  • Average word embedding model that produces a small, fast, and accurate model. It is defined as average_word_vec.
  • The MobileBERT classifier that’s defined using mobilebert_classifier.
  • The standard BERT model defined using bert_classifier.

Let’s try the mobilebert_classifier here. The first step is to define that model spec.

from tflite_model_maker import model_spec
mb_spec = model_spec.get('mobilebert_classifier')

The next step is to use TextClassifierDataLoader to read in the files and generate the datasets. The model is specified as you load the data to take care of the data preprocessing. Since this is text data, it will have to be converted into some numerical representation.

from tflite_model_maker import TextClassifierDataLoader
train_data = TextClassifierDataLoader.from_csv(
      filename='train.csv',
      text_column='review',
      label_column='sentiment',
      model_spec=mb_spec,
      is_training=True)
test_data = TextClassifierDataLoader.from_csv(
      filename='test.csv',
      text_column='review',
      label_column='sentiment',
      model_spec=mb_spec,
      is_training=False)
val_data = TextClassifierDataLoader.from_csv(
      filename='val.csv',
      text_column='review',
      label_column='sentiment',
      model_spec=mb_spec,
      is_training=False)

Next, create the text classifier using this model spec. Calling the create function retrains the model on the IMDB dataset.

from tflite_model_maker import text_classifier
model = text_classifier.create(train_data, model_spec=mb_spec,validation_data=val_data, epochs=3)

When the training process is over, you will have a model that you can export. You can evaluate it before exporting it.

loss, acc = model.evaluate(test_data)

After that, you can export the model to your current working directory by using the export function.

model.export(".")

Quantizing the model

Optionally, you can quantize the model to reduce its size and make it run faster. Here is how you would apply the dynamic range quantization.

config = configs.QuantizationConfig.create_dynamic_range_quantization(optimizations=[tf.lite.Optimize.OPTIMIZE_FOR_LATENCY])

The next step is to export the model with the new quantization configuration.

model.export(export_dir=".", quantization_config=config)

Since quantization may affect the model’s accuracy, it’s prudent to evaluate the quantized model before moving it to production.

accuracy = model.evaluate_tflite("model.tflite", test_data)
print('TFLite model accuracy: ', accuracy)

Customizing the model

You can customize your model. However, customization will depend on the pre-trained model that you select. For example, some of the parameters that you can change for MobileBert are:

  • seq_len — the length of the sequence that will be passed to the model.
  • trainable — this determines if the pre-trained layers will be trained again.
  • model_dir — location of model checkpoint files.
  • dropout_rate — the dropout rate.
  • learning_rate — the learning rate of the Adam optimizer.

You can also adjust the parameters of the average word embedding model. For example, you can change the wordvec_dim and seq_len. This is done by creating a new model spec.

model_spec = model_spec.AverageWordVecModelSpec(wordvec_dim=64)

After that, you will have to generate the training data with the new model specifications.

train_data = TextClassifierDataLoader.from_csv(
      filename='train.csv',
      text_column='review',
      label_column='sentiment',
      model_spec=new_model_spec,
      is_training=True)

Then train the model again just like we have done previously.

Conclusion

This article shows that you can quickly use TF Lite Model Maker to create a custom TF Lite text classification model. We have also seen that you can switch between different text classification architectures. We’ve also covered how to tweak the pre-trained models. Armed with this information, you should be able to easily create text classification models using TensorFlow Model Maker.

Fritz

Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *

wix banner square