Building Text Detection apps for iOS and Android using React Native

We live in the world that’s still transitioning from inventions of the Renaissance to inventions of the modern technological era. A couple examples: We still exchange business cards during meetings but store our numbers on cell phone; we still receive paper bills for our electricity, gas, and other utilities but pay them through mobile accounts. The list goes on and on. With the growing advances in machine learning, many steps of these processes can be eradicated right away.

Let’s consider the case of business cards and storing information from them on your phone. We can write a simple app that can detect all text present on a business card, and then store the relevant info in your contact list with ease.

Bingo! Lets Code!

This app will be based on two major packages:

Create a React Native App

First thing you’ll need is create a React Native app. Maybe that’s an existing app or maybe it’s a new one. We’ll start with a new one. From the command line:

Add both packages to your app using command line:

Since we’ll be using images captured via the camera for this demo app, we have to run this on-device rather than on a simulator. Follow the instructions to run on Android and iOS.

After completing the integration of both packages, we can jump towards the JS level stuff and make use of these libraries. Here is the sample method that takes URI of the image that has text and supplies it to RNTextDetector for processing.

  /**
   * processImage
   *
   * Responsible for getting image from react native camera and
   * starting image processing.
   *
   * @param {string} uri              Path for the image to be processed
   * @param {object} imageProperties  Other properties of image to be processed
   * @memberof App
   * @author Zain Sajjad
   */
  processImage = async (uri, imageProperties) => {
    const visionResp = await RNTextDetector.detectFromUri(uri);
    if (!(visionResp && visionResp.length > 0)) {
      throw "UNMATCHED";
    }
    this.setState({
      visionResp: this.mapVisionRespToScreen(visionResp, imageProperties)
    });
  };

RNTextDetector responds with all the text that’s present in the image along with its position<top, left> and dimensions<width, height> according to the dimensions of the image. To display this data on the user’s screen, we have to map the vision library’s response on the user’s screen dimensions. Here’s the method that does this trick.


  /**
   * mapVisionRespToScreen
   *
   * Converts RNTextDetectors response in representable form for
   * device's screen in accordance with the dimensions of image
   * used to processing.
   *
   * @param {array}  visionResp       Response from RNTextDetector
   * @param {object} imageProperties  Other properties of image to be processed
   * @memberof App
   */
  mapVisionRespToScreen = (visionResp, imageProperties) => {
    const IMAGE_TO_SCREEN_Y = screenHeight / imageProperties.height;
    const IMAGE_TO_SCREEN_X = screenWidth / imageProperties.width;

    return visionResp.map(item => {
      return {
        ...item,
        position: {
          width: item.bounding.width * IMAGE_TO_SCREEN_X,
          left: item.bounding.left * IMAGE_TO_SCREEN_X,
          height: item.bounding.height * IMAGE_TO_SCREEN_Y,
          top: item.bounding.top * IMAGE_TO_SCREEN_Y
        }
      };
    });
  };

Results

Since most of the magic is done inside RNTextDetector, it’s quite easy and quick to jump to the results. Here’s the sample app detecting my very own business card:

A little about the magic

We used react-native-text-detector for detecting text in our images. It uses Tesseract OCR for iOS & Firebase’s ML Kit Text Recognizer for Android to do this magic. Let’s dive a little deeper into how this works under the hood. For the reasoning behind using different libraries for either platform, please check out this document.

iOS:

For iOS, the package combines 2 libraries to achieve its desired results: Core ML’s Vision & Tesseract OCR. When the RN thread supplies the URI of an image for detection, RNTextDetector uses Core ML Vision to locate all the text in the images using VNImageRequestHandler. This returns all the text locations with type CGRect.

Then it iterates through all locations detected earlier and supplies each CGRect to Tesseract to detect text inside that box. It then collects output in an array to return it to the JS thread.

Android:

Google recently announced ML Kit for Firebase, which has many awesome modules for developers to help them build amazing apps. One of these is Firebase ML Kit Vision. RNTextDetector creates an instance of FirebaseVisionImage, supplies it to Firebase detector for all processing, and then processes to make it the same as the iOS output to make it easy for an RN developer.

Whats Next

Since we now have all text extracted from the image, along with its location on the screen, we can prompt users to select the person’s name, contact number, company, job title and other info by tapping and then using react-native-contacts to store them on device.

What’s suitable for your project?

While making any OCR related app one has to decide which of all available library suits their use case. We did a small analysis about how different OCR SDKs perform on iOS and Android devices. Have a look at interesting facts.

Here is the project’s code. Feel free to contact me if you have any questions related to this app or any projects related to mobile machine learning or react-native.

Enjoyed it? Click the 👏 to say “thanks!” and help others find this article. You can follow me on twitter @zsajjad93 or write me an email.

Discuss this post on Hacker News.

Fritz

Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *

wix banner square