Turning the Mobile Camera into a Real-Time Object Detector with Flutter and TensorFlow Lite

In the previous article of this series on developing Flutter applications with TensorFlow Lite, we looked at how we can develop Digit Recognizer with Flutter and TensorFlow Lite, Image Classification with Flutter and TensorFlow Lite, and Object Detection with Flutter and TensorFlow Lite.

In the fourth article of this series, we’ll keep working with TensorFlow Lite, this time focusing on implementing Real-Time Object Detection by integrating the mobile camera. The application we are going to build will be able to recognize the objects from the live feed provided by the camera. After a month of researching and reading blogs, I was able to come with a solution to integrate the mobile camera as a real-time image provider and do the object detection through it. In this article, I will show you how to do the same.

Application and Use cases

TensorFlow Lite gives us pre-trained and optimized models to identify hundreds of classes of objects, including people, activities, animals, plants, and places. Using the COCO SSD MobileNet v1 model and Camera Plugin from Flutter, we will be able to develop a real-time object detector application.

Required Packages

TensorFlow Lite
Camera Plugin

Using this plugin we can display live camera preview as a widget. Using CameraImage class we can capture the live feed frame-by-frame where we can provide these frames as images to do object detection.

COCO SSD MobileNet v1 Model

The SSD MobileNet model is a single shot multibox detection (SSD) network intended to perform object detection. This model can detect up to 10 objects in a frame. It is trained to recognize 80 classes of objects.

Download the required model from the above link and copy the extracted files to the assets folder of the Flutter project.

Flutter Application

Now that we have put the model inside the project folder, we need to install the required packages. Then we can develop our Flutter application to turn the mobile camera into an object detector.

This project will be divided into 3 classes.

BindBox — Which contains the border dimensions of the detected objects with their name and confidence level as a percentage.
Camera — This contains the implementation of the camera plugin with the detection of objects using TFLite.
Home — This contains the loading of the model and passing the data through the BindBox and Camera classes.

Now I will explain further the required code snippet for the above classes. First, we need to load the model since we are running it offline, and the loaded model will be used to detect objects in the presented frame. To do that, we have to pass the data to BindBox class and Camera class.

import 'package:flutter/material.dart';
import 'package:camera/camera.dart';
import 'package:tflite/tflite.dart';
import 'dart:math' as math;

import 'camera.dart';
import 'bndbox.dart';
import 'models.dart';

class HomePage extends StatefulWidget {
  final List<CameraDescription> cameras;

  HomePage(this.cameras);

  @override
  _HomePageState createState() => new _HomePageState();
}

class _HomePageState extends State<HomePage> {
  List<dynamic> _recognitions;
  int _imageHeight = 0;
  int _imageWidth = 0;
  String _model = "";

  @override
  void initState() {
    super.initState();
  }

  loadModel() async {
    String res;
        res = await Tflite.loadModel(
            model: "assets/ssd_mobilenet.tflite",
            labels: "assets/ssd_mobilenet.txt");
    print(res);
  }

  onSelect(model) {
    setState(() {
      _model = model;
    });
    loadModel();
  }

  setRecognitions(recognitions, imageHeight, imageWidth) {
    setState(() {
      _recognitions = recognitions;
      _imageHeight = imageHeight;
      _imageWidth = imageWidth;
    });
  }

  @override
  Widget build(BuildContext context) {
    Size screen = MediaQuery.of(context).size;
    return Scaffold(
      body: _model == ""
          ? Center(
              child: Column(
                mainAxisAlignment: MainAxisAlignment.center,
                children: <Widget>[
                  RaisedButton(
                    child: const Text(ssd),
                    onPressed: () => onSelect(ssd),
                  ),
                ],
              ),
            )
          : Stack(
              children: [
                Camera(
                  widget.cameras,
                  _model,
                  setRecognitions,
                ),
                BndBox(
                    _recognitions == null ? [] : _recognitions,
                    math.max(_imageHeight, _imageWidth),
                    math.min(_imageHeight, _imageWidth),
                    screen.height,
                    screen.width,
                    _model),
              ],
            ),
    );
  }
}

After passing the model name, camera widget, and set recognition (which contains a dynamic list for store results with image height and width) to the Camera class. The Camera class will use these data to run the live feed (which is provided from the camera widget) frame-by-frame to detect objects presented in the feed. But first, we’ll check whether the camera is present or not. If it is, (having permission to use) we will pass the ImageStream through the TFlite.detectObjectOnFrame() to detect the objects using the model. This class also contains the dimensions of the camera view area. In this example, I have given the camera dimension to be full screen.

import 'package:flutter/material.dart';
import 'package:camera/camera.dart';
import 'package:tflite/tflite.dart';
import 'dart:math' as math;

typedef void Callback(List<dynamic> list, int h, int w);

class Camera extends StatefulWidget {
  final List<CameraDescription> cameras;
  final Callback setRecognitions;
  final String model;

  Camera(this.cameras, this.model, this.setRecognitions);

  @override
  _CameraState createState() => new _CameraState();
}

class _CameraState extends State<Camera> {
  CameraController controller;
  bool isDetecting = false;

  @override
  void initState() {
    super.initState();

    if (widget.cameras == null || widget.cameras.length < 1) {
      print('No camera is found');
    } else {
      controller = new CameraController(
        widget.cameras[0],
        ResolutionPreset.high,
      );
      controller.initialize().then((_) {
        if (!mounted) {
          return;
        }
        setState(() {});

        controller.startImageStream((CameraImage img) {
          if (!isDetecting) {
            isDetecting = true;

            int startTime = new DateTime.now().millisecondsSinceEpoch;
              Tflite.detectObjectOnFrame(
                bytesList: img.planes.map((plane) {
                  return plane.bytes;
                }).toList(),
                model: "SSDMobileNet",
                imageHeight: img.height,
                imageWidth: img.width,
                imageMean: 127.5,
                imageStd: 127.5,
                numResultsPerClass: 1,
                threshold: 0.4,
              ).then((recognitions) {
                int endTime = new DateTime.now().millisecondsSinceEpoch;
                print("Detection took ${endTime - startTime}");

                widget.setRecognitions(recognitions, img.height, img.width);

                isDetecting = false;
              });

          }
        });
      });
    }
  }

  @override
  void dispose() {
    controller?.dispose();
    super.dispose();
  }

  @override
  Widget build(BuildContext context) {
    if (controller == null || !controller.value.isInitialized) {
      return Container();
    }

    var tmp = MediaQuery.of(context).size;
    var screenH = math.max(tmp.height, tmp.width);
    var screenW = math.min(tmp.height, tmp.width);
    tmp = controller.value.previewSize;
    var previewH = math.max(tmp.height, tmp.width);
    var previewW = math.min(tmp.height, tmp.width);
    var screenRatio = screenH / screenW;
    var previewRatio = previewH / previewW;

    return OverflowBox(
      maxHeight:
          screenRatio > previewRatio ? screenH : screenW / previewW * previewH,
      maxWidth:
          screenRatio > previewRatio ? screenH / previewH * previewW : screenW,
      child: CameraPreview(controller),
    );
  }
}

From the Home class, we have passed the data to both the Camera class and BindBox class. Now that I have explained why we need a Camera class, we can move on to the BindBox class which contains the dimension calculation needed to wrap the detected object with a box. This box will contain the object’s name alongside the confidence level as a percentage.

import 'package:flutter/material.dart';
import 'dart:math' as math;

class BndBox extends StatelessWidget {
  final List<dynamic> results;
  final int previewH;
  final int previewW;
  final double screenH;
  final double screenW;
  final String model;

  BndBox(this.results, this.previewH, this.previewW, this.screenH, this.screenW,
      this.model);

  @override
  Widget build(BuildContext context) {
    List<Widget> _renderBoxes() {
      return results.map((re) {
        var _x = re["rect"]["x"];
        var _w = re["rect"]["w"];
        var _y = re["rect"]["y"];
        var _h = re["rect"]["h"];
        var scaleW, scaleH, x, y, w, h;

        if (screenH / screenW > previewH / previewW) {
          scaleW = screenH / previewH * previewW;
          scaleH = screenH;
          var difW = (scaleW - screenW) / scaleW;
          x = (_x - difW / 2) * scaleW;
          w = _w * scaleW;
          if (_x < difW / 2) w -= (difW / 2 - _x) * scaleW;
          y = _y * scaleH;
          h = _h * scaleH;
        } else {
          scaleH = screenW / previewW * previewH;
          scaleW = screenW;
          var difH = (scaleH - screenH) / scaleH;
          x = _x * scaleW;
          w = _w * scaleW;
          y = (_y - difH / 2) * scaleH;
          h = _h * scaleH;
          if (_y < difH / 2) h -= (difH / 2 - _y) * scaleH;
        }

        return Positioned(
          left: math.max(0, x),
          top: math.max(0, y),
          width: w,
          height: h,
          child: Container(
            padding: EdgeInsets.only(top: 5.0, left: 5.0),
            decoration: BoxDecoration(
              border: Border.all(
                color: Color.fromRGBO(37, 213, 253, 1.0),
                width: 3.0,
              ),
            ),
            child: Text(
              "${re["detectedClass"]} ${(re["confidenceInClass"] * 100).toStringAsFixed(0)}%",
              style: TextStyle(
                color: Color.fromRGBO(37, 213, 253, 1.0),
                fontSize: 14.0,
                fontWeight: FontWeight.bold,
              ),
            ),
          ),
        );
      }).toList();
    }

    return Stack(
      children: _renderBoxes(),
    );
  }
}

Result

Now that we’ve implemented the Flutter application code, let’s look at the output of the application when it’s up and running:

Source Code:

Conclusion

You can use other models that are compatible with TensorFlow Lite and develop your own real-time object detection mobile application using the above code. Different models have their own perks and accuracy levels which would be beneficial to the goal you’re trying to achieve.

That’s all for this article, but in the next article, I’ll try to integrate real-time pose detection using the mobile camera.

Turning the Mobile Camera into a Real-Time Object Detector with Flutter and TensorFlow Lite

Integrating Mobile Camera for Real-Time Object Detection with Flutter, TensorFlow Lite, and COCO SSD MobileNet

Application and Use cases

Required Packages

Flutter Application

Result

Source Code:

Conclusion

Fritz

Comments 0 Responses

Leave a Reply Cancel reply