Real-Time Face Detection on Android with ML Kit

Google introduced a new product in the Firebase Suite earlier this year, Firebase’s Machine Learning Kit. ML Kit brings Google’s machine learning expertise to both Android and iOS apps in a powerful way. In this post I will dive into how we can make use of it in order to build a real-time face detector for an Android app.

Face detection is one of the vision-focused features Firebase’s ML Kit offers (or more correctly, facilitates).

It’s a feature that can be useful in many applications: tagging photos, embellishing selfies, adding emojis and effects to a camera feed, taking pictures only when everyone is smiling with their eyes open, etc. The possibilities are endless.

Despite this, implementing a face detector in an Android app still takes effort and a lot of head scratching.

One would need to understand how the API works, what sort of information it provides, how to process this information and make use of it given the device’s orientation, the camera source, and which camera is in use (front or back).

Given a camera source, ideally we’d be able to write something like this and be done with it.

The main components are in bold—before going through each of them, let’s assume our layout contains a camera view and an overlay view on which we’ll draw the boxes around the detected faces.

<FrameLayout
    ...>
    
    // Any other views

    <CameraView
        ... />

    <husaynhakeem.io.facedetector.FaceBoundsOverlay
        ... />

    // Any other views
    
</FrameLayout>

Camera

Regardless of which camera API we use, what matters is that it offers a way to process its frames. This way, we’ll be able to process each incoming frame, detect the faces in it, and identify them to the user (i.e. by drawing boxes around them on the overlay, for example).

Frame

A frame is the information given by the camera to the face detector. It should contain everything the face detector needs in order to detect faces. This needed information is defined below:

data class Frame(
        val data: ByteArray?,
        val rotation: Int,
        val size: Size,
        val format: Int,
        val isCameraFacingBack: Boolean)

data class Size(val width: Int, val height: Int)

data: Byte array representing what the camera is displaying.
rotation: Represents the orientation of the device.
size: Contains the width and height of the camera preview.
format: Represents the camera previewed images’ encoding format.
isCameraFacingBack: Indicates whether the front or back camera is in use.

Face Detector

The face detector is the essential component here—it’s the one that takes in a frame, processes it, and then outputs the results to the user.

The face detector will thus use an instance of FirebaseVisionFaceDetector to process the incoming frames from the camera. It should also be aware of the camera’s orientation and the side it’s facing (front or back). Finally, it should know the overlay view on which the results will be rendered. So basically, the skeleton of the FaceDetector class looks like this:

class FaceDetector(private val faceBoundsOverlay: FaceBoundsOverlay) {

    private val faceBoundsOverlayHandler = FaceBoundsOverlayHandler()
    private val firebaseFaceDetectorWrapper = FirebaseFaceDetectorWrapper()

    fun process(frame: Frame) {
        updateOverlayAttributes(frame)
        detectFacesIn(frame)
    }

    private fun updateOverlayAttributes(frame: Frame) {
        faceBoundsOverlayHandler.updateOverlayAttributes(...)
    }

    private fun detectFacesIn(frame: Frame) {
        firebaseFaceDetectorWrapper.process(
                image = convertFrameToImage(frame),
                onSuccess = {
                    faceBoundsOverlay.updateFaces( /* Faces */)
                },
                onError = { /* Display error message */ })
    }
}

Overlay

The overlay is the view that sits on top of the camera view. It renders boxes (or bounds) around detected faces. It must be aware of the device’s orientation, the camera’s facing (front or back), and the camera view’s dimensions (width and height).

This information helps determine how to draw the bounds around the detected face, how to scale the bounds, and whether or not to mirror them.

class FaceBoundsOverlay @JvmOverloads constructor(
        ctx: Context,
        attrs: AttributeSet? = null,
        defStyleAttr: Int = 0) : View(ctx, attrs, defStyleAttr) {

    private val facesBounds: MutableList<FaceBounds> = mutableListOf()

    fun updateFaces(bounds: List<FaceBounds>) {
        facesBounds.clear()
        facesBounds.addAll(bounds)
        invalidate()
    }

    override fun onDraw(canvas: Canvas) {
        super.onDraw(canvas)
        facesBounds.forEach {
            val centerX = /* Compute the center's x coordinate */
            val centerY = /* Compute the center's ycoordinate */
            drawBounds(it.box, canvas, centerX, centerY)
        }
    }

    private fun drawBounds(box: Rect, canvas: Canvas, centerX: Float, centerY: Float) {
        /* Compute the positions left, right, top and bottom */
        canvas.drawRect(
                left,
                top,
                right,
                bottom,
                boundsPaint)
    }
}

The diagram below shows the components explained above and how they interact with each other, from the moment the camera provides a frame to when the results are displayed to the user.

Building a real-time face detection app in 3 steps

Using the face detection library below (which contains the code explained above), building a real-time face detection app becomes quite easy.

N.B: For this example, I chose the following camera library.

Step 1. Add a FaceBoundsOverlay on top of the camera view.

<FrameLayout
    ...>
    
    // Any other views

    <CameraView
        ... />

    <husaynhakeem.io.facedetector.FaceBoundsOverlay
        ... />

    // Any other views
    
</FrameLayout>

Step 2. Define a FaceDetection instance and connect it to the camera.

private val faceDetector: FaceDetector by lazy {
    FaceDetector(facesBoundsOverlay)
}
cameraView.addFrameProcessor {
    faceDetector.process(Frame(
            data = it.data,
            rotation = it.rotation,
            size = Size(it.size.width, it.size.height),
            format = it.format))
}

Step 3. Set up Firebase in the project.

Conclusion

Face detection is a powerful feature, and with Firebase’s ML Kit, Google is making it more accessible and allowing developers to build more advanced features on top of it, such as face recognition, which goes beyond merely detecting when a face is present, but actually attempts to identify whose face it is.

ML Kit is still in its early days, but what I’m most excited about is a new face contour feature that’s planned to be added to its bundle — another quite powerful feature that’ll be able to detect more than 100 points around the face and quickly process them.

This could potentially be of use in applications using augmented reality objects or virtual stickers (such as Snapchat). Paired with the face detector, it’ll be interesting to see all the new applications they’ll create.

For more on Java, Kotlin and Android, follow me to get notified when I write new posts, or let’s connect on Github and Twitter!

Discuss this post on Hacker News.