Introduction to hand detection in the browser with Handtrack.js and TensorFlow

Hand detection is a fundamental sub-task of object detection that has historically been hard to implement. But there have been many technological advancements in the field of machine learning and AI in recent years.

One of the core technologies widely used is TensorFlow, an end-to-end open-source platform for machine learning. It has a wide-ranging, flexible ecosystem of tools, libraries, and community resources. With the power of TensorFlow technology, researchers and developers can easily develop and deploy ML-powered applications.

In this tutorial, we’re going to make use of this technology in an JavaScript environment, building a machine learning model that allows users to detect hands. With COVID-19’s continued spread around the globe, this kind of model could potentially useful in helping users to practice more sanitary habits (i.e. not regularly touching one’s face).

This solution could be especially useful for those who are constantly in front of their workstations or laptops—tech guys like programmers, software developers, digital writers, etc. who spend most of their day in front of the computer screen. They are bound to subconsciously touch computer surfaces and then their faces.

The specific solution we’ll be implementing here is to trigger an alarm or notification as soon as a hand appears in your computer’s webcam. Usually, a webcam is pointed directly towards our faces, so we can use this to create a system that detects hands in a given video frame and triggers said notification.

Here, we’re going to make use of a webcam and TensorFlow.js to build this system, which will work directly in the browser, and in real-time.

So let’s get started!

Pre-requisites for the system

The requirements in order to get started with our project are as follows:

node.js
parcel.js
handtrack.js
A chosen warning sound/audio sample
Computer with a webcam

Handtrack.js

Handtrack.js is a library for prototyping real-time hand detection (using bounding boxes), directly in the browser. This library is very useful for our cause in the implementation of this project. Behind the scenes, this library uses a trained convolutional neural network (CNN) that provides bounding box predictions for the location of hands in an image. The convolutional neural network (ssdlite, mobilenetv2) is trained using the TensorFlow object detection API.

Initializing the project with the parcel.js

Parcel.js is a minimal configuration bundler which is easy to use and can get your project up and running quickly. This bundler plugin works pretty fast and supports web technologies out of the box.

Core Features (from the repo)

Fast bundle times, with multi-core compilation and a filesystem cache for quicker rebuilds.
Supports HTML, CSS, JavaScript, etc. out of the box. No need to install extra plugins.
Offers automatic module transformation, making use of Babel, PostCSS, and PostHTML.
Little to no configuration. We can easily use import statement to import the plugins.
Facilitates live module replacement.
Also supports simple and clean error logging with syntax highlighting.

Getting Started

Before starting, we need to install Node and Yarn (or npm) and create a package.json for our project. For that, we need to run the following command in our project directory:

Then, using Yarn package installer, we can install parcel into our app:

From there, we just need to point parcel at some of our entry files. Like if we’re building a website, an index.html file:

Then, we can run the following command to host the project in the local server:

After running the command, we should get a URL that looks something like: http://localhost:1234/

We can open this URL in order to view for the project in a browser.

Including Assets

Next, we need to include an audio file in index.html and create a video element in order to access the webcam. Lastly, we need to add an index.js script file into our index.html file that contains all the JavaScript code. After including our index.html file, our source code should look like the code snippet below:

The next step is to add the handtrack.js library to our project. In order to do that, we need to run the following command in our project command prompt:

Then, we need to open our src/index.js file and import the handtrack plugin, as shown below:

Now, we need to initialize handtrack library plugin with default parameters named as modelParams. The modelParams constant holds the object with the handtrack plugin configurations.

By using the load method provided by the handTrack module, we’re going to load the parameters into the plugin and get the results, which we’re going to assign to a variable object called model. All the coding implementation for this is provided in the code snippet below:

Our initialization of a handTrack instance is now complete. We’re now going to detect the hand in the webcam screen and fetch the data from the plugin.

Fetching the Webcam Stream Data

The process of fetching the webcam stream data is simple and easy. All we have to do is to make use of browser API named MediaDevices.getUserMedia().

First, we need to get the video and audio element using the querySelector method, as shown in the code snippet below:

Then, we integrate the handTrack object with the video source.

As a reminder, the process is to detect hands using the handtrack model by adding the video object to the detecting function. Then, we’ll get the prediction data.

Next, we need to run the function every second in order to get the data for each moment. The function is implemented as a getUsermedia function.

As a result, the data length will be zero if the hand doesn’t appear on the screen. And if a hand does appears on the screen, then the length of the data will be more than zero, as shown in the console result below:

By using the simple condition based on the data length, we can implement the audio function to trigger the sound when the hand appears on the webcam screen.

Hence, we have successfully completed our simple hand detection app.

Conclusion

In this post, we used the power of TensorFlow technology in the web JavaScript environment for the detection of hand through the webcam. We learned how to detect hand movement with Handtrack.js. The aim of this project was to detect the hand before it touches the face where we use a webcam for sending visual data to the system. The system with Handtrack.js and TensorFlow technology detects the hand and notifies the user with data. The project is just the start-up for what we can do using machine learning technology like TensorFlow. There are many other technologies that you can use and make this project better.

The full source code is available in this GitHub repo.