Real-Time Human Pose Estimation with TensorFlow.js

What is PoseNet?

PoseNet is a deep learning TensorFlow model that allows you to estimate and track human poses (known as “pose estimation”) by detecting body parts such as elbows, hips, wrists, knees, and ankles.

It uses the joints of these body parts to determine body postures. Nowadays, many industries use this kind of technology in order to improve work efficiency, and in technologies such as augmented reality experiences, animation & gaming, and robotics. The evolution of human-like robots, virtual gaming experiences, motion tracking, and body movement interpretations can be done with the use of these types of high-end PoseNet deep learning models.

What you’ll learn

How to a PoseNet deep learning model.
Set up a webcam feed.
Detect different poses in the browser using a webcam and PoseNet model.

Setting up dependencies

First, we need to install the dependencies needed for our project. The dependencies we need to install are the posenet model, tfjs (TensorFlow.js), and react-webcam. We can either use npm or yarn to install the dependencies by running the following commands in our project terminal:

@tensorflow/tfjs: The core TensorFlow package based on JavaScript.
@tensorflow-models/posenet: This package delivers the PoseNet TensorFlow model.
react-webcam: This library component enables to access webcam in react project.

Now, we need to import all the installed dependencies into our App.js file as directed in the code snippet below:

With this, we have completed setting up our dependencies for this project.

Setup webcam and canvas

Next, we’re going to set up our webcam and a canvas to view the webcam stream in the web display. For that, we’re going to make use of the webcam component that we installed and imported earlier. First, we need to create reference variables using the useRef hook, as shown in the code snippet below:

Next, we need to initialize the webcam component in our render method. Using this, we can stream the webcam feed in the canvas, also passing the refs as prop properties. The coding implementation is provided in the snippet below:

Here, we have provided the style properties to the Webcam as well as canvas components.

Detecting the webcam

Next, we need to create a function that grabs the video properties and handles the video adjustments. The overall coding implementation of the function, called detectWebcamFeed, is provided in the code snippet below:

Loading the PoseNet model

In this step, we’re going to load the pre-trained PoseNet Model that we installed and imported earlier. For that, we need to create a function called runPosenet and call the load method from the posenet module inside the function.

Then, we need to use the setInterval method to run the detectWebcamFeed every 100 milliseconds in order to detect the pose. The overall coding implementation is provided in the snippet below:

Now, in the detectWebcamFeed function, we will receive posenet_model as a parameter, using which we can start estimating the pose from the webcam feed data:

Lastly, we need to call the runPosenet function on the app’s initial load:

Drawing Utilities from TensorFlow

In this step, we are going to start drawing the pose estimation key points on our canvas in order to demonstrate that our model actually works well and as it should. For the drawing function, we need to grab the code from the posenet repo and paste it into the new file named utilities.js in our project.

Now back to our app.js file—we need to import some functions from the utilities.js file, as shown in the code snippet below:

Draw functions

Here, we’re going to implement a function called drawResult that shows the pose estimation results on the canvas. For that, we need to get the data from the webcam feed, passing it to the drawKeypoints and drawSkeleton functions, as shown in the code snippet below:

Lastly, we need to add the call the drawResult function inside the detectWebcamFeed function by passing the parameters of the pose estimation result along with the video, video height, video width, and canvasRef:

And that’s it! We have successfully implemented a real-time human pose estimation model ready for the browser using the TensorFlow.js model and a webcam feed in our React project.

Final result

You can play with a demo of the final result on Codesandbox.

Recap

Human-based pose estimation techniques are highly used is AI, robotics, and gaming industries. Several pose estimation TensorFlow models are available that deliver high-end results.

In this tutorial, we learned about one of them in depth—PoseNet. We learned how to set up the PoseNet model along with TensorFlow.js in a React project. We also got detailed, stepwise guidance on how to draw the detected result on a canvas using the data from the webcam feed.

The aim was to deliver the basic use cases of the PoseNet model for real-time human pose estimation using a webcam feed as the data. Now, the challenge is to create an advanced webcam filter that has detection functionalities that rival advanced tools like the Snapchat camera.