How to Create Neural Network that Detects People Wearing Masks

Time to read:
3 min

Is it possible to detect people wearing masks through the web cam? The short answer: YES.

In brief, we decided to create a proof of concept. Our goal was to check if it’s possible to identify people wearing masks on the streets, using just web cams. It’s not a commercial project, but it got us curious – wether or not the whole thing was viable.

After just two weeks we could give a definite YES-answer. That’s how long it took us to create a neural network. This blog post is a quick overview of our experiment. Here’s what we got in the end:

Technical Stack 

In the final version, we used the latest versions of TensorFlow 2 Nightly, OpenCV 2, Keras, Yolov3.

More specifically, we used OpenCV for processing images and drawing ‘squares’ when detecting masks. Yolov3 is ‘the brain’ behind our entire neural network. It includes TensorFlow Nightly with built-in Keras. We used this stack specifically for educating modules.

  1. NodeJS

  • opencv4nodejs

  • elementtree

  • keras-js

  1. Python

  • Moddified latest yolov3

  • Latest Python 3.6+

  • opencv-python

  • tensorflow 2.0.0-beta1 / nightly

Stage 1. How does this work?

We decided to start simple. At first we wanted to see if we can recognize masks on images, then move on to the video footage. 

Currently, we have 2 apps. The first is written in Node.js, it’s used for creating labels.  This app helps us compose dataset and annotations.

Initially, we need to identify the precise location of the mask (or any other object, for that matter). For identifying objects we used this website – Labelbox. It’s pretty convenient for multiple purposes. It generates the file with the needed settings – masks locations, image dimensions, file names, time spent & more.  Later on we insert them in one of our programs.

Besides, it shows the time that it takes to highlight any image. 

So we also created a code for Labelbox, which collects, parses all this data. The info is later spread between other different files in this specific view, required for neural network (height, width, depth). You can set additional parameters if you need more accurate labelling. But for simple things like a medical mask, we just need a simple square. As a result, we will receive annotations needed to proceed. 

This program also creates anchors, bases on this data. Anchors are used for defining  the height, width of the mask and how to scale it. In the end we receive the final dataset - images and annotations. They are needed for the second program. 

Stage 2. Processing Images 

The second app is written in Python. It includes Yolov3 — ‘the brains’ behind the whole thing. It educates our neural network, collects files. It requires the standard file (it’s pre-installed) in order for it to begin the educating process. 

Then with the help of this app, we create our own model for recognizing objects on images. It’s similar to the default file, but it’s just more tailored to our needs.

Roughly, we need the location of the file (where we keep the model), paths for images and annotations.  We set the max size for 288px, although you can define a bigger number. This is just for faster processing.  num.epoch stands for the amount of steps for educating. 30 steps took us 12 hours to educate with this exact net size (288).

Video Recognition

For video we wrote a separate script, it’s pretty complex: there are many different aspects to consider. However it uses the same principle that we used for images. The script is based on Yolov3. We also set up openCV for uploading video and detecting frames with certain fps.

You need to add your video file into the folder with the app. Insert the required command, which launched the procedure. The program starts to process our video frame by frame. At any time you can check the progress. 

How can this work with online web cams?

WebCam Admins usually record short videos, that last 5-10 mins. Theoretically, these kinds of videos could be automatically sent to a server and then processed by software like this one. The software will run the video and detect wether people wear masks or not. 

This software could potentially be handy, if you decide to implement it at the factories & make sure that employees are wearing masks. As for the streets, it’s still too sci-fi and laborious to execute. Sure, nobody wants ‘1984’ outside our windows.

Bottom Line

In the end, we prepared a cool white paper showing you how to build a neural network with image recognition. It can be literally anything, not masks specifically.

Go ahead play with it, and let us know if there are any questions whatsoever. 

Neural network from
We described A-to-Z workflow for creating neural network that recognizes images.

Serhii Kalachnikov
Backend Developer
How to Create SRS for Edtech App: Don’t Reinvent the Wheel, Take What’s Already Written
A SRS document template to rock any edtech app. Success recipe is simple: read, adjust and use.
Kick Off Your Education App Development
Education app development has never been easier. Learn how to create an edtech app on real cases.
Learn what it takes to create your perfect product.
Speak to us