PC Webcam Object detection with TensorFlow2 GPU Training

Taiwei Yin
Taiwei Yin

Friday, March 19th, 2021

Read Time
3 min read


During a recent Hackathon I had a task to use my PC webcam to scan and detect networking ports. At the beginning, I searched open-source projects but I couldn’t find a similar one. Fortunately, TensorFlow is one of the most popular open-source SDKs for developers to create custom models and the latest version of TensorFlow2 also supports Nvidia GPU to speed up the large dataset model training time. With that, I am able to train my custom model and detect electronic device ports from PC webcams as shown in Figure 1 below.

Figure 1: My application detected an ethernet port

Figure 1: My application detected an ethernet port

Prepare Dataset

The first thing of a successful model training is to prepare and label the dataset. My dataset contains many networking device images with different angles, backgrounds and lighting conditions. Because object detection is a supervised machine learning method, I have to label each image and save the annotation to an XML file in order to train my custom model.

There are many GUI image annotation tools available to help the image labelling process and you can choose any one based on your preference and platform. I used Microsoft VoTT tool (https://github.com/microsoft/VoTT) and saved annotations to Pascal VOC XML file format as shown in Figure 2 below.

Figure 2: VoTT image annotation tool

VoTT image annotation tool

Model Training Without GPU

The next thing is to convert my dataset to TensorFlow binary format so that TensorFlow python scripts can process them. I followed TensorFlow2’s tutorial, converted my dataset and started model training on my PC. On my PC i7 CPU, it was a very slow process. It took days to run 5000 steps in order to have my model trained. If I want to re-train or fine-tune my model, this is very time consuming on my regular PC.

Model Training With GPU

So I moved my model training to Google Colab Service. Google Colab provides a python virtual machine on the cloud and I can allocate Nvidia GPU to speed up model training. After setting up my Google Colab account and configuring GPU runtime, I was able to run 5000 steps to get my model trained in one hour. This shows that the GPU hardware is very good at highly parallel computation jobs such as model training and machine learning.

Because Google Colab is a free service to share GPU computation, you have a 12 hour time limit to run your jobs. Make sure you save your data when jobs are done or you will lose your progress when your Colab session times out or the session is idle for too long.

Putting It All Together

Finally, I downloaded my trained model to my PC and tested it. The test result is a success and my application can recognize ethernet ports through webcams as shown in Figure 3 below.

Figure 3: Router’s RJ45 ethernet ports correctly identified

Router’s RJ45 ethernet ports correctly identified

Many modern applications use Augmented Reality and Machine Learning techniques to detect different objects and provide a better user experience. To achieve this, developers create custom models and spend a lot of time on training and testing their models. This article provides my experience on how to setup the TensorFlow2 GPU developer environment, prepare the dataset and run model training. With the right tools, it will speed up the process and help developers working on similar tasks.