Blog post 2: Getting Stuck In

Team Waldo
May 20, 2019
3 min read

The main aims of this past week were to improve upon the machine learning network for sign language recognition and at the same time to set up the Jetson Nano Developer Kit [1] (as seen in the picture below) with the eventual aim of implementing the network onto the Jetson.

On the machine learning side of things, the focus was to improve upon the previous week’s DNN. We also researched alternative methods such as Pose Estimation [2] and Bounding Box [3] methods to mask the subject. The main problem is that if pose estimation or bounding box methods are inaccurate, then subsequent gesture detection based on their output will likely be inaccurate as well. A simple test was performed using tinyYOLOv3 to only keep the bounding box of the detected person. This only achieved 61% validation accuracy on our 5+1 classes of actions from the JESTER dataset. We also implemented the ConvLSTM [4] model which performed decently, achieving 76% validation accuracy on 5+1 classes of actions. However, the downside of this algorithm was that the model was extremely large and complex, and yet was unable to outperform the CNN +LSTM model which we improved this week. Drawing inspiration from previous experience with deep learning, we decided to normalise the feature vector output of MobileNetV2. This was trained on 5 actions + 1 ‘no gesture’ (total 6 classes). Without normalisation, it achieved a validation accuracy of 82% whereas after normalisation, an accuracy of 87% was achieved. With this simple but crucial modification, the accuracy of the network increased by 5%. We postulate that normalising the feature vector improved the ability of the LSTM to learn the temporal features because it did not have to account for varying L2 length of the input feature vectors.

In preparation to using our own dataset for the network, the training videos recorded over the past week had to be preprocessed. This was done by first splitting each video into frames at a fixed frame rate, then removing the frames where the volunteer does not perform any actions. The frames where the volunteer does nothing were then used as the training set for the ‘sixth action’, doing nothing.

In addition, the Jetson Nano Developer Kit was successfully set up this week, where the required packages for deploying the network were installed. Initially, the Jetson Nano was powered via the micro USB port. The main problem encountered was that when external peripherals, such as a Raspberry Pi Camera, are connected, the Jetson simply shuts down. This is probably due to a lack of power being supplied. It soon became clear that we needed to look at alternative ways to power the Jetson. A potential solution was to use the barrel jack connector for power [4], instead of the micro usb port. This was done using a power mains adapter, which seemed to solve the problem. However, using this method for power means that Waldo would not be portable as it would have to be connected to the mains for power. This is certainly not ideal. Hence, one potential solution may be to use an extremely strong portable power source, which we will be trying out this coming week by using a 5V 3A rated power bank.

Further to this, the text to speech module has also been completed using IBM Watson’s Python API so that once the sign recognition portion is ready for deployment, the modules can be integrated together smoothly on the Jetson.

Ultimately, the target for our network is to achieve a minimum of 90% accuracy which we believe is viable for 6 classes. The focus would then be to continue further improving the network and begin deployment of the model onto the Jetson to determine real time, edge performance.

Blog post 2: Getting Stuck In

Recent Posts

Comments