Blog post 3: The Tough Get Going

Team Waldo
May 26, 2019
3 min read

This past week was a challenging one to say the least- not only due to the heavy work load and limited time but also probably made worse due to the fact that 2 of our group members fell ill. However, it was still important that we as a group stuck together and kept on track with our tasks. The main focus was still to improve the machine learning network but also to implement the model onto the Jetson Nano Development Kit.

Over the week, the team has worked hard on improving the machine learning algorithm and the countless hours of time spent on researching finally paid off after we made respectable progress from successfully implementing one of the many papers and algorithms we came across and tried. This algorithm termed C3D + LSTM [1] turned out to be a huge success taking our validation accuracy all the way up to almost 97% on 5 +1 (1 for no action) action classes from our current best of 87% using the CNN + LSTM Model we implemented last week. This algorithm is quite a novel idea with the paper only published a couple of months ago in March 2019 this year. The basic idea is that instead of extracting spatial features from frames using 2D convolution in the case of CNN + LSTM, the C3D + LSTM uses a 3D convolution method to extract spatio-temporal features from a video clip of frames which is then passed into the LSTM to understand the evolution of these features before making a decision on what class of action the video clip contains. This novel trick certainly improved the accuracy of the model as temporal features are now learnt for the LSTM to better decipher the temporal evolution of these features. This is extremely crucial in our use case as hand signs have relatively similar spatial features and what the neural network requires to make improved predictions are the learned spatio-temporal features of the input video. This explains why the accuracy was bumped up to a whole new level using the C3D + LSTM algorithm.

We also began processing our data to make it more suitable for training the neural network. As our training data still lacks diversity, we had to adjust the composition of training data by removing some videos to ensure the model does not overfit on our (the 3 group members who were involved in creating the dataset) videos. Currently, 33% of the videos in our training data are diverse (ie not from the 3 group members), a percentage that we aim to increase in the coming week by getting more data from members of the public. We did this by creating different datasets (training and validation sets) with different amounts of diversity. In the coming week, we aim to quantify how the amount of diversity in the training or validation set affects the accuracy of the model.

Additionally, the General Purpose Input Output (GPIO) interface was successfully set up on the Jetson this week, with the aim of being able to integrate external peripherals such as buttons and various ambient sensors. This was done so by following an online guide on the Python Package Index [2] (PyPI) website. The next step will be to implement the buttons such that when the user presses a button, a comforting preset phrase will be said by Waldo. Ambient sensors, such as light intensity sensors and microphones will then be incorporated, where the data recorded will be stored as a backlog for analysis by medical professionals.

Finally, we also completed the leaflet this week, which is one of our deliverables for this project. The task is to design and produce a 2pp A4 leaflet to introduce Waldo as a product and also provide a platform to explain our engineering solution to an interested but non-expert audience such as a potential investors. The leaflet was a challenge as we had to balance the need to provide pertinent information with the limited space afforded by its 2 sides.

Blog post 3: The Tough Get Going

Recent Posts

Comments