Deep Learning and Human Activity Recognition
A review of Deep Learning for Sensor-based Activity Recognition: A Survey, published December 2017.
Being able to perform human activity recognition (HAR) accurately in real-time could play a helpful role in a number of health-related concerns, including diagnosis of certain diseases based on mobility changes, monitoring sleep or respirations, or even overall monitoring of a person’s physical activity level or fall risk. Accurate and timely HAR could even have an impact on high-level physical skills assessments or improved comfort in a smart home environment.
For a brief background, sensor-based activity recognition is achieved through three different types of sensors: body-worn sensors (smart phones, wrist bands, glasses, helmets), object sensors (e.g., a sensor attached to a cup or RFID tags), and ambient sensors (radar, sound, pressure, temperature). You can also have hybrid versions of these sensors.
Smart phones can be a common form of sensor, as they already contain accelerometers. These sensors create information that has both temporal and spatial elements, by relaying acceleration and angular velocity.
Conventional pattern recognition for human activity has been a slower process that depends on a defined environment, the assessor’s domain knowledge, and is limited to mostly shallow features. It requires ground truths and calculates the accuracy of predictions via a loss function. Sensors are capable of communicating a large amount of information, but there is a bottleneck of that information that means the full picture of what the sensors can communicate is lost. How can we take more full advantage of this data source?
Deep learning has already achieved improved performance with visual object recognition, natural language processing, and logic reasoning. With HAR, deep learning is able to perform feature extraction and modeling building simultaneously, and learns much more high-level and meaningful features by training an end-to-end neural network. Deep learning models also perform better with unlabeled data.
There are many types of deep learning models mentioned in this summary:
Deep Neural Networks (DNN) are capable of learning from large data. DNN can serve as the dense layer of other deep models. When the HAR data is multi-dimensional, and activities are more complex, more hidden layers can help the model train well.
Convolution Neural Networks (CNN, or ConvNets) are the most widely used with HAR. They benefit from local dependency (nearby signals are correlated) and scale invariance (important for different paces or frequencies). CNNs require model-driven or data-driven input adaptations before being applied, since they are essentially operating on multidimensional temporal readings. CNNs can utilize pooling as well, which helps reduce overfitting and can speed up the training process. Weight-sharing also speeds up the training process on a new task.
Recurrent neural networks (RNN) are used less often but can still achieve good performance in a resource-constrained environment.
Restricted Boltzmann machines (RBM), or the stacked RBM called a deep belief network (DBN), have been used with multi-modal sensors, where an RBM is constructed for each sensor modality.
Stacked autoencoders (SAE) are able to learn more advanced feature representation via an unsupervised learning schema. After several rounds of training, the learned features are stacked with labels to form a classifier. All of this could create a powerful tool for feature extraction.
And of course, there are hybrid creations of these to form new types of models. One combination (CNN and RNN) was highlighted, as CNN is able to capture the spatial relationship of HAR while RNN can utilize the temporal relationship.
To be honest, even though the more detailed discussions of deep learning were over my head, the reason I was interested in this article was because I immediately considered its applications to my former career as a physical therapist.
The article referenced using sensors and deep learning to evaluate movement disorders, like Parkinson’s disease; specifically using inertial sensors attached to shoes to quantify “freezing” episodes during walking. Parkinson’s disease is mostly diagnosed based on clinical presentation, and if there was a reasonable way to standardize that clinical diagnosis based on gait pattern, it would be worth pursuing. It may also be beneficial to have a more formal way to track changes in the disease process over the course of time, and the effectiveness of different medical and/or therapeutic interventions. Reading this article, I was struck by the fact that it may not be the gathering of data that is the challenge, but the processing of that complex data in a timely and meaningful way.
Another application of HAR that seemed like it could be meaningful and cost-effective would be real-time activity and fall-prevention monitoring. Having worked in assisted-living environments, and knowing that the resident-to-staff ratio is always unbalanced, if there was a way to have enhanced 24-hour surveillance (without violating privacy concerns) then the likelihood of reducing fall-related injuries would be greatly improved. As an added benefit, just being able to get feedback on how active any individual resident is over a certain time period would be a great addition to their overall health picture and potential effectiveness of a variety of medical interventions.
That line of thinking made me wonder if that kind of monitoring system could also realistically be used in the home, for elderly individuals living alone who have family members concerned for their welfare. I’ve known many family members of patients who had Nest cameras installed to monitor their Mom or Dad, and they called regularly to check on them as well. What if, through the use of a personal body sensor as well as a small selection of ambient sensors, family members could have real-time awareness of activity levels or potential fall events? The cost of transitioning into assisted-living is already considerable; could an investment in specific sensor hardware, along with a deep learning program to synthesize all that data, make added time in the home a real and safe alternative?
In summary, there are many other potential applications of this technology highlighted in this article (some of which I mentioned at the beginning) but the above possibilities struck the most personal chord for me. The authors provide a thorough summary related to sensor deployment and model selection, both of which depend on the nature of the activity you are trying to capture. Body-worn sensors are the most common and will recognize many types of daily activities, but object and ambient sensors are better at recognizing activities related to context and environment. For model selection, CNN are better at inferring long-term repetitive activities, and RNN are better at recognizing short activities that have natural order.
Future challenges for deep learning based HAR involve enhancing the computing of mobile devices, crowd sourcing for more accurate unsupervised activity recognition, and creating lighter-weight and flexible deep learning models that can recognize more complex activities.