Welcome to our channel! Here are some brief introduction of our research:
Activity Recognition with Sensor Network
Region-based Activity Recognition Using Conditional GAN
We present a method for activity recognition that first estimates the activity performer's location and uses it with input data
for activity recognition. Existing approaches directly take video frames or entire video for feature extraction and
recognition, and treat the classifier as a black box. Our method first locates the activities in each input video frame by
generating an activity mask using a conditional generative adversarial network (cGAN). The generated mask is appended to
color channels of input images and fed into a VGG-LSTM network for activity recognition. To test our system, we produced
two datasets with manually created masks, one containing Olympic sports activities and the other containing trauma
resuscitation activities. Our system makes activity prediction for each video frame and achieves performance comparable to
the state-of-the-art systems while simultaneously outlining the location of the activity. We show how the generated masks
facilitate the learning of features that are representative of the activity rather than accidental surrounding information.
Activity Recognition for Medical Teamwork Based on Passive RFID
We describe a novel and practical activity recognition system for dynamic and complex medical settings using only passive
RFID technology. Our activity recognition approach is based on the use of objects that are specific for a given activity.
The object-use status is detected from RFID data and the activities are predicted from the statuses of use of different
objects. We tagged 10 objects in a trauma room of an emergency department and recorded RFID data for 10 actual trauma
resuscitation. More than 20,000 seconds of data were collected and used for analysis. The system achieved a 96% overall
accuracy with a 0.74 F-score for detecting use of 10 common resuscitation objects and 95% accuracy with a 0.30 F-Score
for activity recognition of 10 medical activities.
Paper.
Deep Neural Network for RFID-Based Activity Recognition
We propose a Deep Neural Network (DNN) structure for RFIDbased activity recognition. RFID data collected from several
reader antennas with overlapping coverage have potential spatiotemporal relationships that can be used for object tracking.
We augmented the standard fully-connected DNN structure with additional pooling layers to extract the most representative
features. For model training and testing, we used RFID data from 12 tagged objects collected during 25 actual trauma
resuscitations. Our results showed 76% recognition micro-accuracy for 7 resuscitation activities and 85% average
micro-accuracy for 5 resuscitation phases, which is similar to existing system that, however, require the user to wear
an RFID antenna.
Paper.
Deep Learning for RFID-Based Activity Recognition
We present a system for activity recognition from passive RFID data using a deep convolutional neural network. We directly
feed the RFID data into a deep convolutional neural network for activity recognition instead of selecting features and
using a cascade structure that first detects object use from RFID data followed by predicting the activity. Because our
system treats activity recognition as a multi-class classification problem, it is scalable for applications with large
number of activity classes. We tested our system using RFID data collected in a trauma room, including 14 hours of RFID
data from 16 actual trauma resuscitations. Our system outperformed existing systems developed for activity recognition and
achieved similar performance with process-phase detection as systems that require wearable sensors or manually-generated
input. We also analyzed the strengths and limitations of our current deep learning architecture for activity recognition
from RFID data.
Paper.
Concurrent Activity Recognition with Multimodal CNN-LSTM Structure
We introduce a system that recognizes concurrent activities from real-world data captured by multiple sensors of
different types. The recognition is achieved in two steps. First, we extract spatial and temporal features from
the multimodal data. We feed each data type into a convolutional neural network that extracts spatial features,
followed by a long-short term memory network that extracts temporal information in the sensory data. The extracted
features are then fused for decision making in the second step. Second, we achieve concurrent activity
recognition with a single classifier that encodes a binary output vector in which elements indicate whether the
corresponding activity types are currently in progress. We tested our system with three datasets from
different domains recorded using different sensors and achieved performance comparable to existing
systems designed specifically for those domains. Our system is the first to address the concurrent activity recognition
with multi-sensory data using a single model, which is scalable, simple to train and easy to deploy.
Preprint Paper.
Process Progress Estimation and Phase Detection
Process Progress Detection
Process modeling and understanding is fundamental for advanced human-computer interfaces and automation systems. Recent
research focused on activity recognition, but little work has focused on process progress detection with sensor data. We
introduce a real-time, sensor-based system for modeling, recognizing and estimating the completeness of a process. We
implemented a multimodal CNN-LSTM structure to extract the spatio-temporal features from different sensory datatypes. We
used a novel deep regression structure for overall completeness estimation. By combining process completeness estimation with
a Gaussian mixture model, our system can predict the process phase using the estimated completeness. We also introduced the
rectified hyperbolic tangent (rtanh) activation function and conditional loss to help the training process. Using the
completeness estimation result and performance speed calculations, we also implemented a real-time time remaining estimator.
We tested the proposed system using data obtained from a medical process (trauma resuscitation) and sports event
(swim competition). Our system outperformed previous implementations for phase prediction during trauma resuscitation and
achieved over 85% of process phase detection accuracy with less than 10% error in each dataset.
Preprint Paper.
Online Process Phase Detection Using Multimodal Deep Learning
We present a multimodal deep-learning structure that automatically predicts phases of the trauma resuscitation process in
real-time. The system first pre-processes the audio and video streams captured by a Kinect’s built-in microphone array and
depth sensor. A multimodal deep learning structure then extracts video and audio features, which are later combined through
a “slow fusion” model. The final decision is then made from the combined features through a modified softmax classification
layer. The model was trained on 20 trauma resuscitation cases (˃13 hours), and was tested on 5 other cases.
Paper.
People Tracking and Activity Localization
Privacy Preserving Dynamic Room Layout Mapping
We present a novel and efficient room layout mapping strategy that does not reveal people’s identity. The system uses only a
Kinect depth sensor instead of RGB cameras or a high-resolution depth sensor. The users’ facial details will neither be
captured nor recognized by the system. The system recognizes and localizes 3D objects in an indoor environment, that includes
the furniture and equipment, and generates a 2D map of room layout. We evaluated this system in two challenging real-world
application scenarios: a laboratory room with four people present and a trauma room with up to 10 people during actual trauma
resuscitations.
Paper.
3D Activity Tracking
We present a deep learning framework for fast 3D activity localization and tracking in a dynamic and crowded environment. We
focused on recognizing activities in a real setting, rather than static images of activities staged in a controlled environment.
Our training approach reverses the traditional activity localization method, which first estimates the activity’s possible
location and then predicts its occurrence. Instead, we first trained a deep convolutional neural network for activity
recognition using depth video and RFID data as input, and then used the network’s activation maps to locate the recognized
activity in 3D space. We evaluated the system with a medical activity dataset, achieving accurate activity localization with
decimeter resolution.
Online People Tracking and Identification with RFID and Kinect
We introduce a novel, accurate and practical system for real-time people tracking and identification. We used a Kinect V2
sensor for tracking that generates a body skeleton for up to six people in the view. We perform identification using both
Kinect and passive RFID, by first measuring the velocity vector of person's skeleton and of their RFID tag using the position
of the RFID reader antennas as reference points and then finding the best match between skeletons and tags. We introduce a
method for synchronizing Kinect data, which is captured regularly, with irregular or missing RFID data readouts. Our
experiments show centimeter-level people tracking resolution with 80% average identification accuracy for up to six people in
indoor environments, which meets the needs of many applications. Our system can preserve user privacy and work with different
lighting.
Preprint Paper.