Project Pages - An Integrated Scientific Blogging Template

Yue Gu; Xinyu Li; Moliang Zhou

Welcome to our channel! Here are some brief introduction of our research:

Activity Recognition with Sensor Network

Region-based Activity Recognition Using Conditional GAN

We present a method for activity recognition that first estimates the activity performer's location and uses it with input data for activity recognition. Existing approaches directly take video frames or entire video for feature extraction and recognition, and treat the classifier as a black box. Our method first locates the activities in each input video frame by generating an activity mask using a conditional generative adversarial network (cGAN). The generated mask is appended to color channels of input images and fed into a VGG-LSTM network for activity recognition. To test our system, we produced two datasets with manually created masks, one containing Olympic sports activities and the other containing trauma resuscitation activities. Our system makes activity prediction for each video frame and achieves performance comparable to the state-of-the-art systems while simultaneously outlining the location of the activity. We show how the generated masks facilitate the learning of features that are representative of the activity rather than accidental surrounding information.

Activity Recognition for Medical Teamwork Based on Passive RFID

We describe a novel and practical activity recognition system for dynamic and complex medical settings using only passive RFID technology. Our activity recognition approach is based on the use of objects that are specific for a given activity. The object-use status is detected from RFID data and the activities are predicted from the statuses of use of different objects. We tagged 10 objects in a trauma room of an emergency department and recorded RFID data for 10 actual trauma resuscitation. More than 20,000 seconds of data were collected and used for analysis. The system achieved a 96% overall accuracy with a 0.74 F-score for detecting use of 10 common resuscitation objects and 95% accuracy with a 0.30 F-Score for activity recognition of 10 medical activities. Paper.

Deep Neural Network for RFID-Based Activity Recognition

We propose a Deep Neural Network (DNN) structure for RFIDbased activity recognition. RFID data collected from several reader antennas with overlapping coverage have potential spatiotemporal relationships that can be used for object tracking. We augmented the standard fully-connected DNN structure with additional pooling layers to extract the most representative features. For model training and testing, we used RFID data from 12 tagged objects collected during 25 actual trauma resuscitations. Our results showed 76% recognition micro-accuracy for 7 resuscitation activities and 85% average micro-accuracy for 5 resuscitation phases, which is similar to existing system that, however, require the user to wear an RFID antenna. Paper.

Deep Learning for RFID-Based Activity Recognition

We present a system for activity recognition from passive RFID data using a deep convolutional neural network. We directly feed the RFID data into a deep convolutional neural network for activity recognition instead of selecting features and using a cascade structure that first detects object use from RFID data followed by predicting the activity. Because our system treats activity recognition as a multi-class classification problem, it is scalable for applications with large number of activity classes. We tested our system using RFID data collected in a trauma room, including 14 hours of RFID data from 16 actual trauma resuscitations. Our system outperformed existing systems developed for activity recognition and achieved similar performance with process-phase detection as systems that require wearable sensors or manually-generated input. We also analyzed the strengths and limitations of our current deep learning architecture for activity recognition from RFID data. Paper.

Concurrent Activity Recognition with Multimodal CNN-LSTM Structure

We introduce a system that recognizes concurrent activities from real-world data captured by multiple sensors of different types. The recognition is achieved in two steps. First, we extract spatial and temporal features from the multimodal data. We feed each data type into a convolutional neural network that extracts spatial features, followed by a long-short term memory network that extracts temporal information in the sensory data. The extracted features are then fused for decision making in the second step. Second, we achieve concurrent activity recognition with a single classifier that encodes a binary output vector in which elements indicate whether the corresponding activity types are currently in progress. We tested our system with three datasets from different domains recorded using different sensors and achieved performance comparable to existing systems designed specifically for those domains. Our system is the first to address the concurrent activity recognition with multi-sensory data using a single model, which is scalable, simple to train and easy to deploy. Preprint Paper.

Process Progress Estimation and Phase Detection

Process Progress Detection

Process modeling and understanding is fundamental for advanced human-computer interfaces and automation systems. Recent research focused on activity recognition, but little work has focused on process progress detection with sensor data. We introduce a real-time, sensor-based system for modeling, recognizing and estimating the completeness of a process. We implemented a multimodal CNN-LSTM structure to extract the spatio-temporal features from different sensory datatypes. We used a novel deep regression structure for overall completeness estimation. By combining process completeness estimation with a Gaussian mixture model, our system can predict the process phase using the estimated completeness. We also introduced the rectified hyperbolic tangent (rtanh) activation function and conditional loss to help the training process. Using the completeness estimation result and performance speed calculations, we also implemented a real-time time remaining estimator. We tested the proposed system using data obtained from a medical process (trauma resuscitation) and sports event (swim competition). Our system outperformed previous implementations for phase prediction during trauma resuscitation and achieved over 85% of process phase detection accuracy with less than 10% error in each dataset. Preprint Paper.

Online Process Phase Detection Using Multimodal Deep Learning

We present a multimodal deep-learning structure that automatically predicts phases of the trauma resuscitation process in real-time. The system first pre-processes the audio and video streams captured by a Kinect’s built-in microphone array and depth sensor. A multimodal deep learning structure then extracts video and audio features, which are later combined through a “slow fusion” model. The final decision is then made from the combined features through a modified softmax classification layer. The model was trained on 20 trauma resuscitation cases (˃13 hours), and was tested on 5 other cases. Paper.

People Tracking and Activity Localization

Privacy Preserving Dynamic Room Layout Mapping

We present a novel and efficient room layout mapping strategy that does not reveal people’s identity. The system uses only a Kinect depth sensor instead of RGB cameras or a high-resolution depth sensor. The users’ facial details will neither be captured nor recognized by the system. The system recognizes and localizes 3D objects in an indoor environment, that includes the furniture and equipment, and generates a 2D map of room layout. We evaluated this system in two challenging real-world application scenarios: a laboratory room with four people present and a trauma room with up to 10 people during actual trauma resuscitations. Paper.

3D Activity Tracking

We present a deep learning framework for fast 3D activity localization and tracking in a dynamic and crowded environment. We focused on recognizing activities in a real setting, rather than static images of activities staged in a controlled environment. Our training approach reverses the traditional activity localization method, which first estimates the activity’s possible location and then predicts its occurrence. Instead, we first trained a deep convolutional neural network for activity recognition using depth video and RFID data as input, and then used the network’s activation maps to locate the recognized activity in 3D space. We evaluated the system with a medical activity dataset, achieving accurate activity localization with decimeter resolution.

Online People Tracking and Identification with RFID and Kinect

We introduce a novel, accurate and practical system for real-time people tracking and identification. We used a Kinect V2 sensor for tracking that generates a body skeleton for up to six people in the view. We perform identification using both Kinect and passive RFID, by first measuring the velocity vector of person's skeleton and of their RFID tag using the position of the RFID reader antennas as reference points and then finding the best match between skeletons and tags. We introduce a method for synchronizing Kinect data, which is captured regularly, with irregular or missing RFID data readouts. Our experiments show centimeter-level people tracking resolution with 80% average identification accuracy for up to six people in indoor environments, which meets the needs of many applications. Our system can preserve user privacy and work with different lighting. Preprint Paper.