A Robust Approach for Action Recognition Based on Spatio-Temporal Features in RGB-D Sequences
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2016, Vol 7, Issue 5
Abstract
Recognizing human action is attractive research topic in computer vision since it plays an important role on the applications such as human-computer interaction, intelligent surveillance, human actions retrieval system, health care, smart home, robotics and so on. The availability the low-cost Microsoft Kinect sensor, which can capture real-time high-resolution RGB and visual depth information, has opened an opportunity to significantly increase the capabilities of many automated vision based recognition tasks. In this paper, we propose new framework for action recognition in RGB-D video. We extract spatiotemporal features from RGB-D data that capture both visual, shape and motion information. Moreover, the segmentation technique is applied to present the temporal structure of action. Firstly, we use STIP to detect interest points both of RGB and depth channels. Secondly, we apply HOG3D descriptor for RGB channel and 3DS-HONV descriptor for depth channel. In addition, we also extract HOF2.5D from fusing RGB and Depth to capture human’s motion. Thirdly, we divide the video into segments and apply GMM to create feature vectors for each segment. So, we have three feature vectors (HOG3D, 3DS-HONV, and HOF2.5D) that represent for each segment. Next, the max pooling technique is applied to create a final vector for each descriptor. Then, we concatenate the feature vectors from the previous step into the final vector for action representation. Lastly, we use SVM method for classification step. We evaluated our proposed method on three benchmark datasets to demonstrate generalizability. And, the experimental results shown to be more accurate for action recognition compared to the previous works. We obtain overall accuracies of 93.5%, 99.16% and 89.38% with our proposed method on the UTKinect-Action, 3D Action Pairs and MSR-Daily Activity 3D dataset, respectively. These results show that our method is feasible and superior performance over the-state-of-the-art methods on these datasets.
Authors and Affiliations
Ly Ngoc, Vo Viet, Tran Son, Pham Hoang
Text Summarization Techniques: A Brief Survey
In recent years, there has been a explosion in the amount of text data from a variety of sources. This volume of text is an invaluable source of information and knowledge which needs to be effectively summarized to be us...
Glaucoma-Deep: Detection of Glaucoma Eye Disease on Retinal Fundus Images using Deep Learning
Detection of glaucoma eye disease is still a challenging task for computer-aided diagnostics (CADx) systems. During eye screening process, the ophthalmologists measures the glaucoma by structure changes in optic disc (OD...
Model Driven Development Transformations using Inductive Logic Programming
Model transformation by example is a novel approach in model-driven software engineering. The rationale behind the approach is to derive transformation rules from an initial set of interrelated source and target models;...
A Penalized-Likelihood Image Reconstruction Algorithm for Positron Emission Tomography Exploiting Root Image Size
Iterative image reconstruction methods are considered better as compared to the analytical reconstruction methods in terms of their noise characteristics and quantification ability. Penalized-Likelihood Expectation Maxim...
Image Processing based Task Allocation for Autonomous Multi Rotor Unmanned Aerial Vehicles
Nowadays studies based on unmanned aerial vehicles draws attention. Especially image processing based tasks are quite important. In this study, several tasks were performed based on the autonomous flight, image processin...