Monitoring Human Activity - Action Recognition

Monitoring Human Activity

A project of the Artifical Intelligence, Robotics and Vision Laboratory
University of Minnesota, Department of Computer Science and Engineering

Home

Detection of Events

Real-Time Tracking

Action Recognition

Learning Patterns from Video Sequences

Breathing Abnormality Detection

Other Projects

Publications

People

Action Recognition

Monitoring Crowded Scenes

Monitoring crowded urban environments is a vital goal for many of today's vision systems. Knowing the size of crowds and tracking their motion has many applications. For example at traffic intersections, intelligent walk-signal systems could be designed based on the number of people waiting to cross. Also, the knowledge of the number of people walking through a crowded area, e.g., outside a school or outside the premises of a public event can be helpful in planning urban environments, general safety, and crowd control. We estimate accurately the counts of people in a scene without constraining ourselves to individuals. This includes dense groups of people moving together. We do this in real-time and place no constraints as far as camera placement or about the size of the groups as far as number of people.

P. Kilambi, O. Masoud, N. Papanikolopoulos, "Crowd Analysis at Mass Transit Sites", IEEE International Conference on Intelligent Transportation Systems, pp. 753-758, Seattle, WA, Sep. 2006.

B. Maurin, O. Masoud, N.P. Papanikolopoulos, "Computer Vision Algorithms for Monitoring Crowded Scenes", IEEE Robotics & Automation Magazine Special Issue on Robotic Technologies Applied to Intelligent Transportation Systems, Dec. 2003.

B. Maurin, O. Masoud, N.P. Papanikolopoulos, "Monitoring Crowded Traffic Scenes", IEEE 5th International Conference on Intelligent Transportation Systems, pp.19-24, Singapore, Sep. 2002.

B. Maurin, O. Masoud, N.P. Papanikolopoulos, "Camera Surveillance of Crowded Traffic Scenes", ITS America 12th Annual Meeting, pp. 28, Long Beach, CA, Apr. 2002.

Crowd counting on plaza 1. (8.6MB)
Crowd counting on plaza 2. (6.1MB)
Demonstrates crowd counting with crowds merging and splitting. (4.3MB)

Crowd counting with Region of Interest mask. (2.8MB)
Crowd counting on sidewalk. (31.9MB)

View-Dependent Human Motion Recognition

In this project, we attempt to classify human motion into one of several classes. The methods we developed use motion features directly rather than try to reconstruct 2D or 3D models of the human body. We then use Principle Component Analysis for training and classification. In the experiments, we use a data set consisting of 232 video sequences (29 people, each performing 8 different actions).

O. Masoud, N.P. Papanikolopoulos, "A Method for Human Action Recognition", Image and Vision Computing , vol. 21, no. 8, pp. 729-743, Aug. 2003.

O. Masoud, N.P. Papanikolopoulos, "Recognizing Human Activities", IEEE International Conference on Advanced Video and Signal Based Surveillance AVSS2003, pp. 157-162, Miami, FL, Jul. 2003.

The following are samples of the eight actions used in the experiments.

Click on each image to activate control. Then move your mouse over the images to play the movies.

Online Motion Recognition

In this project, we use a motion recognition strategy that represents a videoclip as a set of filtered images, each of which encodes a short period of motion history. Given a set of videoclips whose motion types are known, a filtered image classifier is built using support vector machines. In offline classification, the label of a test videoclip is obtained by applying majority voting over its filtered images. In online classification, the most probable type of action at an instance is determined by applying the majority voting over the most recent filtered images, which are within a sliding window. The effectiveness of this strategy was demonstrated on real datasets where the videoclips were recorded using a fixed camera whose optical axis is perpendicular to the person's trajectory.

The proposed online strategy can classify motions correctly and identify the transition between different types of motions. It can also identify the existence of an unknown motion type. This latter capability and the efficiency of the proposed strategy make it possible to create a real-time motion recognition system that can learn new types of actions and recognize them in the future.

D. Cao, O. Masoud, D. Boley, N.P. Papanikolopoulos, "Online Motion Classification Using Support Vector Machines", IEEE International Conference on Robotics and Automation ICRA2004, Apr. 2004.

In the following movies, March is considered as an unknown action. The red line shows the confidence level in the classification result (the blue line).

Walk Action (0.9MB)
Transition from Walk to March (1.2MB)
Transition from March to Run (0.8MB)

View-Independent Human Motion Recognition

Here we study the use of image-based rendering to generate optimal inputs to an entire class of view-dependent human motion recognition systems. Orthogonal input views can be created automatically using image-based rendering to construct the proper view from a combination of non-orthogonal views taken from several cameras. This allows the systems to robustly recognize motion taken from any angle, making their real world application more viable.

R. Bodor, B. Jackson, O. Masoud, N.P. Papanikolopoulos, "Image-Based Reconstruction for View-Independent Human Motion Recognition", IEEE International Conference on Intelligent Robots and Systems IROS2003, pp 1548-1553, Las Vegas, Oct. 2003.

R. Bodor, B. Jackson, , O. Masoud, N.P. Papanikolopoulos, "Image-Based Reconstruction for View-Independent Human Motion Recognition," Technical Report, Artificial Intelligence, Robotics and Vision Laboratory, Dept. of Computer Science and Engineering, University of Minnesota, Mar. 2003.

The movies below demonstrate the process and results of our work on image-based approaches to classification of cyclic human motion. The movies are very noisy, but are sufficient for classification of motions types.

Begins by showing the steps of the image segmentation process for a three camera system. Ends with a virtual view orthognal to the subject's path of motion , generated automatically by the image-based renderer (IBR). (3.1MB)

IBR output for first running subject. (0.1MB)

IBR output for second running subject. (0.1MB)

IBR output for first walking subject. (0.2MB)

IBR output for second walking subject. (0.2MB)

Monitoring Bus Stops

G. Gasser, N. Bird, O. Masoud, N.P. Papanikolopoulos, "Human Activities Monitoring at Bus Stops", IEEE International Conference on Robotics and Automation ICRA2004, Apr. 2004.

Pedestrian Monitoring Movie (6.7M)
This clip shows some tracking results at a bus stop. The numbers in the boxes indicate the ID of the person being tracked.

General Activity Recognition

The protection of critical transportation assets and infrastructure is an important topic these days. Transportation assets such as bridges, overpasses, dams and tunnels are vulnerable to attacks. In addition, facilities such as chemical storage, office complexes and laboratories can become targets. Many of these facilities exist in areas of high pedestrian traffic, making them accessible to attack, while making the monitoring of the facilities difficult. In this research, we developed components of an automated, "smart video" system to track pedestrians and detect situations where people may be in peril, as well as suspicious motion or activities at or near critical transportation assets. The software tracks individual pedestrians as they pass through the field of vision of the camera, and uses vision algorithms to classify the motion and activities of each pedestrian. The tracking is accomplished through the development of a position and velocity path characteristic for each pedestrian using a Kalman filter. With this information, the system can bring the incident to the attention of human security personnel. In future applications, this system could alert authorities if a pedestrian displays suspicious behavior such as: entering a "secure area," running or moving erratically, loitering or moving against traffic, or dropping a bag or other item.

R. Bodor, B. Jackson, N.P. Papanikolopoulos, "Vision-Based Human Tracking and Activity Recognition," Proc. of the 11th Mediterranean Conf. on Control and Automation, Jun. 2003.

Tracking and Recognition Movie (1.3M)
Illustrates categories of human motion tracked and recognized by the system.

This work is supported by grants from the National Science Foundation, the Minnesota Department of Transportation, the University of Minnesota ITS Institute, and the Department of Homeland Security. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.