This paper presents a new framework for multi-subject event inference in surveillance video, where measurements produced by low-level vision analytics usually are noisy, incomplete or incorrect. Our goal is to infer the composite events undertaken by each subject from noise observations. To achieve this, we consider the temporal characteristics of event relations and propose a method to correctly associate the detected events with individual subjects. The Dempster-Shafer (DS) theory of belief functions is used to infer events of interest from the results of our vision analytics and to measure conflicts occurring during the event association. Our system is evaluated against a number of videos that present passenger behaviours on a public transport platform namely buses at different levels of complexity. The experimental results demonstrate that by reasoning with spatio-temporal correlations, the proposed method achieves a satisfying performance when associating atomic events and recognising composite events involving multiple subjects in dynamic environments.



-          System outline

The main purpose of video surveillance is to provide situational awareness of a specific place over a period of time. In this context, therefore, an event is an observation (or collection of observations) that has semantic meaning. An event can be simple or complex depending on the level of relevant semantic information provided. To distinguish these two different concepts, we call the former an atomic event and the latter a composite event. An atomic event can be directly detected using video analytics and/or sensors. Atomic events can then be aggregated to generate composite events which are more semantically meaningful.


Our system is composed of two main stages, shown in Figure 1 and integrates computer vision techniques with knowledge representation and reasoning mechanisms. In the first stage, human subjects are detected and video analytics are then generated in order to provide low-level semantic components such as ``a female face has been detected" and ``a person has moved from the door towards the gang-way". The second stage is designed to recognise significant events based on a semantic hierarchy obtained from domain knowledge.

Figure 1. System of intelligent event management for video surveillance


-          Event inference

This work focuses on event inference processing at the upper level of the system. At this level, the events of interest are recognised based on the information derived at the lower-level with varying degrees of belief.

Knowledge is the main drive behind the proposed event inference approach. Our knowledge base contains frameworks for representing uncertain events, spatio-temporal relations and event network models, which facilitate atomic event detection, event association and composite event recognition, Figure 2. Event inference starts by deriving atomic events from the outputs of the computer vision analysis modules. Once atomic events are detected, the event association aims to make the correct association of atomic events to specific subjects. Composite event recognition then is performed on the detected atomic events associated to a single subject. The final outputs of the process are the subjects with the composite events they have undertaken.

Figure 2. Event inference components



-          Datasets

Using the environmental settings in Figure 3, we captured eight sequences of varying complexity, including different numbers of passengers on board, various passenger behaviour patterns, and from simple to difficulty scene captures. The properties of the eight sequences are summarised in Table 3.

Figure 3. Experimental environment: (a) route with six designated stops (the red curve highlights the route, the black circles mark the six bus stops) (b) bus saloon (c) seat layout (numbered seats are used in experiments)
Table 1. Properties of the eight bus sequences
Table 2. Association results for the evidential reasoning system
Table 3. Recognition results for rule-based approach, Bayesian approach, and our evidential reasoning approach



Evidential Event Inference in Transport Video Surveillance

X. Hong, Y. Huang, W. Ma, S. Varadarajan, P Miller, W. Liu, M.J.Santofimia, J Martinez Del Rincon, H. Zhou

Journal of Computer Vision and Image Understanding, 2016.


title = "Evidential event inference in transport video surveillance",
author = "X. Hong and Y.Huang and W. Ma and S. Varadarajan and P. Miller and W. Liu and M. J. Santofimia and J.Martinez del Rincon and H. Zhou",
year = "2016",
volume = "144",
pages = "276--297",
journal = "Computer Vision and Image Understanding",