Michael Fleischman, Philip DeCamp, Deb Roy
Work for a Member company and need a Member Portal account? Register here with your company email address.
Oct. 26, 2006
Michael Fleischman, Philip DeCamp, Deb Roy
Scalable approaches to video event recognition are limited by an inability to automatically generate representations of events that encode abstract temporal structure. This paper presents a method in which temporal information is captured by representing events using a lexicon of hierarchical patterns of movement that are mined from large corpora of unannotated video data. These patterns are then used as features for a discriminative model of event recognition that exploits tree kernels in a Support Vector Machine. Evaluations show the method learns informative patterns on a 1450-hour video corpus of natural human activities recorded in the home.