DeCamp, P. "HeadLock: Wide-Range Head Pose Estimation for Low Resolution Video"
Work for a Member company and need a Member Portal account? Register here with your company email address.
Feb. 1, 2008
DeCamp, P. "HeadLock: Wide-Range Head Pose Estimation for Low Resolution Video"
This thesis focuses on data mining technologies to extract head pose information from low resolution video recordings. Head pose, as an approximation of gaze direction, is a key indicator of human behavior and interaction. Extracting head pose information from video recordings is a labor intensive endeavor that severely limits the feasibility of using large video corpora to perform tasks that require analysis of human behavior. HeadLock is a novel head pose annotation and tracking tool. Pose annotation is formulated as a semiautomatic process in which a human annotator is aided by computationally generated head pose estimates, significantly reducing the human effort required to accurately annotate video recordings.
HeadLock has been designed to perform head pose tracking on video from overhead, wide-angle cameras. The head pose estimation system used by HeadLock can perform pose estimation to arbitrary precision on images that reveal only the top or back of a head. This system takes a 3D model-based approach in which heads are modeled as 3D surfaces covered with localized features. The set of features used can be reliably extracted from both hair and skin regions at any resolution, providing better performance for images that may contain small facial regions and no discernible facial features.
HeadLock is evaluated on video recorded for the Human Speechome Project (HSP), a research initiative to study human language development by analyzing longitudinal audio-video recordings of a developing child. Results indicate that HeadLock may enable annotation of head pose at ten times the speed of a manual approach. In addition to head tracking, this thesis describes the data collection and data management systems that have been developed for HSP, providing a comprehensive example of how very large corpora of video recordings may be used to research human development, health and behavior.