HERMES concentrates on how to extract descriptions of human behaviour from videos in a restricted discourse domain, transform this into written text, and allow to synthesis a dynamic scene based on a textual description. Discourse domains are for example pedestrians crossing inner-city roads and pedestrians approaching or waiting at stops of buses or trams. These discourse domains allow to explore a coherent evaluation of human movements and facial expressions across a wide variation of scale. This general approach lends itself to various cognitive surveillance scenarios at varying degrees of resolution: From wide-field-of-view multiple-agent scenes, through to more specific inferences of emotional state that could be elicited from high resolution imagery of faces. HERMES aim to consider how cooperating pan-tilt-zoom sensors can enhance the process of cognition via controlled responses to uncertain or ambiguous interpretations. The system will be exposed to video recordings from different parts of Europe in order to prevent over adaptation to local habits and, in addition, to learn systematically occurring differences between pedestrian habits in different countries. The system's explanatory and arguing capabilities are expected to ease an assessment of its strengths and weaknesses.Within the HERMES project CVMT will primary be working on extracting body information from sequences, for example arm gestures, head pose, and body pose.HERMES is a 6 partners EU/IST/STREP project (IST-027110).