From Far-Field Dynamics to Close-Up Confidence: Action Recognition Across Varying Camera Distances – VISION AND IMAGE PROCESSING RESEARCH LAB (VIP LAB)

Ksusha Buzko

March 21st, 2025 – 11:30am-12:00pm, EC4-2101A

This work explores action recognition in two contrasting camera settings: fast-paced sports broadcasts (far-field) and presenter-focused videos (near-field). It showcases how pose-based insights can overcome obstacles of scale, occlusion, and subtle body movements. First, HAIKYU: Hockey Action Identification and Keypose Understanding introduces a pipeline for accurately classifying player actions in far-field broadcast hockey footage. A bounding-box normalization technique mitigates camera pan and zoom effects, while an expanded 15-keypoint configuration that includes hockey-stick endpoints improves recognition of subtle stick movements. Next, Generative Video Editing: From Unconfident to Confident focuses on near-field scenarios, addressing the detection and transformation of unconfident micro-gestures into confident body language. A new dataset of 38 micro-gestures, such as folding arms and crossing fingers, and the CONFIDANT generative model, showcase how small-scale features like fidgeting hands can be systematically identified and edited to project confidence. These two projects provide a comprehensive approach to human action recognition across varying camera distances, advancing robust pose-based strategies for sports analytics and generative video editing.