Paper
Workshop and Seminars
A Perceptual Prediction Framework for Self-Supervised Event Detection and Segmentation in Streaming Videos

A Perceptual Prediction Framework for Self-Supervised Event Detection and Segmentation in Streaming Videos: Dr. Sudeep Sarkar, Chair of Computer Science and Engineering and the co-Director of the USF Institute for Artificial Intelligence (AI+X).

Events are central to the content of human experience. From the constant stream of the sensory onslaught, the brain segments, extracts, represents aspects related to activities, and stores in memory for future comparison, retrieval, and re-storage. This talk will focus on the first problem of event segmentation from video streams. Can we temporally segment activity into its constituent sub-events? Can we spatially localize the event in the image frame?

These tasks have been tackled through supervised learning, often requiring large amounts of training data associated with many manual annotations. The question we ask is: can we do these tasks without the need for manual labels? Human perception experiments suggest that we can solve these tasks without requiring high-level supervision.

I will share our experience with a set of minimal, self-supervised, predictive learning models that draws inspiration from cognitive psychology and recent brain models from neuroscience. This approach can be used for temporal segmentation and spatial localization of events in the image.

We will see results on traditional activity datasets such as Breakfast Actions, 50 Salads, and INRIA Instructional Videos datasets and on ten days of continuous video footage of a bird's nest. The proposed approach can outperform weakly supervised and other unsupervised learning approaches by up to 24% and have competitive performance compared to fully supervised methods.