Date of Completion

Spring 5-1-2020

Thesis Advisor(s)

Joseph Johnson, Zhijie Jerry Shi

Honors Major

Computer Science and Engineering


Due to the progress made in computing resources and artificial intelligence, applications in computer vision have gained a lot of traction over the past decade. One such application applies to video understanding and content analysis, which are the main goals of the annual YouTube-8M Video Understanding Challenge. In the newest challenge, the aim is to localize events to specific video segments in addition to discerning the main topics of the video. This paper introduces and presents a broad overview of techniques, data, and the top-performing algorithms presented at the International Conference on Computer Vision (ICCV) conference last year. Ensemble methods and candidate generation with VLAD model representation appear to be particularly popular and high-performing. There are many extensions of this line of research, including further time and budget constraints, as well as content-specific tasks. Implications extend beyond user content searches to optimizing content retrieval and analysis.