Po-Yuan Mao

Cue representation learning for interpretable video models

Project Summary

Video models have wide potential to enable spatio-temporal intelligence. However, the black box nature poses a major barrier to their deployment in high-stakes real-world scenarios such as medicine and autonomous driving.

My research project focuses on developing an interpretable representation for video understanding models. By carefully designing the representation space and combining it with a visual language model (VLM), I aim to make the model explain its decisions in a human-interpretable way before they are made.

It can benefit both developers and end-users by increasing transparency, facilitating error analysis, and supporting regulatory compliance. This contributes to the development of safer, more trustworthy AI systems in all applications.

Research Interests

Explainable AI (XAI), Video Understanding, Generative Models

Background

Worked as a full-time Research Assistant, Institute of Information Science, Academia Sinica, Taiwan

Worked as a Student Research Intern, Sony Group R&D, Japan

MSc in Information Science, Kyushu University, Japan

BSc in Mechanical Engineering, National Chung Hsing University, Taiwan

Supervisors

Dr Davide Moltisanti

Dr Vinay Namboodiri