Video models have wide potential to enable spatio-temporal intelligence. However, the black box nature poses a major barrier to their deployment in high-stakes real-world scenarios such as medicine and autonomous driving.
My research project focuses on developing an interpretable representation for video understanding models. By carefully designing the representation space and combining it with a visual language model (VLM), I aim to make the model explain its decisions in a human-interpretable way before they are made.
It can benefit both developers and end-users by increasing transparency, facilitating error analysis, and supporting regulatory compliance. This contributes to the development of safer, more trustworthy AI systems in all applications.
Explainable AI (XAI), Video Understanding, Generative Models
Worked as a full-time Research Assistant, Institute of Information Science, Academia Sinica, Taiwan
Worked as a Student Research Intern, Sony Group R&D, Japan
MSc in Information Science, Kyushu University, Japan
BSc in Mechanical Engineering, National Chung Hsing University, Taiwan
Dr Davide Moltisanti
Dr Vinay Namboodiri