Although deep learning (DL) has been applied successfully to a wide range of domains in recent years, it still suffers from being a black-box model: the inner workings are impenetrable to users, with only the inputs and outputs being readily observable. This causes numerous problems, including identifying the key features of the inputs, evaluating the model’s potential performance when introduced to new domains and elements, and explaining any reasoning processes used by the model.
Explainable artificial intelligence (XAI) works to overcome transparency problems such as these in AI systems through a range of approaches. In this project, we will explore one approach specific to deep learning: analysing the weights of DL models to examine the relationship between inputs and outputs. Taking a Bayesian approach, the aim is to be able to generate the desired model outputs, including uncertainty quantification, by adjusting model inputs using joint multivariate distributions and Bayesian inference.
By analysing the distribution of outputs of the model as the inputs vary, we hope to map the relationship between specific input features with the components of the model, and how they affect the output. In doing this we aim to generate further understanding of deep learning models from a Bayesian statistics perspective, and also develop software prototypes that can be used for both analysing input variables in deep learning models and generating simulations of potential outcomes.
AI Safety & Alignment.
BSc in Applied Mathematics from Cardiff University.
MSc in Applied Mathematics from University of Bath.
Four years working in actuarial science between bachelor’s and master’s degree.
Eight months working as an AI researcher after master’s, focusing on Bayesian approaches for playing games and different methods for training reinforcement learning agents.
Dr Xi Chen