Jack McKinlay

CAVA for Value Alignment: A Neuro-Symbolic Framework from Text to Decisions

Project Summary

Value alignment, the problem of ensuring that AI systems make decisions that reflect human values, has emerged as a critical challenge for AI deployment. Many forms of text provide information about values, and  policy documents in particular offer a rich source of value guidance. Large language models (LLMs) provide powerful tools for analysing such text at scale. However, values are abstract, contextual, and subjective. Given the need to audit AI decision-making for safety and trust, we must understand how AI systems interpret and apply values. Neural systems are opaque and often unpredictable, preventing reliable auditing and behaviour guarantees. This creates a fundamental tension: we need the scalability of neural systems to process diverse policy text, but also the interpretability and predictability of symbolic systems to enable verification and guarantee outcomes.

My research addresses this tension through a neuro-symbolic approach. In my PhD I have developed a framework to model Contextual Argumentation for Value-based Assessment (CAVA). I have implemented this framework as CAVA Bodega, the first complete pipeline for translating policy text into justified, value-aligned decisions. CAVA Bodega integrates two novel frameworks: CAVA Reasoning, an argumentation framework for symbolic value-based reasoning across dynamic contexts and multiple stakeholders; and CAVA Press, a neural system for extracting value concepts from policy text. My research is grounded in a comprehensive survey of value alignment literature which provides thematic structure to the field and identifies key sub-problems.

My contributions include multiple papers, open-source software, and public datasets. Through my PhD I demonstrate the viability of neuro-symbolic approaches for value alignment, provide practical tools for building value-aware systems, and identify multiple directions for future research in value-aligned AI.

Research Interests

AI Safety & Alignment

AI Ethics

Explainable AI

Background

BSc in Applied Mathematics from Cardiff University.

MSc in Applied Mathematics from University of Bath.

Four years working in actuarial science between bachelor’s and master’s degree.

Eight months working as an AI researcher after master’s, focusing on Bayesian approaches for playing games and different methods for training reinforcement learning agents.

Supervisors

Dr Marina De Vos

Dr Janina Hoffmann

Dr Andreas Theodorou