Most policies are written by humans for humans, but some environments are too complex or change too quickly for manual policy design to remain effective. The central aim of my research is to establish methods for algorithmic policy engineering that are constrained to acting in the interests of humans, while retaining the benefits of autonomous policy iteration. The key questions that guide this research ask:
-How can policies be safely constructed with statistical machine learning methods?
-How can these policies be accurately validated and communicated between humans and software agents?
-To what degree should human preferences be inferred, and how can AI systems be developed to safely infer human preferences?
AI alignment.
Policy engineering.
Cooperative inverse reinforcement learning.
Iterated distillation and amplification.
AI governance.
MSci in Natural Sciences specialising in Physics, University of York
Medical Microwave Imaging Research, University of York & Sylatech
Dr Julian Padget