Djordje Bozic

Hierarchical and Intrinsically Motivated Reinforcement Learning

Project Summary

Reinforcement learning is a machine learning approach where an agent interacts with the environment at each time step by choosing an action, and receiving feedback in the form of rewards and penalties. The resulting behaviour is called the policy.

In hierarchical reinforcement learning the policy is formed in two steps. First, actions are grouped into sub-policies aimed at solving one particular aspect of the problem. Then, the final policy is comprised of these sub-policies instead of primitive actions. This hierarchical structure allows agents to learn and make decisions at different levels of abstraction, enhancing learning efficiency for solving complex tasks.

Intrinsically motivated reinforcement learning focuses on internal sources of motivation. Instead of receiving rewards and penalties from the environment, agents set their own goals, and reward themselves for achieving them. This promotes exploration and behaviour acquisition independent of external rewards.

Research Interests

While hierarchical reinforcement learning does enable solving complex tasks, acquiring useful hierarchies is an open research question. Discovering which aspects of the problem warrant sub-policies, and grouping these sub-policies into a meaningful final behaviour poses some problems. Most often the number of sub-behaviours and the structure of the hierarchy are determined beforehand, instead of discovered through interaction with the environment. Moreover, we should be able to discover new behaviours not only through sub-policy chaining, but also through interpolation. My research focuses on understanding and solving these problems. Intrinsically motivated reinforcement learning can leverage various behavioural and biological principles found in humans and animals to provide agents with the environment independent reward structure. This new reward structure can, for example, guide the agent to form a meaningful policy hierarchy, or explore better. However, it also changes the initial task the agent is aimed to solve. To what extent this affects the agent, and how it can be utilised to benefit the agent while avoiding any potential drawbacks, is one of the research questions I aim to explore.

Background

Education

BSc Computer Science and Engineering, School of Electrical Engineering, University of Belgrade

MSc Applied Mathematics, School of Electrical Engineering, University of Belgrade

Industry

Spent four years working on computer vision problems in the retail and security sectors. Worked on solving image recognition, object and action recognition, tracking, pose estimation, and face liveness problems.

Supervisors

Prof Özgür Şimşek

Prof Maria Battarra