Working as an annotator has changed how I think about AI
As a social scientist far more acquainted with qualitative methods than with quantitative methods, I found getting to grips with artificial intelligence (AI) throughout the first year of my PhD quite challenging. I’ve taken courses on social statistics in the past but, in my experience, these tend to focus more on the use of software packages like SPSS or Stata than on the underlying mathematics.
Since joining ART-AI I’ve learned the fundamental principles of computer programming, how to code in different languages, how to build basic AI algorithms, and some of the statistical techniques that underpin many areas of AI – including different types of machine learning (ML). ML is a branch of AI that enables computer systems to learn from examples and experiences. ML systems can carry out complex processes by learning from these data (rather than relying entirely on pre-programmed rules).
One of the most common exercises for ML beginners is a Naïve Bayes spam email classifier. Naïve Bayes is a very simple family of ML programmes which compute the probability of a given feature based on prior knowledge of conditions related to that feature. Using a training dataset of emails already labelled as either ‘spam’ or ‘ham’, you can implement an algorithm derived from Bayes’ theorem to predict whether a new email is spam or not (based on the presence of certain conditions e.g. it contains the phrase ‘bank transfer’).
Exercises like this can be a useful way to learn but they tend to be very simplified and detached from ‘real world’ problems and applications – particularly the kind social scientists might be interested in. To gain a more detailed understanding of how ML techniques can be applied in interdisciplinary research, I decided to work part time as a research annotator at the Alan Turing Institute.
What the work involved
My role as an annotator was to read and categorise text taken from news and social media for two projects relating to the detection and analysis of (harmful) online content. These annotations were then used as training data in an ML system – one far more sophisticated than the email classifier I outlined above. The first of these projects aimed to build an automatic classifier for abusive political news content as part of a larger ongoing research programme entitled The (mis)informed citizen. The overall purpose of this programme is to build more robust and scalable tools for measuring the extent and impact of false, misleading or quality news. The second project I worked on created a classifier to detect anti-East Asian prejudice on social media. A detailed discussion of the methodology is available here.
In both projects, I worked with a codebook containing a taxonomy developed using existing expert knowledge and iterated through use. Any annotator will have their own understanding of, for example, ‘abuse’. For the purpose of these projects, however, I was construing textual data in light of codebooks rather than my own everyday interpretation of terms. Before annotation began, a significant amount of qualitative research work had gone into developing frameworks that included various categories and sub-categories. A team of paid annotators was then trained and employed to apply these codebooks when annotating textual data.
Sometimes pairs of annotators working on the same content disagreed on how a particular piece of text should be categorised. For the first project, we had weekly calls with our annotation partner in which we would explain our rationale for marking an entry up a particular way, and then together would reach an agreement. If the disagreement could not be resolved, we could seek clarification from the researchers. This enabled the researchers to see how the codebook might be refined or clarified, and to understand instances likely to be ‘edge cases’ where categorisation was not clear-cut.
What I learned
Although I don’t employ computational methods in my current research, gaining insight into this field has been very useful. As the scholar Hanna Wallach has argued, computational social science is not simply computer science + social data. In other words, good interdisciplinary work is not just additive (“add ML and stir”) but involves the creation of new ways of doing research. Working on these projects has illuminated how some of these methodologies are being developed and their potential implications for other areas of social science.
Before working as an annotator, I was unaware of the extent to which AI technologies rely heavily on human labour; that of the software engineers, researchers and analysts but also people (like annotators) working behind the scenes. I was fortunate to have decent wages and working conditions, and to have my work recognised and valued. But many AI workers have very different experiences. The anthropologist Mary L. Gray and computer scientist Siddharth Suri have coined the term ‘ghost work’ to describe the conditions “that devalue or hide the human labor powering AI and AI-enabled human-in-a-loop services.” Working as an annotator made me think more about this disparate and precarious workforce. Without them, many of the applications we use every day simply wouldn’t happen. Their continued invisibility contributes to the mystification of AI and the idea that there is ‘magic’ in the software. This in turn makes it more difficult to hold anyone accountable for its consequences.
When we talk about transparency in AI we normally mean the intelligibility and explainability of algorithms and their outputs. But I would argue transparency requires us to look not just for what is obscured but for what is invisible, and to consider the roles of both human and non-human elements of these assemblages. When interacting with or studying any AI technology or research output I now ask myself: What kind of work goes into building and maintaining this? Who is doing that work? Is it hidden? Is it secure? Is it fair? If not, who can we hold responsible? These are all questions anyone concerned with the ethics and politics of AI should be seeking answers to.