Human annotations can help index digital resources as well as improve search and recommendation systems. Human annotators may carry their biases and stereotypes in the labels they create when annotating digital content. These are then reflected in machine learning models trained with such data. The result is a reinforcement loop where end-users are pushed to consume stereotypical content by the search and recommendation systems they use on a daily basis. In order to break the loop, the impact on models of using diverse data that can better represent a diverse population has been looked at.

In this work, we look at how human annotators annotate digital content differently from the content that is popular on the Web and social media. We present the results of a controlled user study in which participants are asked to annotate digital content from various socio-economic levels. We test for the presence of social stereotypes and investigate the diversity of the provided annotations, especially since some abstract labels may reveal information about annotators' emotional states. We observe different types of annotations for content from different socio-economic levels. Furthermore, we find regional and income level biases in annotation sentiment.


Shaoyang Fan is a Ph.D. candidate at ITEE. He obtained his B.A. degree in Mathematics and Economics from the University of Kansas and earned his M.S. degree in Econometrics from the University of Manchester. Since 2021, he has been working as a People Data Science Researcher at Aurecon in Australia. His research interests include information retrieval, crowdsourcing, human computation, and computational social science.


Assoc. Prof. Gianluca Demartini

This session will be conducted in hybrid mode.
UQ St Lucia Campus venue: 46-442 or via Zoom: https://uqz.zoom.us/j/89362232168

About Data Science Seminar

This seminar series will be run as weekly sessions and is hosted by ITEE Data Science.


46-446 or via Zoom