Artificial intelligence (AI) systems are increasingly called on to apply normative judgments to real-world facts on behalf of human decision-makers. Already today, AI systems perform content moderation, offer sentencing recommendations, and assess the creditworthiness of prospective debtors. Before our very eyes, the issue of aligning AI’s behaviour with human values has leapt from the pages of science fiction and into our lives.
It’s no wonder then that concern for the calibration of machine behaviour to human norms is pervasive.
How can we train machines to make normative decisions? By “normative,” we mean what we should or shouldn’t do. Machines already make factual decisions, but in order to be effective and fair decision-makers, they need to make normative decisions much like human beings do.
The literature on best designing AI agents to make decisions compatible with those of humans remains sparse. But in a pathbreaking new publication, Aparna Balagopalan (MIT), and co-authors David Madras (University of Toronto), David H. Yang (University of Toronto), Dylan Hadfield-Menell (MIT), SRI Director and Chair Gillian Hadfield (University of Toronto), and SRI Faculty Affiliate Marzyeh Ghassemi (MIT) — demonstrate empirical evidence on the relationship between the methods used to label the data that trains machine learning (ML) models and the performance of those models when applying norms.
Their paper, “Judging facts, judging norms: Training machine learning models to judge humans requires a modified approach to labeling data,” was published on May 10th in Science Advances. Its results challenge conventional wisdom on human-computer interaction and reducing bias in AI.
Why labelling data with value judgments instead of facts can yield better results
Much of the scholarship in this area presumes that calibrating AI behaviour to human conventions requires value-neutral, observational data from which AI can best reason toward sound normative conclusions. The new research presented by Balagopalan and her co-authors suggests that labels explicitly reflecting value judgments, rather than the facts used to reach those judgments, might yield ML models that assess rule adherence and rule violation in a manner that we humans would deem acceptable.
To reach this conclusion, the authors conducted experiments to see how individuals behaved when asked to provide factual assessments as opposed to when asked to judge whether a rule had been followed.
For example, one group of participants was asked to label dogs that exhibited certain characteristics (namely, those that were “large sized … not well groomed … [or] aggressive”). Meanwhile, another group of participants was instead asked whether or not the dogs shown to them were fit to be in an apartment, rather than assessing the presence or absence of specific features.
The first group was asked to make a factual assessment, and the second, a normative one.
Participants were more likely to descriptively label objects as possessing the factual features associated to norm violation than to normatively label those objects to be in violation of the concerned norm. To put it another way: human participants in the experiments were more likely to recognize (and label) a factual feature than the violation of an explicit rule predicated on the factual feature.
Training machine learning algorithms from these two categories of labelled data, normative and factual, led to significant discrepancies in model performance. ML models over-predicted rule violation when trained on data labelled with factual features compared to when they were trained using data that was labelled to reflect rule violation directly.
Implicit in this outcome is a watershed conclusion: labelling data with features that reflect norm-violation, rather than labelling them relative to norm violation in itself, produces labels that overstate the ubiquity of norm violation. Reasoning about norms is qualitatively different from reasoning about facts.
This important difference was then verified relative to potential confounding factors. First, the authors confirmed that the meaningful differences between the “descriptive” and “normative” labels that research participants applied persisted for those objects that were subject to considerable disagreement in labelling (e.g., controversial edge cases). This provides evidence that AI taught to reason about rule adherence from factual data will deviate considerably from human expectations in those contentious contexts where we should most hope to rely on it. The problem identified here is of great practical concern.
Second, the authors assessed whether providing additional contextual cues to those performing the “descriptive” labelling exercise might lead them to apply labels more consistently with the group applying “normative” labels. If so, the “descriptive” cohort might have less information about the setting in which the factual feature was being assessed than did the “normative” cohort. However, even when participants in the “descriptive” group were provided with further context regarding the setting in which the facts were being assessed, profound differences in outcome remained across the “descriptive” and “normative” contexts.
Empirical research in law and psychology has previously explored how the framing of legal questions, including factual and normative questions, influences decision-making. Such research shows that the formulation of a question can significantly affect the responses received. These prior findings support the conclusion that using factual data to teach AI about norms will produce AI agents that apply norms more harshly than humans do.
The importance of building fair and equitable ML systems
Efforts to build AI that is fair, equitable, and aligns with human value judgments should structure their data collection practices accordingly. This means resisting the impulse to train ML models to perform adjudicative tasks using observational data. Doing so can be tempting, both because such data is often readily available, and because the value-neutral labelling of data appeals to intuitive notions of objective and non-discriminatory judgment.
Ensuring that the data used to train decision-making ML algorithms mirrors the results of human judgment — rather than simple factual observation — is no small feat. Proposed approaches include ensuring that the training data used to reproduce human assessments is collected in an appropriate context. To this end, the authors recommend that the creators of trained models and of datasets supplement those products with clear descriptions of the approaches used to tag the data — taking special care to establish whether the tags relate to facts perceived or judgments applied.
Contemporary lawmaking efforts, including the European Union’s flagship General Data Protection Regulation and ongoing Canadian efforts to create a comprehensive law regulating private-sector data use and artificial intelligence, incorporate legal requirements directed to de-biasing AI. These regulatory developments mirror public outcry for demonstrably fair AI. Balagopalan and others’ research provides empirical evidence to guide research and development efforts in this nascent field.
The principal lesson to be drawn is that curating impartial datasets about the world will not be enough to produce AI that is sensitive to social norms. Rather, careful consideration of the particular kind of judgment one wishes to reproduce, and the development of datasets and methodologies tailored to that purpose, is required.