'We, the Data': Wendy H. Wong on human rights in the age of datafication

October 4, 2023 by Schwartz Reisman Institute for Technology & Society

Data are an integral part of our lives, argues Wendy H. Wong, to the extent that they have become an essential part of who we are. In her new book, We, The Data: Human Rights in the Digital Age (MIT Press, 2023), Wong offers a rallying call for extending human rights beyond our physical selves, and why we need to reboot our understandings of rights to match the challenges of datafication. 

Wong is a professor and principal’s research chair at the University of British Columbia Okanagan’s Department of Economics, Philosophy, and Political Science, and a faculty affiliate at the University of Toronto’s Schwartz Reisman Institute for Technology and Society (SRI). She studies global governance, with a focus on emerging technologies such as AI and big data. She is the author of two other award-winning books, Internal Affairs: How the Structure of NGOs Transforms Human Rights and The Authority Trap: Strategic Choices of International NGOs.

Wong will participate in an SRI book launch for We, The Data at U of T’s Munk School of Global Affairs & Public Policy on October 20, 2023. Registration for the event is free and open to the public. SRI provided initial support for the book, enabling Wong to develop its core ideas and initial drafting while serving as one of the institute’s inaugural research leads. In advance of We, The Data’s upcoming launch, SRI interviewed Wong about the book’s key takeaways.

The following interview has been condensed for clarity and length.

What inspired you to write We, The Data?

Specifically, it was a conversation I had with a former advisor about what artificial intelligence (AI) was doing to human life. But more generally, when I came back from maternity leave, I started to think about the intersection of human rights and sharing personal information online. At the time, I found it troubling and wasn't sure how to pinpoint it. I realized I needed to know more about AI and what people were thinking about it, so I started reading more, and it scared me. That's the honest answer.

The way technologists talk about AI is very different from how I think about it — the social and political implications are often left out completely. I work on global governance issues and saw this as an opportunity to explore. It started me down a long path where I had to learn a lot of things quickly, which is where SRI came in so nicely in the formative days of this project.

What is datafication, and how is different today than in the past?

Datafication is the concept that all our daily activities, thoughts, and behaviours — anything that we use a digital interface to figure out — are all being turned into digital data. This is very different from 20 years ago, when we didn't have smartphones and Google hadn’t started tracking the sort of behavioral surplus which Shoshana Zuboff describes. This explosion of data happened around the mid-2000s, and that’s what enabled many aspects of AI development.

One of the first things we need to consider when we think about AI, regardless of the definition, is that AI needs three things: computing power, algorithmic innovation, and data. Not only is data essential to the AI process, data reveal all kinds of human behaviours, and that’s the most troubling from a human rights view.

What do you mean when you say that data are “sticky” and why is this so important?

Book cover with title: 'We the Data' with doodle images of many people on cover
Wendy H. Wong’s We, The Data: Human Rights in the Digital Age explores the impact of data on human rights.

The reason I started calling data sticky is because it conveys this idea of gum on your shoe. It’s not going to be easily rid of — we’re not going to get out of this world of datafication. We may not even want to get rid of it, frankly, because data and AI have great potential to improve lives. My trouble is with the framing. I’m looking for what can we do to make them a little less sticky.

Data are sticky for four reasons. First, they’re about mundane behaviours, like things you can’t avoid: walking, taking transit, using any search engine. Second, they are effectively forever because we don’t know where they are going — we know corporations collect data, but they also sell and share data. They pool the data. And that’s the third point about stickiness: data are linked. Once collected, they don’t just stay in some neat little dataset. That’s why they’re so useful, because they’re easily transferable and copyable. But the final reason, which is maybe the stickiest of all, is that data are co-created: any piece of data is a combination of a person acting and a source wanting to record that behaviour. If you’re missing either piece, you don’t get the data — somebody has to be interested in recording some kind of behaviour and someone has to actually do that behaviour. Because of this co-creation, it creates a lot of problems for the way we apply human rights, because there’s no clear owner, even if there is a clear harm or effect.

Why are human rights so important for understanding the impact of data?

 A lot of people use the term “human rights” in thinking about AI without reflecting on what they’re for. Typically, people say: it’s about privacy; it’s about freedom of expression. Those are two very important rights, but they are not the only human rights. In fact, there are dozens and dozens of human rights at the international level. In this book, I didn’t want to go through and explore different existing rights and explain how data affects them, because how they are affected may change as AI advances. That wasn’t really the point. 

Instead, I wanted us to have human rights tools that could be more enduring. There are four big values that motivated the 1948 Universal Declaration of Human Rights, which is the touchstone in international frameworks. The writers of that document wanted to protect four values: autonomy, which was originally called liberty; human dignity; equality among humans; and community, originally called brotherhood. Maybe not individually, but all together, when we think about all the human rights that exist in the world, these four values are at the foundation. I want to get back to these foundational values to think about what parts are being changed by datafication — sometimes for the better, sometimes for the worse. 

This is how I arrived at the book’s structure, which walks through some of the things we all encounter in our lives. Claims about data rights — how does data stickiness affect the way we think about property rights, and how does it matter? Thinking about the role of Big Tech in enforcing human rights, because international law applies to states, and only very rarely to corporations and other non-state actors. I think about facial recognition and its implications for dignity, autonomy, equality, and community. I talk about privacy in that context, too, and why it’s kind of strange to talk about your face as a private piece of you. I have a chapter on what happens to data when we die, because that is a key consideration for all of us going forward — there’s so much out there about our activities that will remain when we’re no longer physically here. I end the book with a call for expanding our idea of the right to education to include data literacy, because it is such a fundamental part of what it means to experience human life now. We should include that as part of what every human is entitled to benefit from, and to claim as part of their experience with their life.

Why are the algorithms giving us the information they do? How is our relationship to the world a process of data creation? Data literacy enables people to see the number of choices along the way before we get some sort of output from ChatGPT ... if you demystify the process, that’s a huge step. 

There is an interesting tension in the book between the ways we think about rights as individuals versus as a collective. What challenges does this create for governance?

The book is trying to resist the tendency for all of us to think we’re in this alone. This is why we are all data stakeholders; we’re not just subjects. We have common experiences to pull on, and common grievances to express and figure out solutions for. Because datafication and AI have largely come from corporate entities and Western societies who think about individuals as consumers, we have this market-driven individual language. But what a lot of people miss in this conversation — and why I resist calling it “your” data or “mine” — is that data are also collective. 

It’s easy to think about how data affect us individually, but data are about groups of people like us, and, in fact, that’s their usefulness. If a company was just collecting data on you as an individual, it wouldn’t be that useful — instead, they’re collecting data on millions or billions of people, and they’re able to sort through all those data to target specific groups. That is the real power of data and AI systems, in terms of prediction. 

However, it's hard to think about these implications because the collectivities made through algorithms are not chosen by us, nor are we typically aware of where we fall within them. We just don't know about them, and yet they affect the ways that we, as participants in political and social life, can interact. If we don’t know what groups we’ve been put into, or if we don’t have any ties to those groups, how can we act politically? That’s the whole basis of collective action. In my previous research on global civil society, I looked at NGOs and social movements. They all have a common cause. If we are missing that idea of collectivity and common identity, it’s hard for us as a species, at least at this point in history, to act against the status quo.

What initiatives are most important for helping the public develop self-realization as data stakeholders?

Data literacy is probably the most important, large-scale, long-term solution. Data literacy does not mean educating people on how to use a computer or a smartphone, but why those devices do what they do. Why are the algorithms giving us the information they do? How is our relationship to the world a process of data creation? Data literacy enables people to see the number of choices along the way before we get some sort of output from ChatGPT. There have been many, many decisions about what kinds of data to use, how to program the algorithms, and what kinds of assumptions are built into AI models to spit out a humanlike output. I don’t think everyone needs to be a data scientist or AI researcher — what we need is to understand these systems at a basic level. If you demystify the process, that’s a huge step. 

Another step is for governments to enact better policies. The US is trying to create AI policy right now, and they’re asking AI companies to define what AI is. I suppose that’s one way, except these companies have a real stake in defining AI in a certain way. So why not ask other people who don’t have financial incentives to create mystery around AI technology? It is very complicated and sophisticated, and I’m not saying that these algorithmic innovations are not amazing in and of themselves. But they’re also the result of a lot of brute compute. We need to demystify that bit and foreground this idea of data being important to this magical AI process, because there’s a disconnect right now. I’m disappointed in the way some conversations are going, where AI and algorithms are divorced from the data that power them. We’re not talking about the fact that if we regulate data, we’re effectively regulating algorithms. We don’t need accountability in the same way from these black boxes. What we need is social and political constraints on how data are collected and used.

Have you observed any shifting trends around data literacy over the course of writing the book that felt challenging or validating to your arguments?

I've been really heartened by the way the conversation has changed. When I first started writing, it took more work to find critical pieces coming out of journalism. Kashmir Hill’s work on Clearview AI, MegaFace, and other facial recognition datasets was really useful. You still see some technological triumphalism among journalists, and people are really excited about new developments, but I do think there’s also a more skeptical strain that has arisen. 

Part of what I hope the book can shift is that the conversation shouldn’t be about privacy. If we limit the conversation to privacy, we’re going to get a “So what?” response, because many people ultimately believe if they’re not doing anything wrong, who cares? Or, if companies are going to funnel me products that I care about, why does it matter if I lose some privacy? I just don’t think that’s the right way to think. It’s not about privacy, it’s about autonomy and dignity. It’s about how we ought to be treated: not as means for profit. We are human beings, living social, cultural, and political lives, in addition to our economic lives. 

Every development that has come about as I’ve been writing this book has really told me that data was the right way to go. I’m not a computer scientist or a computer engineer, and I’m never going to understand the technology as well as these people. But it’s not about that. It’s about how we collect data about people, and how we use data about people. Why are the ways that we’re doing so right now wrong from a human rights perspective? How are we eroding each other's autonomy? How are we treating each other as products as opposed to as people?

Want to learn more?

Categories