Episode 16: The Power of Linguistics: Unpacking Natural Language Processing Ethics with Emily M. Bender
Emily Bender_mixdown.mp3 transcript powered by Sonix—easily convert your audio to text with Sonix.
Emily Bender_mixdown.mp3 was automatically transcribed by Sonix with the latest audio-to-text algorithms. This transcript may contain errors. Sonix is the best audio automated transcription service in 2020. Our automated transcription algorithms works with many of the popular audio file formats.
Welcome to Radical A.I., a podcast about radical ideas, radical people and radical stories at the intersection of ethics and artificial intelligence. We are your hosts, Dylan and Jess.
In this episode, we interview Dr. Emily Bender, who researches linguistics, computational linguistics and ethical issues in natural language processing. Emily is currently a professor in the Department of Linguistics and an adjunct professor in the Department of Computer Science and Engineering at the University of Washington. She is also the faculty director of the s.L M.S. Program and the director of the Computational Linguistics Laboratory.
Some of the topics that we cover in our interview with Emily include. What are the societal impacts and ethics of natural language processing or an LP? How can language be a form of power? How can we effectively teach ethics in the MLP classroom? And finally, how can we promote healthy interdisciplinary collaboration in the development of MLP products?
We want to thank Emily for coming on for this interview. And also, Emily is another one of those folks who has been supportive of this project since, you know, the first week that we've been around. I don't know if anyone has engaged with her tweets, with the ferocity that Emily has.
And we are so, so grateful for her mentorship and for her support as this project has continued to develop. So it was such a treat for us to finally be able to sit down with Emily on Zoome and interview her.
And we are so excited to share this interview with Dr. Emily Bender with all of you.
We're here today with Dr. Emily Bender. Emily, welcome to the show.
Thank you. I am super excited to get to be here. It seems like I've been looking forward to it for a long time. Will thank all your wonderful episodes.
So if we as as our listeners may or may not know if they follow our Twitter, you are someone who has been such a big supporter of the show since we launched. And so while we're on the line with you, I just want to make sure that we give a shout out to you for for your support of this project so far. And as we begin. Yeah, absolutely. And as we begin this conversation, I'm wondering if you can start us off by telling us why you do what you do. What is your motivation in in the work that you do, both as a researcher, but then also as a person?
That is such a wonderful question. And I think I want to answer it by sort of telling you a little bit of my story. I am. I'm a linguist, all right. I didn't know what linguistics was until I got to college. I always loved languages. And I actually got to be an exchange student in high school and found this to France. And I found that no matter how boring the situation was that I was in. If I could listen to how people were talking, then I was fine. And I sat through some really, really boring, like conversations, driving around with an adult, talking about stuff I didn't care about. But the way they spoke was was really interesting to me. And so I discovered linguistics fortunately in college. And all of my training is in linguistics.
So, you know, undergraduate UC Berkeley linguistics, graduate PHC and linguistics at Stanford working on syntax and sociolinguistics. And then I did not manage to get the job I was looking for as a SAN tactician or a sociolinguist. And I did get a job as a computational linguist, feeling quite, quite a bit of an outsider to that process. I got hired by the embassy of Washington in 2003 in a temporary position and then tenure track in 2004 to establish and run our professional masters program in computational linguistics. And I was like, OK, I think I can do this, but this isn't really how I think of myself. Right. And I think I at the time, you know, competition, linguistics in the early 2000s was all about statistical learning and stuff. And my research was, on the one hand, sociolinguistics and syntax. And we could talk about that a bit if you want, but also grammar engineering, which is building descriptions of languages on computers. And yes, I did manage to publish that in the main natural language processing venues, but it was definitely peripheral. So I thought is there is a room for me in this field. And what's happened over the last 17 years, I guess, is that I've gone from thinking, do I belong here to how do I make sure there's enough space for linguistics here? Because this field needs linguistics. And so that's been sort of my the last really less three or four years sort of understanding that and coming around to questions of societal impact of an LP. What I've noticed is that that sort of bigger picture of looking at how the machine learning fits into the scientific questions we're asking has been a very useful perspective for thinking about how does the machine learning fit into the social things that we're facing.
For folks that don't necessarily know what linguistics is or what a linguist does. Could you break that down for our audience as just a little bit? And especially looking to why why it why it's meaningful for you to do that work. Like, why it sounds like you really fell in love with that work at some point. And just like what? Why you're why you're passionate about it.
So linguistics is sometimes described as the science of language. It's a little bit contested whether we want to embrace sort of science as a moniker for what we do. But it is the scholarly study of language. So how do languages work? So they're their sound systems, the way words are built up, the way words combine to make sentences. The way the meanings of words combine to make the meaning of sentences. The way people use sentence meanings to achieve communicative ends. So all that sort of the structural part. But you can also look at how languages change over time. You can look at how language variation in language correlates with factors in the structure of societies that sociolinguistics. And you can look at how language is processed in the brain. So all of that is linguistics. And for me, it was really just what I was most interested in. And I was just thrilled to encounter it in my second semester of undergrad. And honestly, I was hooked from day one of that class. Like this is it. It took me the rest of that term to commit myself to a major in something that I saw as impractical. But I did because it was just it was just too great. So hopefully, does that make sense as to non linguists for what linguistics is?
Definitely. And in terms of computational linguistics, which is like one form of linguists or linguistics, I would love if you could do convolve the computational linguistics and what that means. And then also if you could share a little bit of your story of what it was like to start that program at UW, because you mentioned that you felt a little bit like a fish out of water there. And so I'm sure that. There were a lot of emotions and just a lot of things that probably happened that framed where you are today.
Yeah, absolutely. So computational linguistics is basically doing linguistics with computers. And that, from a linguistic point of view, is about understanding the using computer models to better understand what's going on with the language. All right. So in my case, grammar engineering is taking syntax, which is about writing down the rules of a language, how you make sentences and writing those down in a way that a computer can do it. So you can then crunch through a lot of text and make sure. Okay. Is it actually working across all these sentences? Computational linguistics is sometimes considered synonymous with natural language processing or MLP and sometimes as contrasted with it. And so you can say, well, it's called computational linguistics if you're into linguistics department and it's called MLP if you're in a computer science department. Not that those are the only academic homes for this field, but those are sort of the two ones that first come to mind. Other times people say, look, if what you're interested in is understanding how language works using computers, you're doing computational linguistics. If you're interested in building technology for practical ends, that has computers dealing with language, you're doing an LP. So there's sort of two different ways of looking at it. And starting the program was really interesting. And in many ways, I think our program is successful largely because I thought of myself as an outsider. And I sort of said, OK, it's my job here to create a professional master's program that's going to prepare people to go work in this industry.
It's important to me that this is accessible to people from both a linguistics background and a computer science background and other backgrounds as well. And that was important to me as a linguist. I didn't want to sort of say, now you have to be a computer scientist to do this. Turns out it's been incredibly important for the program because we have these wonderfully diverse in terms of intellectual training cohorts of students who learn from each other and get so much more out of the program than if everyone was coming in with the same background. So that is wonderful. And so me being a linguist, starting this program, I said, OK, I want linguists to be here. And that led to the program being more successful because it's more interdisciplinary, which I think is good. Another part of it was I decentered my own research interests out of really that sense of feeling. I didn't belong like I. We have a class in the program on grammar engineering. It's an elective, right. It's the people who are really interested in that. We'll take that class that we have loads of fun, but it's not a required course because that's not where the mainstream of the field is. And so I approach the problem of curriculum designed by saying, all right, it's hopeless for me to try to figure out what the main sort of tasks or ways of thinking about things are in design classes around that.
I don't want to do that because that's going to change too fast and I can't keep up with it. I don't want to design the program around maybe the classes that I want to teach in the classes that a couple of colleagues want to teach, because that's not going to be good preparation for the students. Also, we had the good fortune that the way the program was supported here at you, Deb, we got to design the curriculum first and then hire people into it. So that made it a better program. And so what I did was said, OK, well, what are the things that people do in computational linguistics right now? Let me break those things down into subtasks and then group the subtasks into classes. And then the program had an advisory board from the start, which was people all around campus and also in industry. And I said, OK, feedback, please. These are the the bundles I've put together, what looks like it belongs in the core curriculum, what looks like it might be electives and what's missing, what should I add? And that's led to a very robust curriculum that has changed a bit over the years. But that sort of foundational setup has stayed with us.
So as much as I think, you know, I come from a religious studies background and moral philosophy background.
And one of the questions that I have to grapple with is whether religious studies is impractical, I think was the word he used earlier.
So about linguistics.
And I'm I'm curious about your thoughts on the humanities in these conversation, because right now you're at this intersection between an LP or an LP, which is, you know, this intersection between the humanities and linguistics and computer science. Do you still believe that linguistics is as impractical as as you might have once did when you were in college, or have you turned the page?
I have absolutely turned the page. And it's now a situation where it is harder to get people to recognize the value of it. Right. It's I mean, there's actually lots of different interesting ways that linguists get employed out in the world. But it's not sort of the easy ticket into a well-paying career that, say, computer science would be, but it is extremely valuable. So I think that it's it is practical and important. And I think that a lot of the conversation around I'm going to keep saying societal impacts of an LP and avoid saying ethics. But it'd be fun to talk about why. So these conversations around the societal impacts of. The machine learning and I very much rely on being able to understand the world that the technology is fitting into. And I think that religious studies like linguistics, like sociology, like science and technology, studies, like anthropology, like psychology is really, really important in those discussions. And they just can't be had without that kind of expertise. And the expertise doesn't have to be one individual who's trained in, you know, A.I.M. out and something else. It can be a good collaborative conversations.
Does religious studies count in your list?
Am I doing something valuable? I mean, why validate me? Yes.
I think anything that looks into how humans build and navigate our world and how that world nurtures us or could nurture us better definitely fits into these conversations. And I, I just wanna sort of flag that. I only have the faintest idea of what religious studies is. And so I've answered that question from a place of not expertise, but I hereby validate you going back to an up and linguistic server.
What is the difference between societal impacts and ethics? And why are you using that wording?
Ok. So we've been calling this area within A.P. ethics and A.P. for a while. And we should also talk about how I got into that area.
But I'm going to say that to answer your question here, what I've noticed is that if we talk about it as ethics, that tends to point the discussion in a certain direction and that tends to go towards discussions of how do we balance the needs of different groups of people in society. And, you know, let's look at the various philosophical traditions and how they answer that question. And maybe also let's talk about creating a code of ethics. And I mean, all of these things are really valuable conversations, but I think they missed some important points and they can also lead people down some rabbit holes. So one of the important points that they missed is that this notion of balancing the needs of different people in society seems to presume, and I am not a philosopher, particularly Wolfert wild in philosophy. So I'm sure that people have engaged with this. But at the level that I've hit it, it seems to presuppose that those people are on a level playing field. And that is so not where we are, especially in the U.S. and especially in the kinds of adverse impacts that we're talking about here. It's not okay. How do we balance the needs of these equally supported, equally validated groups of people? It's how do we deal with the fact that those who already have power and prestige and privilege can make these decisions that harm other people. And so I think it's more valuable, especially from an education point of view, to train students and being able to think about what are the possible negative impacts in society of this technology.
And then what can we do to mitigate those along along a variety of strategies rather than saying how do we think about how to make the right decision? So rabbit holes or things like the stupid trolley problem, like it wants to suck all of the air out of all discussions. That seems to simmer down. But for a while is like the first time I taught the ethics and A.P. class, I had to basically just shut that down. We're not talking about that one. All right. Because it turns out it's not a very good model for the kinds of issues that we're actually facing with this technology. Number one. And number two, I think everyone always gets drawn to it because it doesn't challenge anyone's privilege. It's just this abstract question. So try problem is one rabbit hole, another rabbit hole is people saying, well, we can't possibly come up with something that everybody agrees with. So this is hopeless. We need one overarching ethical framework. That's not possible in this international context. So never mind. All right. Another one is this idea that it's all about sort of being a morally correct individual. And, you know, why are you messing with those sorts of judgments of people when I want to just work on, you know, these mathematical models and how to optimize them, how to train them efficiently and all of that? That's what's interesting to me. Don't judge my morality. Right. So these are reasons that I think the word ethics sometimes derails things.
I'm I'm wondering, besides the rabbit holes that we can get into. What you would say, again, for folks like myself who are still learning about an LP, what some of the big questions are that you ask it in your research. What are some of the questions that folks should know about when they start thinking about an LP, especially the ethical dilemmas that you might encounter?
So there's I think these are two very different things. So the big questions in MLP, sort of outside of societal impacts are things like, you know, how do we build algorithms that can actually understand what people are saying? Right. And that goes hand-in-hand with how do we evaluate the outputs of the algorithms so we can tell whether or not they're understanding what people are saying. You could also say, how do we build a system that can learn from appropriate data for any language as opposed to building systems that are well-designed to learn from English? And these are systems for things like speech recognition. So going from the sound wave to a written form or question answering someone poses a question in whatever language they're using. And the system, based on its machine reading of the web, comes back with an answer. And also to very specific applications, like we've got handwritten notes or dictated notes from a doctor after a medical visit. Should we be flagging this patient for a particular kind of urgent therapy and so on? So these are the kinds of things that get worked on in A.P. The questions around societal impact are things like, OK, it just if this doesn't work well for different groups of people because of variation in language. Right.
Different people speak differently. It is a fact of the world that there is language variation in every speech community. Is the fact of the world that in most, if not all speech communities, one variety of the language gets anointed the standard. And that's purely a question of who's in power and not about that variety being any better. Right. And so if you're building technology that's focusing on these standard varieties of a few languages. What happens when people who don't speak those varieties are excluded from being able to use the technology from the technology working well, for example, there as auto transcriptions of their podcasts? Right. And, you know, things like this. On2, when we train and this is everything comes up in any big data enterprise. Right. When we train machine learning systems on lots and lots of data where the collections of data are far too big to go through and verify by hand that they are reasonable. What kind of societal biases is that? Training. Training into the machine. And then what happens when we deployed in the world rather than just using it as a picture of what we learned on? We actually try to make it, you know, affect the world. Is it having the effect that we want it to have?
So these are the kinds of questions and these kinds of questions are topics that I'm assuming are brought up in your class, especially the one that's called the ethics of an LP, which maybe we should call the societal impacts of helping to stay with the lingo. Yeah. Could you tell us a little bit about what you teach in that class and just really what it's all about? Sure.
So there's there's actually two main ways that I'm putting these issues into our curriculum. One is this elective class, which is standalone and the other is trying to sprinklered out through the other classes because it's useful to have this class that focuses on it for a quarter. But you don't really want to keep the these societal impacts questions in a box off to the side where only the students who are interested come study it. So it also needs to be throughout. But in the class, we are doing things like asking. OK. So in the class I start with what is ethics? And we read from the philosophical literature and I was very intentional about avoiding centering the European philosophical tradition, didn't want to set it up. So there's like, you know, there's the Greek philosophers and their intellectual descendants. And then everyone else, because that seemed like a inappropriate way to go about this.
And so what I did was I basically collected a whole bunch of resources and I said to the students, everyone reads two of these and at least one of them has to reflect a very different social address than where you're coming from. So any individual student need to find an author or point of view that was very different from their own social address. And then I did, too. And so we would. Throughout this class, I use a strategy of divide and share where there's far too many readings. And so we all go into different readings with the same questions. And then we discuss those questions on the basis of what we've read. And it's been extremely fruitful with one hidden gem that I did not plan for. So, of course, evaluations at the end of the year, the first time I taught this. One of the students pointed out that that setup made it so much easier for them to ask questions because they weren't presumed to have read everything. And so it was OK to ask questions that came from a place of of not having read it because they weren't expected to have read it. And the discussions were so much better for that. So I was really excited about that unintended consequence.
I'm wondering if you can give some examples of places that we may see an LP being used that we may not we may not notice it because it's so ambiguous. It ubiquitous in our culture right now.
I'm so I'm especially thinking in terms of higher ed and how we might be seeing it used there. Basically, I'm looking for case studies about and I'll be looking for case studies.
Yeah. So Ed l.P gets used in automated essay grading. It gets used in plagiarism detection. So things like turn dot com. So there are some places where you might not see it. I think the automated essay grading tends to happen with like standardized tests. You would also use it and not know it. Whenever you get ads that are based on keywords. Right. That's an application of an LP.
You are using it without maybe really knowing it when you just do a search, because the the ways in which the search is not just flat out keyword matching between what you put in and what's there is a whole bunch else that goes in there, including the page rank algorithm. But also a bunch of keyword expansion. So using words similar to your words. You probably did notice it in the auto completion of searches. That's a kind of an LP. There are a bunch of companies promoting software to do automated employment interviews, screening where I think it's listening in on two people, talking to each other and then making judgments about the person. Or maybe it's actually just calling to talk to a robot. So these are places where it's. Oh, and then one of the one on social media frequently Facebook does it this way. It will translate things for you automatically. And you have to kind of squint to see that it was translated sometimes.
So in your class, when you're bringing up case studies like this, which I'm assuming is later on in the semester, how do you bridge that gap? Starting off with the philosophy literature into something like auto completion. Whereas the pipeline between those two.
So what I did the first time I taught this class and I said, I want to tell you the story of how I came to teach the class, which leads into this. So we have an advisory board for a master's program. And one of the people on that board now is Leslie Karmichael. She's at Microsoft. She's got a H.D. in linguistics from YouTube. And she's working over at Microsoft. And many things to do with initially speech recognition and then like privacy issues. And she sort of just nudged me and said, you know, your program ought to have something about this in the curricula.
I said, oh, that's a really good idea. Who can I get to come teach that for us? And I sort of, you know, looked around and we have Ryan Kalo in the law school U Dub, who does lots of really interesting stuff around technology and policy. And he seemed like an interesting person to bring over. But I couldn't get his attention at that time. And I couldn't find anybody to come to this. Well, I guess it's me. And so I rolled up my sleeves and this was late 2016. And I went and I basically just collected every single thing I could find that had to do with societal impacts of an LP. So things going wrong. And I was particularly interested in focusing in on stuff to do with the points where it's different for language than other areas of machine learning.
So that was one of my questions, was, is are there particular things we need to worry about here beyond what's happening in the fairness, accountability, transparency in machine learning, broader conversation that had been going on for a little while.
And so mostly what I was collecting was disaster stories. Right. Things that went wrong.
So the Tay chat bot and the asked Latanya Sweeneys work on the the way that if you put in an African-American sounding name on a search and Google in the US, you get back this has so-and-so been arrested ad versus if you put in a white American sounding name, you get background information on so-and-so. And this is a 2013 article I think in communications of the ACM, which he very carefully documents the phenomenon and the issues that it leads to and then doesn't because of mythologising. Let her do this. Shouldn't say why it happens, but there's sort of some hints there as to what might be going on. So these kinds of things, what what could go wrong? And I tried to sort of group them together thematically. And then each week we went in with some questions and we sort of divided these up and read them and and came back.
But I also didn't want to so that the transition from ethics to these what could go wrong things was not smooth at all. OK. You know, if we're talking about ethics and A.P., we don't need to reinvent the wheel here. There's been people thinking about ethics for millennia. Let's go read some of that and then. OK, now let's read about some other things that have gone wrong. And I think it helps to read the ethics stuff, but I'm not sure I think I might. I did two weeks of it the first two times I taught this class.
The next time I might make that smaller because it's not maybe I just haven't learned how to do it, how to have that inform the discussion of the rest of it. But these are these are the questions. So I don't want to leave people just with. It's all terrible. Burn it all down. Right. So we also look into applications where people are trying to use MLP for social good and thinking about that and also frameworks for how do we how do we do better? And through the course of teaching that class, the first time I connected with your Freedman, my colleague in the high school, you, Deb, and learned about the 25 year research program of value sensitive design, which is a wonderful set of metha. Colleges for how to do better. And so I've been incorporating that into the class.
One of the research questions, I guess, of this project of the radical podcast is how we take some of these conversations that we have in the classroom and bring it out into the world. And I'm wondering if you have thoughts on how we take these really just vital conversations about linguistics and l.p, especially taking into account privilege and power systems of power and systems of oppression and how we bridge that out into the products that are being made and being produced out in the world.
Yeah. So I think there's two things in the world. There's discussions in the world and there's products in the world. And for the products in the world, I think the goal is to, through education, provide enough training for people that when they are asked to work on these products, they know what questions to ask. And at what point to ask those questions, which is often really, really early in the design process. Right. Who is this going to affect? And to use some terminology for value sensitive design? Who are the direct and indirect stakeholders? So the direct stakeholders are the people who actually touch the technology and the indirect stakeholders are other people who are affected. And so in this idea, Sweeneys example, when someone else search for her name and I pop this thing saying has the tiniest when you've been arrested, she's the indirect stakeholder there. She didn't do the search, but she's the one accruing the negative impacts that people are seeing this over and over again with her name. So training people to know how to ask those questions early on and connecting people in conversations so that when they feel like the actual answer is no, we shouldn't do this, because sometimes that needs to be the answer. And they have networks that they can connect with because that's a really difficult thing to do on your own. To be the one person saying no is really, really hard. But if you have a network of people that you've connected to through these conversations, we can bounce ideas off of who can support you.
And so we see, you know, all the wonderful tech organizing that's happening and involves those networks. So I think that that's a thing that work in the classroom can do. And that sort of bleeds over into these questions of conversations. So I think social media has been extremely powerful for this. I get a lot out of Twitter. I know there's a lot of downsides to social media, but I get an awful lot out of Twitter and love it is around these discussions and connecting with people and sort of hearing more points of view, but also being able to get the word out. And then also in terms of the social media, the old media. Right. And all of the reporting on technology and being able to read that critically and then share those critical reading skills with our own sort of people around us is really important. That's another kind of out in the world. Right. And then one more out in the world thing is regulation. Right. So we all have multiple roles, visa envy, technology. We can be developers. We can be users. We can be indirect stakeholders. We can be members of the public advocating for good policy and we can be policymakers depending on where we end up. And I think that the more we have an educated public around how the technology works and what the possible downsides are, the better situated will be towards getting sensible policy.
And you mentioned earlier the table, which was an example of very bad mishap with an LP being pushed into the world and then being immediately shut down, which it definitely should have been. Do you have any examples of technology that did run the discourse of these conversations, whether with indirect stakeholders or with an organization or just got some sort of feedback that allowed the people who created it to actually iterate and fix some of the ethical problems, as opposed to just shutting it down and scrapping the project altogether?
Who I am sure there are some out there. Well, OK, so an ongoing version of that is the Google image search results.
So there's sort of this ongoing dialogue where people will say, hey, this is really harmful. And so if he ennoble you, I think you either had on or will be on soon, has done excellent work documenting that and the sort of initial reaction out of Google for me. And this is but as an outsider, I seem to be well, we're just reflecting back what's in the world. Right. Which was simply not true. All right. All right. And it's not true on a couple of levels. So what they're reflecting back is what's in their training data. And their training data is scraped from images on the Web that were labeled in a certain way. And so they are reflecting back the view of the world encapsulated in those training data images and that training set and with lots and lots of other decisions. Right. There's this it's not just a flat. So this is what comes in. That's what goes out kind of an algorithm. And so over time, if people have documented all of the various problems with the Google image search and the very awkward, unfortunate pairings of search strings and images that it can turn up, they've been tweaking that. So I think that's an example where the criticism was hard fought and like it's sort of this unfortunate imbalance where it's really easy to create and deploy the technology, not to minimize the work of the engineers at Google, but it's actually a relatively easy to do that and relatively hard to do the painstaking work of documenting the harm that it can do. But you can see it back and forth in that story.
I think one of the myths that at least I was socialized to believe in is that language is objective in some way. And I know now from especially researching the Bible in translation through years, that there are very specific decisions that are made in terms of data set in translation and language in general.
And this is kind of a leading question, but I'm wondering if you could put a a finer point on that. Like what? What are the politics of language or how should we understand the politics of natural language processing?
Ok. So. This touches on so many fun and interesting things. So there is so language is absolutely not neutral. Right. And the choices that we make to the word choices we make and the metaphors we use to describe things. Absolutely. Frame how they're looked at. One key I mean, there's George Lakoff does a lot of this looking at political discourse. And so you can look at, you know, sort of gun violence prevention versus gun control versus gun rights as different names for the same issue. And think about sort of what that highlights and how it frames things are. You know, it makes a huge difference. Right. You can look at the politics of gender in language and all of the questions around gender pronouns. And there's sort of a couple different ways of that. There was this. And actually historically artificial insistence on using he for an unknown, singular referent in English. Not so the use of they in that context where we just don't know anything about the person predates the use of he in that context. This was artificially imposed on English a couple of centuries back, and then it became the correct way of doing it. Bunch of pushback around that. And then, of course, more recently, there's the questions of, well, what about they for a specific person who's non binary? And how do we make room for that in the discourse? And, you know, is that if I am referring to someone with their correct pronouns. Right. Is that a political act? Well, yeah, kind of. But it's also just flat out a, you know, active of decency. Right. To just like you would refer to someone by their correct name, you refer to someone by their correct pronouns. So, yeah, there's all kinds of language is not neutral. Language is powerful and language. Language is languages. Power and power uses language. And this goes around and around and around. Right.
And when you throw natural language processing into that mix, if you do it without any sensibility to how power and language work together, then you have yet another example of effectively automating bias and that that automation step comes from data set to model.
Right. So do you think you could speak a little bit about the power of not just language, but language as it exists in a data set that is being trained on a natural language model? And basically how that process works and which steps in the process as a linguist or a natural language processing engineer, we should be asking these important societal impact questions.
Yeah, absolutely. So one really big part of it is when my linguists point of view, when we're looking at a sample of language, I think linguists are better placed to understand how that represents the language it has sampled from. So computer scientists have this bad habit of not even naming the language that they're working on. The English gets called natural language. And it's like, well, no, it is a natural language. And when we are building technology, we're making claims about what it can generalize to. So it's a machine learning setup. You have training data and you have test data.
And these tend to be relatively similar to each other, though there is work that's looking at training in one domain and testing is in some other kinds of data. But it's generally fairly close. And so if you're training data doesn't reflect the use case, then it's not going to work very well. And there's lots of ways in which you can end up encoding bias with that, basically. Well, we've trained it on the speech of people from the segment of society. And now we've deployed it where everybody has to use it. And guess what? It doesn't work equally well. And the test data determines how far you can generalize your claims. Right. So if you say I've built a system that can do machine reading, well, that's an overclaim, right? I built the system that can do machine reading of algebra problems written in English or, you know, whatever the the sort of specific thing is. And computer science also loves to go for the go big write and make. It's really interesting to compare, like the CSI literature to the biomedical literature and the cultures there around how you talk about what you've done. So one of the proposals that I've made together with your Freedman, my colleague from the high school, where of my sense of design is something called data statements, and this fits into a range of proposals that all came out in 2018. So Timna gave you ADLs data sheets and Meg Mitchell ADLs model cards. And a couple of others are all about documenting what went into it and how did you choose. So that if someone picks up this dataset or picks up a model training data set, they can reason about how it fits into their use case.
What a wonderful example about the natural language processing. And I feel silly because I've never the amount of times I've used that term.
Without ever asking natural language for whom or to whom is is ridiculous, it is just a bias that we carry and we just assume. Right, that that's that it is what it says, that it is what's on the label. And I'm wondering what what we do with bias is. I guess what I'm wondering, which is not necessarily a fair question, but there seem to be a few different strategies. One which computer science seems to have gone the last several years, at least until recently, which is we have to eliminate bias. Like if we eliminate bias and LP systems, then we're gonna be golden. And I have some thoughts on that. I would be curious about yours. But then also there's this seems to be this other strategy, which is to name what our bias is and then I guess take it as it is and see what we can do with it. And there may be others as well. But my question to you is, what do we do with bias and entropy?
That's a great question. I think you've summed it up really well in terms of especially around the bias. And this is called word and beddings, which is a way of representing what a word means based on all of the other words that it co occurs within text. And there's a problem there, because the way people talk about the world is neither the way the world actually is nor the way we want the world to be. And if we learn word meanings based on this, then we end up with these very biased representations. If we forget about that bias and then use these representations to do for the things down the line, and we're going to pull the world more in the direction of the the way we don't want it to be. Right. And there's a wonderful, very clear example of this that comes out of work by Robin Speare on sentiment analysis for restaurant reviews. So she's looking at Yelp reviews and which sort of a naturally occurring dataset where you have the input, which is the text and the output, which is the number of stars and a component in the system. Also, is these word and beddings based on the general web text in English, which I like to refer to sometimes as general web garbage. All right.
So when she's training up her system, looking at the restaurant reviews is basically saying, OK, I'm training a system to to predict the number of stars based on this text. But instead of using the words as they are, I am going to use the words represented by these word and settings. So the words as they're using this much larger collection of text. And I think she also then cedes it with a sentiment lexicon. So she knows that terrific means good and men means so, so and disgusting means bad. And and what she discovers actresses built the system is that the system systematically under predicts the stars for Mexican restaurants. Right. So what's going on there? What's going on there is that in the general web garbage, there is a whole bunch of the very toxic discourse in the U.S. around immigration and immigration from Mexico and through Mexico, such that the word Mexican in this word, embedding representation looks like other negative words. And so if the Yelp reviewer called the restaurant Mexican, then they clearly have said something negative about it as far as the system is concerned. So it's fear goes on to improve the word and beddings and find a way to pull out that bias.
And you can sort of see that that's true because the sentiment analysis test starts working better. So it is possible to mitigate bias in word and beddings. It is not possible to completely remove it. And there is a great paper by Gotan and Goldberg that says, even if you think you removed it by looking at your sort of typical measures of bias, you may not have. You might just not be able to see it with that measure because you are optimizing effectively on that measure to remove it. So and that was a 19 paper. So what do we do? And I think it's a combination of documenting at a superimportant we have to know that it's there and then we have to say, OK, what's our use case for this technology? How is that by us going to play out in the world? How does that affect what we can reasonably do with the technology? How do the fixes help? Are they enough for our use case? And so it's always about sort of stepping back and saying, how is this going to be deployed in the world and who's going to be affected?
Yeah, and a lot of these questions are really interdisciplinary. Right. Because I know in my computer science classes, when I learned natural language processing, I didn't learn about looking into the bias of my data set or looking into the bigger implications of the deployment of my model. I just learned how to create it. So I'm wondering in these big, complex systems that are deployed where a lot of interdisciplinary conversations either are or should be happening. What is a healthy relationship between the domain expert, the computer scientist, the linguist, and how can they interact in an effective way to really help not create biased systems or mitigate the bias at least as much as possible?
So what is a healthy relationship is easier for me to say what's an unhealthy relationship? I spend a lot of time critiquing that, but it's always good to sort of imagine the like, you know, what would be the good thing. So I think, first of all, as with any relationship, the fundamental of a healthy relationship is respect. And so in an interdisciplinary collaboration, if the different parties start with respect for the other person's knowledge and also self respect, each person has to come in and say, I've got something valuable to contribute here and I value what you're contributing. And then you think you're off to a really good start for a healthy relationship and a healthy collaboration. And the flip side of that is, and I'm going to turn negative is right now, I think largely because of where the power lies in industry and in academia, it is very easy for people to do machine learning to say, OK, we're going to define the problems, we're going to find the success criteria, and then we're gonna show that we've solved the problem and the people who are the domain experts. And sometimes if the problem is a language problem, then the linguist, our domain experts. But if the problem is like, you know, biomedical A.P. problem, that is doctors who are domain experts. And so when the machine learning people say, OK, I've solved this problem, the domain experts go, no, you haven't.
Or what problem or how does that relate to anything I'm interested in? That's where it it falls apart. But it is difficult to get heard as an outsider, as an on machine learning person, in the machine learning context. And I think that somehow I've managed to do that in some ways, like I meditate to sort of worm myself into the community enough that that sometimes people listen or at least they know I'm talking. But it's taken both a lot of persistence, which I probably wouldn't have done if I hadn't been set on the task of running this masters programs like. Okay, well, I'd better pay attention to this community now, because that's the community that my students are going to need to belong to. And so I need to have a sort of broader view of what's going on. And so I sort of had motivation to do it. And I had the interesting position of coming in with sort of relative seniority. Right. I mean, I started the message program as an assistant professor, but at least I was a faculty member. Right. And so I I came in as an outsider, but an outsider with my own sort of security within my own job context that I think made that a little bit easier than it would have been otherwise.
So as we are on the radical A.I. podcast, we like to ask about radicality, as you know, from listening to our show. And so we're curious for you what that word radical means for you in this context. And then also if you situate your work within that radical context and to what degree you might situate your work in that radical context.
So, as you know, I had a long time to think about this question because I've heard you asking lots of people and and I've loved everyone's answers. I have to say. And I think for me, radical means looking at the world as it is and saying this could be different. And that's the flip side of understanding the world as what we've made of it, what we collectively make of it. And that is just, I think, obviously true for social things. So legal systems, economic systems, all these things are things that humans have created. And but it's also very much true about the physical world. Right. I mean, obviously, the laws of physics are the laws of physics. But what's going on with our climate is because of our collective actions. Right. And how well we manage the current pandemic in any given locality is directly the result of our collective actions. And so I think for me, an idea or a person is radical. If they can look at the status quo and say, hold on. That's not set in stone. That's not given. That can be changed. And sort of thinking through that definition, it occurred to me that that that's a value neutral statement of it. Right. You've got people can look at this and say, well, there's a market I can disrupt and I can make a lot of money for myself by disrupting that market.
And I would put a negative value on that way of looking at it. Or you can say, I see this system and I see people are suffering in this way and it can be fixed. It can be changed. And it's also a sort of thinking that if you go further, it seems to me that our current existing systems, both sort of regulatory environments and systems of power, make it easier for an individual or small group of people to make the kind of like I'm going to disrupt a market and make a lot of money change as opposed to I'm going to improve the lives of people change, unfortunately. And I also think that to be a successful radical was sort of positive value requires not just seeing the change, but also understanding the system as a whole so that the net result is positive because you can come in and say, all right, this is bad. And people are suffering. So I I'm just going to blow it all up and then it'll be better. And if you don't understand the ways in which it is good and functioning and how they would fit into the new solution, then I think that that radical can lead to negative outcomes as well.
How do you think that linguistics can be a positive?
Destructive force in the world, especially in technology, but also in general.
Yeah. So I think the there's two main contributions that I see along those lines for linguistics. One is linguists focus on the language and sort of how language works at a whole bunch of different levels. And oftentimes people who are working with language technology are sort of just trying to get past the language. There's information that they want and it's encoded in language and the language is kind of in the way. And so bringing that understanding of actually how language works can lead to technology that is just more successful in its own right. And also more inclusive, because one of the key things that we know about how language works is sociolinguistic variation in the way language varies. And so those are sort of the two understanding how language works and especially understanding how language works. In terms of Sesana, linguistic variation are important.
So as we reach the end of this interview, something that we usually like to do with our guests is to ask a piece of advice that's really relevant to some of the work that you're doing. And what I'm wondering today is if you might be able to offer some advice to both sides of the equation here. So the linguists and the domain experts who aren't linguists who might be getting pushed out of the conversation when they shouldn't be, and the computer scientists are machine learning experts who might be unintentionally pushing some people out of the conversation. What advice do you have to give for better collaboration and just better communication so that we can work towards building better technologies?
So it sounds like I'm giving relationship advice here. Yeah.
So I think to the people who are domain experts, I would say have some patience around the way the discourse comes out of machine learning and computer science, because there is a lot of powerful technology there. And by being someone with a domain expert, you're in a position to direct that power in positive directions. And so engaging is worthwhile. And it's OK to laugh when someone claims they've solved your whole field and, you know, roll your eyes and stuff. But don't don't take that as a reason to walk away. Like, engaging is useful and powerful. And the domain experts have a lot to contribute. And I would say to people who are coming at it from machine learning point of view, if what your primary research interests is, is, well, OK. There's two things I wanted to be sure. Two loving people. The first one is your primary research interest is how to get machines to learn things. Then it's really important to work with domain experts so that you can actually show that your machine has learned something. If you if you're working on Toye tasks or tasks that that aren't grounded in the world, then you don't have that validation for your algorithm. That's the first one. And the second one is coming back around to the societal impact question that if you're building something and especially you're building something where you motivated in your grant proposals or in the instruction to your papers by talking about how it's going to help the world in X, Y, Z way, you're talking besame that's going to go into the world and affect people's lives. And so that means that you have responsibility to engage in the dialogue around that as well. Now, you can't predict everything. It could possibly go wrong. You can't ensure that you're only making things that are positive, but you do have a responsibility. Be part of the conversation.
As we wrap up, I just want to take a second to sit with the what you said about this being relationship advice, because I think it was it's super profound for us because it's I mean, that's that's what it's all about, as far as I'm concerned, is like not just the relationships, like the linguistics is from where I'm sitting, like the study of relationships and also how we collaborate is is the study of relationships. And one of the ways I think we can all do this better is by caring for our relationships a little bit better.
So anytime you want to give relationship advice, I think out there will be a place for you to do it.
Dr. Bender, thank you so much for joining us today. And, of course. Thank you so much for your support and ongoing support of the radical AA podcast. It's been a pleasure.
Yeah. Well, thank you for this wonderful space you're building. I'm really excited to get to be a part of it and keep it up.
We want to thank Dr. Emily Bender again for joining us today for this great conversation. And as always, it is now time to do our immediate reactions and our initial debrief. So one of my immediate reactions from this conversation, and probably because this is one of the last things that we were talking with Emily about is what she was talking about in terms of a healthy relationship between interdisciplinary collaborators for machine learning products. And this is something that I think about quite a bit, because I situate myself as a technologist in an interdisciplinary team when it comes to machine learning. And I know that oftentimes the people who are computer scientists or the machine learning experts might not have a sense of respect for the people who are experts in these social science domains and the ethics domains. And so hearing Emily talk about that urgency for respect and the need for respect within our groups that we collaborating with, that really stuck with me quite a bit because I think it's so true. If if we are going to create ethical products, we're going to need people from many disciplines working together. And the only way for us to work together effectively is for us to respect ourselves and to know that we have something to contribute and also to respect those that we're working with and to know that they are bringing something important and necessary to the table, oftentimes on the show with our guests.
We reflect on the concepts of power and how power is wielded and how it's embedded into these technological systems. And for me, one of the things that I research and also that I'm passionate about is how power is embedded in language that we use. So whether that's categories and I know just I talk your ear off a lot about like how do things fit in certain categories versus other categories? It's probably annoying to certain degree, but that's because I am really passionate about, you know, how how can we be intentional about how we're talking about the reality that we live in? Because I believe that how we talk about something and the categories that we utilize to either fence something in or to broaden how we talk about the world around us has real implications and and real power behind it. And we saw that in what Emily was talking about quite a bit in natural language processing, that how we use language and how some words are tethered to other words and how concepts are tethered to other concepts in dramatic ways, like how we're teaching machines. How to do that has real ethical implications. Right. Language is not this objective thing that's out there.
It's something that we are constantly creating and co creating. And every time we use words like even when I'm talking right now, I'm bringing with each one of those words certain value statements, every word that I'm using means that I'm not using another word, even just like how I how I talk about your name. Right. Like I could call you Jesse Kelly, Jesse. I could call you like Jay. I call you whatever. And each one of those has a very particular set of either connotations or value statements behind them. And it's the same way with whatever you want to call me. Right. If you call me like Reverend Dre because I'm an ordained minister. That's very different than if you use my first name, Dylan. And that's like a benign example. But I think it's an example that really gets to the heart of this like power differential of how we not only talk about people and categorize people, but also how we categorize things and then how we embed those categories into data, which I guess is all just to say that language matters. And that's what I'm really taking away from this conversation.
Yeah, language is definitely really interesting when we're talking about the relationship with power, because Emily said it herself. She was saying, you know, language is power, which is something that we see a lot on the show. Like you mentioned, you know, data is power, algorithms are power that coders decisions is power. But the interesting thing about language is that power uses language is so language is almost like a power enabler as well. So it's like this meta idea about power and how we are influenced by the way that we use language. And we are also using language as a force to represent power. It's almost like this feedback loop. Really, really interesting.
Yeah, I've actually been thinking about this a lot. So we use a transcription service for these episodes. If you go on, you know, radically I dot org. We have a transcript. Sometimes a transcript is is fine. Sometimes it's good. And oftentimes we have to do like some basic editing on it because it says something blatantly offensive, because it's just taking the words and whatever the sound file, and it's translating it into into English. But what's really interesting to me in playing around with that transcription service is that there is such a difference in accuracy when it is a someone from a European country who is white and who grew up with English as their first language vs. anyone else. Like even someone who's speaking with an English accent or whose English is not their first language. There is such a dramatic drop off of accuracy in that, which shows me that there is like a certain model that these transcription services have been trained off of. And in that, there's so much that's being said that's commentary cricket on how race, gender accents, etc are embedded and then embodied in these.
I guess like you would think that that would be a benign algorithm. But really like it's about communication, too.
And that's almost it's kind of startling to me.
Yeah. And that's just that's like an explicit example of how we can literally see right in front of us with some of the consequences of this algorithmic bias is when it comes to natural language processing. But then Emily was also talking about some of the unintended consequences of the things that we don't see, like the word and beddings, you know, and these huge potentially really harmful value statements and value judgments that are being made with this technology that we actually don't even recognize or understand or have an ability to audit in the way that we can. With a transcript that's sitting right in front of our eyes.
And there's so much that we could say about this, as usual.
But we'll talk much more about this and our upcoming mini soad in a few weeks. So for now, for more information on today's show, please visit the episode page at Radical A.I., Dawg.
And if you enjoyed this episode, we invite you to subscribe rate and review the show on iTunes or on your favorite continent. Join our conversation on Twitter at Radical, a iPod. And as always, just go for it.
That's because they do it every time. Because, as you say, a better shot of the Arctic straight. Well, it's my favorite line to say is. Thank you. I appreciate it. Because you say it so well.
Yeah. Birkitt.
Automatically convert your audio files to text with Sonix. Sonix is the best online, automated transcription service.
Sonix uses cutting-edge artificial intelligence to convert your mp3 files to text.
Rapid advancements in speech-to-text technology has made transcription a whole lot easier. Sonix converts audio to text in minutes, not hours. Create and share better audio content with Sonix. Are you a radio station? Better transcribe your radio shows with Sonix. Better audio means a higher transcript accuracy rate. Are you a podcaster looking for automated transcription? Sonix can help you better transcribe your podcast episodes. Sonix has the world's best audio transcription platform with features focused on collaboration. Do you have a podcast? Here's how to automatically transcribe your podcasts with Sonix.
Sonix uses cutting-edge artificial intelligence to convert your mp3 files to text.
Sonix is the best online audio transcription software in 2020—it's fast, easy, and affordable.
If you are looking for a great way to convert your audio to text, try Sonix today.