University Data Science Programs Turn to Ethics and the Humanities

Data scientist Mark Madsen has been programming and crunching data long before buzzwords like artificial intelligence and machine learning were common. So when the field really started expanding around 2010, Madsen, who works near Portland, Ore., began receiving requests from local colleges and universities asking for tips about crafting their data-science curriculum.

Madsen and the institutions that reached out agreed that courses in mathematics, statistics and computing were crucial. But he says the conversations often fell off once he suggested adding subjects like history of science, philosophy or communication.

“Very few of them had any interest in interdisciplinary programs,” is how he remembers it. Madsen, who advises software companies these days, is still concerned by that attitude, and about how he sees it impacting industry practices—and society more broadly. “Formulating a product, you better know about ethics and understand legal frameworks.”

Madsen recently tweeted his frustration, spurring a conversation about ethics in data science curriculum, what’s missing, and where programs are already working to fill in the gap.

A half dozen universities asked me for help designing a data science curriculum at the time it was heating up. When I would suggest adding subjects like critical thinking, philosophy, history of science, communication, visual design, or anthropology, they would stop talking to me
— Mark Madsen (@markmadsen) December 26, 2018

These days a growing number of people are concerned with bringing more talk of ethics into technology. One question is whether that will bring change to data-science curricula.

Following major data breaches and privacy scandals at tech companies like Facebook, universities including Stanford, the University of Texas and Harvard have all added ethics courses into computer science degree programs to address tech’s “ethical dark side,” the New York Times has reported.

But when it comes to data science, which often deals with gathering large data sets and making predictions about human behavior, some schools are taking a different route. A handful of programs are emerging from the ground-up with interdisciplinary approaches embedded.

One example is the Social Data Science Program at the University of Oxford, which brings the social sciences and computational sciences together. Housed in the university's Internet Institute, the 10-month master’s program is currently in its first year with a cohort of 26 students.

“The goal is to get students to think critically and connect data science with social-science theory,” says program director Scott Hale.

The program has courses one would expect of a data science program: programming, statistics and research methods. And it also includes courses on subjects like philosophy and ethics of information, sociological analysis and online social networks, and data science for government and politics. “Research ethics is baked in throughout the curriculum,” Hale explains.

Even in statistics courses, “the emphasis is on explanation and really understand what factors drive causes,” says Hale. That’s also key to dealing with data, he says, especially when the data will “intervene in a social system,” whether that’s an online social network, healthcare, employment or anything else.

Similar programs are cropping up in the U.S. The University of California at Berkeley announced plans in November 2018 for a new Division of Data Sciences, with the goal of connecting colleges and disciplines across campus around data literacy.

The new division has been quietly underway for several years, and in 2011 former chancellor Nicholas Dirks pushed the new Division of Data Sciences to become a campus priority. The program contains research units, undergraduate and graduate-level degree programs, and data science consulting and tutoring.

In the undergraduate degree program, students take courses on statistics, probability and techniques for data science. There are also “connectors” courses, such as Data Science and the Mind and also Children in the Developing World, that tie the data science curriculum to other corners of campus.

The program also includes a set of “human contexts and ethics” courses that tackle moral questions about data, urban data analytics, and even environmental and health development, all of which place the data science curriculum in contexts that students might deal with when analyzing data in the real world.

“You don't just throw algorithms at data. You need to look at it, understand how it was collected, and ask yourself: ‘How can I be responsible with the data and the people from which it came?’” says Cathryn Carson, a UC Berkeley historian with a background in physics who steered the committee tasked with designing the schools’ data-science curriculum.

The new division goes a step further than adding an ethics course to an existing program. “Computer science has been trying to catch up with the ethical implications of what they are already doing,” Carson says. “Data science has this built in from the start, and you’re not trying to retrofit something to insert ethics—it's making it a part of the design principle.”

Whose Ethics?

As more college and universities consider incorporating humanities courses into technical degree programs, some are asking what kind of ethics should be taught.

“What probably isn't being dealt with enough is the enormous ethical challenges that are part of our new technological world,” Dirks said in an interview with EdSurge. “I see this everywhere from reading about the way machine learning will reproduce stereotypes of different kinds, race and gender, and how everyone assumes a level of neutrality when it's a machine.”

But the Berkeley program isn’t built to instill a particular philosophy in students. “This is not about ethics per say, but about connecting data science across the curriculum,” Dirks said.

That might please practitioners like Madsen, who wants to see a more holistic approach to teaching and using data, but thinks the term “ethics” can be a bit vague.

“Most [data scientists] believe that they are making ethical decisions,” he says. “But what they missed is that their tool or program lives in a larger system.”