Data Science: The Good, The Bad, and the Ugly

Waheed Bajwa (Electrical and Computer Engineering) and
Anand Sarwate (Electrical and Computer Engineering)

Data science is a hot field, and the term “machine learning” has moved into the popular culture. “Artificial intelligence” is no longer the subject of sci-fi movies alone: we regularly interact with “smart systems” which are powered by sophisticated learning and inference algorithms. There is no question that these systems have made great improvements in the efficiency of services and quality of life. On the flip side, the decisions made by machines reflect the biases (implicit or explicit) of their designers. The last ten years have truly been a “Decade of Discovery” in terms of advances in data collection and processing. But how can we navigate the potentials and pitfalls in the decades to come? In order to gain some critical perspective on these topics, people have to learn basic concepts of “data science,” much like we understand basic concepts of biology, chemistry, and physics. In this course, students will learn what goes into these algorithms, how they work, and how decisions in design can be reflected in the outputs. We will emphasize the fundamental questions that drive statistics, data science, and machine learning. By the end of the course, students should be able to use different perspectives (statistical, computational, social) to describe specific data-driven systems. Students will learn to do this through case studies on examples of machine learning and inference algorithms and will be exposed to topics of contemporary research, such as interpretability, fairness, bias, and privacy. Some examples will be drawn from the recent book Weapons of Math Destruction by Cathy O’Neil, which illustrates some of the dangers of machine learning. Others will come from more recent news and emerging work on Critical Data Studies.