Ashley Smith
Public Affairs Coordinator

New Audio Available for Media Use: Ethics in Data Collection and Management

BALTIMORE, MD, April 28, 2022 – New audio is available for media use featuring Rachel Cummings, an assistant professor of Industrial Engineering and Operations Research at Columbia University’s School of Engineering. She focuses her research on data privacy. This content is made available by INFORMS, the largest association for the decision and data sciences. All sound should be attributed to Rachel Cummings. What follows are 4 questions and responses. These responses were provided on April 27th, 2022.


Question 1: With all of the ways the major digital platforms have of collecting and monetizing data, are current systems and processes in place enough to ensure that the private data of users is protected in an ethical manner?

Time Cue: 0:27, Soundbite Duration: :41

“I want to start with the positives. Every major platform out there, and hopefully all of the smaller ones are going to have excellent security systems in place. This includes things like encrypted data storage on servers and secure communication channels. Companies are pretty good at this by now. Then there is data privacy, which is more nuanced and is very much an active research area. This is about making sure that statistics computed on the data of many users, or perhaps models learned on user data don’t leak information about individuals. Some companies have active teams working on these problems and implementing solutions, but some do not.”


Question 2: What are the major areas of concern when it comes to the management of data ethically?

Time Cue: 01:25, Soundbite Duration: 0:48

“The first one again, is data security, things like encryption, which is basically making sure people who are not supposed to see your data can’t see your data. Another important one is going to be data privacy. This is about making sure aggregate statistics and models — such as word prediction, recommendation engines, personalized ads and so on that are learned from user data don’t leak information about the individual. Much of my work is on this topic, and the notion of differential privacy. A related concern is content moderation on platforms and the line between free speech and misinformation. While this is not data management per se, after the recent announcement of Elon Musk buying Twitter, I expect we’ll hear a lot more on this topic in the coming months.”


Question 3: What codes or systems need to be in place to ensure that private data is handled and processed ethically?

Time Cue: 02:19, Soundbite Duration: :47

“First of all, we have privacy laws. However, these are often insufficient to ensure ethical treatment of data. For example, we have seen cases where legally anonymized databases that were shared that were later re-identified, meaning that it wasn’t actually private in the first place. And this means we have to look beyond just existing privacy laws. A comprehensive system to ensure ethical data privacy should include concerns like the privacy needs and expectations of users who are providing data, accuracy and ease-of-use requirements of other stakeholders, and the context of data use including type of data and purpose of the analysis.”


Question 4: How should ethical standards for the management of data be monitored and enforced?

Time Cue: 03:14, Soundbite Duration: 0:47 

“This is a tricky one, in part because privacy regulations are not one-size-fits all, and laws are purposely vague to accommodate this. For example, you’d probably expect different privacy protections if we’re talking about, say, your health information versus advertising data. This makes it difficult to integrate somewhat vague legal requirements such as “individuals shall not be identifiable” — with technical tools like machine learning or differential privacy.  Ideally, research will be done at the interface of privacy and law, but in the meantime, companies should have internal standards boards to decide how ethical standards can be met in their specific use case with their specific type of data being used.”

