Researchers Lift the Curtain Behind the “Black Box” of Data Broker Records: New Study Reveals Key Strengths and Weaknesses of Data Records


Ashley Smith
Public Affairs Coordinator

Researchers Lift the Curtain Behind the “Black Box” of Data Broker Records: New Study Reveals Key Strengths and Weaknesses of Data Records

Key Takeaways: 

  • Simply using age and gender data points may not be enough to create accurate digital profiles.
  • Basic demographic data from data brokers is probably not as accurate as when that data is combined with other data.


CATONSVILLE, MD, November 4, 2019 – It’s no longer news that our data is for sale. Data brokers often use online browsing records to create digital consumer profiles that are then sold to marketers as pre-defined audiences for targeted advertising. 

It is often assumed that the tools used to analyze and categorize customer data are so sophisticated that marketers can reliably fine-tune messaging and targeting. But new research from the INFORMS journal Marketing Science (Editor’s note: The source of this research is INFORMS) has revealed that the process for creating those digital profiles may not be as reliable as many may assume.

The study, to be published in the November edition of the INFORMS journal Marketing Science, is titled “Frontiers: How Effective Is Third-Party Consumer Profiling? Evidence from Field Studies.” It is authored by Nico Neumann of Melbourne Business School, Catherine Tucker of MIT and the National Bureau of Economic Research, and Timothy Whitfield of Burst SMS in Australia.

The researchers examined two basic demographic attributes (age and gender), and three distinct Internet user interest areas (sports, travel and fitness).  They analyzed data from more than 19 different data brokers, which resulted in more than 90 validated digital audiences of Internet users. And they conducted three distinct field tests.

“In general, the process which underlies the creation of user profiles and segments for targeting is a `black box,’ which creates challenges for understanding the reliability and the accuracy of digital profiles” said Tucker. “Furthermore, advertisers have little chance of assessing how accurate the profiles they are buying are.  

“In our first field test, we ran an online campaign in much the same way as an advertiser would run a campaign and assessed whether the ad was seen by the requested demographic segment,” said Tim Whitfield.  “In our second field test, we narrowed our focus and looked directly at whether data brokers are able to accurately determine the age and gender of a specific pair of eyeballs.  And in our third field test, we extended our data quality assessment from demographics to audience-interest segments.”

“In our first field test, we found that our ad was shown to the right demographic segment 59 percent of the time,” said Neumann.  “In our second field test, we found that data brokers basically were able to identify gender about the same as random chance. The third field test revealed that the accuracy of interest-based audiences is higher (72.8 -87.4 percent on average). However, this greater classification percentage seemed rather linked to the fact that the tested attributes occur very often in the population – for example there are many people who like sports in Australia and the US, so identifying someone who is interested in sports is not that hard.  “The relative improvement of using audience data versus randomly picking people is still overall disappointing across all our tests”, added Neumann. 

The three studies combined illustrate that it is important to consider the costs and benefits of using audience data for ad targeting. Because audience data leads to large extra expense, it may not provide a useful business case for every situation relative to untargeted advertising. For example, the average extra costs for display ad targeting based on purchased audience data are around 151%. However, in a best-case scenario the relative improvement in finding the right customer was only 123% (when comparing audience targeting versus random people selection).

However, the business case depends on the individual organization’s expertise and technology costs, the selected data brokers and media used. In particular, more expensive media (e.g. video advertising) is much more likely to result in positive benefit-cost trade-offs for the use of audience information purchased from data brokers. 


Link to full study.


About INFORMS and Marketing Science 

Marketing Science is a premier peer-reviewed scholarly marketing journal focused on research using quantitative approaches to study all aspects of the interface between consumers and firms. It is published by INFORMS, the leading international association for operations research and analytics professionals. More information is available at or @informs.


# # #




Tim O’Brien