Machine Learning and Ethnography: A Marriage Made in Heaven

Tatiana Gherman

by Tatiana Gherman
School of Business and Economics, Loughborough University, UK

Simply stated, the essence of operations research is the creation of models to support better decision-making. Although ‘modelling’ is regarded as being the key term here, it is essential that we do not prioritize modelling at the expense of the ‘better decision-making’ element. At the end of the day, modelling that does not unlock value to improve decision-making is, in practical terms, useless.

Nowadays, in the context of the exponentially-growing data, developed models are being automated using methods of machine learning, whose applications have an enormous ‘appetite’ for data. It is not too bold to say that over the past years, machine learning and predictive analytics together have been revolutionizing our society by transforming the growing data into predictions that support the decision-making process (Lee, Shin, & Realff, 2018). While it might be true that applying machine learning techniques to the decision-making process can translate into a competitive advantage, a lot can go wrong along the way, especially when dealing with emergent human dynamics in the data, which can lead to inaccurate predictions.

When we speak about machine learning algorithms, we generally imagine a lot of ‘crunching’ of data points; but it’s not just about a simple and brute computational force. From a purely cognitive computing perspective, the development of artificial models using machine learning techniques resembles the ability of human learning; in other words, it is the way to educating computers on how to perform complex tasks. The question is, can we teach algorithms to learn better? In machine learning, “a computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E” (Mitchell, 1997, p. 2). Machine learning needs large datasets to learn, which implies that rather than relying on statistically relevant samples, as much data as possible is instead collected and analysed (Butterworth, 2018). In other words, “machine learning aims to build programs that develop their own analytic or descriptive approaches to a body of data, rather than employing ready-made solutions such as rule-based deduction or the regressions of more traditional statistics. They do so through repeated trials, following each of which error is identified and fed back into the system, and adjusting the approach for each subsequent trial” (Lowrie, 2017, p. 4).

In order for machine learning algorithms to automatically learn from the existing data, data is trained to reduce prediction errors and then is tested for feature extraction. But to what extent exactly can prediction errors be diminished? A quite common problem has to do with ‘overfitting’, which occurs when a machine learning algorithm tries too hard to hit every data point exactly, adapting itself too much to the noise in the data. It is rather obvious that understanding the data to uncover the underlying causes for the fluctuations in the data is essential; this is even more relevant if, as mentioned before, we deal with emergent human dynamics. While computational techniques are continuously being developed to address these aspects, few research efforts are actually considering the potential brought about by a different kind of approach that by excellence is able to provide deep insights into human behaviour: ethnography.

The aim of ethnography is to provide a detailed description of the phenomena under study. It involves systematic research and analysis, grounded in evidence, and it can provide insights that can lead to new hypotheses or revisions of existing theory or understanding of social life. Ethnography can offer a richer understanding of the data, of the social context from which the data comes.

Generally speaking, machine learning and ethnography are conceptualized as polar ends of a research spectrum; nonetheless, there is more common ground than is obvious at first glance – in many ways, they have a shared purpose. As noted above, machine learning algorithms can go wrong and they often do go wrong. And sometimes the reason for going wrong resides not in technical details, but in the fact that not enough effort has been dedicated to understanding the social context from which data comes. In order to develop machine learning applications that work better for society, we must be able to understand what the society looks like from inside a particular context and articulate particular stances. Together, machine learning and ethnography can provide a more comprehensive picture of data, and can generate more societal value than each approach on its own (Charles & Gherman, 2018). As of today, mixed methods research that combine machine learning and ethnographic approaches is still rather scarce; but, as the discussion about the greater good in machine learning is heating up, this type of work will grow on a greater scale. There is a scope for expanding the common ground between machine learning and ethnography.

The future of machine learning is not just about crunching more data points, but instead it is about asking deeper and more insightful questions. International Data Corporation (IDC) predicts that the digital data created worldwide will grow from 4.4 zettabytes in 2013 to 44 zettabytes by 2020 and 180 zettabytes by 2025, still there is a lot of unexplored potential. As Rattenbury and Nafus (2018) elegantly stated in a recent interview with regards to the common ground between data science/machine learning and ethnography, “[…] there’s a lot of potential in collaborating to illuminate the systems that create data. Part of that potential […] will be realized by leveraging the different epistemological assumptions behind our respective approaches. For example, there is unquestionable value in using statistical models as a lens to interpret and forecast sociocultural trends—both business value and value to growing knowledge more generally. But that value is entirely dependent on the quality of the alignment between the statistical model and the sociocultural system(s) it is built for. When there are misalignments and blind spots, the door is opened to validity issues and negative social consequences, such as those coming to light in the debates about fairness in machine learning. There are real disconnects between how data-intensive systems currently work, and what benefits societies.”


 1. Butterworth, M. (2018). The ICO and artificial intelligence: The role of fairness in the GDPR framework. Computer Law & Security Review, 34, 257-268.
 2. Charles, V., & Gherman, T. (2018). Big Data Analytics and Ethnography: Together for the Greater Good. In A. Emrouznejad & V. Charles (Eds.), Big Data for the Greater Good (pp. 19-34). Studies in Big Data, 42, Springer International Publishing.
 3. Lee, J. H., Shin, J., & Realff, M. J. (2018). Machine learning: Overview of the recent progresses and implications for the process systems engineering field. Computers and Chemical Engineering, 114, 111-121.
 4. Lowrie, I. (2017). Algorithmic rationality: Epistemology and efficiency in the data sciences. Big Data & Society, January-June, 1-13.
 5. Mitchell, T. M. (1997). Machine Learning. Boston, MA: McGraw-Hill.
 6. Rattenbury, T., & Nafus, D. (2018). Data Science and Ethnography: What’s Our Common Ground, and Why Does It Matter?