Game Theory and Machine Learning

Shreya Gupta

ˆby Shreya Gupta
Ph.D. Candidate in Operations Research and Industrial Engineering, University of Texas at Austin

With the onset of the New Year, it makes sense to review what has been accomplished in the past year; here, I briefly review three papers on machine learning and game theory. Many machine learning methodologies have been explored by either casting them into a game-theoretic framework or by using game theory on top of the existing machine learning framework. Popular examples include Generative Adversarial Nets (GANs) (Goodfellow et al., 2014) which correspond to a minimax two-player game between the generator and discriminator networks, hard margin support vector machine which can be modeled as a two-player zero-sum game (Aiolli et al., 2008), linear regression being modeled as a non-cooperative game (Ioannidis and Loiseau, 2013), and Adaboost (Freund and Schapire, 1997) which uses game-theory for online learning. The variety of algorithms linking game theory and machine learning is vast and filled with fast moving trends (e.g, GANs); this review takes the road less travelled by first highlighting two new and interesting algorithms with diverse applications and finishing with a discussion on employing game theory for fairness.

First consider a Preference Learning (PL) algorithm, a machine learning methodology that involves learning preferences for previously unseen items (Polato and Aiolli, 2018). Predicting preferences is done by creating a rank ordering of pairs of preferences for every pattern. Aiolli and Polato highlight how a PL problem can be converted to “a two player zero-sum game where the row player P (the nature) picks a distribution over the whole set of training preferences (i.e., rows) aiming at minimizing the expected margin. Simultaneously, the opponent player Q (the learner) picks a distribution over the set of preference-feature pairs (i.e. columns) aiming at maximizing the expected margin (payoff).” An approximation method that iteratively samples a subset of columns from a large game matrix is also proposed for the optimal strategy. In today’s age of online shopping, online entertainment, online…everything, applications of preference learning is essential for such companies as Amazon, Netflix and Spotify to thrive.

Figure 1: Spotify and Netflix trying to learn my preferences.

Lexical Link Analysis (LLA) is a data-driven text analysis proposed by Zhao and Zhou (2018) as a game-theoretic framework for identifying high-value information. LLA treats words as nodes and bi-grams, or word pairs, as the link between these nodes. LLA has three categories of high-value information and value metrics which are combined as the total value of information derived from the text: authoritative or popular themes, emerging themes and anomalous themes. While in social networks the popular themes are of most interest, the authors point out that emerging and anomalous themes are of higher interest in LLA due to their correlation with innovation. Deriving emerging and anomalous information from text is advantageous since these themes can develop into popular information in the future. Thus, high-value information is information that has potential to grow; this is where game theory comes in. The authors suggest a two-player framework where the information provider is one player and the rest of the world, representing the second player, responds to the information generated by the first player. In this game-theoretic approach, in addition to Nash equilibrium, another factor also important for identifying high-value information in a multi-layer network of players is that the whole system has to be Pareto efficient; i.e. the system cannot make a player better off without making another player worse off. Any application that relies heavily on text data such as user comments and descriptions in online marketplaces and hospitality services, doctor and nurse reports in electronic medical records, etc, could utilize LLA to identify high-value information. This high-value information could be discovering emerging topics of discussion in social media, proactively alleviating issues as they emerge in hospitality services, predicting product demands for out of stock or non-existent items in online marketplace inventories, and many more!

Figure 2: Publicly available social media content, that is made available by the informed consent of users, is a great resource for analyzing emerging trends.

Finally, I would like to discuss the relationship between fairness and game theory. First, what is fairness? The NIPS 2017 tutorial on fairness by Solon Bacrocas and Moritz Hardt (found at ) describes fairness as “understanding and mitigating discrimination based on sensitive characteristics, such as, gender, race, religion, physical ability, and sexual orientation.” The tutorial warns that building algorithms without understanding social context, having less data for minority groups, or using a biased sample for modeling can lead to unintentional discrimination being learned by machine learning algorithms. While you can learn more about fairness from this tutorial, I want to focus your attention on the important question in Patel (2018): how do we measure and ensure an algorithm is fair? Patel (2018) helps to answer this question by exploring if fairness in algorithms should be 1) treated as a cost in the algorithm or 2) measured by an unfair prediction on the party for whom the prediction was made. Patel encourages thinking beyond the traditional goal of reduction of predictive errors in order to reduce biases embedded in data and ensure algorithmic fairness while maintaining high accuracy. To ensure algorithmic fairness, interventions can be introduced at different points in the pipeline (Friedler et al.). Pre-processing approaches assume the biases are in the training data whereas post-processing approaches work to improve interpretability and transparency so as to avoid an unfair impact on any group. Patel focuses on post-processing methods and points out that the concept of fairness in economics, revolving around the division of resources and ensuring Pareto-optimality (no one is worse off because of the decision (Corbett-Davies and Goel, 2018; Liu et al., 2018), can be extended to machine learning. To do so, the author utilizes the Nash Welfare Product (Kaneko and Nakamura, 1979, Caragiannis et al., 2016; Venkatasubramanian and Luo; Hu and Chen, 2018), a popular concept in welfare economics which attempts to combine the utility functions of every member of the society (the subjects for whom the predictions are being made) and the utility of the institution, making the predictions into a “joint product and seeks to push all towards an equilibria” (Patel, 2018; Kaneko and Nakamura, 1979). The author justifies that Nash Welfare is seen as a halfway solution between utilitarian models (“measures the welfare of a society by the sum of the individuals’ utilities” (Stark et al., 2014)) and Rawlsian welfare models (“measures the welfare of a society by the wellbeing of the worst-off individual (the maximin criterion)” (RAWLS, 2009)). This “halfway” model helps achieve both minimizing loss in prediction accuracy and maximizing utility of the individuals and groups who are subject to the prediction.

Figure 3: “The common FICO threshold of 620 corresponds to a non-default rate of 82 %. Rescaling the x axis to represent the within-group thresholds (right), Pr[Ŷ= 1|Y= 1,A] is the fraction of the area under the curve that is shaded. This means black non-defaulters are much less likely to qualify for loans than white or Asian ones, so a race blind score threshold violates our fairness definitions.'' Hardt et al. (2016).

In conclusion, the applications of machine learning and game theory continue to expand as fields such as digital entertainment, online shopping, and service providers like hospitals seek out innovative strategies to extract customer preferences and predict new and changing trends; all while trying to ensure fairness. As the expectations of such industries evolve, it is crucial we look beyond the popular methods of the field in search for the best, most fair solutions and draw philosophically from other fields. Fairness is a concept which directly benefits from this line of thinking; the idea in Patel (2018) of drawing from economics is an innovative attempt at combining ideas and concepts from socio-economic sciences with that of machine learning while Bacrocas and Hardt encourage a similar appproach to modeling research and development of the aforementioned tutorial. Finally, in bringing to you these ideas and papers, I might have missed out on other algorithms which also deserve to be highlighted. If there are new algorithms and modeling approaches out there that you think deserve attention, please feel free to reach out to me on LinkedIn.


 Aiolli, F., Da San Martino, G., and Sperduti, A. (2008). A kernel method for the optimization of the margin distribution. In Kůrková, V., Neruda, R., and Koutník, J., editors, Artificial Neural Networks - ICANN 2008, pages 305–314, Berlin, Heidelberg. Springer Berlin Heidelberg.

 Caragiannis, I., Kurokawa, D., Moulin, H., Procaccia, A. D., Shah, N., and Wang, J. (2016). The unreasonable fairness of maximum nash welfare. In Proceedings of the 2016 ACM Conference on Economics and Computation, EC ’16, pages 305–322, New York, NY, USA. ACM.

 Corbett-Davies, S. and Goel, S. (2018). The measure and mismeasure of fairness: A critical review of fair machine learning. CoRR, abs/1808.00023.

 Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119 – 139.

 Friedler, S. A., Scheidegger, C., Venkatasubramanian, S., Choudhary, S., Hamilton, E. P., and Roth, D. A comparative study of fairness-enhancing interventions in machine learning. arXiv preprint.

 Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., and Bengio, Y. (2014). Generative adversarial nets. In Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems 27, pages 2672–2680. Curran Associates, Inc.

 Hardt, M., Price, E., and Srebro, N. (2016). Equality of opportunity in supervised learning. CoRR, abs/1610.02413.

 Hu, L. and Chen, Y. (2018). Welfare and distributional impacts of fair classification. CoRR, abs/1807.01134.

 Ioannidis, S. and Loiseau, P. (2013). Linear regression as a non-cooperative game. In Chen, Y. and Immorlica, N., editors, Web and Internet Economics, pages 277–290, Berlin, Heidelberg. Springer Berlin Heidelberg.

 Kaneko, M. and Nakamura, K. (1979). The nash social welfare function. Econometrica, 47(2):423–435.

 Liu, L. T., Dean, S., Rolf, E., Simchowitz, M., and Hardt, M. (2018). Delayed impact of fair machine learning. CoRR, abs/1803.04383.

 Patel, A. (2018). Fairness for whom? critically reframing fairness with nash welfare product. CoRR, abs/1810.08540.

 Polato, M. and Aiolli, F. (2018). Interpretable preference learning: a game theoretic framework for large margin on-line feature and rule learning. Association for the Advancement of Artificial Intelligence.

 RAWLS, J. (2009). A Theory of Justice. Harvard University Press.

 Stark, O., Jakubek, M., and Falniowski, F. (2014). Reconciling the rawlsian and the utilitarian approaches to the maximization of social welfare. Economics Letters, 122(3):439 – 444.

 Venkatasubramanian, V. and Luo, Y. How much income inequality is fair? nash bargaining solution and its connection to entropy. arXiv preprint.

 Zhao, Y. and Zhou, C. C. (2018). A game theoretic lexical link analysis for discovering high-value information from big data. In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pages 621–625.