Mixed Integer Programming and Machine Learning

Rahul Swamy

by Rahul Swamy
Ph.D. candidate in Industrial Engineering, University of Illinois at Urbana-Champaign

In today’s world, advancements in computing power augmented by the availability of data has ushered in an unprecedented ability to transform data into useful insights. Machine learning (ML) has earned its place as a quintessential tool in any data scientist’s toolbox. Mixed Integer Programming (MIP) has provided a long-standing framework for solving large NP-Hard problems to theoretically-proven optimality and has revolutionized many industries such as logistics and transportation. Even though both ML and MIP share a common trait - using data to influence decision making - they have [for the large part] been studied by different research communities. Squarely in the interface between computer science and operations research, interesting problems have emerged when studying ML and MIP together.

The idea behind ML had existed since 1950s but found prominence much later with the merging of statistical tools, continuous optimization and big data. With the boom in Artificial Intelligence, present day ML has transformed into variations of Deep Neural Networks (DNNs) and has found success in applications such as in personal assistants, product recommendations, spam filtering, among others. For MIPs, the classical branch-and-bound technique has seen remarkable speed-ups over the last half a century with theoretical breakthroughs in cutting plane theory, decomposition techniques, column generation, among others, in tandem with efficient implementations in commercial solvers. What were once considered “unsolvable” problems can now be solved in a matter of seconds. For example, the branch-and-cut-based solver Concorde TSP can solve TSP instances with more than 85,950 cities as reported in Applegate et al. (2009). At the outset, ML and MIP may seem to have different objectives. ML predicts while MIP solves. However, there are fundamental problems in both the fields that cross-disciplinary research can be benefit each other.

Continuous optimization forms the crux of (supervised) ML, with the training phase using a fraction of the data to learn (optimize) for certain model parameters. However, the use of discrete optimization and its solution strategies has been relatively underexplored in ML literature. Misquoting a visionary who once shot for the moon, “Ask not what ML can do for MIP - ask what MIP can do for ML.” In that spirit, some of the MIP for ML works are highlighted here.

Support vector machines (SVMs) are popular classification techniques in ML. Early work by Bennett and Demiriz (1999) posed semi-supervised SVM as an Integer Program and solved it using CPLEX. For the classic SVM problem with ramp-loss minimization, Belotti et al. (2016) present a reformulation to tackle the big-M constraints. Üney and Türkay (2006) provide an MIP approach for a general multi-class classification problem using hyper boxes to partition the data. Word alignment is a key component of language translation tools. Lacoste-Julien et al. (2006) solve this problem by framing it as a Quadratic Assignment problem. DNNs have gained popularity in recent years. As a step towards modeling DNNs using MIP, Fischetti and Jo (2018) present an MILP model with application to feature visualization and adversarial ML. The latter is a natural setting since it needs optimal solutions that “fools” the DNN by overfitting it.

On the other hand, if one indeed asked what ML can do for MIPs, there are answers to that as well. The conventional branch-and-bound procedure involves a host of parameters that are typically tuned ad-hoc to the needs of specific [classes of] problems.

These parameters decide what the next step should be at a certain node in the branch-and-bound tree. Recently, several papers explored how ML can be used to tune these parameters that decide different aspects of branching and bounding - Kruber et al. (2017) on deciding whether Dantzig-Wolfe decomposition should be used or not, Khalil et al. (2016) on which variable to branch on, Alvarez (2016) on branching decisions specific to approximations to strong branching and Khalil et al. (2017) on whether a primal heuristic should be used or not. The idea here is to predict whether making a decision at a node will improve the overall run-time of the algorithm. The results from these works provide promising directions for not only generic MIP solvers, but also for exploring their success in specific classes of problems.

At his recent plenary talk at EWGLA 2018, Prof. Andreas Lodi from Polytechnique Montreal posed an interesting perspective that ML and optimization are “two-sides of the same coin”. There is a dual nature to an algorithm that solves an optimization problem and the parameters that can make it efficient, and there is a need for frameworks that integrate ML and optimization. While this article is nowhere close to a literature review, the goal is to touch upon some of the opportunities in the intersection of ML and MIP. There are many more interesting open challenges that must be of interest to both communities and the future will tell.


Marcos Alvarez, Alejandro, Louis Wehenkel, and Quentin Louveaux. "Online learning for strong branching approximation in branch-and-bound." (2016).  
David L. Applegate, Robert E. Bixby, Vašek Chvátal, William Cook, Daniel G. Espinoza, Marcos Goycoolea and Keld Helsgaun, "Certification of an optimal TSP tour through 85,900 cities." Operations Research Letters 37.1 (2009): 11-15.  
Pietro Belotti, Pierre Bonami, Matteo Fischetti, Andreas Lodi, Michele Monaci, Amaya Nogales-Gómez and Domenico Salvagnin, "On handling indicator constraints in mixed integer programming." Computational Optimization and Applications 65.3 (2016): 545-566.  
Kristin P. Bennett, and Ayhan Demiriz. "Semi-supervised support vector machines." Advances in Neural Information processing systems. 1999.  
Simon Lacoste-Julien, Ben Taskar, Dan Klein and Michael I. Jordan, "Word alignment via quadratic assignment." Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, 2006.  
Elias B. Khalil, Pierre Le Bodic, Le Song, George Nemhauser and Bistra Dilkina,"Learning to Branch in Mixed Integer Programming." AAAI. 2016.  
Elias B. Khalil , Bistra Dilkina, George Nemhauser, Shabbir Ahmed and Yufen Shao,"Learning to run heuristics in tree search." Proceedings of the international joint conference on artificial intelligence. AAAI Press, Melbourne, Australia. 2017.  
Markus Kruber, Marco E. Lübbecke, and Axel Parmentier. "Learning when to use a decomposition." International Conference on AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems. Springer, Cham, 2017.  
Fadime Üney and Metin Türkay. "A mixed-integer programming approach to multi-class data classification problem." European journal of operational research 173.3 (2006): 910-920.