Volume 25 (2013)

Book Reviews

Summer 2013; 25 (3)

This issue contains two reviews of books that approach the general topic of algorithms from different perspectives.  David Bader, professor at Georgia Tech’s College of Computing, reviews A Guide to Experimental Algorithmics, by Catherine McGeoch. This book, aimed at software developers, describes the practice of performance analysis and tuning of software. Topics include strategies for test-driven development, experimental designs, and statistical analysis techniques. Lance Fortnow’s The Golden Ticket: P, NP, and the Search for the Impossible is a book for general readers about complexity theory. The reviewer is Mike Trick, senior associate dean of Carnegie Mellon’s Tepper School of Business, and a widely-read blogger about operations research. The book attempts to introduce the nonspecialist reader to the field of algorithmic complexity and the problem of whether P=NP. To the extent that it is successful, it should help improve the general public’s understanding of the work that computational operations researchers do.

 
A Guide to Experimental Algorithmics, by Catherine C. McGeoch, Cambridge University Press, 2012.  See http://www.cambridge.org/.
Reviewed by:  David A. Bader, Professor, School of Computational Science and Engineering, Georgia Institute of Technology, bader@cc.gatech.edu.

The world is filled with many real-world challenges such as predicting the track of severe storms, distributing medicine efficiently in the Third World, quickly rerouting our automobile’s navigation system to avoid a traffic jam, managing one’s stock portfolio, and protecting ourselves against cyber-attacks. The role of the computer scientist is to design algorithms, i.e., step-by-step procedures to solve these problems, and to ensure that the algorithm will run fast for the user. Over the last few decades, the computer science theory community has in fact designed amazing algorithms that are provably efficient and darn good approximations when the absolute best answer is intractable to find. On the other hand, computer programmers often work in an ad hoc fashion, producing many lines of code using layers of abstraction to hide the complexity. As the programmer moves closer to the application, the merit of success is how quickly she can produce a bug-free program that solves the problem, and rarely does one have the luxury to evaluate alternate software designs. So what is missing? What is the bridge that connects over the growing gap between the theoreticians’ elegant algorithms and the application programmer’s code? How does the computer scientist evaluate algorithmic trade-offs, and engineer her algorithms for performance on the particular use cases?

In the new book A Guide to Experimental Algorithmics by Professor Catherine C. McGeoch, the Beitzel Professor of Technology and Society in the Department of Computer Science at Amherst College, Dr. McGeoch provides the first deep dive into the science of algorithms. Just as we study physical phenomena using the scientific method of theory and experiment, and then put the results into practice, Dr. McGeoch provides the tools one needs to rigorously make choices when designing empirical algorithms. There are many realities that seemingly get in the way of the clean theoretical performance, often given in worst-case “big-Oh” notation and parameterized by the size of the input of the problem. For instance, our memory subsystems have deep hierarchies where accessing an element in cache close to the CPU is fast (a few clock cycles), memory is a bit further away (hundreds of clock cycles), but going out to disk may be tens of thousands of clock cycles away. The hardware-software co-design of computer architectures has increased the complexity of our systems to hide the performance gap between our super-fast CPUs and the memory that has not kept up at the same rates. Caching, instruction-level parallelism, superscalar issue, prefetching from memory, branch prediction, thread-level parallelism, multicore processors, and so on, are all wonderful mechanisms to improve the performance of some codes, but may have disastrous effects on other implementations. Dr. McGeoch provides an overview of the memory hierarchy, I/O effects, and introduces multicore computing. Dr. McGeoch then illustrates algorithm engineering with clear examples demonstrating how one may choose among algorithms and their implementations. She discusses the essentials from how to measure performance, tuning algorithms and code, the experimental toolbox, and analysis techniques.

No one is more qualified than Dr. McGeoch to discuss this subject. She, along with David S. Johnson, cofounded the DIMACS Implementation Challenges in 1990, has published extensively in this area, cofounded (with Michael Goodrich) the Annual Workshop on Algorithm Engineering and Experimentation (ALENEX) sponsored by SIAM, and served as the editor-in-chief of the ACM Journal of Experimental Algorithmics from 2003 to 2008. This book concisely brings together her lectures and tutorials and rich experience in this area into a well-organized collection. In addition, a companion website Dr. McGeoch maintains, AlgLab, contains downloadable files, programs, and tools, for use in readers’ projects.

The table of contents of this book is as follows:

  1. Introduction
  2. A plan of attack
  3. What to measure
  4. Tuning algorithms, tuning code
  5. The toolbox
  6. Creating analysis-friendly data
  7. Data analysis.

In Chapter 1, the introduction makes the case for performing experiments on algorithms, and illustrates the point with a catalog of real examples where algorithm engineering yields stunning performance speedups between the theoretical algorithms and resulting running times. Key concepts are introduced, specifically discussing the differences between terms like “algorithm” and “program”. In this chapter, Dr. McGeoch also presents the algorithm design hierarchy that includes six levels of broad strategies from the literature for improving algorithm performance ranging from hardware and platform selection, system software, code tuning, implementation and algorithm tuning, algorithm and data structure design, to system structure. She illustrates this point with interesting case studies from cracking the RSA-129 encryption challenge, to my own computational biology effort of finding evolutionary histories, where multiple levels of the algorithm design hierarchy were employed in a multiplicative fashion to provide some incredible speedups to these near intractable problems. Dr. McGeoch concludes the chapter with the flow of the experimental process one should follow, in a cyclic way, through planning and executing experiments on algorithmic trade-offs, to systematically improve performance.

Chapter 2 describes the plan of attack when faced with a new challenge. The experimental design basics are given, with definitions of the types of parameters—i.e., knobs and dials––one has leverage over during the construction of the experiment, including the selection of input classes for testing the algorithm. For instance, the inputs may include stress-test inputs to reveal bugs and artifacts, worst- and best-case instances, random instances, structured random instances, real instances, hybrid instances, and even public test beds, when available. Each category has its merits and drawbacks, and these are discussed in detail regarding how to select the right classes for a given problem. Given so many variables, the book next provides techniques for reducing the design space and evaluating these design points. Projects are given in the back of the chapter that may help the reader gain more experience with these areas.

Now that one has a plan of attack and has chosen the design points to test, how does one measure performance? Chapter 3 provides an extensive guide to timing, including instrumenting code and figuring out what to measure in an experiment, and tools and utilities that are available. At the end of the chapter, Dr. McGeoch illustrates these techniques using the NP-hard problem of one-dimensional bin packing.

Chapter 4 turns a seemingly ad hoc topic of tuning algorithms and code into a rigorous methodology for improving performance. A number of useful techniques are given for reducing the number of instructions issued such as finessing a calculation, memoization, aborting loops early, filtering, and customizing data structures. Dr. McGeoch illustrates each of these points with case studies and code snippets that highlight these useful techniques and how they can be applied in general.  The chapter is a wealth of tuning techniques that gives a programmer concrete areas on which to focus in a methodical fashion.

Now we are ready to roll up our sleeves and work on a real program. Chapter 5 gives the basics for the design and development of testing harnesses and toolboxes, from building the test program to generating random inputs. As with many of the chapters in this book, the case studies and specifics are immediately useful in a wide range of algorithmic problems.

Chapter 6 helps the reader with the design of experiments that will give more insightful results from the measured outputs including running times and qualities. Dr. McGeoch catalogs variance reduction techniques (VRT) that reduce variance in the measured outcomes, on the theory that reducing variance yields better views of average case costs. Some of the VRTs include using common random numbers in multiple experiments, exploiting positive correlation to adjust an outcome by moving it closer to its expectation, and conditional expectation––sometimes called conditional Monte Carlo––whereby the experiment is split conceptually into two phases to generate a random state and random sample of the cost of that state.

In the final chapter, four main areas of data analysis are given: descriptive statistics, exploratory data analysis, graphical data analysis, and methods of inferential statistics. The chapter contains a detailed guide through the categories of data and how best to analyze each when given univariate and bivariate data sets. Especially useful are the techniques for transforming data for more insightful analysis, and the use of estimation and confidence intervals. This chapter is rich with examples to illustrate each data analysis method, and shows how these data analysis methods can be applied to the study of algorithms and on toward building performance models.

Overall, this is a very valuable book for every computer scientist’s and programmer’s bookshelf. It is useful for students and practitioners alike, and is accessible at all levels from serving as an undergraduate supplement to a basic data structures and algorithms course, to its use as the main text in a senior undergraduate to graduate course on the design of real-world algorithms. Even seasoned programmers would benefit from learning the experimental methods given in this book and may gain new insights into analyzing the performance of their algorithmic implementations. As computer architecture continues to increase in complexity with multicore and many-core processors, novel memory subsystems, new accelerators such as Intel Xeon Phi and NVIDIA’s graphics processing units (GPUs), and data-intensive computing systems for Big Data problems, the book will become even more valuable for every computer scientist and programmer.

The Golden Ticket: P, NP, and the Search for the Impossible, by Lance Fortnow, Princeton University Press, 2013.  See http://press.princeton.edu/.
Reviewed by:  Michael Trick, Senior Associate Dean, Tepper School of Business, Carnegie Mellon University, trick@cmu.edu.

Most mature fields of study have a few key concepts that have leaked over to the general population. Everyone knows E=mc2 from physics, or that hydrogen and oxygen combine to form water in chemistry, or evolution in biology (with varying levels of belief). While the knowledge may be superficial, it serves as shorthand for the field: people know something.

Operations research, particularly in its overlap with computer science, is much younger than those fields, so we do not yet have that touchstone of knowledge. And what would that concept be? Presumably not polyhedral dimension, cutting plane algorithms, convergence rates or other topics, however fascinating we may find them.

One key concept that is important for the general public, and unites many of us in operations research, is the concept of algorithmic efficiency, particularly as exemplified by the P=NP issue. Although the formal definition is quite complicated, involving concepts like nondeterministic algorithms, the fundamental idea is simple: if you have a problem where you can check a solution quickly, can you also find a solution quickly? This concept is at the heart of much of what we do, and the lack of understanding of this concept is a great divide between the general population and our field. When I tell people that I schedule sports leagues (using integer and constraint programming methods), the normal response is, “So you tell the computer what to do,” which is somewhat deflating. When trying to solve NP-hard problems, “telling the computer what to do” only scratches the surface of what needs to be done.

Lance Fortnow, professor and chair of the School of Computer Science at the Georgia Institute of Technology, attempts to bridge the gap between our field and the rest of the world with The Golden Ticket: P, NP, and the Search for the Impossible. This is a book aimed not at specialists but at a reader who wants to understand what the fuss is about in “telling the computer what to do.” Why does the P versus NP problem matter?

The title of the book comes from the story of Willy Wonka and his chocolate factory. Wonka has hidden a small number of “Golden Tickets” among a large number of candy bars. Those who find a ticket would receive a lifetime supply of candy. The “Golden Ticket” can be seen two ways: In optimization, optimal solutions are like the Golden Ticket in that they are hidden among a huge number of non-golden items. And solving the P versus NP issue would certainly be a golden ticket for any researcher!

The initial chapter of the book introduces the concepts of P and NP in completely nontechnical terms: “NP is the collection of problems that have a solution that we want to find. P consists of problems to which we can find a solution quickly.” Some examples are given: the traveling salesman problem and the partition problem as NP-hard problems; shortest path as an exemplar for P. Fortnow then gets to the crux of the matter:

          Suppose we could simply describe a task and immediately have a program that provided that functionality. … Suppose whatever
          we can recognize we can find. We can if P=NP.

I think many outside of our field think that is how computers work. Chapter 2 goes into detail of what the world would be like if that really were to be the case.  Fortnow describes a fanciful future where, after an initial proof of P=NP, successive improvements in algorithms results in the “Urbana algorithm”: a practical computer code that takes an NP problem and spits out a P solution. Clever bootstrapping allows the Urbana algorithm to even work on itself, creating better and better versions of itself. The world is changed. DNA analysis allows for optimal protein design to cure cancer; baseball games rely on exact weather prediction from a year before (mentioning my sports scheduling company in passing: we would end up out of business in a world of P=NP because no one would need us); 3D rendering allows people to view games from any angle; food tastes better, etc. While you can argue about the exact effects (no matter how fast your algorithm, you still need data, making year-in-advance weather prediction a little optimistic even if P=NP), it is clear that the world will be a much different place. And not all of that is good. The gap between P and NP allows encryption, so other methods need to be used in a world of P=NP. And the number of workers displaced by the Urbana algorithm leads to all sorts of disruptions.

Chapter 3 goes into much more detail about how algorithms work but looking at problems on a “frenemy graph”: a graph of friendships (and enemies) in a world where every pair of people is either a friend or an enemy. Through this graph, and related graphs, Fortnow is able to introduce standard problems such as cliques, Hamiltonian path, map coloring, and max-cut. He can clearly get across the idea that solutions to these problems are easy to check, but seem not straightforward to find, providing a more nuanced characterization of the divide between P and NP than occurred in Chapter 1. It is only here that the phrase “nondeterministic polynomial” appears, hopefully at a time where readers will no longer be scared away. He also talks of polynomial algorithms: shortest paths, matching, Eulerian paths, and minimum cut.  This is an extremely effective way of getting across the frustration of P and NP: why can’t the first set of problems be solved as quickly as the second set? There seems no obvious reason why clique, say, is hard, whereas shortest path is easy. Curiosity about that difference has launched thousands of careers in our field.

In Chapter 4, we get to the celebrated work of Steve Cook, who, on May 4, 1971 in Shaker Heights, Ohio, presented work that showed that satisfiability is, in a well-defined sense, as difficult as any problem in NP. If you could solve satisfiability quickly, every other problem in NP could also be solved quickly. Richard Karp then showed that many other problems had the property that if you solved them quickly, satisfiability could be solved quickly. And the cottage industry of NP-completeness proofs was born. Fortnow outlines some of the more appealing NP-complete problems (Sudoku, the game “Minesweeper,” and so on), and moves on to problems that aren’t NP-complete (or not known to be): graph isomorphism, factoring, and operations researchers’ favorite (but in an unsatisfying way) P problem: linear programming.

But these issues did not begin with Cook. Most of us know that in the early 1960s, Jack Edmonds, in “Paths, Trees, and Flowers” discussed the polynomial/exponential divide in algorithms and pointed out the need for a formalism of this issue. Many of us are likely less familiar with Russian work from that era that discussed “perebor”:  brute force search.  When must all possibilities be tried to find the best solution? In Chapter 5, Fortnow discusses Western and Eastern approaches, covering concepts by Turing, Edmonds, and Cobham in the west and Yablonsky, Kolmogorov, and Levin in the east.

But what do we do about NP-complete problems? Chapter 6 covers a number of approaches. For an operations researcher, it is somewhat disappointing to have the richness of our approaches to hard problems reduced to a page (“something called the cutting plane method”). And for Fortnow to state “Don’t expect … to find optimal traveling salesman tours on 20,000 cities any time in the near future” does seem to ignore the work of Bill Cook, his colleagues, and others, who have found optimal tours for instances with up to 85,000 cities (albeit with structured instances and a somewhat unusual amount of computing power). Cook’s new book, In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation, aimed at the same audience as Golden Ticket, provides much more insight into how people exactly solve NP-hard problems of practical size in a reasonable amount of time.

Heuristics and approximation approaches receive more space in Chapter 6, with examples drawn from the frenemy graph problems of Chapter 3.

Chapter 7 discusses issues in proving whether P is equal to NP. This begins with “undecidability,” something that looks like it might be even harder to work with than the polynomial/exponential divide. Fortnow discusses the relationship between circuit complexity and the P versus NP problem. Though it looks close, that approach has not panned out, and Fortnow clearly outlines the difficulties in this direction. Again, it would have been interesting to look at the more “operations research” approaches to the P versus NP issue that often involved ever more complicated linear programming formulations for hard problems, but the technical demands of outlining these “results” are beyond the scope of the book.  Fortnow ends the chapter by noting the lack of a path toward resolving the problem: at this point it is hard to know where to go!

The final two chapters are somewhat more specialized, about cryptography and quantum computing, respectively. The quantum chapter in particular is interesting for its clear description of quantum computing and the effect of quantum computing on complexity classes. Having a workable quantum computer would not, as known so far, allow the solution of NP-complete problems in polynomial time: the best known speedup only changes the complexity of an algorithm by taking the square root of its running time. It would, however, affect problems like factoring, which closely relates the final two chapters.

In the conclusions, Fortnow brings up some issues for the future, highlighting parallel computing and big data.

This book is a short (160 pages or so excluding footnotes and index) and extremely readable introduction to the P versus NP issue. The book is accessible and useful for practically anyone from smart high school students to specialists. Although I wish there was a bit more emphasis on how we can currently solve practical instances of NP-Hard problems to optimality, perhaps the interest sparked by this book will be the “Golden Ticket” for further accessible work in this area. And perhaps P=NP will start to become as famous as E=mc2.

Winter 2013; 25 (1)

In this issue, we review three books. The first uses the paradigm of the traveling salesman problem to bring the limits of computation to life for algorithm design and analysis. The reviewer, Gábor Pataki, received his Ph.D. in Algorithms, Combinatorics, and Optimization from Carnegie Mellon University in 1996. He joined the faculty of the Department of Statistics and Operations Research, University of North Carolina at Chapel Hill, in 2000. His research is in integer and convex programming with recent papers on the complexity of the classical branch-and-bound algorithm, the closedness of the linear image of a closed convex cone, and semidefinite programs that behave badly from the standpoint of duality.

In Pursuit of the Traveling Salesman: Mathematics at the Limits of Computation, by W. J. Cook, Princeton University Press, 2011. See http://press.princeton.edu/.
Reviewed by: Gábor Pataki, Department of Statistics and Operations Research, University of North Carolina Chapel Hill, gabor@unc.edu.

Among his publications, Gábor wrote "Teaching Integer Programming Formulations Using the Traveling Salesman Problem" [SIAM Review 45-1, 2003], directly applicable to the essence of this book. His review is both informative and insightful. He concludes, "the next time a student asks me 'Why are you guys so crazy about research?', I will just answer, 'Read this book."'

The second book deals with e-commerce. The reviewer, Steve Kimbrough, is Professor of Operations and Information Management at The Wharton School, University of Pennsylvania. He is a recognized expert in many things centered around concepts and methods for knowledge-based decision support, including e-commerce. As early as 1997, Steve was invited to be the keynote speaker at the SOBU Symposium on Electronic Commerce, Tilburg University. His work with DSS is extensive, including heading up the Coast Guard's Knowledge-Based Support Systems project for ten years, as well as several other related sponsored research projects. Within e-commerce, he has a long, ongoing research stream applying formal logic to business messaging with the aim of developing a formal language for business communication.

Interactive Decision Aids in E-Commerce, by Jella Pfeiffer, Physica-Verlag, 2012. See http://www.springer.com.
Reviewed by: Steven O. Kimbrough, University of Pennsylvania, kimbrough@wharton.upenn.edu.

His teaching and research continue to include this growing area, making him eminently qualified to review this book. Steve points out, "The book focuses on understanding complex, multi-attribute choice tasks in the online consumer purchasing context." He concludes, "I believe that this book has its greatest value as a resource and point of departure for those with an interest in doing research in this worthy area."

The third book deals with finance models, which uses Matlab for exercises. The reviewer is James Morris, Emeritus Professor of Finance in the School of Business at the University of Colorado Denver. After receiving his Ph.D. in Finance from the University of California, Berkeley, Jim joined the faculty at The Wharton School, followed by a range of visits in the U.S. and Europe. Joining UCD in 1982, he continued to teach financial modeling and publish in a variety of journals. During 2006-09, Jim was Director of the UCD M.S.-Finance Program and offers us a perspective as teacher, researcher, and curriculum designer.

Monte Carlo Simulation with Applications to Finance, by Hui Wang, Chapman & Hall, 2012. See http://www.crcpress.com/.
Reviewed by: James Morris, University of Colorado Denver, James.Morris@ucdenver.edu.

He notes that there could be a problem with finance majors in a business school who have insufficient mathematical background and mathematics majors who have insufficient knowledge of options, but he concludes, "In summary, Monte Carlo Simulation with Applications to Finance is well done."

Download PDF to read the full reviews.

 

Citation information Greenberg HJ, ed. (2013) Book reviews, INFORMS J. Comput. 25(1) doi:10.1287/ijoc.2013.1.br, http://www.informs.org/Pubs/IJOC/Book-Reviews/Volume-25-2013.