June 1998
Nonlinear Programming
Despite some drawbacks, the "all of the above" of optimization models fills an important role
By Stephen G. Nash
Nonlinear programming is the "all of the above" of optimization models. Its name, after all, only indicates that the model is "not linear" — hardly a confining category. Nonlinear programming is certainly a useful tool, since so many aspects of our world do not behave linearly. Doubling the dosage of a drug need not double its effectiveness. Assigning twice as many employees to a project does not guarantee that the project will be completed twice as fast.
Nonlinear models arise often in science and engineering. They have been used, for example, to determine how to clean up pollution in underground water systems. The model determines where to pump out contaminated water, and where to pump in clean water, so as to reduce the contamination to acceptable levels in either the least time or at the least cost. The constraints model the flow of water through the underground system, and also specify limits on how much water can be pumped in and out.
Another application is in the design of trusses. Here the model determines which support beams to include in the truss, and how thick they should be, so as to maximize the stiffness of the truss. The constraints in the model specify the locations and magnitudes of the loads that must be supported by the truss.
Although less common, nonlinear models can also arise in business applications, for example, in the management of investment portfolios. In this setting, the goal might be to determine a mix of investments so as to maximize return while minimizing risk. The nonlinearity in the model comes from taking risk into account.
Modeling of traffic also leads to nonlinearities, representing such notions as congestion and travel times. And gasoline blending models use nonlinear functions to represent volatility and octanequality measures.
Despite many successes, nonlinear programming has never been as popular as linear programming, even though nonlinear models are capable of much greater realism and subtlety. Why is this? There are a variety of reasons, coming from a variety of sources. Some of these are gradually disappearing, since they are based on transitory limitations in computer hardware and software. Others will never completely disappear, since they correspond to inherent mathematical limitations of the techniques.
Let's look at the easy reasons first.
Modeling Languages
It has never been difficult to input a linear program. Once the coefficients in the objective and the constraints are specified, then the problem can be handed to software. For decades there have been wellestablished input formats for linear programming, and software for linear programming has adapted to these formats.
Nonlinear programming is different. Nonlinear models come in many different forms, and there has never been a standard for how to specify the model. It might be represented as a subroutine that evaluates the model functions. (This is essential in cases where the functions are only defined in terms of other calculations.) But for simpler models, ones where it is easy to write down the formulas for the model, this is a cumbersome barrier to using the software. If a user starts with a linear model but wants to add a nonlinear term to it, then the input format for the entire model might have to be rewritten for the nonlinear software.
This obstacle to the use of nonlinear programming is disappearing as modeling languages (such as AMPL) become more sophisticated. By using these languages, it is much easier to express nonlinear models, and linear and nonlinear models can be expressed in a single format. Such languages are linked to the major nonlinear programming packages, and so (even as solvers change) the model can remain in a fixed input format.
Software for nonlinear programming often requires the derivatives of the functions in the model. (I'll explain why later.) So, in addition to having to specify the functions, the user must derive and program his or her derivatives. This is tedious, and it is easy to make mistakes that can cause the software to fail. 

Modeling languages also provide assistance here, either by computing approximations to the required derivatives, or by deriving and programming the derivative formulas as the model is processed.
If a modeling language is not being used, the tedium of derivative calculations can be alleviated through the use of software for "automatic differentiation" (such as ADIFOR). Such software analyzes the formulas for the nonlinear functions and generates software that will evaluate the derivatives. This is not the same as "symbolic differentiation" — the technique used in packages such as Mathematica. Automatic differentiation can be applied even in cases where the nonlinear functions are only defined in terms of other software. The model functions need not be expressed in "closed form."
These two resources (modeling languages and automatic differentiation) remove much of the tedium associated with specifying a nonlinear model. They can greatly simplify the task of preparing a model for the optimization software.
Hardware And Software Issues
An intermediate range of obstacles is associated with computer hardware and optimization software. In the past it was impractical to solve many optimization problems, either because the software could not reliably find a solution, or because the time required to solve the problems was too great.
We all know that computers are a lot faster. In the past, just evaluating the model functions at a single point might have been so timeconsuming that no thought was given to optimizing the model parameters. (If, say, the model required a simulation involving a difficult system of differential equations.) Today the speedups in computers might have altered the balance.
In tandem, optimization software has become more robust and efficient, particularly for the solution of large problems. A much greater collection of optimization problems can now be solved efficiently by "offtheshelf" software, where in the past it might have been necessary to try various packages, or experiment with different parameter settings or starting guesses to successfully optimize the model. It is still not possible to guarantee success first time, every time, but it is certainly true that a great many more problems, particularly large problems, can be routinely and reliably solved.
This all sounds so promising, suggesting that if we just wait a few years, then perhaps nonlinear programming might become as routine a modeling tool as linear programming. Well, it won't. And there are good, solid reasons why it won't.
Hard To Solve
Nonlinear programs can be very hard to solve. Every integer program can be written as a nonlinear program (for example, an integervalued variable x can be represented using the nonlinear constraint sin pi x = 0). Any system of differential equations can be represented as a nonlinear program. Nonlinear programming includes all of optimal control. Interiorpoint methods even interpret linear programs as nonlinear programs. To repeat, nonlinear programming is the "all of the above" of optimization models. It is unreasonable to expect "blackbox" optimization software to be able to solve all nonlinear programming problems. If a problem includes nonlinear constraints (unless an initial feasible point is provided), there is no guaranteed way to find a feasible point, or even to determine whether one exists. If all the constraints are linear, however, determining a feasible point is no more difficult than in linear programming.
If a method finds a "solution," then it may only be a local optimum, or in some cases only a stationary point. This is all that can reasonably be expected, since tools for analyzing the "global" behavior of general nonlinear functions do not exist.
In some situations, this is not a serious limitation. A good initial guess of the solution may be available. It may be known in advance that there is only one stationary point, and that it corresponds to the global optimum. Or it may be that a particular local optimum is what is desired, or at least is adequate. (If it is essential to find a global optimum, then it may be necessary to resort to probabilistic techniques such as simulated annealing; see the references at the end of this article for further information.)
Under certain conditions (for example, if the problem is "convex"), much stronger guarantees are possible. But it can be difficult to determine if a model satisfies these conditions, and so they are not always of practical value.
Some of these obstacles will never be overcome, but there is a danger of becoming obsessed by them. In truth, a great many problems can be solved using today's computers and software, and without a great deal of strain on the part of the modeler. There is always the risk of encountering a difficult problem, one that will overwhelm current capabilities, but this is no reason to shun nonlinear models.
Software Survey
Some of the available software packages for nonlinear programming are described in the accompanying survey. This survey was restricted to "full feature" nonlinear programming packages, meaning packages that accept a full range of models (nonlinear objective function, and nonlinear equality and inequality constraints). This omits many worthwhile pieces of software that only handle more restricted models. (Also, not all software providers responded to the survey, leading to further omissions.)
Because the design and development of nonlinear optimization software can be difficult, it is common for the latest techniques to appear first in more specialized packages. The most successful of these techniques gradually spread to the general packages. Many ideas are first explored with unconstrained problems, where only the objective function is present, and no constraints. For example, the only widely available nonlinear optimization software for parallel computers is restricted to unconstrained problems.
Specialpurpose software may be necessary when solving especially difficult optimization problems. Added information about a model can often be exploited to improve the performance of optimization algorithms. It can be confusing, however, to match the capabilities of specialpurpose software with the idiosyncrasies of a specialpurpose model. For example, a piece of optimization software might require that all the constraints be linear, or that the feasible region have a nonempty interior, whereas the modeler might be hunting for software for a reservoir model. Categories of models and categories of software do not always match up.
Those needing more sophisticated (and perhaps more specialized) optimization tools can obtain assistance over the Internet at NEOS, the NetworkEnabled Optimization System maintained by the Optimization Technology Center of Northwestern University and the Argonne National Laboratory. Optimization models can be submitted electronically to specific solvers, and the results are then transmitted back to the user. The NEOS site also contains guides to available software, technical information on algorithms, and other resources.
How Algorithms Work
The behavior of nonlinear programming software can be better understood if the basis for the methods is explained. The (firstorder) optimality conditions for a nonlinear program can be written as a system of nonlinear equations. The vast majority of algorithms apply some variant of Newton's method to this system of equations. That is, the nonlinear functions are approximated by linear functions, the solution of the linearized equations is used to get a new estimate of the optimum, and the whole process is repeated. The coefficients of the linearized system are derived from the derivatives of the nonlinear function; that is, they are based on a Taylor series approximation to the nonlinear system. What does all this imply? First, that derivative calculations are essential to the operation of the algorithms. And second, that only local solutions can be found (the coefficients of the linearized system depend on the derivatives at the current point, and the approximation is only of value "near" the current point).
If the current point is "close" to the optimum, then Newton's method can be expected to work well. However, the system of nonlinear equations only defines a stationary point, and so auxiliary techniques are required to guarantee progress towards an optimal point. These auxiliary techniques are essential to the performance of the algorithms at points "far" from the solution. These auxiliary techniques use the same derivative information that Newton's method uses, so they, too, construct (perhaps implicitly) a "local" approximation to the nonlinear functions, and this again places fundamental limits on what the algorithm can achieve. (The quotation marks here disguise various technicalities.)
Performance Guarantees
To get better guarantees of performance, the algorithm must have available "global" knowledge about the nonlinear functions. For example, these functions must be convex, or bounds on their derivatives must be provided. It is rarely possible to verify if such conditions are satisfied. If only local information is available (i.e., the values of the nonlinear functions and their derivatives) then only local guarantees are possible.
Not all nonlinear models are differentiable. What can be done in this case? Certain algorithms do not use derivative values, and so can be applied to nondifferentiable functions. However, to guarantee convergence, even to a stationary point, these methods must often assume that the problem is differentiable, even if the derivative values are not used. If the function is not differentiable (and especially if the function is not continuous), the value of the function at one point gives little or no information about the values of the function at nearby points. In such cases, weaker guarantees of convergence must be accepted.
All these cautionary statements may leave a bleak impression of nonlinear programming. You should not despair. Nonlinear programming includes such a broad range of models that warnings are to be expected.
A great deal of effort and ingenuity has been expended to develop Newton's method into a powerful and practical software tool. This has been especially true for large problems. Modern methods can approximate the performance of Newton's method even though they calculate and store only a small fraction of the derivative information required by Newton's method. Other major improvements have come from a better understanding of the role of the dual variables (Lagrange multipliers), and of how to use dual information to accelerate an algorithm. These advances are the basis of the improved performance and reliability of nonlinear optimization algorithms, and they provide ample justification for the increased use of nonlinear models in operations research.
These successes, together with the development of modeling languages and automatic differentiation, and of course with the improvements in computer hardware, have made nonlinear programming a valuable tool for many scientists and engineers.
References
1. Philip E. Gill, Walter Murray, and Margaret H. Wright, "Practical Optimization," Academic Press (New York), 1981.
2. David G. Luenberger, "Linear and Nonlinear Programming," AddisonWesley (Reading, Mass.), 1984.
3. Jorge J. More and Stephen J. Wright, "Optimization Software Guide," SIAM (Philadelphia), 1993.
4. Stephen G. Nash and Ariela Sofer, "Linear and Nonlinear Programming," McGrawHill (New York), 1996.
5. Andreas Griewank and George F. Corliss (editors), "Automatic Differentiation of Algorithms: Theory, Implementation, and Application," SIAM, (Philadelphia), 1991.
6. Christodoulos A. Floudas and Panos M. Pardalos (editors), "Recent Advances in Global Optimization," Princeton University Press (Princeton) 1992.
7. George P. Karatzas and George F. Pinder, "Groundwater Quality Management Using a 3d Numerical Simulator and a Cutting Plane Optimization Method," Computational Methods in Water Resources, X (1994), pp. 841848.
8. Martin Philip Bendsfe, Aharon BenTal, and Jochem Zowe, "Optimization Methods for Truss Geometry and Topology Design," Structured Optimization, 7 (1994), pp. 141159.
9. Harry M. Markowitz, "Meanvariance Analysis in Portfolio Choice and Capital Markets," Blackwell (New York), 1987.
Go to the accompanying survey
Stephen G. Nash is a professor in the Operations Research and Engineering Department at George Mason University in Fairfax, Va. He is also a computer scientist in the Information Technology Laboratory at the National Institute of Standards and Technology in Gaithersburg, Md.