Volume1 No. 3 Geoffrion

STRUCTURED MODELING: SURVEY AND FUTURE RESEARCH DIRECTIONS

For your convenience we make available the PDF version of the originally published in ITORMS in HTML styles

Arthur M. Geoffrion

May 20, 1996 Revised June 1, 1999

Abstract

This is an updated version of an article by the same name that appeared in ORSA CSTS Newsletter, 15:1 (Spring 1994). It surveys recent research and research opportunities in structured modeling. After first reviewing the basics of structured modeling, we discuss improved modeling languages, approaches to model integration, extensions designed for simulation modeling, and three topics falling under implementation strategies and technologies, namely host software, language-directed editors, and language-based customization of specialized application packages. In addition to describing completed work and work in progress in these areas, we highlight numerous attractive research opportunities.

IMPORTANT NOTE:

The references in the body of this article are hyperlinks to the on-line, companion paper "An Informal Annotated Bibliography on Structured Modeling" (also published inITORMS). Once you click on a reference, you will go to that other paper and there will be no on-screen link back to the present paper except on the title page; normally you must use your browser's "Back" function to return to the present paper.

Acknowledgments

I am indebted to many people for valuable comments, most especially to Chris Jones, Melanie Lenard, Waleed Muhanna, Laurel Neustadter, Richard Ramirez, and Fernando Vicuña.

Full Article

1. REVIEW OF STRUCTURED MODELING


In preparation for what follows, this section presents a brief review of the fundamentals of structured modeling.

Structured modeling was developed as a comprehensive response to perceived shortcomings of modeling systems available in the 1980s. It is a systematic way of thinking about models and their implementations, based on the idea that every model can be viewed as a collection of distinctelements, each of which has a definition that is either primitive or based on the definition of other elements in the model. Elements are categorized into five types (so-calledprimitive entity,compound entity,attribute,function, andtest), grouped by similarity into any number of classes calledgenera, and organized hierarchically as a rooted tree ofmodulesso as to reflect the model's high-level structure. It is natural to diagram the definitional dependencies among elements as arcs in a directed acyclic graph. Moreover, this dependency graph can be computationally active because every function and test element has an associated mathematical expression for computing its value.

Using a model for any specific purpose involves subjective intentions. Structured modeling makes a sharp distinction between the resulting user-defined "problems" or "tasks" associated with a model, and the relatively objective model per se. A typical problem or task has to do with ad hoc query, drawing inferences, evaluating model behavior with specified inputs, determining a constrained solution, or optimization, and requires applying a computerized model manipulation tool ("solver"). For certain recurring kinds of problems and tasks, these tools are highly developed and readily available for incorporation into a structured modeling software system.

The theoretical foundation of structured modeling is formalized in Geoffrion [1989a], which presents a rigorous semantic framework that deliberately avoids committing to a representational formalism. The framework is "semantic" because it casts every model as a system of definitions styled to capture semantic content. Ordinary mathematics, in contrast, typically leaves more of the meaning implicit. 28 definitions and 8 propositions establish the notion of model structure at three levels of detail (so-called elemental, generic, and modular structure), the essential distinction between model class and model instance, certain related concepts and constructs, and basic theoretical properties. This framework has points in common with certain ideas found in the computer science literature on knowledge representation, programming language design, and semantic data modeling, but is designed specifically for modeling as practiced in MS/OR and related fields (Sec. 4 of Geoffrion [1987]).

An executable model definition language called SML (Structured Modeling Language) fully supports structured modeling's semantic framework (Geoffrion [1992a]). Other languages for structured modeling also exist, as noted later. SML can be viewed in terms of four upwardly compatible levels of increasing expressive power. The first level encompasses simple definitional systems and directed graph models. The second level covers more complex extensions of these, spreadsheet models, numeric formulas, and propositional calculus models. The third level encompasses mathematical programming and predicate calculus models with simple indexing over sets and Cartesian products. Finally, the fourth level covers sparse versions of the above plus relational and semantic database models.

The following figures from Geoffrion [1987] show an SMLschema(third level) specifying the general structure of the classical feedmix model, and sample SMLelemental detail tablesspecifying model elements. The latter, together with the schema, yield a specific feedmix model instance.

&NUT_DATA NUTRIENT DATA

NUTRi /pe/ There is a list ofNUTRIENTS

.

MIN (NUTRi) /a/ : Real+ For each NUTRIENT there is aMINIMUM DAILY REQUIREMENT (units per day per animal).

&MATERIALSMATERIALS DATA MATERIALm /pe/ There is a list ofMATERIALS that can be used for feed.

UCOST (MATERIALm) /a/ Each MATERIAL has aUNIT COST ($ per pound of material).

ANALYSIS (NUTRi, MATERIALm) /a/ : Real+ For each NUTRIENT-MATERIAL combination, there is anANALYSIS (units of nutrient per pound of material).

Q (MATERIALm) /va/ : Real+ TheQUANTITY (pounds per day per animal) of each MATERIAL is to be chosen.

NLEVEL (ANALYSISi., Q) /f/ ; @SUMm (ANALYSISim * Qm) Once the QUANTITIES are chosen, there is aNUTRITION LEVEL (units per day per animal) for each NUTRIENT calculable from the ANALYSIS.

T:NLEVEL (NLEVELi, MINi) /t/ ; NLEVELi >= MINi For each NUTRIENT there is aNUTRITION TEST to determine whether the NUTRITION LEVEL is at least as large as the MINIMUM DAILY REQUIREMENT.

TOTCOST (UCOST, Q) /f/ ; @SUMm (UCOSTm * Qm) There is aTOTAL COST (dollars per day per animal) associated with the chosen QUANTITIES.

Figure 1. SML Schema for the Classical Feedmix Model (underlined text is replaced by italics for compatibility with HTML)

NUTR
NUTR   INTERP MIN
P   Protein 16
C   Calcium 4
MATERIAL
MATERIAL   INTERP UCOST
std   Standard Feed 1.20
add   Additive 3.00
ANALYSIS
NUTR MATERIAL   ANALYSIS
P std   4.00
P add   14.00
C std   2.00
C add   1.00
Q
MATERIAL   Q
std   2.00
add   0.50
NLEVEL
NUTR   NLEVEL T:NLEVEL
P   15.00 FALSE
C   4.50 TRUE
TOTCOST
    TOTCOST
    3.90

Figure 2. Sample Elemental Detail Tables for the Feedmix Schema

Space does not permit a proper description of SML's syntax, but a few hints are as follows. Schemas are organized as a tree of paragraphs whose leaves are the genera and whose interior nodes are the modules. The boldfaced part of each paragraph is the formal definition of the genus or module, as the case may be, and the rest consists of documentary comments about the formal part which are informal except for conventions about the use of underlining and upper case. The formal definition of a genus paragraph begins with the name of the genus, a parenthetical statement of definitional dependencies (if any), a slash-delimited statement of genus type, a colon-announced statement of data type if an attribute genus, and a semicolon-announced mathematical expression called ageneric ruleif a function or test genus. The formal definition of a module paragraph consists only of its name. Note that a schema is always specified independently of any problem or task that might be posed on it. A common problem associated with the above schema is to find values for all Q elements such that all T:NLEVEL elements evaluate to true and the value of the TOTCOST element is minimal.

The structure and sequence of the elemental detail tables are determined procedurally from the schema. Each table is named, has column names that usually coincide with genus names, and has a row for each element of the corresponding genus.

The next figure, also from Geoffrion [1987], shows the so-calledgenus graphassociated with the above schema. It represents definitional dependencies at the level of genera.

Owing to the design of the underlying semantic framework and of SML itself, SML-based modeling systems can have certain features often lacking in modeling systems of more conventional design, including:

  • error-checking, of the formal specification of general model structure, that is exhaustive with respect to the underlying semantic framework;
  • detailed semantic connections among model parts, a feature that facilitates maintaining, enhancing, and integrating models, and that enables automatic generation of several kinds of model reference documents useful for communication, debugging, model maintenance and evolution, and other essential activities;
  • the ability, owing to the generality of structured modeling's view of models as definitional systems, for a single modeling system to accommodate a wide variety of modeling paradigms, which leads to easier model integration and many of the benefits of standardization;
  • browsable definitional dependency graphs at three levels of abstraction, constructs useful for visualizing and communicating the general structure of any model (e.g., Figure 3);
  • the use of hierarchical organization as an approach to managing model complexity, and also as a visual device for model navigation;
  • automatic generation of relational data table designs for model instance data, a feature that facilitates exploiting relational database tools for data management;
  • partial consistency checking of SML's informal sublanguage for documenting the formal model specification, and also partial consistency and completeness checking of formal specifications by reference to this documentation;
  • complete independence between the general structure of a class of models and instantiating data, a feature that promotes the reuse of each of these, conciseness, efficient communication, and dimensional flexibility;
  • complete independence between models and solvers, a feature that promotes using multiple solvers with a single model, multiple models with a single solver, and conceptual clarity.

A research prototype called FW/SM exhibiting all these features is described by Geoffrion [1991] and Neustadter et al. [1992]. It was built on top of a DOS-based integrated personal productivity package that offers services of value to many phases of the modeling life-cycle (viz., business graphics, file management, LAN support, macros, outlining, a programming language, spreadsheets, tabular database, telecommunications, and word processing).

Some additional features of FW/SM are:

  • a menu-driven interface that lies dormant when not in use and thus does not interfere with normal use of the host software package;
  • the use of outlining, boldface, and underlining to simplify and enhance the user interface with native SML (which uses only ASCII characters);
  • automatic evaluation (with warm restart) of generic rules, a feature that facilitates debugging, "What If" studies, and other kinds of model and results analysis;
  • a tree-oriented editor for navigating, editing, and controlling the display of schema trees;
  • a fully automatic interface to a menu-driven commercial database system, a feature that facilitates queries and report generation for data and results;
  • a fully automatic interface for linear and integer programming based on MPS-format problem files;
  • a control table interface for generalized network flow optimization (no problem file generator need be written);
  • a fully automatic interface to a commercial Prolog system, a feature that facilitates logic-based inferencing with respect to the most fundamental aspects of any model's general structure.

Several other research prototypes for structured modeling with different emphases are referenced in Geoffrion [1991], including: graph-based modeling, hybrid information/mathematical modeling systems, model management with a SQL database server in a networked environment, optimization-based applications, statistical analysis, and syntax-directed model editing. An ample foundation has been laid for the development of production prototypes.

The following sections review and comment on recent research and open research opportunities relating to structured modeling. These are divided into four main categories: improved modeling languages, approaches to model integration, extensions designed for simulation modeling, and three topics falling under implementation strategies and technologies.

2. MODELING LANGUAGES

SML may be the most thoroughly developed language for structured modeling, but it is not the only one, and it is capable of being improved. We first consider possible improvements, and then some fundamentally different alternatives.

Some of the features of level 4 SML are excessively complex. For example, Neustadter [1992] points out that SML's index set specification mechanisms need to be simplified. She proposes a fix involving changing SML's index set statement and generic calling sequence sublanguages in such a way that work is transferred from the former to the latter. Issues of expressive power and compatibility with SM foundations remain to be worked out.

A dissertation by Lin [1993] takes another approach to simplification by studying a structured modeling language, subsequently renamed SFL, that dispenses with indices altogether. The advantages claimed are enhanced readability, enhanced writability, and easier processing by parallel computers. SFL models can be stored in relational DBMSs, with SFL statements translatable into SQL supplemented by C as necessary via embedded SQL. Lin demonstrated the feasibility of doing this by building a compiler for use with INGRES. She gives a detailed argument that SFL supports the principles of structured modeling even better than SML does. SFL is a subset of the SM/DB language (Ramirez and Lin [1993]), which is used in the Iowa State University Model Management System; see Ramirez [1993] for an annotated bibliography on this collaborative project.

SML could benefit from certain additions as well as simplifications. One desirable addition would be the incorporation of typing and units of measurement along the lines pioneered by Bradley and Clemence [1987]. They introduce typing and units into a simple SML-like modeling language in a rigorous, elegant, and active way. In return for doing some extra specification work, the model designer receives powerful consistency checks and useful services for automatic units conversion and scale factoring. Adding these features to SML would be especially valuable for building complex models and for model integration.

Another useful addition to SML is vector-valued generic rules, which have been implemented in the OR/SM system by Wright et al. [1997].

Turning now to more radical departures from SML, Jones [1992] has shown that a structured modeling language can be fundamentally graphical, that is, truly centered on attributed graphs rather than merely complemented by them. Jones, whose dissertation originated the use of graph grammars as a modeling framework, applied his prior work on Graph-Based Modeling Systems (GBMSs) to structured modeling. In particular, he explained how a syntax-directed editing environment for structured modeling can be implemented as a GBMS, and he embodied these ideas in a prototype called Networks/SM.

A different approach to a graphical language for structured modeling is proposed by Chari and Sen [1997], who draw on Jones' GBMS work for formalization. The basic construct of their language is themodel graphfor representing a model class. Model graphs incorporate a multi-level architecture that can represent models at varying levels of detail, allowing a completely graphical approach to representation and manipulation. At one level of detail, a model graph is very similar to a SML genus graph. At a higher level of detail, functions and tests are expressed using elementary functional and logical operators. These operator nodes enable the generic rules and functions of SML to be diagrammed rather than remain internal to function and test nodes. Model graphs can be defined recursively usingmodule nodes, allowing a forest-like structure for models. This approach allows models to be built graphically either top-down or bottom-up.

Model graphs have three distinct types of edges to visually distinguish relationships between nodes. Some edges exhibit index inheritance, some value inheritance, and some simply show that two nodes are related (i.e., no index or value inheritance takes place). To produce a model instance, a model graph can be instantiated by using data stored in a relational database to produce SML-like elemental detail tables that are 1:1 with model graph nodes. An implementation has been completed: see Chari and Sen [1998].

Still another approach to a graphical language for structured modeling is that of Hamacher [1995]. He proposes a variant of the standard genus graph that adds semantic data modeling features from the Entity Relationship Model extended to include specialization, generalization, and aggregation. This is done in a SML-compatible way, so that a standard SML schema can be produced from the proposed diagram. A research prototype called IGOR (Integrated Graphics for Operations Research) has been built using a graphical object-oriented programming language (GraphTalk) under Windows 3.1.

A third style for a structured modeling language -- the first two being hierarchical text-plus-tables and graphical -- is object-oriented. Gagliardi and Spera [1995a, 1997] spell out in detail one possible view of the relationship between structured modeling and the object-oriented approach. They then detail the consistency of their object-oriented model definition language, BLOOMS, with each of the core concepts of structured modeling. BLOOMS, which was influenced by Eiffel, is used by the model management system they are developing. For different approaches to object-oriented model representation based on structured modeling, see LeClaire, Sharda and Suh [1991], Lenard [1993a], Dolk and Ackroyd [1995], and Ma and Tian and Zhou [1998].

These three do not exhaust the possible styles for defining structured models. For example, Chari and Krishnan [1993] have proposed a language called LSM based on logic. One of the nice features of LSM is that, by treating elements and structures as objects, it can predicate information about models as well as express the models themselves (e.g., information concerning model history, assumptions, dimensional units, revision, integration, and validation). See Hua and Kimbrough [1997] for related work. The paper by Ma, Tian and Zhou cited above also uses logic (an action logic from artificial intelligence).

The ideal language for defining structured models yet remains to be designed. I believe that all of the principal language styles are worthy of continued attention, and that language evolution would be aided greatly by experimental studies of language learnability and usability in realistic contexts.

So far we have considered only languages for defining structured models (sometimes called modeling-in-the-small). It is also necessary to have a language, or some other means, for manipulating structured models according to the purposes of users (sometimes called modeling-in-the-large). FW/SM takes a menu selection approach to this, with accompanying dialog as seems appropriate for each kind of manipulation. An alternative is to design a model manipulation language for users. Thoughtful studies of such languages in the structured modeling context include those by Dolk [1988b], Dolk and Kottemann [1993], Kottemann and Dolk [1992], Lin [1993], Muhanna [1993] and Tsai [1991].

In my view, model manipulation languages should be founded on a rigorous semantic framework. If that framework can be cast as an extension of structured modeling's, then the most natural approach to the design of a model manipulation language may be to cast it as an extension of one of the languages for defining structured models.

3. MODEL INTEGRATION

Model integration is becoming ever more important in practice as a result of several industry trends. Some of these were identified in Geoffrion and Powers [1995] in the special context of logistics, but hold more generally in other functional contexts as well: trends toward consolidating previously separate management functions, toward multi-company cooperative arrangements up and down the supply chain, and toward finding new uses for expensive assets. These trends lead to models that are more comprehensive in scope than before, often combining more than one operational, tactical, or strategic purpose.

3.1 What Is Model Integration?

Model integration means different things to different people, so it is appropriate to draw some distinctions here. One such is the distinction between what might be called "deep" integration and "functional" integration. (Others have made this same distinction using different terms; for example, Dolk and Kottemann [1993] call itdefinitionalvsproceduralintegration.)

Deep integrationproduces a single new model that combines two or more given models, subject to the important qualification that the new model must be represented in the same definitional formalism as the given models (or in one of the definitional formalisms used by the given models, if they happen to use more than one). Of course, the new model must be well formed if the given models are: the expression of the new model must be formally correct within the definitional formalism used. More than that, the new model should also be as semantically faithful a rendering as possible of the modeler's intentions.

Functional integration, in distinction, does not yield a new model in the same definitional formalism. It leaves the given models as they were and superimposes a computational agenda for coordinating calculations over them, typically directing certain models' outputs to other models' inputs while specifying the order of computations (portions of which may be left to automatic resolution at run-time). That agenda, which of course must be defined formally, serves as the (only) definition of the functional integration. Usually it is expressed in a model interconnection language. (Dolk and Kottemann [1993] use the term "model integration control language", and Muhanna and Pick [1988] propose a subsequently implemented "model description language" of this sort for composite models.)

Four detailed examples of deep integration can be found in Geoffrion [1992c], namely a four-submodel corporate model originally due to Blanning, demand forecasting plus transportation, EOQ plus transportation, and two simple transportation models that unite to form a two-echelon transshipment model. In all cases, the given and final integrated models are defined in SML. Examples of functional integration are given by Dolk and Kottemann [1993], Kottemann and Dolk [1992], and Muhanna [1993]; they address some of the above models, and a collection of models that together form a product mix model.

Not only is it important to make the deep vs. functional integration distinction, but there are also important distinctions to be made depending on just what is being integrated. As explained at length in Geoffrion [1989b], there is a natural four-level hierarchy of model abstraction:model instance,model class,modeling paradigm, andmodeling tradition. For instance, a numerical example of the classical transportation model can be viewed as a model instance within the model class "Hitchcock-Koopmans transportation model" within the "network flow" modeling paradigm within the MS/OR modeling tradition. As explained in the paper, other modeling traditions containing pertinent modeling paradigms include database management and artificial intelligence.

This hierarchy suggests 10 possible types of integration, most of which have been studied in the literature (here "two" really means "two or more"):

  1. Join two modeling traditions
  2. Join two modeling paradigms from a single modeling tradition
  3. Join two modeling paradigms from different modeling traditions
  4. Join two model classes from a single modeling paradigm (from a single modeling tradition)
  5. Join two model classes from different modeling paradigms from a single modeling tradition
  6. Join two model classes from different modeling paradigms from different modeling traditions
  7. Join two model instances from the same model class (from a single modeling paradigm and tradition)
  8. Join two model instances from different model classes from a single modeling paradigm (and tradition)
  9. Join two model instances from different model classes from different modeling paradigms from a single modeling tradition
  10. Join two model instances from different model classes from different modeling paradigms from different modeling traditions.

Many examples are given in Geoffrion [1989b]. Some comments concerning these different kinds of integration are in order:

Types 1-3, which involve hybrids of modeling traditions and paradigms, are poorly defined and ought not to be considered until the other types of integration are better understood.

Evidently 4 < 5 < 6 and 7 < 8 < 9 < 10, where "<" means "is easier to study". It seems prudent to let this partial order guide the allocation of research effort.

Because functional integration does not need to reconcile different model classes or model instances as deep integration does, functional integration is better able to cope with the relatively greater difficulty of the integration types that come late in the partial orders just mentioned. Consequently, functional integration is likely to be more practical than deep integration in situations that call for the more difficult types of integration (Dolk and Kottemann [1993]).

Types 8, 9, and 10 may not be very meaningful because the resulting model instance may not have an obvious model class, a deprivation that would rob it of much of its usefulness because model classes, rather than model instances, are the essential objects of interest in most model-based work (see pp. 63-64 of Geoffrion [1992a]). So it is prudent to replace the study of types 8, 9, and 10 with the study of types 4, 5, and 6 respectively.

It follows that model integration, in its present immature stage, ought to focus on types 4, 5, and 6 in that order, and also type 7. All four of these types are discussed in Geoffrion [1989b]. Moreover, type 6 may be hopeless to tackle until there is sufficient progress on types 4 and 5. And type 7, which corresponds to what is sometimes called "consolidation" or "aggregation", seems a good deal easier to cope with than any other type. Consequently, in what follows, attention is confined to types 4 and 5. We remark that the corporate and two-echelon transshipment examples of Geoffrion [1992c] are of type 4, while the other two examples are of type 5.

Now that we have a better idea of the meaning of model integration -- deep and functional integration of types 4 and 5, correct semantically as well as syntactically -- we can survey what has been done within the structured modeling context.

3.2 Deep Integration

My general appraisal of how structured modeling's semantic framework (without necessarily committing to SML as the modeling language) does or could support deep integration appears in Geoffrion [1989b]. For example, it seems clear that structured modeling's unusually broad scope of application mitigates the difficulties encountered when progressing from integration type 4 to 5 to 6, or from type 7 to 8 to 9 to 10. Having a lingua franca simplifies model integration, just as a having a common spoken language simplifies coordination in a multi-lingual group, a point echoed by Dolk and Kottemann [1993].

Geoffrion [1992c] proposes a procedure for manual deep integration of types 4-6 when the given model classes are expressed in SML. That paper illustrates the procedure in complete detail for the four examples mentioned earlier, and discusses prospects for automation. The conclusion is reached that, while portions of the procedure can be automated or assisted, deep integration in general is too challenging to automate completely any time soon. Even formal correctness seems difficult to automate, given the intricate nature of SML (a consequence of SML's semantic richness). Since semantic fidelity is the most challenging aspect of deep integration, automating deep integration for SML is correspondingly challenging, and the paper expands on this point in some detail.

Further comments are in order on the difficulty of preserving formal correctness when editing SML in the course of deep integration. Hereformal correctnessrefers not only to SML's context-free syntax, but also to its context-sensitive semantics as expressed in its so-calledschema properties.

Tsai [1988, 1998] obtains partial results on the precise ways in which certain important kinds of schema edits can disrupt the formal correctness of an SML schema. Considering the amount of effort needed to produce those partial results, the prospect of extending them to the point of completeness is daunting. Yet, surely it is necessary to understand precisely the potential vulnerabilities of a schema to the kinds of edits needed for deep integration. Or is it?

A partial way out may be to rely on a language-directed editor that knows all of SML's rules -- context-sensitive schema properties as well as context-free syntax. Then when a schema edit is made, possible damage would be checked automatically and the modeler would be prompted to make any necessary repairs. It turns out that such an editor has already been built by Vicuña [1990]. His dissertation uses attribute grammar equations to completely formalize all the rules of SML, and demonstrates the feasibility of its approach by implementing a complete language-directed editor for SML.

Another possible way out is to study the effect of schema edits without assuming SML or any other particular language in support of structured modeling's semantic framework. Then there are no language rules at all to worry about, although there is still the integrity of the schema to worry about according to the formal definitions of Geoffrion [1989a]. This is the approach of Gagliardi and Spera [1995b, 1995c]. They formalize some of the mechanics of type 4-6 deep integration in terms of the formal structured modeling framework, using their own object-oriented language BLOOMS rather than SML when necessary to present examples taken from Geoffrion [1992c].

Preserving the modeler's semantic intent during deep integration beyond what is defined formally for the given models is more difficult because this intent cannot be characterized by any application-independent set of rules. This is where the most severe difficulties lie for all definitional formalisms.

One aspect of semantic intent is captured by typing and units of measurement. If the definitional formalism does not incorporate these, editing a model can easily produce all sorts of dimensional inconsistencies. But if they are incorporated, then consistency can be enforced as explained by Bradley and Clemence [1988]. SML does not incorporate typing and units of measurement, but clearly it could be modified to do so. This would enable one aspect of semantic intent to be formalized and therefore to become supportable by software.

If explicit typing and units of measurement can render one aspect of semantic intent software-supportable, can the same be done for other aspects? Identifying such opportunities and finding ways to exploit them is an important research topic.

3.3 Functional Integration

We turn now from deep to functional integration, where the given models are largely left intact and are assembled into larger structures by means of a model interconnection language. The interconnection language proposed by Muhanna and Pick [1988], called MDL (see also Muhanna [1993]), has already been mentioned.

Bradley and Clemence [1988] sketch a simple, formal language useful for functional integration of types 4-6, although it can be viewed also as a kind of macro language for deep integration. Of course, it provides for the use of typing and units of measurement to check the consistency of the integrated model. The key "library unit" concept is quite close to the concept of a "module" in structured modeling.

Kottemann and Dolk [1992] focus on the situation where multiple given models are to be used computationally in a coordinated way while preserving their identity. They sketch a model interconnection language for such situations based on the idea of communicating sequential processes. The approach has a strongly procedural, rather than declarative, character and can be viewed as naturally from the viewpoint of model manipulation as from the model integration viewpoint.

Dolk and Kottemann [1993] discuss functional integration, including a tantalizing glimpse of a model interconnection language based on SML. They also discuss deep integration and the virtues of the object-oriented paradigm for implementing model integration, including a brief description of "Communicating Structured Models" based on communicating sequential processes.

The final point of that paper may offer the most profound research challenge associated with functional integration. The authors point out that extant model interconnection languages all have been engineered without the benefit of an explicit, rigorous semantic framework, just as nearly all modeling languages have been. (SML and other languages for structured modeling are exceptional in this regard; see pp. 67-68 of Geoffrion [1992a].) Can such a framework be devised for model interconnection languages? If so, it should permit designing languages for functional integration that are superior to present ones.

Structured modeling's semantic framework (Geoffrion [1989a]) provides a plausible point of departure.

4. SIMULATION

Structured modeling was not designed for simulation, but that has been no deterrence: more than a dozen papers have been written in this area. Some are evaluations of SML's suitability for simulation applications, while others are attempts to improve it for such purposes.

The consensus of the evaluative papers is that SML needs to become more expressive if it is to be useful for simulation, especially for discrete event simulation (DES). For example, Derrick [1988] examines 13 conceptual frameworks applicable to DES using a traffic intersection model as a common frame of reference. He concludes that although structured modeling (as embodied by SML) accommodates thestaticstructure of a simulation model, it does not well accommodatedynamicstructure. The same conclusion emerged for four other frameworks: the Entity-Relationship, Entity-Attribute-Set, object-oriented, and process graph method approaches.

My own evaluation of SML (Geoffrion [1989d]) takes an example-based look at three of the main concepts characteristic of simulation: (a)random variablesandstochastic processes, (b)dynamic behavior rulesdescribing the behavior of a system over time, and (c) the notion of anexperimental planaddressing system behavior over time or over repeated trials. The examples are: Normal random variables and Poisson arrival processes for (a), the D/D/1 FCFS queueing system for (b), and Monte Carlo simulation of a simple structural mechanics problem, of the classical newsboy problem, and of critical path length for (c). In each case, the main SML modeling options are detailed.

My main conclusion regarding (a) and (b) is that much can be done within SML to deal with them, but that the complexity of the resulting models and/or the burden on the solver and its companion programs can easily become excessive if SML is not extended (e.g., to include random-valued attributes). The conclusion regarding (c) is that much could be done with a good model manipulation language designed to work in concert with SML or an extended version thereof. The paper advocates the design and implementation of such a language with certain capabilities, a project that I still believe would be very worthwhile.

Now we come to the efforts of others to improve SML's applicability to simulation. This always involves modifying structured modeling's semantic framework and SML, most obviously by allowing selected attribute elements to be random-valued (generated by known probability distributions). The first serious discussion of this extension appears in Maturana [1987], and there is work yet to be done to make it fully rigorous. One likely way to do so, proposed informally by my colleague John Mamer, would be to incorporate into the core concepts of structured modeling's semantic framework a sample space with a probability measure. This would yield statistical dependence of attribute and function element values as a consequence of the measure properties of the sample space and the structure of the model. A number of technical points need to be settled in connection with this setup, such as the assumptions (if any) needed to justify computing moments of element values by, in effect, repeatedly instantiating and evaluating a structured model that is "A-partially specified" (in the jargon of Geoffrion [1989a]). Then one would need to study how to perform such calculations efficiently in the simulation context.

Two adaptations of structured modeling to DES are particularly noteworthy. See also Ma, Tian and Zhou [1998], which adds a logical formalism to SM for the purpose of describing dynamics.

Lenard [1992] sketches three DES-motivated extensions of SML, actually new kinds of elements: random attributes (see above), actions (which describe state transitions), and transactions (used to describe complex events in terms of a sequence of previously defined actions and transactions). Lenard [1993b] describes a prototype model management system based on these ideas that was implemented in a database environment (ORACLE 6.0). The database schema is not model-specific as in SML, but rather is fixed for all models. In addition, there are major restrictions of SML. The implementation (developed under contract to the U.S. Coast Guard) makes extensive use of the ORACLE tools SQL*Forms and SQL*Menu; in particular, most user interaction is through forms selected from a set of pop-down menus. Among the system's features is the ability to convert extended structured models to SIMSCRIPT II.5 code. The code generation is done entirely in SQL*Plus (ORACLE's version of the standard query language, SQL).

The approach taken by Pollatschek [1995] is quite different. It is elegant, powerful, and potentially applicable to domains other than DES. There is one extension of SML beyond random attributes, namely the addition of units of measurement (discussed in Section 2), and there are no restrictions of SML. The crux of this paper, however, is not in the extensions, although it is enabled by them, but rather in the idea of beginning the SML schema of every simulation model with a standard module (the same for all DES models) containing primitive and compound entities that essentially define a new simulation worldview proposed by the author.

This worldview, which of course must be obeyed by the balance of the model, contains semantics beyond the representational power of SML. These semantics can be used to build a processor capable of reading any such schema and producing simulation code, in a standard target simulation language like Simscript, that gives the dynamic behavior intended by the modeler. The translation process makes extensive use of the definitional dependency structure that is inherent in SML. The resulting code can then be fed to the target language processor, which serves as the solver.

The approach is a powerful one because the schema processor can work with richer semantics than are formally expressed in the schema itself. The extra richness comes by prior agreement of all concerned to accept the new simulation worldview, which regards simulation as involving model entity "associations" that exist in time and that follow one another according to certain rules. This worldview seems straightforward and natural, but needs to face the test of application in a variety of situations. The issue of generality and a pilot implementation are on the agenda for future work.

Pollatschek's approach abounds with intriguing issues for investigation, including its adaptation to domains other than DES.

5. IMPLEMENTATION STRATEGIES AND TECHNOLOGIES

As mentioned earlier, a comprehensive foundation for structured modeling has been laid on which future production prototypes can be built. Toward that end, it is now appropriate for research emphasis to be placed on implementation strategies and technologies. We discuss three topics in this vein: host software, semantic formalization and language-directed editors, and language-based customization of specialized application packages.

5.1 Host Software

There are two basic ways to build new modeling environment implementations: program them from the ground up, incorporating ready-made components as possible and appropriate, or build them on top of existing software that already provides much functionality of value to users. Because of the very wide variety of functionalities required by modeling environment users over the typical modeling life-cycle (see, e.g., Geoffrion [1989b], [1989c]), the second approach seems much more practical than the first. Hence we consider now some of the possibilities for host software: integrated software suites, relational database systems, extensible database systems, and object-oriented database systems.

FW/SM's host was Framework IV, a DOS-based integrated personal productivity package outstanding in its day but now dated. At the time of this writing, an implementation in the spirit of FW/SM would likely choose as its host an integrated software suite such as Microsoft Office Professional, which offers spreadsheet, word processing, presentation graphics, SQL database access and management, electronic mail services, and more under Windows. There is good user interface consistency and strong interprocess communication capabilities through dynamic data exchange, object linking and embedding, and other standards set forth in Microsoft's Windows Open Services Architecture. Excellent development tools are available, as are many compatible programs by vendors active in the Windows market.

No comprehensive implementation hosted by an integrated software suite exists today, although the GESCOPP project made progress in this direction.

Another attractive host is a good relational database management system (RDBMS). There are three major structured modeling implementations of this sort (see also the proposal by Dolk [1988a]). One, Lenard's ORACLE-based system for simulation, already has been mentioned.

The second is DAMS (Decision and Algebraic Management System), built on top of Quadbase/SQL and the POET object-oriented DBMS; see Ramirez [1995]. DAMS is accessed through sublanguages collectively called SM/DB; one based on structured modeling provides model definition and manipulation, and SQL serves as the data sublanguage. Access can be interactive or from a programming language such as C++. "Mappings" allow multiple models to share tables containing instance data.

The third is an elaborate system catering to statistical databases called OR/SM (ORACLE/Structured Modeling); see Wright et al. [1997]. The subject of a doctoral dissertation (Worobetz [1992]), OR/SM was implemented in C, with ORACLE (including SQL*Forms, SQL*Menu, and SQL*Plus) as the host system. Models are composed in Microsoft Word using SML extended to include vector-valued generic rules, with parts of level 4 SML omitted and with minor modifications to exploit SQL's potential for efficient evaluation and retrieval. The schema and data are input through user interaction within a completely menu-driven system. Statistical data analysis, also menu-driven, is carried out via a largely transparent PC-SAS interface, and a SAS/OR interface handles optimization models. OR/SM is extensively documented (Kang et al. [1997]), and has been used successfully by many students.

See also the RDBMS-based implementation of Heavey and Brown [1997], which incorporates two SML extensions and was applied to queueing network models in manufacturing.

An intriguing advantage of building a structured modeling environment on top of a RDBMS is the possibility of using its query engine to carry out theevaluationoperation, that is, for evaluating function and test element values. This operation, in the extended sense that includes computing explicit index sets whose populations are given by formula, is a crucial operation in computer-based modeling environments having algebraic executable modeling languages (EMLs). It is needed to answer even the simplest user requests for basic computations on models, and also for such purposes as loader/editors that understand model structure, model debuggers, solver interfaces, functional style query languages, and report writers.

Although every structured modeling implementation that chooses a RDBMS host makes extensive use of the query engine, none yet does so in a way that fully exploits the potential efficiency for evaluation inherent in the elaborate query optimization implemented by such engines. Neustadter [1994] points the way to changing that, not only for structured modeling environments, but also for modeling environments that choose algebraic EMLs other than SML. She defines a generic data model that is representative of algebraic EMLs, defines a compatible extended relational data model, and views EML expression evaluation as queries over data tables associated with this extended relational model. From this viewpoint she develops and proves correct an expression translation algorithm from the first data model to the second that eventually should enable evaluation to be carried out efficiently for large models through incremental and optimally organized computations much as in RDBMS query optimization. She checks the framework in detail for applicability to two contemporary EMLs, namely AMPL and SML. See also Lin [1993] for a related approach.

Useful as they are, relational DBMSs nevertheless have their limitations as modeling environment hosts. These limitations have been studied by Desai [1992] for extensible database systems and by Huh [1992] forobject-orienteddatabase systems. Each examines in detail the potential of these new types of database systems as hosts for a modeling environment that supports algebraic modeling languages such as SML.

Desai formalizes the essence of what it means for a database system to be "extensible" and develops a suitable data model within this formalization consisting of data structures, operators, and structural constraints. She gives attention to data access strategies for good performance and to version management, and reports on a partial implementation using the experimental system EXODUS. She also gives a comparison of extensible versus object-oriented database systems as platforms for modeling environments.

Huh's concern is with object-oriented modeling environments that simultaneously support multiple algebraic modeling languages. In particular, AMPL, GAMS, and SML are considered in detail. In addition to models in the traditional sense, defined in an object-oriented version of the input-output style of the systems approach, Huh also postulates "functional models" that encapsulate solver functionality. He has completed a prototype implementation using C++ and ObjectStore on a Sun-4. Huh [1993], Huh and Chung [1995] and Huh, Kim and Chung [1998] describe related ObjectStore prototypes incorporating SM.

See also the work of Davis, Srinivasan and Sundaram [1995], [1998] on implementing structured modeling using the object-relational DBMS Illustra, the commercial version of POSTGRES. The work of Ruark [1998], also object-oriented, is promising in a different way.

Thus, it is clear that a variety of attractive host software platforms are available to implementors of new structured modeling environments. Much could be done to adapt and exploit their potential.

5.2 Semantic Formalization and Language-Directed Editors

Vicuña [1990] wrote a dissertation on semantic formalization. He observed that most existing modeling languages for mathematical programming have semantics that are only incompletely formalized. This situation -- which he studied in detail for AMPL, GAMS, and LINGO -- inhibits efforts to achieve a high level of automation for diagnosing errors and generating major components of a computer-based modeling environment (e.g., language-directed editors, type inference tools, and immediate expression evaluators). He demonstrated that attribute grammars furnish a suitable declarative formalism with which to specify the semantics of SML, and a similar development should be possible for most other mathematical modeling languages. As mentioned earlier in our discussion of model integration, Vicuña demonstrated the feasibility of his approach with a fully operational language-directed editor for SML on a UNIX workstation. Moreover, he addressed both the automatic detection of missing language constructs and the immediate evaluation of numerical and logical expressions.

Language-directed editors are possible not only for text-based languages, but for other types as well. For example, see Jones [1992], Chari and Sen [1997], [1998], Hamacher [1995], and Yeo [1997] in the context of graphical languages for SM. Such editors deserve further development, as they are one of the keys to broader acceptance of modeling tools.

5.3 Language-Based Customization of Specialized Application Packages

Almost every real application of optimization requires a model that is unique in some ways. Either a compatible special-purpose optimizer is used, or a general-purpose optimizer is used -- possibly in a manner that exploits special model structure. The skilled practitioner usually can see many other similar applications in the target organization, and in other organizations, that could be solved similarly. But doing so typically would require customizing the software to some degree, and this can be very time-consuming. The point of this topic is to suggest how customization can be accomplished much more quickly (Geoffrion [1992b]).

The starting point is to recognize that the type of customization required to move from one application of a given kind to another of thesamekind is quite different from the type of customization required to move to adifferentkind of application. The difference is that the first type of customization involves only what might be called the model's "surface" structure, whereas the second involves what might be called "core" structure. A model'score structureis whatever aspects of its mathematical structure truly impact a particular optimizer's ability to solve it; everything else issurface structure.

This distinction can be clarified by a familiar example, namely the classical transportation model. In its usual simple form, this model consists almost entirely of core structure. But if embellished with new model features that use additional data to calculate the capacities of the origins, the demands of the destinations, and the magnitudes of the unit shipment costs (perhaps as weighted averages of different freight rates), then all of the new features would be surface structure.

It turns out that surface structure can be customized relatively easily with modeling language-based tools, while this is much less true for core structure modification at the present state of the art. Consequently, I propose a new way to build specialized application packages when execution efficiency is an issue and customized variants are of interest.

In particular, I propose that the interface to the optimizer should be hand crafted for efficiency in such a way that it presumes knowledge of core structure but not knowledge of surface structure. I further propose using SML (or some other model definition language for structured modeling) to describe surface structure, so that SML-based tools can be used to provide many of the functionalities needed for the application at hand. Many such tools are a part of the UCLA prototype FW/SM, as described in Geoffrion [1991].

If this implementation strategy works as intended, then it should be much easier than usual to improve a given application over time as opportunities to improve surface structure become evident (as they always do) and as changes are mandated by external events. It should also be much easier than usual to create new but similar (i.e., same core structure) applications, because the labor-intensive part of the implementation need not be redone.

In both cases, the advantage of this strategy is that it helps to fulfill this dictum of a good implementation:The most volatile aspects of a model should be the easiest to change. [In order of diminishing volatility, the components of a model may be taken as (1) data values, (2) index set contents, (3) surface structure, and (4) core structure. Proper use of database technology can render (1) and (2) easy to change, and the point of the proposed strategy is to make (3) easy to change also.] Moreover, one of the larger difficulties with most of today's modeling languages for mathematical programming will be neatly skirted, namely their inability to automate properly the model/solver interface when special model structure needs to be exploited for efficiency's sake. Finally, one may hope that this strategy will improve the application economics of the many highly specialized algorithms that populate the journals, to the extent that some will become viable in practice for the first time.

Appendix A of Geoffrion [1992b] further clarifies the distinction between core and surface structure, and gives additional details on this implementation strategy. There are many ways in which research and development efforts could be based on this strategy. One such is described by Gazmuri, Maturana, and coworkers in Chile (see [1993] and the more recent references cited therein), whose large project is trying out these ideas together with others in the context of real production and distribution applications in manufacturing. There is also a similar but smaller sponsored project involving some of the same people.

Our discussion of this topic has been limited to models where the key solver is an optimizer, but the strategy suggested here should be just as applicable to other kinds of solvers.

6. CONCLUSION

The brief overview of structured modeling given in Section 1 presents but a superficial glimpse of the foundations reported at length in Geoffrion [1989a], [1991], [1992a].

The remaining sections review and comment on four main categories of structured modeling research topics, including open opportunities. Section 2 covers possible improvements in SML and alternative languages for defining structured models. Section 3 deals with model integration, after distinguishing and discussing several kinds of integration. Section 4 covers extensions and adaptations for discrete event simulation. Section 5 includes three topical areas relating to implementation strategies and technologies: the leading candidates for host software upon which to erect future structured modeling environments, language-directed editors, and an implementation strategy that may improve the ease with which existing applications can be maintained and improved, and with which closely related applications might be produced quickly once the first one has been implemented.

This article covers fewer than a third of the references given in the latest annotated bibliography on structured modeling (Geoffrion [1999]). Readers will find there many additional citations, details, and important topics not discussed here.

In closing, we note that the existing body of work on structured modeling has prepared the way not only for additional basic research, applied research, and prototype development, but also for real applications. Some studies already have been done to assess the applicability of structured modeling in actual companies, including an integrated oil company in Brazil (Hamacher [1991]), a tire manufacturer in Korea (Park, Kang and Kim [1992]), the largest Korean steel company, an industrial gases company, and a telephone company. In addition, the GESCOPP project involves applications at two Chilean manufacturing firms: a candy manufacturer (Maturana and Eterovic [1995]) and the largest appliance manufacturer (Maturana, Gazmuri, and Villena [1999]).

The time is ripe for pilot applications to become the guiding force behind university-based studies.