a
 

Proceedings of the Information Systems Foundations Workshop

Ontology, Semiotics and Practice 1999


 

An Information Theory Formalisation and the BWW Ontology.

 

C N G (Kit). Dampney and M S J Johnson

Department of Computing, Macquarie University

E-Mail: cdampney@ics.mq.edu.au;   mike@ics.mq.edu.au

Abstract

The purpose of this paper is to outline a formalisation for information system representation.  It codifies a slightly amended BWW ontology and explains the inter-relations between a number of Unified Modelling Language (UML) object oriented methods.  The formalisation is based on the formalism of category theory and the very general idea that objects and the way they relate to each other is the basis for our understanding of reality.

Category Theory is an abstract mathematics aimed at universality.  It is used as a foundation for mathematics.  Evidence from its application suggests it codifies the innate abstractions used by us to construct understanding of our reality.  We find and confirm a strong alignment of category theory with ontology that seeks the essential underlying constructs of nature.

The formalisation unifies the semantics of the statics, dynamics and composition of information constructs.  It captures similar constructs to BWW ontology for information systems representation.  The formalisation explains the inter-relations between various methods applied within the Unified Modelling language (UML) widely used for systems analysis and design.

Existence dependency between objects in reality is identified as an essential primitive construct for information.  A new way of composing objects is presented.  It is based on category theory fibration that preserves consistency of existence dependence.

Key words

formalisation, category theory, ontology, BWW ontology, information models, information structures, information systems, existence, dependency, consistent dependency, UML (Unified Modelling Language), representation

INFORMATION CONSTRUCTS

Information constructs have been developed in various approaches such as the entity relationship (ER) and object oriented (OO) methods to model information

Information constructs and the BWW Ontology

An information model is deemed (BWW)-ontologically sound and complete if its constructs correspond to the BWW constructs (Weber 1997; Wand and Weber, 1995; Colomb and Weber, 1998) as represented by the BWW representational model. 

The phrase "as represented by" reveals something deeper.  Two of the BWW "models", representation and state-tracking, described by Weber (1977) are not information models, but ways of comparing information modelling approacheswith the BWW ontology.   Weber's third model, the (good) compositional model, highlights the important, but usually neglected, composition construct for information models.

Weber thus identifies three aspects of information systems that are also the subject of this paper.  They are:

·        The static aspect of information represented by static information models concerning the conceptual structure of reality and data models represented in database sschemas

·        The dynamic aspect of information represneted in dynamic information models, often state-transition models, and

·        The compositional construct that we represent as a mapping between static information models.

Weber and Wand based their work on Bunge’s ontology concerning on the nature of things in general.  In essence the BWW ontology seeks to identify the basic constructs we use to understand nature.

A specific instance of a reality is revealed to us as the Real world extant and is therefore based on the nature of reality.   Figure 1 presents an informal schema idealising the mapping from the nature of reality to the domain of information constructs.  These are developed in BWW-ontology, in UML and in the formalisation in this paper.

 

Figure 1.  A schema of the nature of reality formalised by information constructs from various approaches.

The Unified Modelling Language (UML).

The Unified Modelling Language (UML) (e.g. Rumbaugh et al 1999) has gained substantial acceptance.  It is an object oriented approach and expressed its constructs in a graphical notation. These constructs are used to develop models for various aspects of information  represented by objects and relationships.  These aspects include in particular:

1) a static model (the class diagram),

2) a dynamic model  (the state model) , and

3) a package diagram specifying a composition of the system.

Other diagrams identify other elements of information and relate aspects to each other, such as:

4) a use case diagram denoting the actions,

5) a collaboration diagram specifying control within the system for each action type, and

6) an activity diagram relating the static models, controls and actions.

Statics, dynamics, and composition are all recognised as essential aspects or construct in the BWW ontology, the UML, and in the formalisation presented in this paper.

Illustration of the application of category theory formalism to INFORMATION

There is a close alignment between the aims of category theory and BWW ontology - they both seek to unify the means of describing, respectively, an abstract and a natural reality.  In this paper we indicate the use of Category Theory in formalising information and use a simple airline reservation system as a working example.

We consider the different formalised aspects of information and the composition construct identified earlier.  In terms of our formalisation they are:

•   the information objects and their relationships (associations),

•   an action-flow diagram specifying how actions flow through state machines specifying the behaviour of individual instances of objects (Dampney, 1998) , and

•   a mapping showing composition / decomposition of objects.

These derive from the core constructs of our formalised information - the object and its dependencies which we now describe.

A simple underlying construct - dependencies.

In information models used in a large sector of industry and commerce, relationships are not given attributes.  This is as defined in UML and as usually practised in entity relationship modelling. Relationships can always be normalised, possibly by adding additional (intersection) objects, so that all relationships are mandatory many-to-one ro mandatory one-to-one.  That is all are normalised to functions.  These functions represent existence dependency.  For example in

B  ––>  A

the existence of the object B depends on the object A.  This is inferred from the definition of a function and the real existence of A and B - an object instance of B cannot exist until the object instance of A on which it depends exists.

We further observe that we can, without loss of generality, constrain dependencies between object instances so they do not change over time once they come into existence.  No loss of generality occurs because we can employ the artifice that a non-fixed dependency B ––> A can be normalised to fixed dependencies by adding an intersection object AB and creating new instances with fixed dependencies in place of the changing dependency.  That is:

B ––> A, is normalised to   B <––< AB ––> A

This allows these now fixed many-to-one relationships to be interpreted as existence dependencies.  They represent the fact that the existence (of an instance) of an entity depends on the existence of the entities it is dependent on.

As the graph of dependencies must be acyclic, the information model can be rearranged to show dependencies similarly aligned (vertically downwards is chosen) so that the dependencies within the model can be more easily understood and interpreted.  It also becomes much simpler to recognise consistent and inconsistent dependencies.

The dependency construct is basic because all relationships of any degree and multiplicity between objects can be normalised into fixed dependencies.   Some would argue that eliminating relationships as objects in their own right looses semantic expressiveness.  Whatever might be the loss, it is compensated by enabling a much more coherent, and thus semantically rich theory of information.

The object dependency diagram does not specify attributes.  In a section below, we examine the BWW ontology and the composition construct.  This shows that the representation of attributes and properties of an object is an issue that can be separated from representing objects and dependencies.  Space permits no more than an indication of what is involved.

Consistent dependency.

Empirical analysis of the structure of relationships within a number of large entity relationship models developed by various practitioners show that these models satisfy consistency according to a rule, we refer to as the "Consistent Dependency Principle".

The details of this principle and its codification are given in [Dampney et al 1993, Dampney and Johnson, 1995].  It can be briefly stated as:

The Consistent Dependency principle
Dependencies are consistent providing role is consistent (and vice-versa).

The consistent dependency principle is intuitively appealing.  It is driven by the idea that the existence of a thing (entity) depends on the existence of something else (another entity).  Where there are multiple paths of dependencies from one dependent entity to a target entity, then the composed dependencies along all paths are the same (equal) unless the target entity is playing different roles.

The consistent dependency principle is thus used to check the semantics of an information model and audit it for compliance.  As exemplified in figure 2 dependencies are expressed as arrows in the formalised static model of information.

Category theory formalism is natural for information models

Category theory is based on graphs with nodes of objects and edges of arrows.  This has consequences which give category theory substantial intuitive appeal for formalising information. The appendix, the axioms of category theory, illustrates that equations in category theory can be expressed using commuting diagrams.

In the list following it is useful to remember that arrows can usually be thought of as functions and that such relationships are endemic in informal information models such as the entity relationship model or the object class diagram.  We use the term existence dependency or just dependency to describe the semantics of these functions.

The consequences include:

•   Implicit in the informal directed graphs used to model information is that arrows compose to form new arrows and the order of composition does not matter. This is the axiom of associativity in Category Theory.

•   Objects in category theory denote entity types in ER models, or object classes in UML, or things and events in BWW ontology (Dampney, 1998). The arrows denote dependencies between objects

•   It has become empirically evident that where an object has more than one dependency path to another object, then the dependencies are equal if the role of the other object is the same for both paths  - see the reservation systems example below.  This is important because the concept of consistent dependency (CD) equates with the formalism of commuting diagram (CD), a property widely exploited in category theory.

•   A model is defined as a homomorphic mapping (functor) from the category of graphs to the category of sets.  Homomorphic means that objects connected by an arrow in the theory map to sets connected by relations so that the compositional structure is the same in both.  The relations must satisfy the categorical axioms of associativity and identity (see Appendix).  A common relation satisfying these axioms are .

• Category theory does not require the concept of elementhood.  This is a major advantage in formalisation for information models.  The internal structure of an object and internal structure between elements can be deferred for later analysis and dealt with separately.

•  Constructs in category theory such as commuting diagram, products, pullbacks and other limits, and the terminal object turn out to be a natural formalisation of constructs we use in informal information models.

•  Category theory extends to multiple dimensions which for information enables dynamics to be expressed within the same formalism (Dampney, 1998) - an important synthesis of statics and dynamics.  [See also Dampney and Johnson, 1993].

A model in category theory.

The reader should carefully note a subtle point in the examples that follow.  Most diagrams represent expressions of information theory in a category using the category theory notation of objects and arrows.  But this same representation also represents a set-theoretic model of the theory (providing the theory is representable in sets - a point we do not need to consider here). Thus the same diagram stands in for:

•   the formal information theory describing the particular reality,

•   the model, mapping from theory to a representation structure, and

•   the representation of the model.

However, not all of the theory can be expressed within just one category because there are various kinds of relationships between the various kinds of objects we recognise in reality.  We think of this as different aspects of reality.  For example, to formalise relationships as arrows in category theory, the relationships must compose governed by semantics that can be expressed by the associativity axiom of category theory.  Different kinds of relationships and objects require different categories to express them.  However, a deeper application of category theory emerges.  We can map between categories using functors which themselves satisfy the category theory axioms over categories.

This gives us a formal reason for having several diagrams that represent models of different aspects of information.  In UML several models of expressions are employed according to the aspects being considered.

In this paper separate diagrams will be used for the static and dynamic aspects of information.  Composition will be shown as a mapping between diagrams

A simple Airline Reservation system.

An example of a static model of information is given in figure 2.  The dependencies in this model are re-oriented in figure 3 to show the dependencies more clearly.

The dependencies to airport are inconsistent because airport is playing two roles (departure airport and arrival airport).  The terminal object is specified by the property that every object must have a unique arrow to it.  The terminal object can be interpreted as representing global properties of the system such as the airline company that has the reservation system.

In the model the concept of reservation has been simplified by requiring that all reservations be part of a group reservation, even if a group reservation is for a group of just one.

Figure 2.  Normalised Information model of a flight seat reservation system annotated to convey meaning easily

Figure 3. Formalised dependency derived from figure 2.  The X denotes inconsistent dependencies. and "1" the terminal object.  This figure represents both theory and model.

The dynamic aspect of formalised information.

As shown in Dampney (1998), the object dependency diagram describing the statics is a starting point for specifying the dynamic behaviour of the system.

Consider the dependencies in the flight seat reservation system, and specifically the action allocate_seat  of allocating a seat to a passenger.  This action depends on all the objects within the particular instance seat_alloc's dependency lattice (figure 4).  Note that this dependency lattice is between object instances, not object types. Thus seat_alloc, by virtue of consistent dependency, depends on just one instance of each and all the objects as shown.

Figure 4.  Dependency lattices for (a)  a seat allocation seat_alloc, and (b) a reservation.

For a seat allocation to be valid, that is for seat_alloc to come into existence, each of the objects under seat_alloc in the dependency lattice must exist and be in a state that permits the seat allocation action to be carried out.  This is formalised as an action flow of allocate_seat originating at the terminal object and flowing up through the entire dependency lattice, layer-by-layer, flowing through each object by an allowed transition or being stopped, leading to seat_alloc coming into existence or not accordingly.

Figure 5.  Informal representation of the components of the communicating multi-channel finite state machines for actions on reservations.

For example if the group reservation had not been paid, and state transitions were specified so that payment was required before a seat could be allocated, then allocate_seat could not flow through group_res.  A similar description applies for reservation. 

Note we form a closed lattice of object instances by distinguishing distinct roles of objects to accommodate inconsistent dependencies, such as customer and passenger persons, and departure and arrival airports.  The lattice of object instances of figure 4 so formed can be mapped, as a consequence of the consistent dependency principle, to the lattice of object (types) and dependencies of figure 3.

The dynamic model is now formalised.  As figure 5 shows, it is a lattice of communicating multi-channel finite state machines.  It derives from Katis et al (1998) who formalised in the formalism of 2-categories Milner's calculus of communicating systems (Milner, 1980, 1990).  In 2-categories, action types are formalised as objects and state transitions as arrows.  The domain and codomain of arrows are objects, some of which are formed as a product of other objects. 

The technical detail of formalising the lattice of communicating finite state machines within the formalism of a  2-category is sophisticated but elegant.

Current computer mechanisms do not directly support data flow, so control must be part of the design.  The design as specified by the UML collaboration diagram for reservation (figure 6) in fact simulates the effect of an action flow.

Figure 6.  Collaboration diagram for the action reserve reservation.

Composition and decomposition of formalised models of information .

Figure 7 modifies figure 3 by leaving out the detail of the terminal object.  It adds in objects to support flights with multiple segments.  A basic flight segment (flight_seg) is a leg of a flight, that is the component of the flight between adjacent airports of the flight path.

The flight_seg* object is an expression of all contiguous flight path combinations of flight segments, including basic flight segments and segments that include other segments.

Figure 7.  Dependency of the Flight Seat Reservation system modified from figure 6 to support multi-segment flights as described in the text.

Figure 8 further extends figure 7 to define how flight segments relate to other flight segments. The flight_seg* object is an expression of all contiguous flight path combinations of flight segments, including basic flight segments and segments that include other segments.  The overall structure of flight_seg* and the flight_seg*lattice_gen happens to represent a lattice structure.  As a consequence, the components of seg*_seat, seg*_seat_alloc, group_res and reservation also form a lattice structure.  The additional rounded boxes define object clusters.  Technically these clusters are (sub-)categories. 

Each flight now depends on a list of airports with roles of (departure, transit, arrival).  LIST (airport) can be generated from airport and also forms a lattice of component lists.

Figure 8.  Figure 7 enhanced as in the text. The rounded boxes denote fibred object clusters.  The crosses denote inconsistent dependency.

Overall figure 8 has consistent dependency except where indicated.  There are some interesting technical details, that qualify this last statement in category theory.  The composed dependencies to flight, flight_seat, flight_group_res, flight_res actually do have consistent dependency, and the consistent dependency to list(airport) is to within the lattice of lists.

Figure 13(a) composes the clusters in figure 12 into new (abstracted) objects.

Figure 9. Functor dependencies between composed objects from the object clusters of figure 8 - (a) the functors, and (b) detail of the group_res to flight functor showing mappings from one instance complex of group_res.

The mapping from figure 8 to figure 9 satisfies the properties of a fibration functor in Category theory.  .From this it can be deduced using category theory that there are also functors between the sub-categories of figure 8.  An indication of the structure of these functors is given in  figure 9(b) by mappings between clustered object instances rearranged into internally connected molecular instances within the inter-related object clusters.  Specifically figure 9(b) shows the mapping of a molecular instance of the composed object GROUP_RES to the molecular instance defining the “1,2" instance of flight_seg* associated via flight_seg component to instances “1” and “2” of flight_seg*  of a flight.

This construct introduces an important new dimension to information modelling.  At the conceptual level, concepts elicited are not necessarily atomic.  What at one level is an object dependency, may be an abstraction of much more detailed simpler dependencies.   The mappings that form each of the simpler dependencies can be rearranged to define a functor.  Functors and functions satisfy the same category theory formalism, and conceptually there is no essential difference.  That is a composed object appears like an object.

Attributes compose with their objects.

The approach to abstraction by composition, means that things and properties as defined in the BWW ontology can be dealt with separately by taking advantage of the property of currying (Curry 1978).

Currying is expressed in category theory by saying that for any arrow  f : A x B ––> C, where A x B is a product of objects, we can define uniquely another arrow
 f' : A ––> [B––>C], where [B––>C] is itself an object.

This enables us to re-cast postulate 2.2* in Weber (1997) concerning representing properties as attributes.  Using a to represent an attribute function, an object E to represent a thing in the BWW ontology and the objects {Vi} to represent a family of (value) sets, the definition of attribute

            a:  E x V1 x  V2 x ... Vn––> {True, False}

can be replaced by       a':  E ––> [V1 x  V2 x ... Vn––>{True, False}],

so         a':  E ––> A     where A is [V1 x  V2 x ... Vn––>{True, False}],

It is useful to recognise here that we do not interpret an arrow like E ––> A  as a dependency.  This is because the object E represents something in reality and does not depend for its existence on the existence of the values that define its attributes.  Alternatively we may think of the values as always existing so that existence dependency to a value is a given.

Figure 10.  Attributes abstracted into their objects.

Thus both the thing (E) and the attributes A' representing the thing's properties in Figure 10(a), can be abstracted (b) to the objects and dependencies (c) via fibration.  The arrows typified by a' map into the identity arrow of the abstracted objects Xi.  Johnson et al (1997) examines the statics and dynamics of queries, attributes and updates in some detail.

Conclusion

The formalisation presented here awaits further experiment to test the theory with the hope of discovering more new principles governing information systems representation.

Weber's monograph (1977) puts forward a semi-formal description relating information systems representation in sets to Bunge's ontology.  There is useful philosophy and the beginning of experimentation to elicit theory and discover principles, but it is only a beginning. 

The category theory mathematics that is indicated in this paper is not easy, but it does lead to a formalisation that appears consistent with the constructs proposed in the BWW ontology and UML.

APPENDIX - THE BASIC AXIOMS OF CATEGORY THEORY

Category theory has become widely accepted as an alternative to set theory for expressing the foundation axioms of all mathematics.  It is being applied increasingly to computing (e.g. Barr and Wells, 1990), and we are finding important information system application by virtue of the natural commuting structures, consistent dependencies, evident in information systems. 

Category theory provides a formal basis for the concepts of transitivity and identity.   It informs us that transitivity and identity can be axiomatised by:

•associatively,                                                                  (3)

•identity                                      (4)

such that these diagrams commute.

Definition of a Commuting diagram: an arrow formed by composing the arrows from a source object to a target object is the same regardless of the path that generates it.  (Trivially a diagram that consists of a single path of arrows commutes.)

These two diagrams expressed in text are:

                                 A <–– D  =  f • (g  •  h )  =  (f  • g ) • h                                           (3')

and                            1A •  f  =  f        ,           f  •  1B  =  f  .                                                  (4')

The concept of an identity is fundamental in information modelling - it expresses the important concept of the identifying attribute, or the identifier.  Transitivity is implied by associativity.

The hidden power of this mathematics of category theory is that the assertions have extraordinarily wide application by virtue of their simplicity, their weakness, and thus their generality.  Because so little is assumed, the mathematics can be applied in many different ways.  It does, however, take significant interpretation to make good use of the mathematics.

Further axioms defining the categorical constructs of product, pullback, other limits, functor, commuting diagram have important and intuitively appealing application to information.

References

Barr, M and C. Wells, 1990.  Category Theory for Computer Scientists.  Prentice Hall.  ISBN 0 13-120486-6

Bunge, M., 1977.  Treatise on Basic Philosophy: Volume 3: Ontology I: The Furniture of the World,  Reidel, Dordrecht, Holland.

Colomb, R.M. and Weber, R., 1998 Completeness and Quality of an Ontology for an Information System.  in N. Guarino (ed.) Formal Ontology in Information Systems (International Conference on Formal Ontology in Information Systems - FOIS’98 - Trento, Italy, 6-8 June, 1998). IOS-Press (Amsterdam, Oxford, Tokyo, Washington, DC) 207-217.

Curry, H.B., 1980.  Some philosophical aspects of combinatorial logic.  In Barwise, J., H.J. Keisler and K. Kunnen (eds.), The Kleene Symposium, 1978.  North Holland, pp. 85-101.

Dampney, C.N.G., 1998.  The event as a fundamental construct of information systems.  Proceedings of Ninth Australasian Conference on Information Systems, 29Sep-2Oct.  University of New South Wales, Sydney,  pp 120-133. ISBN 0 7334 0498 7

Dampney, C N G and R M Colomb, 1994. Semantic Correspondence in Integrating CASE TOOL Repository Schemas.  Journal of Information and Software Technology, Vol 36 No 2, pp87-96.

Dampney C N G, M.S.J Johnson, and G.P. Munro, 1992.  An illustrated mathematical foundation for ERA.  Proceedings of the Institute of Mathematics and Its Applications,  Oxford University Press,  pp 77-84.

Dampney, C N G  and M S J Johnson, 1995.  Application of Consistent Dependency to Corporate and Project Information Models.  Lecture notes in Computer Science , 1021.  pp445--446,  Springer-Verlag.

Dampney C N G,  M S J Johnson, and P. Deuble, 1993.  Taming Complex Information Systems.   Conference on Complex Systems 92, Australian National University, Canberra, December, pp210-222.  in   Green, David G and Terry J Bossomaier (),  Complex Systems: From Biology to Computation.  IOS Press, Amsterdam. ISBN 90 5199 1177

Dampney CNG , MSJ Johnson, P. Dazeley and V. Reich, 1994.   A higher order ``commuting diagram'' structure that supports very large information system data and process architecture,  International Federation for Information Processing Transactions, vol. A54,  pp211--222, North Holland.

Johnson, M.S.J. and  C.N.G. Dampney, 1993,  On the value of commutative diagrams in information modelling.  in Nivat et al (Eds.) Springer Workshops in Computing,, pp47--60, Springer-Verlag, London.

Johnson, M.S.J.,  R. Rosebrugh and R.J. Wood, 1997.  Entity relationship models and sketches, submitted to  Mathematical Structures in Computer Science, July, 1997, 18pp.

Katis, P., N, Sabadini and R.F.C. Walters, 1997a,b.  Span (Graph): A categorical algebra of transition systems.  pp 307-321; Representing place/transition nets in Span(Graph). pp 322-336.  in Johnson, M. (Ed.), 1997.  Algebraic Methodology and Software Technology.  Springer Lecture Notes in Computer Science 1349, 594pp.

Milner, R., 1980.  A Calculus of Communicating Systems. Lecture Notes in Computer Science 92, Springer-Verlag, 171pp.  ISBN: 3-540-10235-3.

Milner, R., 1990.  Operational algebraic semantics of concurrent processes.   in  J. van Leeuwen, (Ed.), 1990. Handbook of Theoretical Computer Science, Elsevier Science Publishers B.V. Chapter 10. pp 1201- 1242.  ISBN:  0444 88074 7.

Rumbaugh, J., I Jacobson, and G Booch, 1999.  The Unified Modelling Language Reference Manual.  Addison-Wesley, 550pp.  ISBN:  0-201-30998-X.

Wand, Y., and R. Weber, 1995.  On the deep structure of information systems.  Information systems Journal, no5, vol 3, pp203-223.

Weber, R., 1997.  Ontological Foundations of Information Systems.  Coopers & Lybrand Accounting Research Methodology Monograph No 4., ISSN 1321-2605.

COPYRIGHT

C.N.G. Dampney and M.S.J. Johnson (c) 1999. The authors assign to the IS Foundations Workshop held at the Department of Computing, Macquarie University on Wednesday 29 September 1999,  and to educational and non-profit institutions, a non-exclusive licence to use this document for personal use and in courses of instruction provided that the article is used in full and this copyright statement is reproduced.  The authors also grant a non-exclusive licence to the IS Foundations Workshop to publish this document in full in the Workshop Proceedings. Those documents may be published on the World Wide Web, CD-ROM, in printed form, and on mirror sites on the World Wide Web. Any other usage is prohibited without the express permission of the authors.


 


Proceedings of Information Systems Foundations Workshop 1999
Department of Computing, ICS

Macquarie University
CNG (Kit) Dampney - Web Page Updated 7th September 2000