mewsings, a blog
Tuesday, January 24, 2006
The Naked Model
Strip the term relational from relational model and you have an unadorned model. So as not to confuse this with other possible meanings, we should be more precise. This model is typically termed a data model. A data model is employed in the design, construction, and maintenance of computer software systems.
The goal of this article is to get us to a common understanding of the term data model while also giving more indication of where these mewsings are headed. Before zeroing in on the meaning of data model, let's look at some similar terms used in software development that are NOT the same. For example, is this data model minus the relational adjective a...
...Conceptual Data Model (CDM)? Nope.
The CDM results from analyzing an area to be automated, capturing requirements, and communicating these between those who know the subject areas and those who will develop a software system. While the CDM can be back-of-a-napkin informal, there are many techniques for adding rigor, including the use of Entity-Relationship or UML Class Diagrams.
...Logical Data Model (LDM)? Nope.
This is the one that concerns me. Please don't confuse the naked data model with the logical data model, OK? When talking about a particular system, an LDM might be called the data model by some. However, the LDM is different from the term data model being discussed in this blog, so when I write data model sans adjectives, I am not referring to an LDM. The LDM results from structuring a specific CDM and communicating that structure to the computer.
...Physical Data Model (PDM)? Nope.
Only those writing the low-level database software need to know anything about the physical model, in theory (knowing grin goes here). Pretty much the only time you will hear me talk about the physical data model is if I am saying that I am not talking about the physical data model.
Each of these three possible glossary entries is related to a particular problem space being modeled for incorporation in a computer system. The data model we are talking about is more abstract. Data models such as the RM have implications for all LDMs.
Now that we know what our data model is not, let's turn our attention to what it is. The Relational Model (RM), introduced in an earlier blog, is a sweet, tight, mathematical model based on set theory and predicate logic. While you might have a hint that I'm putting the RM on trial over the course of these mewsings, I really do appreciate predicate logic and adore set theory. I applaud the cleverness in modeling data with both set theory and predicate logic. It can be quite helpful. For example, if we organize data and prepare query languages aligned with first order predicate logic, we can prove that our queries will return accurate results with respect to the data, in a finite amount of time. Also, if we choose a mathematically simplified data model, we can implement a mathematically simplified query language.
In addition to appreciating mathematics, I also like religion. But I hope to debunk some of the RM religion that has come along with the application of these mathematics to data. The current use of the RM has been pervasive-enough in the industry that it will take me some time to lay out a case. If all goes well, I plan to have closing arguments sometime before the end of 2006. I will also admit that while I think I have a good case, I don't have it all formed into words in my head just waiting to hit paper. Writing in blog-sized units should help me refine and crystalize my thinking. I hope that you, the jury, enjoy taking the journey through the evidence with me.
I would like to enter into evidence the Information Principle as Exhibit A. I will use a quotation from C. J. Date who is quoting E. F. (Ted) Codd. Both of these men have been at the center of relational data modeling.
Exhibit A: The Information Principle
"The Information Principle (which I heard Ted refer to on occasion as the fundamental principle underlying the relational model) [is]...
The entire information content of a relational database is represented in one and only one way: namely, as attribute values within tuples within relations." (Date, Edgar F. Codd, A Tribute, www.sigmod.org/codd-tribute.html)
A data model is related to the representation of data
Tuck this point away: a data model is related to the representation of data. Now let's move on to a definition of a generic data model, using Date to rephrase Codd.
Codd defines a data model in a 1980 paper Data models in database management. By his definition a data model consists of a collection of data structure types, operators that can be applied to instances of these types and consistency rules that define valid states for the data.
Objects, operators, and, effectively, rules for assignment…Hmmm… If we were to implement a data model what would we have? Let's take a look at a recent definition of data model from Date.
A data model is an abstract, self-contained, logical definition of the objects, operators, and so forth, that together constitute the abstract machine with which users interact. The objects allow us to model the structure of data. The operators allow us to model its behavior. (C. J. Date, An Introduction to Database Systems, Addison Wesley, 8e, 2003, p 15-16)
The implementation of a data model is a programming language
I conclude from this that the implementation of a data model is a programming language, whether a general purpose programming language or not. Also, each programming language provides an implementation of a data model or perhaps more than one. Put another way, a data model is an abstraction of a programming language or programming sublanguage.
Now that we have some clarification of the term data model, I will make a claim that is likely agreeable to readers as I have never heard anyone argue otherwise. The RM is not necessary. It is not necessary for developing software solutions, maintaining large shared databases, or any other purpose in the world of software development. Any software solutions that can be developed while employing the RM could be written without it, using other data models. I will follow this up in a future blog by showing that the RM is not sufficient for developing and maintaining data-based software. Once we are all on the same page that the RM is neither necessary nor sufficient, we can look at what the purpose of the RM is and discuss its comparative usefulness.
My beef with the RM is related both to normalization theory as taught in colleges and universities, discussed in the Is Codd Dead? blog and to the way the RM, or parts thereof, are used in the practice of software development and maintenance today. It shapes the thinking of software developers in ways that are often not the most effective.
The RM is not necessary
And, by the way, if you are thinking that the RM need not be obvious in a developer's programming language but could be hidden behind the scenes, then my work is done. That would mean that no computer language would need to use the Information Principle, and neither you nor I would need to use the RM as a data model. We can use any programming language that does not represent itself as an implementation of the RM to employ an alternative data model. Did I mention that the RM is not necessary?