Tincat Group, Inc. - Mewsings, a Software Development and Data Modeling Blog

mewsings, a blog

--dawn

Tuesday, January 24, 2006

The Naked Model

Strip the term relational from relational model and you have an unadorned model. So as not to confuse this with other possible meanings, we should be more precise. This model is typically termed a data model. A data model is employed in the design, construction, and maintenance of computer software systems.

The goal of this article is to get us to a common understanding of the term data model while also giving more indication of where these mewsings are headed. Before zeroing in on the meaning of data model, let's look at some similar terms used in software development that are NOT the same. For example, is this data model minus the relational adjective a...

...Conceptual Data Model (CDM)? Nope.
The CDM results from analyzing an area to be automated, capturing requirements, and communicating these between those who know the subject areas and those who will develop a software system. While the CDM can be back-of-a-napkin informal, there are many techniques for adding rigor, including the use of Entity-Relationship or UML Class Diagrams.

...Logical Data Model (LDM)? Nope.
This is the one that concerns me. Please don't confuse the naked data model with the logical data model, OK? When talking about a particular system, an LDM might be called the data model by some. However, the LDM is different from the term data model being discussed in this blog, so when I write data model sans adjectives, I am not referring to an LDM. The LDM results from structuring a specific CDM and communicating that structure to the computer.

...Physical Data Model (PDM)? Nope.
Only those writing the low-level database software need to know anything about the physical model, in theory (knowing grin goes here). Pretty much the only time you will hear me talk about the physical data model is if I am saying that I am not talking about the physical data model.

Each of these three possible glossary entries is related to a particular problem space being modeled for incorporation in a computer system. The data model we are talking about is more abstract. Data models such as the RM have implications for all LDMs.

Now that we know what our data model is not, let's turn our attention to what it is. The Relational Model (RM), introduced in an earlier blog, is a sweet, tight, mathematical model based on set theory and predicate logic. While you might have a hint that I'm putting the RM on trial over the course of these mewsings, I really do appreciate predicate logic and adore set theory. I applaud the cleverness in modeling data with both set theory and predicate logic. It can be quite helpful. For example, if we organize data and prepare query languages aligned with first order predicate logic, we can prove that our queries will return accurate results with respect to the data, in a finite amount of time. Also, if we choose a mathematically simplified data model, we can implement a mathematically simplified query language.

In addition to appreciating mathematics, I also like religion. But I hope to debunk some of the RM religion that has come along with the application of these mathematics to data. The current use of the RM has been pervasive-enough in the industry that it will take me some time to lay out a case. If all goes well, I plan to have closing arguments sometime before the end of 2006. I will also admit that while I think I have a good case, I don't have it all formed into words in my head just waiting to hit paper. Writing in blog-sized units should help me refine and crystalize my thinking. I hope that you, the jury, enjoy taking the journey through the evidence with me.

I would like to enter into evidence the Information Principle as Exhibit A. I will use a quotation from C. J. Date who is quoting E. F. (Ted) Codd. Both of these men have been at the center of relational data modeling.

Exhibit A: The Information Principle

"The Information Principle (which I heard Ted refer to on occasion as the fundamental principle underlying the relational model) [is]...

The entire information content of a relational database is represented in one and only one way: namely, as attribute values within tuples within relations." (Date, Edgar F. Codd, A Tribute, www.sigmod.org/codd-tribute.html)

A data model is related to the representation of data

Tuck this point away: a data model is related to the representation of data. Now let's move on to a definition of a generic data model, using Date to rephrase Codd.

Codd defines a data model in a 1980 paper Data models in database management. By his definition a data model consists of a collection of data structure types, operators that can be applied to instances of these types and consistency rules that define valid states for the data.

Objects, operators, and, effectively, rules for assignment…Hmmm… If we were to implement a data model what would we have? Let's take a look at a recent definition of data model from Date.

A data model is an abstract, self-contained, logical definition of the objects, operators, and so forth, that together constitute the abstract machine with which users interact. The objects allow us to model the structure of data. The operators allow us to model its behavior. (C. J. Date, An Introduction to Database Systems, Addison Wesley, 8e, 2003, p 15-16)

The implementation of a data model is a programming language

I conclude from this that the implementation of a data model is a programming language, whether a general purpose programming language or not. Also, each programming language provides an implementation of a data model or perhaps more than one. Put another way, a data model is an abstraction of a programming language or programming sublanguage.

Now that we have some clarification of the term data model, I will make a claim that is likely agreeable to readers as I have never heard anyone argue otherwise. The RM is not necessary. It is not necessary for developing software solutions, maintaining large shared databases, or any other purpose in the world of software development. Any software solutions that can be developed while employing the RM could be written without it, using other data models. I will follow this up in a future blog by showing that the RM is not sufficient for developing and maintaining data-based software. Once we are all on the same page that the RM is neither necessary nor sufficient, we can look at what the purpose of the RM is and discuss its comparative usefulness.

My beef with the RM is related both to normalization theory as taught in colleges and universities, discussed in the Is Codd Dead? blog and to the way the RM, or parts thereof, are used in the practice of software development and maintenance today. It shapes the thinking of software developers in ways that are often not the most effective.

The RM is not necessary

And, by the way, if you are thinking that the RM need not be obvious in a developer's programming language but could be hidden behind the scenes, then my work is done. That would mean that no computer language would need to use the Information Principle, and neither you nor I would need to use the RM as a data model. We can use any programming language that does not represent itself as an implementation of the RM to employ an alternative data model. Did I mention that the RM is not necessary?

← Previous Next →

17 Comments:

At 12:51 PM, January 25, 2006 , Hugh Darwen said...: I wish Codd hadn't called his theory a "model of data". At my university I don't teach "the relational model of data". I teach "relational database theory" instead. (And by contrast somebody else teaches SQL on the same course.)

Note that Codd didn't call his theory a "data model", though. There's a subtle difference!

As for the CDM, I prefer the term "conceptual schema", if only to avoid using the confusing term "model". Because a conceptual schema is a model of an enterprise, I think it would be more accurate to call it an enterprise model. It's not a model of data, nor is it even constructed of data.

I have not previously come across the term "data model" for what I have learned to call the logical schema and the physical schema (a.k.a., better, storage schema).

I hate the way normalization theory has been taught in universites and colleges, too. Codd pretty well admitted he had made mistakes with 2NF and 3NF and he put them right with BCNF. As BCNF is so much easier to understand than 2NF and 3NF, why do we continue to bother with 2NF and 3NF at all? Similarly, 4NF was superseded by 5NF when 4NF was discovered to be inadequate, and 5NF is easier to understand, so why bother with 4NF?

Personally, I think it's better to start with a 6NF design (i.e., maximum decomposition) and then consider opportunities for "denormalizing" (preferably not to below 5NF, though). I expect you would hate that even more.

You won't be surprised to hear that I think your suggestion that "no computer programming language would need to use the Information Principle" is nonsense. I can't bring myself to respond otrherwise to that here. My views on the advantages of relational databases (not that many people have ever seen one yet, thanks to SQL!) are well known, via my various collaborations with Chris Date.
At 1:27 PM, January 25, 2006 , --dawn said...: Mr. Darwen, I am honored to have you reading my mewsings. I have certainly read many of your writings. I do want to be precise in my terminology and accurate in my statements, even if they are peppered with my own perspective (hard to have it otherwise).

I do like the term data model, although (or because) it is very broad but understand why you might like "theory." They are different concepts, of course. A data model need not be backed by any particular mathematical theory just as language might not be backed by any particular mathematical theory and, yet, is used to model facts.

I am fine with calling a conceptual model a "conceptual schema." However, I prefer "conceptual data model" or "conceptual model" because the term "model" suits most end-users better than the term "schema" and this device is used with both IT professionals and subject matter experts or end-users.

If you google for "data-model -logical -conceptual" you find that there are many sites of specific logical models for organizations refering to their logical data model as simply their "data model." That is why I wanted to be clear that was not how I was using the term.

I agree on BCNF (sans 1NF in my world) and admit that I always have to look up normal forms beyond that to recall when they come into play. I think it is a shame that the work with functional dependencies has been offered with the statement that the data must first be in 1NF.

I'm not certain what you think to be nonsense. I was guessing that some folks might think that the RM could be hidden from view of the developer so that the developer could perform an insert in an ordered list, for example, while behind the scenes the "proper" relational activity was taking place. I was indicating that if that were the case, then the representation of the data structures would not be exclusively through relations and I could close my case. My interest is in the interaction between IT professionals and their tools in developing and maintaining software systems, particularly data-based software.

Thanks again for reading and responding. I admire your work, as well as that of Chris Date and even that of Fabian Pascal ;-) Cheers! --dawn
At 4:17 PM, January 25, 2006 , --dawn said...: I just realized I had wanted to reply to Hugh's point about possibly calling the conceptual model an enterprise model. I do not like that term when doing project work. If talking about an existing model or schema, I'm OK with using the adjective "enterprise." However, if a project is underway, using such terms can lead to huge scope overrun (and time and budget overrun). When a team decides to come up with a schema for the entire enterprise rather than working to nail down the project scope to something smaller than "everything" the result is typically not pretty. --dawn
At 2:32 AM, January 26, 2006 , x@c.d.t. said...: Dawn: A data model need not be backed by any particular mathematical theory

Darwen: I wish Codd hadn't called his theory a "model of data"

I view the "relational model of data" as a way of putting data into formulaes.

Dawn: Codd defines a data model in a 1980 paper Data models in database management. By his definition a data model consists of a collection of data structure types, operators that can be applied to instances of these types and consistency rules that define valid states for the data.

What is this if not mathematics ?

How can one automate something with computers without involving mathematics ?

Mathematically, relations are just sets. Can you imagine Mathematics without sets ?

Dawn: the RM is neither necessary nor sufficient
At 8:12 AM, January 26, 2006 , --dawn said...: Well, x, I'll take your point that there is necessarily mathematics involved in data model and implementation thereof. It need not be so neat and tidy as the RM, and, more importantly, might not be modeled as mathematics prior to implementation.

For example, those developing the PICK. MUMPS, or XML environments did not start with a mathematical model and implement it. Those implementing SQL started with the mathematics, even if falling a bit short on the implementation. Some might say that SQL is not backed by any particular mathematical theory because it is not a pure implementation of the RM. Most would say that XML is not backed by any particular mathematical theory, even if it can be modeled with mathematics after the fact.

But, yes, take the type of set that is a relation and the type of relation that is a function and you can model input, processing, and output.
At 9:11 AM, January 26, 2006 , x@c.d.t. said...: For example, those developing the PICK. MUMPS, or XML environments did not start with a mathematical model and implement it.

Maybe they started with one without being aware :-)

The question is how could one put data into a computer and work with it without using the kind of set that is a relation and the kind of operations that are selection, projection, etc. and if the alternative is better.
At 9:30 AM, January 26, 2006 , --dawn said...: OK, I'll be more philosophical and rephrase. Those developing PICK, MUMPS, and XML were unaware of starting with a mathematical model just as I often fail to tune into such when eating lunch. The beauty and order of creation are there none-the-less.

I agree wtih the second statement except that I don't see this as "the question." I abstract to functions and composition of functions everywhere, including selection, projection, and ripple deletes. Input-processing-output can be seen as the high level function. Functions, of course, are relations (by mathematical definition, not necessarily by everyone's definition within the computer world).

So, x, what is this alternative of which you speak?
At 9:48 AM, January 26, 2006 , JOG said...: Arguments about nomenclature and semantics. There are never winners here. I personally loathe the term data model as it is pretty vapid - Data is nothing more than a one or a zero, a groove in a record, a bead moved down an abacus. How on earth can such an abstract nugget of nothingness ever be "modelled"? The term is meaningless; that is, meaningless over and above that which we decide this newly established abstract noun "datamodel" to mean.

My thought process would be as follows: to any observer data is meaningless without interpretation, and it is only the combination of the two which generates information, which in turn is useful to us. RM, as an example, provides both structure and mechanism to facilitate both data storage and consequential interpretation. As such imo, what we are talking about here is an "Information model". (Again this is a tricky area to broach as the term 'information' has been hijacked by more disciplines, ranging from psychology to communication theory, than any other term I can think of. However to me, it remains correct in this application of terminology.)

I'd also address the points:

>> What is this if not mathematics ?
RM appears to me very much mathematics. But only in the sense that Codd's relational model, while ultimately having little to do with mathematical relations, re-visioned the relation concept (or relationship concept - as dawn has previously pointed - which Codd originally intended to rename it when he integrated attribute headers into the system) with mathematical rigour.

>> Mathematically, relations are just sets.
Mathematically database relations are not "relations" but sets of finite partial maps.

>> Can you imagine Mathematics without sets ?
Well, yes - many many people imagine Mathematics without sets and are proponents therein (even though I am not currently one of them). The whole mathematical field of Mereology does just that, where the concept of a set is replaced with the concept of a fusion. See: http://en.wikipedia.org/wiki/Mereology.

All best, Jim.
At 10:13 AM, January 26, 2006 , x@c.d.t. said...: So, x, what is this alternative of which you speak?

I prefer a servile telepathic computer manipulating data as pictures. But that's just me.

The alternative to relational model, of course.
At 10:16 AM, January 26, 2006 , --dawn said...: Hi Jim -- I had seen that term "Mereology" before, but didn't know what it was, so I followed your link. Interesting. I love that it appeals to that fine mathematical, ah hem, principle of Occam's razor, something the RM does as well.

I have been known to say that database relations with at least one candidate key (which might be redundant, depending on how a database relation is defined) are all functions (mapping key to tuple). If you look at it as mapping from domains to values or set of domains to tuples, then data can be modeled, as you suggest, with partial functions.

Using "functions" rather than relations as a basis for our modeling has the advantage of helping the perception of data and process as two sides of the same coin.

While it could be an academic exercise to view mathematics without sets, I see no reason to exclude the set metaphor from our bag of mathematical models. I also see using relations as the sole metaphor for data, excluding trees, di-graphs, partial maps, etc as an academic exercise.

You might notice that other than a dfn tag in the Is Codd Dead? blog, I have so far resisted defining data. If we call the 0's and 1's data, then they are of little interest in the discussion of data modeling. I tend to be with Date and others defining data and information to be the same for our purposes. It would be meaningless to model anything that has no meaning.

Cheers! --dawn
At 10:20 AM, January 26, 2006 , --dawn said...: Being a very visual and telepathic person, I like your alternative, x. Yes, I knew it was an alternative to the RM, but was wondering what you think that might be so as to get my homework done for future blogs. smiles --dawn
At 5:49 PM, January 27, 2006 , Mike Preece said...: Hi Dawn

Congratulations on the fine job you're doing here.

With reference to something you said in response to a previous comment...

"If we call the 0's and 1's data, then they are of little interest in the discussion of data modeling. I tend to be with Date and others defining data and information to be the same for our purposes. It would be meaningless to model anything that has no meaning.
"

...I would like to see an exploration of the word "relevance" in "relation" to data models. What I mean is - is it correct/ideal to treat all related data the same way, disregarding relevance? Do you see where I'm going with this?

Mike.
At 10:45 AM, February 08, 2006 , Anonymous said...: I just want to see the naked model picture ... DMM
At 11:17 AM, February 08, 2006 , --dawn said...: You might think it would be wise for me to ignore this DMM character, but I happen to know someone with those initials who would be inclined to say hello this way. So, if you happen to be the person who in the 1980's once said something like "I'm the best systems programmer on the planet, but yer the best d*mn application programmer I ever met" then welcome to my blog, you old geezer. If this is some other DMM, well then go away ;-) Cheers! --dawn
At 12:16 AM, January 24, 2007 , richard said...: Dawn.
You said "We can use any programming language that does not represent itself as an implementation of the RM to employ an alternative data model". You quoted from C. Date to suggest that he would would support this idea.

I agree with your reasoning.

Richard
At 5:03 AM, January 24, 2007 , --dawn said...: Thanks for your comment, Richard. My reasoning has oft been called into question, so a note like this is much appreciated. Thanks! --dawn
At 6:10 PM, July 30, 2007 , Chris Travers said...: Hi;

This was an interesting and thought-provoking article. The basic premise seems to be that the relational model is simply one possibility of modelling information within an application and I would agree with this. I would further agree that programming languages provide all the tools necessary to create data models. Certainly the proponents of Model-view-controller frameworks would agree as well.

However, as much as the relational model is not necessary, it makes everyone's life a heck of a lot easier. In short the RM is not necessary in the same sense that steel, concrete, and wood are not necessary to build a house-- you can build a structure without using any of these but they make your life a whole lot easier.

In short the relational model allows application evelopers to move certain cross-cutting concerns out of the program flow and handle them using a declarative rather than a procedural approach. In this way it is similar to but almost entirely different from aspect-oriented programming.

The relational model is not strictly necessary, but it is the best approach we have to handle large amounts of complex information such that the information is always internally consistant, valid, and as complete as necessary.

So while I agree with your reasoning as far as you take it, I am not entirely sure I would take it as implementation advice ;-)

Best Wishes,
Chris Travers

Litter Box

Paw through past Mewsings, a blog about software development, with a focus on data modeling.

2005
November
A Modeling Profession

2009
January
New Year, New Blog