Tincat Group, Inc. - Mewsings, a Software Development and Data Modeling Blog

mewsings, a blog

--dawn

Tuesday, February 21, 2006

Don't Suffer Impedance

A considerable amount of software development consists of mapping and converting data from one format to another, one schema to another. When software is used to bridge the gap between the person and the machine, there is an obvious need for translation. Impedance mismatch refers to the difference between the output of one process and input of another, requiring a transformation to connect the processes. There is a huge impedance mismatch between the thoughts of a person and the 0's and 1's of a computer, for example. (Although there is perhaps less of a mismatch in the case of Nick here than for others.)

Let's narrow the scope. As fascinating as it might be to discuss in a future mewsing, I don't want to start with a person's thoughts here, so let's move to the point where software components collect data from, or present data to, a person. So we will start with the user interface at one end. Without loss of generalization for these purposes, we can narrow further to text-based UI data. We could return to this UML class as a model for an example XHTML (and therefore XML) web page. [Tip: mouse-over acronyms to get the expanded form.]

At the other end, the operating system works with the hardware to handle the translation of data to 0's and 1's. Additionally, let's assume a database product that has at least CRUD services and communicates with the OS. For example, this could be an SQL-DBMS. In summary, we will look at text data at the point of the user interface to and from the interface with a database product. As an example, we will start with an XML page at one end and an SQL-DBMS at the other.

While impedance can be measured in electrical engineering, in software development it is a much more loosely-defined term often used to sell products or claim superiority. Most definitions of impedance mismatch within software development, as used in the phrase OO-RM impedance mismatch, provide information specific to OO and RM, so I'll try my hand at a more generic description. An impedance mismatch occurs when there is enough difference in the data model used for the output of one process and the data model employed for the input of another to require a transformer. This transformer would be analogous to an electrical transformer, with the definition left to the reader.

The number of transformations of any kind relates at least to the size and scope of any given project, but the number of places where there is an impedance mismatch relates to the architecture and product choices for the solution. If there might be such a mismatch wherever we switch data models, and data models are abstractions for programming languages or sublanguages (see The Naked Model for a description of a data model), we can search for them by looking at places where we switch programming languages.

In our example, we could use JavaScript to read and write UI values via the DOM of the XML page. We could pass these data using XML to Java, PHP, Ruby, C, C++, Perl, Python, or even your favorite derivative of Dartmouth Basic, going from data entry on our XHTML page into some middle tier. We could otherwise GET or POST into this middle tier with name=value pairs, but I only mention that so you don't point it out.

If we take our data into an OO structure in the middle tier, there is a change between the UI and the middle tier or within the middle tier that requires a transformation. This XML-OO or Strings-OO transformation is worth a closer look in a later discussion, but permits similar or identical data structures to be used. Each language has the ability to work with XML, for example.

What would it take to minimize the number of impedance mismatches in a particular application?

Then we have a transition between our middle tier and the database by way of SQL. This is well-documented as a place where there is an impedance mismatch. Of course there are many proprietary extensions to SQL, but for most implementations (e.g. SQL-92) three of the differences that will need to be addressed somewhere between the front-end and SQL are 1) NF2 vs. 1NF 2) Lists vs. unordered data and 3) two-valued vs three-valued logic (or nulls as empty sets/strings vs. SQL-style NULLS).

It might be worth noting that the SQL side does not feel the pain. SQL is not a general purpose programming language, and the expectation is typically that the transformer required to address this impedance mismatch will be handled entirely by "the other guy." Whether this has been a cause of resentment in companies that organize with a separate group responsible for development and maintenance of the database aspects of software development is anyone's guess.

There might be good reasons to put up with these mismatches, but what would it take to minimize the number of impedance mismatches in a particular application? As indicated in the ripple delete example, we could use a data model similar to the UI on the back-end. Could we similarly choose to implement the front-end using the RM? What would an RM UI be? I don't mean that the data are stored using the RM, but that the data model for the actual UI form would conform to the RM. If we were to apply the Information Principle to the UI, we would need the entire information content to be represented only as attribute values within tuples within relations. While that is feasible with a data store, that would require no lists or ordered multivalued attributes, for example, which is not a sacrifice that can be made in a user interface.

Unlike other data models, the RM is not sufficient for writing software.

An arbitrary UI, therefore, cannot use the RM for its data model. Given that the RM was developed for the purpose of working with large shared data banks, it is understandable that it might not also be useful as a UI data model. But if we were to decide that life is too short for impedance, we would have to eliminate the RM from the solution. Unlike other data models, the RM is not sufficient for writing software.

← Previous Next →

4 Comments:

At 11:19 PM, February 21, 2006 , Larry Hazel said...: Dawn,

I had to stop in the middle of reading your fine blog to look up CRUD. I now know that your CRUD is not the stuff we scraped off scuzzy Marines with hard bristle brushes, but rather is an SQL term relating to Create, Read, Update and Delete of records. Of course, the same Google search revealed that all blogs were crud, but I discounted that definition.

I think I've got OO figured out. That must be object oriented??, but RM still has me stumped.

By the way, where does EDI fall into your transformation scheme?

Larry
At 4:58 AM, February 22, 2006 , --dawn said...: Hi Larry -- I try to put acronym html tags around acronyms and also occasionally dfn for definitions of terms. Then a mouseover should spell it out. Firefox was kind enough (?) to overlook the fact that I spelled it acronyn in the opening tag since I spelled the closing tag correctly, so it was a mystery to me why it wasn't working in IE. Mystery solved and I fixed that now, so if you point to it, you should get Create, Read, Update, Delete. I didn't think of that as an SQL acronym since I have used it since sometime in the 80's unrelated to SQL, but it might have arisen there.

As for the RM, I do put the acronym tags around it some of the time, but since I refer to the Relational Model frequently, I don't do it every time. I think I'll go through and tag all of those too. I was concerned that people might feel compelled to mouseover anything with the dots (FireFox) or dashes (IE) under them or that it would be distracting in some way. Someone told me they don't see the dashes, however. Gotta love the browser UI, eh?

Thanks for reading and giving feedback. Cheers! -dawn
At 9:53 AM, February 24, 2006 , Greg said...: Dawn,

Excellent series of posts, I'm looking forward to more.

>>>> This XML-OO or Strings-OO transformation is worth a closer look in a later discussion, but permits similar or identical data structures to be used. Each language has the ability to work with XML, for example.

Is the UI to middle tier data model transformation really any different from middle tier to RM transformation? Isn't the 'ability to work with XML' just a data model transformation tool that's built into the language?
At 10:34 AM, February 24, 2006 , --dawn said...: Thanks, Greg, much appreciated. Yes, there is a difference between any UI to mid-tier transformation and that of a mid-tier (e.g. OO language) to RM transformation in that an implementation of the RM is, by definition, not a general purpose programming language. While Relations and related operations can be incorporated into general purpose languages, a key concept with the RM is that Relations with attributes be the only data structure (said loosely).

So, while in the UI to mid-tier interface either side of this can handle handle all or part of any required transformation, with the RM (SQL being the primary implementation-ish), the mid-tier always has to do the required work to communicate with it. Additionally, there is more work to do because the language on the other side (e.g. SQL) is not as full-featured a language.

While I'm here, I'll also comment related to some e-mails I have received on this. Some folks think I am saying that I want the logical data model for the persistence layer to be identical to the logical data model for some UI form and that is not the case. There will almost always be transformations related to a difference in a view and a logical data model for persistence since those are decoupled. Think of the impedance mismatch transformations as related to the more abstract data models of the interfaces. Perhaps it is easier to think about the differences between xml documents, objects, and relations.

Which gets back to your question. Relations restrict data structures, particularly in the area of ordering data. XML goes to the opposite extreme of having everything ordered. It works better to ignore irrelevant ordering when it is present than to manufacture meaningful ordering when none is available.

Cheers! --dawn

Litter Box

Paw through past Mewsings, a blog about software development, with a focus on data modeling.

2005
November
A Modeling Profession

2009
January
New Year, New Blog