Tincat Group, Inc. - Mewsings, a Software Development and Data Modeling Blog

<<software development>>

mewsings, a blog

--dawn

Sunday, February 19, 2006

Don't Suffer Impedance

photo by Judi Hengeveld

A considerable amount of software development consists of mapping and converting data from one format to another, one schema to another. When software is used to bridge the gap between the person and the machine, there is an obvious need for translation. Impedance mismatch refers to the difference between the output of one process and input of another, requiring a transformation to connect the processes. There is an huge impedance mismatch between the thoughts of a person and the 0's and 1's of a computer, for example. (Although there is perhaps less of a mismatch in the case of Nick here than for others.)

Let's narrow the scope. As fascinating as it might be to discuss in a future mewsing, I don't want to start with a person's thoughts here, so let's move to the point where software components collect data from, or present data to, a person. So we will start with the user interface at one end. Without loss of generalization for these purposes, we can narrow further to text-based UI data. We could return to this UML class as a model for an example XHTML (and therefore XML) web page. [Tip: mouse-over acronyms to get the expanded form.]

At the other end, the operating system works with the hardware to handle the translation of data to 0's and 1's. Additionally, let's assume a database product that has at least CRUD services and communicates with the OS. For example, this could be an SQL-DBMS. In summary, we will look at text data at the point of the user interface to and from the interface with a database product. As an example, we will start with an XML page at one end and an SQL-DBMS at the other.

While impedance can be measured in electrical engineering, in software development it is a much more loosely-defined term often used to sell products or claim superiority. Most definitions of impedance mismatch within software development, as used in the phrase OO-RM impedance mismatch, provide information specific to OO and RM, so I'll try my hand at a more generic description. An impedance mismatch occurs when there is enough difference in the data model used for the output of one process and the data model employed for the input of another to require a transformer. This transformer would be analogous to an electrical transformer, with the definition left to the reader.

The number of transformations of any kind relates at least to the size and scope of any given project, but the number of places where there is an impedance mismatch relates to the architecture and product choices for the solution. If there might be such a mismatch wherever we switch data models, and data models are abstractions for programming languages or sublanguages (see The Naked Model for a description of a data model), we can search for them by looking at places where we switch programming languages.

In our example, we could use JavaScript to read and write UI values via the DOM of the XML page. We could pass these data using XML to Java, PHP, Ruby, C, C++, Perl, Python, or even your favorite derivative of Dartmouth Basic going from data entry on our XHTML page into some middle tier. We could otherwise GET or POST into this middle tier with name=value pairs, but I only mention that so you don't point it out. If we take our data into an OO structure in the middle tier, there is a change between the UI and the middle tier or within the middle tier that requires a transformation. This XML-OO or Strings-OO transformation is worth a closer look in a later discussion, but permits similar or identical data structures to be used. Each language has the ability to work with XML, for example.

What would it take to minimize the number of impedance mismatches in a particular application?

Then we have a transition between our middle tier and the database by way of SQL. This is well-documented as a place where there is an impedance mismatch. Of course there are many proprietary extensions to SQL, but for most implementations (e.g. SQL-92) three of the differences that will need to be addressed somewhere between the front-end and SQL are 1) NF2 vs. 1NF 2) Lists vs. unordered data and 3) two-valued vs three-valued logic (or nulls as empty sets/strings vs. SQL-style NULLS).

It might be worth noting that the SQL side does not feel the pain. SQL is not a general purpose programming language, and the expectation is typically that the transformer required to address this impedance mismatch will be handled entirely by "the other guy." Whether this has been a cause of resentment in companies that organize with a separate group responsible for development and maintenance of the database aspects of software development is anyone's guess.

There might be good reasons to put up with these mismatches, but what would it take to minimize the number of impedance mismatches in a particular application? As indicated in the ripple delete example, we could use a data model similar to the UI on the back-end. Could we similarly choose to implement the front-end using the RM? What would an RM UI be? I don't mean that the data are stored using the RM, but that the data model for the actual UI form would conform to the RM. If we were to apply the Information Principle to the UI, we would need the entire information content to be represented only as attribute values within tuples within relations. While that is feasible with a data store, that would require no lists or ordered multivalued attributes, for example, which is not a sacrifice that can be made in a user interface.

Unlike other data models, the RM is not sufficient for writing software.

An arbitrary UI, therefore, cannot use the RM for its data model. Given that the RM was developed for the purpose of working with large shared data banks, it is understandable that it might not also be useful as a UI data model. But if we were to decide that life is too short for impedance, we would have to eliminate the RM from the solution. Unlike other data models, the RM is not sufficient for writing software.

See comments.

0 Comments:

Litter Box

Musings about software development, with a focus on data modeling.

Some of this is worth pawing through.