mewsings, a blog

Tuesday, February 28, 2006
The LIST of GIRLS
The Relational Model (RM) is neither necessary (see The Naked Model) nor sufficient (see Don't Suffer Impedance). That said, it is useful. There are pros and cons to employing it. In order to be able to compare its usefulness to that of tools based on approaches other than the RM, we need to know what else is out there. This mewsing is not an opinion piece, but includes some things old, some things new, some things borrowed, and some things blue (Big Blue, that is).
GIRLS stands for Generalized Information Retrieval Language & System
While I plan to talk about a variety of possibilities in the future and am particularly curious about the future of XML-DBMS tools, the RM alternative with which I have the most direct experience is the MultiValue or PICK data model. I'll start there with a couple of blog entries.
Codd's papers, including the pdf version of his 1970 paper, are readily available on the web. That is not the case with early papers related to the Nelson-Pick data model. Although my source materials are not-always-easy-to-read copies and my scan, resize, and Adobe skills also leave room for improvement, I spent my time allocated for this mewsing to turn two historical papers into pdfs. I think these papers are available only from this site, but if anyone knows of other sources, please inform me as I would be happy to point to better versions of them. I also provide a link below to Don Nelson's resume, which indicates that he worked under F. George Steele, who invented and developed the Digital Differential Analyzer. After studying under Steele, Nelson developed the GIRLS and GIM-1 specifications at TRW.
What is your preference, GIRLS or SQL?
You might have noticed that GIRLS stands for Generalized Information Retrieval Language & System. Many flavors of Pick have been developed over the years, as indicated in the MultiValue Family Tree poster. Unlike SQL, which has a single name covering many different implementations, GIRLS has had almost as many names as implementations. GIRLS has been named UniQuery, ENGLISH, FRENCH, AQL, ACCESS, Info/Access, jQL, RetrieVe, Vision, RECALL, QMQuery, CMQL, queryON, R/LIST, and INFORM. Current implementations of GIRLS are available from many different vendors, most (all?) of which are listed here, ordered by a complex algorithm.
- IBM U2, UniData & UniVerse
- Ladybridge OpenQM (open source & commercial versions)
- Revelation OpenInsight
- jBASE
- Intersystems Cache' for MultiValue
- Northgate Reality
- MaVerick (Java-based open source)
- Raining Data, successor to Pick Systems
- ONgroup
- EDP UniVision
What is your preference, GIRLS or SQL? There are many differences between them, but I'll save that discussion for next time and just mention a few right now. GIRLS can perform queries with data that is in NF2 (non-1NF), it employs a two-valued logic (no SQL NULLS), and in place of the SELECT of SQL is the LIST of GIRLS.
Continue to next blog →Tuesday, February 21, 2006
Don't Suffer Impedance
A considerable amount of software development consists of mapping and converting data from one format to another, one schema to another. When software is used to bridge the gap between the person and the machine, there is an obvious need for translation. Impedance mismatch refers to the difference between the output of one process and input of another, requiring a transformation to connect the processes. There is an huge impedance mismatch between the thoughts of a person and the 0's and 1's of a computer, for example. (Although there is perhaps less of a mismatch in the case of Nick here than for others.)
Let's narrow the scope. As fascinating as it might be to discuss in a future mewsing, I don't want to start with a person's thoughts here, so let's move to the point where software components collect data from, or present data to, a person. So we will start with the user interface at one end. Without loss of generalization for these purposes, we can narrow further to text-based UI data. We could return to this UML class as a model for an example XHTML (and therefore XML) web page. [Tip: mouse-over acronyms to get the expanded form.]
At the other end, the operating system works with the hardware to handle the translation of data to 0's and 1's. Additionally, let's assume a database product that has at least CRUD services and communicates with the OS. For example, this could be an SQL-DBMS. In summary, we will look at text data at the point of the user interface to and from the interface with a database product. As an example, we will start with an XML page at one end and an SQL-DBMS at the other.
While impedance can be measured in electrical engineering, in software development it is a much more loosely-defined term often used to sell products or claim superiority. Most definitions of impedance mismatch within software development, as used in the phrase OO-RM impedance mismatch, provide information specific to OO and RM, so I'll try my hand at a more generic description. An impedance mismatch occurs when there is enough difference in the data model used for the output of one process and the data model employed for the input of another to require a transformer. This transformer would be analogous to an electrical transformer, with the definition left to the reader.
The number of transformations of any kind relates at least to the size and scope of any given project, but the number of places where there is an impedance mismatch relates to the architecture and product choices for the solution. If there might be such a mismatch wherever we switch data models, and data models are abstractions for programming languages or sublanguages (see The Naked Model for a description of a data model), we can search for them by looking at places where we switch programming languages.
In our example, we could use JavaScript to read and write UI values via the DOM of the XML page. We could pass these data using XML to Java, PHP, Ruby, C, C++, Perl, Python, or even your favorite derivative of Dartmouth Basic, going from data entry on our XHTML page into some middle tier. We could otherwise GET or POST into this middle tier with name=value pairs, but I only mention that so you don't point it out.
If we take our data into an OO structure in the middle tier, there is a change between the UI and the middle tier or within the middle tier that requires a transformation. This XML-OO or Strings-OO transformation is worth a closer look in a later discussion, but permits similar or identical data structures to be used. Each language has the ability to work with XML, for example.
What would it take to minimize the number of impedance mismatches in a particular application?
Then we have a transition between our middle tier and the database by way of SQL. This is well-documented as a place where there is an impedance mismatch. Of course there are many proprietary extensions to SQL, but for most implementations (e.g. SQL-92) three of the differences that will need to be addressed somewhere between the front-end and SQL are 1) NF2 vs. 1NF 2) Lists vs. unordered data and 3) two-valued vs three-valued logic (or nulls as empty sets/strings vs. SQL-style NULLS).
It might be worth noting that the SQL side does not feel the pain. SQL is not a general purpose programming language, and the expectation is typically that the transformer required to address this impedance mismatch will be handled entirely by "the other guy." Whether this has been a cause of resentment in companies that organize with a separate group responsible for development and maintenance of the database aspects of software development is anyone's guess.
There might be good reasons to put up with these mismatches, but what would it take to minimize the number of impedance mismatches in a particular application? As indicated in the ripple delete example, we could use a data model similar to the UI on the back-end. Could we similarly choose to implement the front-end using the RM? What would an RM UI be? I don't mean that the data are stored using the RM, but that the data model for the actual UI form would conform to the RM. If we were to apply the Information Principle to the UI, we would need the entire information content to be represented only as attribute values within tuples within relations. While that is feasible with a data store, that would require no lists or ordered multivalued attributes, for example, which is not a sacrifice that can be made in a user interface.
Unlike other data models, the RM is not sufficient for writing software.
An arbitrary UI, therefore, cannot use the RM for its data model. Given that the RM was developed for the purpose of working with large shared data banks, it is understandable that it might not also be useful as a UI data model. But if we were to decide that life is too short for impedance, we would have to eliminate the RM from the solution. Unlike other data models, the RM is not sufficient for writing software.
Continue to next blog →Tuesday, February 07, 2006
The Model Behind the interFace
An interface is the face that computer software shows to a person, other software, or possibly hardware devices. While data models are often discussed related to databases and storing data, this mewsing is about data models behind software interfaces in general and user interfaces in particular.
Let's take an example of a browser-based UI page with three text fields, one of which requires an integer value; one single selection drop-down; two multi-selection drop-downs; one text area; one radio button; and one date entry via a free-form text field. Using all the creativity I can muster right now, I'll name them as indicated in the UML class shown here.
Developing software is a process of modeling data and behavior. One set of data we can model is that which will be entered by the user. This single page of data could be backed by a view/schema modeled with this single UML box. We could use XML or JSON, for example, within the software to define and work with this view of data.
Similarly if not working with a UI but a data exchange interface, such as one using web services, we could use this same data model. This could be the model for a single record of data. For this example I'll include some sample values. I'll use an xml-ish format (because I wish XML had arrays like this) to model this view. [Note: I'll start the array index at 1, but I'm noting that I'm doing that just to retain my credentials in the real-programmers-start-counting-at-zero world.]
<MyExchange>
<text1>elephant</text1>
<text2>ears</text2>
<text3>2</text3>
<singleSelect>mouse</singleSelect>
<multiSelect1>
<multiSelect1[1]>grey</multiSelect1[1]>
<multiSelect1[2]>pink</multiSelect1[2]>
<multiSelect1[3]>ivory</multiSelect1[3]>
</multiSelect1>
<multiSelect2 />
<textArea>These are the times that try men's souls
</textArea>
<radioButton>Africa</radioButton>
<dateText>01JAN06</dateText>
</MyExchange>
An arbitrary web page cannot have an SQL view as a data model.
An arbitrary web page cannot have an SQL view as a data model. While views need not be in 3rd or 5th normal form or BCNF, you cannot define an SQL view that is not in 1NF. Using my favorite definition of an SQL view being a stored query, we see that while we can get a lot of different result sets in an SQL query, we cannot get a single web page of data if said data includes lists. Lists or arrays are very common in user interfaces as well as throughout the rest of software development. SQL-DBMS advocates have been known to say things like "You can use reporting tools to represent the view in whatever form you like--that is a representation issue". You might recall from a previous blog that the RM is all about representation, however.
The inability to get a view that is not normalized is a failure of SQL-DBMS tools, while the current state of the RM has made accommodations by redefining 1NF. I suspect I'll bring that up in a future blog, but for now I'll just make the point that even with some new variations on the RM that permit relation-valued attributes, ordered lists are still not included in the model.
Now that we have our UI or web services interface modeled, what might we want to do with data that are hosted by this model? We might want to select, project, join… basically we might want to do anything we otherwise do with data. These data need not come from a disk, they could come from a web page or pages, a web service or other interface, or a process that generates data and stores it in memory, for example.
Are there any of these statements with which you disagree?
- Data modeling is required for all interfaces and, therefore, throughout the process of software development.
- When data values are provided in data models related to a UI or any other interface, there might be a requirement to do any type of manipulation of or queries against this data.
- When working with a UI data model, it is not possible to work exclusively with normalized data.
Therefore, it is not just important, but necessary, to have models of data other than the RM.
Therefore, it is not just important, but necessary, to have models of data other than the RM. Whatever the other data model, it has the same requirements for manipulating and querying the data as data models that are specific to DBMS tools. Data in these models must be projected, inspected, dejected, neglected, and selected (apologies to Arlo Guthrie and Alice's Restaurant).
Even if we decide to make changes to whatever data model we use for the UI when we work with large shared data banks, we cannot make the RM the data model across the board in software development. We must have ordered lists, for example. Before we turn our attention to the face of the database, I want to be sure you are with me on this point. The User simply requires a more full-featured model behind their Interface.
Continue to next blog →

