mewsings, a blog


← Previous   Next →

Sunday, March 12, 2006

In Every Job That Must be Done There is a Data Element of Fun

Data Carpet Bag
In every job that must be done there is an element of fun. You find the fun and snap, the job's a game. -Mary Poppins

While SQL has many advantages over the languages stemming from GIRLS, the reverse is also true as the latter is quite charming. I will continue my exploration of some of the advantages of the GIRLS family of query languages. I am not proposing that the entire industry adopt GIRLS, but there are things we can learn from this language and related data model.

There are fewer files in an MV-based application than tables in a similar SQL-DBMS solution. This makes sense given that MV solutions are not in 1NF, so properties with multiple values need not be split out into separate tables. With more nouns modeled as properties of entities, more files than tables are related to primary entities of people, places, things, or events. There are fewer nouns modeled as weak entities in an MV solution.

Users of the GIRLS-related query language ask questions about the data by way of exactly one file at a time, somewhat analogous to using a single SQL view in a query. That doesn't mean they can only report against base (stored) data identified in that file dictionary, but that they are viewing the data in the context of a single entity at a time. Just as a secretary in the 1950's picked a file cabinet in which to find information, the user of the MV query language chooses a single file for each question they ask.

Each MV file, or entity, provides one lens through which a user looks at the world of data. This might seem very restrictive, but just like Mary Poppins hauling a large coat rack out of her small carpet bag, data of all sorts can come from each such file view of the data. Through the eyes of a file dictionary, a user can query data stored under that name, or data derived from any of the data in the system, including aggregated data.

Each dictionary is like a Land's End catalog section related to one entity. The user can shop from any one of the available catalog sections in a system. Did you want to be in the STUDENTS section, FACULTY, CLASS_ROSTERS, or ACCOUNTS_RECEIVABLE section...? Once you are there, you can pick the data you want from the list of data elements on-hand. Some data are cross-listed in multiple catalog sections.

A file dictionary is a logical view of data. We could see not only a student ID, name, and class level, but a computed GPA, all majors, and the name of each of the students' advisors through a single dictionary. Anything you can think to ask about a student, you could ask the STUDENTS entity.


As mentioned in the previous mewsing, this listing will show one result (not necessarily one line) per student. The majors and advisor names will show on multiple lines when there are multiple entries for these.

Pickies have been doing OLAP directly against their OLTP data since the 70's.

It is likely worth a full exploration in a future mewsing, but I'll just make a quick note that because of this approach to the data, metadata, and the related query language, Pickies have been doing OLAP directly against their OLTP data since the '70's, without a need to reshape it first. This is very cool. Of course they also create data marts for some analysis when appropriate, such as when data need to be frozen. A file with a compound key (called a multi-part key in MV) is automatically a virtual fact table from which you can retrieve or derive any data you would want from the dimensions, forming a virtual fact-dimension or cube perspective of the data.

One downside for this approach is that the carpet bags are not pre-packed with everything. You must add to the vocabularly for the queries (the schema) by defining virtual fields for anything that is not defined as a stored data element under this file name. So the file dictionaries become very long catalogs of base or stored data and definitions for derived data. To help develop complex virtual elements, which I'll refer to as taking your medicine, the spoonful of sugar is that you may refer to subroutines written in Pick/BASIC, a very data-savvy general purpose programming language. So, the sky's the limit, but typically each virtual element dictionary item must be constructed either in the dictionary or inline in a query. Unlike SQL, GIRLS is not powerful enough to write every query strictly from the base schema for stored data.

Adding derived data elements is quite gratifying in a big-bang- for-the-buck way

Developing virtual data elements in Pick has the feeling of enlarging the catalog of offerings in the logical view for the user querying the data. Developing views in a SQL-DBMS has the feeling of shrinking the view of the data for users. With the former, the dictionaries, or logical views, can get junked up with lots of unnecessary vocabulary. Developers talk about the need to clean the dictionaries as one might suggest cleaning an office. In spite of that, there is a sense among both end-users and developers that this is our junk and, therefore, within our control. The environment is very empowering for users. With SQL, your views can get brittle with difficulty adding attributes and retaining performance and existing reports on the view (for example, when you add a property with multiple values resulting in more rows in the changed view). There are variations on virtual field definition in several other DBMS's, with Microsoft SQL Server likely having the best such implementation among the SQL-DBMS's.

I can't quite put my finger on it, but compared to coding SQL views, adding derived data elements is quite gratifying in a big-bang-for-the-buck way. It is often a big point of collaboration between developers and end-users. It's very plain to see how simple it is to think in terms of querying your enitities individually. No matter what new requirements end-users have for querying the data, you can add virtual data elements into the carpet bag for an entity without performance penalties for adding elements nor the same performance penalities often incurred with an SQL JOIN. This makes the possibilities for queries about that entity even more magic. "In every job that must be done there is an element of fun."

← Previous   Next →


At 9:59 AM, March 20, 2006 , Anonymous Dave Bucklin said...

I always get a kick out of relating the story of Dick Pick's Generalized Information Retrieval Language System to other MV programmers. There are, indeed, some of us who haven't been around since those times.

Coming to SQL from MV, I was puzzled at the lack of saved-lists. Though SQL offers many functions and formulas for manipulating output, MV dictionaries allow programmers to "set it and forget it" as they say.

At 9:23 PM, March 23, 2006 , Blogger --dawn said...

Hi Dave -- Yes, good points. Savedlists are on my list of possible topics for the future. They are both simple and powerful. Those thinking in SQL terms think that a view is similar, but it just isn't the same and isn't as easy to work with.
Cheers! --dawn


Post a Comment

<< Home