Google Web

mewsings, a blog


Saturday, March 11, 2006

In Every Job That Must Be Done There Is a Data Element of Fun

Data Carpet Bag
In every job that must be done there is an element of fun. You find the fun and snap, the job's a game. -Mary Poppins

While SQL has many advantages over the languages coming from GIRLS, I will continue my exploration of some of the advantages of the GIRLS family of query languages. I am not proposing that the entire industry adopt GIRLS but we can learn from what it does well.

There are fewer files in the related MV-based application than tables in a SQL-DBMS solution. This makes sense given that MV solutions are not in 1NF, so properties with multiple values need not be split out into separate tables. Many of the files are related to primary entities for the application such as people, places, things, or events. With more nouns modeled as properties for strong entities, there are fewer weak entities in an MV solution.

Users of the GIRLS-related query language ask questions about the data by way of exactly one file/entity at a time, someone analogous to using a single SQL view in a query. That doesn't mean they can only report against base data identified in that file dictionary, but that they are viewing the data through the eyes of a single entity at a time. Just as the secretary in the 1950's picked a file cabinet in which to find information, the user of the MV query language, chooses a single file for each question they ask.

Each MV file, or entity, provides one lens through which a user looks at the world of data. This might seem very restrictive, but just like Mary Poppins hauling a large coat rack out of her small carpet bag, data of all sorts can come from each such file view of the data. Through the eyes of a file dictionary, a user can query data stored under that name, or data derived from any of the data in the system, including aggregated data. Each dictionary is, effectively, a catalog section of data related to a modeled entity. The user can shop from any one of the available catalog sections in a system.

For example, we could see not only a student ID, name, and class level, but a computed GPA, all majors, and the name of each of the students' advisors through a single dictionary. Anything you can think to ask about a student, you could ask the STUDENTS entity.


As mentioned in the previous mewsing, this listing will show one result (not necessarily one line) per student. The majors and advisor names will show on multiple lines when there are multiple entries for these.

OLAP is something that Pickies have done on their OLTP data from the start

It is likely worth a full exploration, but I'll just note that because of this approach to the data, metadata, and the related query language, OLAP is something that Pickies have done on their OLTP data from the start. A file with a compound key (called a multi-part key in MV) somewhat automatically is a virtual fact table from which you can retrieve or derive any data you would want from the dimensions, forming a virtual fact-dimension perspective of the data.

One big downside for this approach is that the carpet bags are not pre-packed with everything. You must add to the vocabularly for the queries (the schema) by defining virtual fields for anything that is not defined as a stored data element under this file name. So the file dictionaries become very long catalogs of base or stored data and definitions for derived data. To help develop complex virtual elements, which I'll refer to as taking your medicine, the spoonful of sugar is that you may refer to subroutines written in Pick (DataBASIC), a very data-savvy general purpose programming language. So, the sky's the limit, but typically each virtual element dictionary item must be constructed either in the dictionary or inline in a query. Unlike SQL, GIRLS is not powerful enough to write every query once the base schema is developed and the database populated.

I've developed virtual data elements in Pick, which has the feeling of enlarging the catalog of offerings in the logical view for the user querying the data. I've developed SQL views in a SQL-DBMS, which has the feeling of shrinking the view of the data for the users. With the former, the dictionaries, or logical views, can get junked up with lots of unnecessary vocabulary. Developers talk about the need to clean the dictionaries as one might suggest cleaning an office. With SQL, your views can get brittle with difficulty adding attributes and retaining performance and existing reports on the view (for example, when you add a property with multiple values resulting in more rows in the changed view).

I can't quite put my finger on it, but compared to coding SQL views, adding derived data elements is quite gratifying in a big-bang-for-the-buck way. No matter what new requirements end-users have for querying the data, you can add virtual data elements which make a carpet bag for queries about an entity even more magic. "In every job that must be done there is an element of fun."


Post a Comment

<< Home

Litter Box

Atom feed

Add to Google

Some of this is worth pawing through.