mewsings, a blog
Wednesday, March 08, 2006
Data for Every 1
Queries and reports. Let's talk about 'em. My plan was to do an overview comparing the query language formerly known as GIRLS with SQL in this one blog entry and then move on to features of reporting tools in general. But I'm scratching that and blending the two. Instead of listing a whole bunch of differences, I'm going to take it nice, and easy, 1 thought at a time. Please recognize that this is not about trying to convince anyone to use my favorite query language; it is much more general than that, with concepts that might also apply to XQuery or any other query language.
Terms that crop up with SQL include the RM, 1NF, and 3VL. Terms that relate to GIRLS are MV, Pick, NF2, and 2VL. Through the eyes of these two query languages, I hope to illustrate some significant differences in data models.
Let's dive into a contrived example of a school system booster club sale of frozen pizzas: multiple pizzas sold through multiple school booster clubs in a single school system. If we had an information system, we might want to query it asking about the people who bought the pizzas, perhaps for a mailing list. We might also want information about the pizzas, maybe for forecasting of supplies for future events. We might ask which booster clubs sold which pizzas so we deliver the pizzas to the right places. Those are all 1 entity instance questions. We are asking a question about each 1 instance of people, pizzas, or clubs.
It is typical to want 1 chunk of information for every 1 instance of an entity.
It is typical to want information about some entity or other, whether people, places, things, or events. Additionally, it is typical to want 1 chunk of information for every 1 instance of such an entity. Optionally, you might want to aggregate information into groupings bigger than 1 asking, for example, how many pizzas each booster club ordered. We might otherwise split out the instances by requesting information about a multivalued property. For example, we might ask how many pizzas have pepperoni as a topping.
In any case, starting with 1 and then listing, grouping, or splitting from there is a very common way to think. 1 might even suggest it is natural. If you walked into a business in the 1970's, or even today, you would find filing cabinets dedicated to 1 single entity, such as Customers or Orders, often with a folder for each instance of that 1 entity. Why? It has to do with how we think, I would think. 1 might be the loneliest number, but it is conceptually very powerful.
A pizza chef team has been organized to assemble all of the uncooked pizzas and put them in boxes, with cooking instructions attached. They would like each box to have a Pizza Description Label, including a unique Pizza ID, the type of crust, the list of cheeses used, and the list of toppings. Sure, you might have mixed cheeses and toppings in a single attribute or split out meats and veggies into separate ones, but just stick with me and don't let your mind wander in that way. (I'll also hold off any discussion of list compared to set or bag for a rainy day. Let's just say that this list of toppings is rather like a shopping list.)
LIST PIZZAS CRUST CHEESES TOPPINGS WITH PIZZA_ID = "12345"
PIZZA_ID, CRUST, CHEESE, TOPPING
PIZZA_ID = "12345"
But this isn't going to be quite the same, is it? There are insignificant details such as whether column headings and unique id's come along for the ride, but then there is the matter of how to create the PIZZA_VIEW and what the output from the above SELECT would be like. The view might be produced with something like this:
CREATE VIEW PIZZA_VIEW AS (SELECT P1.PIZZA_ID, CRUST, CHEESE, TOPPING FROM (PIZZA AS P1 JOIN PIZZA_CHEESE AS P2 ON P1.PIZZA_ID = P2.PIZZA_ID) JOIN PIZZA_TOPPING AS P3 ON P2.PIZZA_ID = P3.PIZZA_ID)
If we take the above view and the previous SELECT on it, we might get a label something like the one below.
I'm a bit rusty on SQL and I'm not doing anything clever here to handle the multiple 1-M (one-to-many) relationships, but feel free to add comments on how the view or query on this data could be set up better. I think the party line is that the data are all there and how it is displayed is a task for a reporting tool. But look at how the query language shows up this key difference in the data models. With GIRLS we ask questions while thinking about 1 thing, listing, aggregating, or splitting them all the while with a sense of 1-ness.
A significant issue for professionals who are learning SQL after only knowing languages that derived from GIRLS is the change to thinking about going from the many to the 1 in their thinking instead of from the 1 to the many. It didn't seem like they ever had to learn to start with 1, but they definitely do have to learn to start with many instead.
Moving the story forward, we have just found out that the green peppers are delayed and we have decided not to wait any longer. We are going to have the delivery team get all the pizzas without peppers to the right booster clubs now. The delivery team has requested a list of the Pizza IDs that need to be loaded in the delivery van. So, we need a listing of all Pizza IDs for pizzas without peppers as a topping.
LIST PIZZAS WITH EVERY TOPPING <> "PEPPERS"
SELECT P1.PIZZA_ID FROM PIZZA AS P1 WHERE P1.PIZZA_ID NOT IN (SELECT P2.PIZZA_ID FROM PIZZA_TOPPING AS P2 WHERE TOPPING = "PEPPERS")
It just doesn't matter to GIRLS if there is an attribute, such as topping, that has multiple values for a single pizza. We can still look at data for every 1.