mewsings, a blog

--dawn

← Previous   Next →

Monday, November 06, 2006

Cowboys with Promiscuous Databases

Database Cowboy

In Northwest Iowa we have lots of cows but no cowboys. We have cattle farms that, as best I can tell, are much like hog farms.

Somewhere between Iowa and Wyoming the backdrop changes from cattle farmers to cattle ranchers. We move from confinement lots to more open ranges. Where the land is fertile, it is farmed. The cows remain rounded up and typically crowded together. Farm land is planned out, designed, and structured. Farmers designate areas for corn, soy, or cows. They work the land, toiling over its makeup throughout the seasons.

I'm no expert, but it seems that ranches are left with a more natural order. As you head towards the mountain regions, the rockier unfarmed land is available for cattle. With less control on their movements, these cattle spread out a bit more. Cowboys round them up and prompt them to move as a group to another location when needed.

Iowa Cows
Iowa cattle farm above, Wyoming cattle range below
Wyoming Cows
Promiscuous means consisting of elements brought together without order

We might say that the confinement lot is organized, constrained, and controlled in accordance with economic and other mathematical principles, while the ranch provides a more promiscuous landscape. The ranch is also organized, mind you, but a natural order arises from the land. Cowboys interact with the landscape to herd the cattle, performing tasks that might have been designed into a confinement lot.

Using the second definition from dictionary.com, promiscuous means consisting of parts, elements, or individuals of different kinds brought together without order.

Lest there be confusion, let me state that as we turn our heads from cows to data, I will interpret the word order in this definition to refer broadly to the organization and structuredness of a database, not to the ordering of attributes or rows.

Real Cowboys
Real cowboys, Wyoming Nov 2006
Cow Parade

Chose from among the most common dictionary definitions of the term database, such as a collection of related facts or perhaps one that requires the use of a computer, if you prefer. Most readers will likely be familiar with the process of designing databases by employing relational modeling, given that this is taught in college courses as well as on the job. The design is organized, constrained, and controlled according to mathematical principles from set theory and first order predicate logic. Like the cattle farmer's land, a model could be drafted showing how the database designer will structure the database. Structure of this sort attracts certain personalities (farmers?) and not others (ranchers?). You might guess that I, in particular, feel more at home on the range.

Any model other than a relational data model might seem to the software development profession as promiscuous. I chose this derogatory, yet enticing, term in part because of the seeming unorderedness, comparatively, of legacy data models. This is not unlike the seeming unorderedness of a cattle ranch compared to the more obvious structure of the cattle farm. I also like using the term promiscuous here because our profession currently sees these alternative database tools as improper, even if increasingly enticing. I predict that our industry will be seduced by something resembling these legacy databases enough to switch to considering not-really-relational databases as mainstream again in the future, especially as SQL becomes less attractive as our interface language to data.

Legacy Photo Sign

Note that although I do need a term for the databases about which I have been writing, often referred to as embedded in marketing literature, I will likely not latch onto this term as that would put me in the uncomfortable position of endorsing promiscuity. I can live with that discomfort for this one blog entry.

Ask a rancher or cowboy how they divide up the land, and they might suggest that the land divides itself. The water is here, the grass there, and the rocks up this way. They might draw or paint you a picture. Ask a database cowboy, one working with a more promiscuous database than those based on the relational model, how they model the data and you might hear an analogous response. The data orders itself.

Ask what steps a database cowboy takes to design a database, and you will likely hear that the first step is to have a good understanding of the landscape, the business. Then you define the scope of your project, putting a fence around it, and then you record what you see inside that fence. By looking at the landscape, you can make a computerized model of this reality for your database implementation. The implementation is a model of the business, not unlike a painting of the range.

I am well aware this scenario generates laughter from some, ridicule from others, as it sounds so unscientific. But as my colleague Anthony Youngman (aka Wol) would suggest, relational modeling is mathematical but not very scientific. The RM imposes an order using mathematical terms such as predicate and relation, typically avoiding terms matching the problem domain such as thing, entity, property, empty, and list, terms used by cowboys working with promiscuous databases. Relational modeling includes putting data into Nth normal form, while the database cowboy knows the land and paints what he sees.

For anyone confused by the imprecision of this description, perhaps the Jayne VanDoe example in the Is Codd Dead? mewsing provides more hints. By the way, regarding science and databases, have the terms relational model and experiment ever made it into the same sentence? We need to return to the science and art, the craft, of databases, modeling by painting what we see and testing our models over time. I'll grant that there is a need for more emperical data related to the effectiveness and resources required over time for all varieties of databases.

At the risk of repeating what I have said in earlier blog entries, but for the sake of any new readers, I will briefly suggest three features that might distinguish a seemingly promiscuous database from one that more closely implements the relational model.

  • 2VL

    Most, if not all, languages that work with the data employ two-valued logic.

  • NF2

    The data need not be in what has traditionally been called first normal form. Attribute values may be arrays or multivalues.

  • Contraints as data
    Saloon Girl

    This one needs a sweet acronym and a better description, but the idea is that constraints related to attribute types are typically specified with data, rather than with metadata, and are enforced outside of a DBMS, rather than by one. Rather promiscuous, wouldn't you say?

Let's take a look at legacy databases. As it turns out, the data handled with/in/using databases termed legacy is current, not primarily legacy, data. While it has been the conventional wisdom, that some would say has been proven, to migrate from legacy databases to SQL-DBMS tools, signs point to a return of such proven approaches as the use of two-valued, rather than three-valued logic. Additionally, more and more work with databases is done by developers without the tired, old 1NF requirement, often by way of object-relational or XML-RM mappings. There is reason to suggest that the future of data modeling resembles the past, the data modeling done by our current database cowboys.

In case you are asking Where's the beef? (perhaps you cannot see the pictures herein), the next blog entry will start looking at specific design patterns used when designing for one model of promiscuous databases, the Pick/MultiValue databases. While I am not as familiar with other not-really-relational models, it is very likely that these best practices will translate to best practices in many other environments as well. And, as always, I fear they are apt to irritate or even infuriate some RM enthusiasts. Heigh ho.

Although cowboys typically prefer using an apprentice approach with new recruits, a cowboy handbook might be in order so that the next generation of cowboys can learn from the best practices of those who have gone before. While it once looked like these cowboys were a dying breed, with the new wild, wild west of the internet, database cowboys look like they will be around for as long as the farmers. In the next blog entry, I will have something for you to sink your teeth into. The least we can do is pass along some tips from seasoned cowboys on how they have been saddling promiscuous databases for the past half-century.

Cowboy Cafe

← Previous   Next →

3 Comments:

At 9:16 PM, November 06, 2006 , Anonymous Ross Ferris said...

I am concerned that the term "cowboy" usually has a derogatory context as it applies to IT – to label any section of the industry as cowboys (unless you are talking about the SQueaLers) may backfire.

I understand that you are milking the analogy for all it is worth, appealing to the romanticized stereotype, but I think you will find that the modern cowboy is not only at home on the range, but also knows a thing or two about a gaggle of technology offerings (and that is no bull!)

Of course, we COULD use some consistent branding!

 
At 5:42 PM, November 09, 2006 , Blogger --dawn said...

Hi Ross -- Yes, I was aware that the term "cowboys" has a downside, but so does talking about non-SQL DBMS tools, especially those that are 40 years old. I don't know if you have cowboys down under, but I gotta tell you was swept away in the romance of the setting when I saw these two cowboys herding the cows through a tunnel under the highway I was driving the other day. By the time I stopped and took a picture they were too far away to good a really good photo. That is when it hit me that I live where there are cows, but no cowboys -- such a shame. So, I'm making an analogy, and, as such, it surely has limitations, but in the picture I am drawing there are pros and cons to having the cowboy do this work when structure and constraints put in place up front could accomplish the same thing.

I included the picture at the end to show that the modern cowboy knows about trucks too, not just horses, but I dropped the point from the blog entry.

Speaking of milking, I'm enough of a city girl (the first several decades of life) that when I took pictures of cattle near home I had to try to figure out how to tell the difference between a dairy and a not-dairy ;-) Otherwise the difference to me was only in the signage.

Along with the need for swoopy database tools and enough of a standard that third parties could consistently "plug in" to all MV databases, the MV/Pick space has definitely lacked for branding, although several seem to have tried (Gus Giobbi with the MV logo, for example).

I don't know if that covered your comments, but I both agree with your comments and took delight in your awful, I mean great, puns. I almost attempted to match 'em, but I'm no match. cheers! --dawn

 
At 6:20 AM, May 07, 2007 , Blogger Scott W, Ambler said...

In the Agile Data community we've been talking about these sorts of ideas for awhile. Spending a bit of time to understand the landscape at first but then let the details evolve over time as they may. Unfortunately this goes completely against the "think everything through up front then force the DB schema on to the developers" mentality that we often see within the traditional data community. Sadly, the traditional data folks are well entrenched in many organizations and so adept at coporate politics that it's very difficult to promote any sort of change.

We need to challenge the traditional mindset more often. We need to educate traditional data professionals in emerging techniques such as database refactoring, database regression testing (how many traditional data professionals do you know that talk about data quality but NEVER seem to talk about testing? Think about the implications of that for awhile), and continuous database integration. Until the realize that there are better options out there, they'll never consider changing.

- Scott

 

Post a Comment

<< Home