Google Web www.tincat-group.com

mewsings, a blog

--dawn

Wednesday, May 24, 2006

Constraints Factored In and Out

Neck Tie Constraint

It's time to factor constraints into the equation. A constraint is any component of software, including code, metadata, or data, designed to limit possibilities. Constraints implement "business" rules such as only men may be elders in churches in Sioux Center, IA.

Every aspect of a software product is constraining. Remove all constraints from software and you no longer have software. While I will use the term constraint with a broad definition, some use it only related to constraint specifications formed as propositions (logical predicates, to be precise). Some prefer or even insist upon using a declarative, rather than OO or procedural, language for constraints. While I see the charm in that approach, I will be almost programming-language-agnostic in this mewsing.

From a theory perspective, we can model both data and constraints as propositions, then use predicate logic to and these propositions. For example, if we are defining a person P to the computer, and we are collecting a birthdate for that person, we might refine our definition of a person using a constraint Q that specifies a living person's age to be <= 140. To validate our data, we can then ask the question P ^ Q?

An aside: Predicate logic is relevant to the choice of data models. Simple propositions (e.g. those without lists) require only first order predicate logic, which makes for a simpler theory than if using higher order logic. However, it is simpler for me to work with property lists, for example, than to normalize as described in Is Codd Dead?. We will have to bring Occam in for that debate at some point so he can tell us whether he is relevant to this discussion or not.

A sculptor wields the chisel, and the stricken marble grows to beauty. WILLIAM CULLEN BRYANT

No matter how a software product implements one constraint or another, there are two parts to any software constraint: a specification and a service. A constraint specification is developed using a computer language, whether a general purpose language such as Java, a database sublanguage such as SQL, a declarative rules language such as OCL, or a homegrown or vendor-supplied proprietary language, perhaps specified using XML documents.

A full range of acronyms could also be relevant to the development of a constraint service that applies a constraint and returns a result. The constraint service reads in the constraint as part of the input (Q above) along with whatever needs to be verified against the constraint (P) and performs the test (P ^ Q?), taking appropriate action based on the test result.

Note re terminology, feel free to skip: Constraint services are often referred to as validation services, which is somewhat narrower. To clarify, I'm trying to use the term rules for the conceptual realm, the analysis aspects of the project, with constraints as the implementation of those rules in the software. Some rules' implementations, aka constraints by this terminology, do not constrain but assist, perhaps providing a suggestion, for example. The rest of the software to implement a constraint, other than the specification is what I am referring to as a constraint service, even in these cases when it does not constrain the user or the data. So, it might be better to skip the term constraint altogether and speak only of rules. Clear as mud? For the examples here, consider all mention of constraint services to be validation services even though I am using a broader term.
There are two parts to any constraint: a specification and a service.

Sometimes the specification and service are interwoven, tightly coupled. If a user enters a birthdate, there might be code similar to age = (today - birthdate)/365; if age > 140 show errorMessage. Alternatively, there could be a constraint similar to (today - birthdate)/365 <= 140 specified as input to a constraint service, or rules engine. The latter is typically the case when contraints are applied by a DBMS while the former is often the case when a constraint is applied to user input. Implementation of the same constraint is often partitioned differently in different components of a single software product.

End-users, rather than developers, might maintain a constraint. We could give our users a means of changing the male-only elders constraint, for example. This could be implemented by something as simple as a system-wide WOMEN_ELDERS Yes/No flag or as complex as a general engine that interprets specifications such as if Person's gender = 'F' then mayBeElder = false. When an end-user might need to change a constraint, the specification is data, even if it is also code, often stored in a database.

Moving along, have you seen the following? Web-based software that includes constraints 1) coded in JavaScript to get that quick response time when verifying entered data in the browser environment; these constraints are also 2) specified and applied in a language associated with the web or app server; additionally, such constraints are 3) specified to and applied by the DBMS on a database server. There are three separate specifications and three separate services for the same constraint. For example, our age <= 140 constraint might be specified using the JavaScript language, using a OCL in XML, and using SQL. The associated constraint services might be coded in JavaScript and Java with the third being a component of a proprietary DBMS. Wow! All that and we are able to ensure that our software only models people who are not older than 140 years old.

I hesitate to mention that the birthdate might have been entered by someone working for another company whose software already validated this data up and down before passing it to your web service so you can do likewise, perhaps before passing it on. We verify, verify, then verify again to ensure quality data, of course, provided the date was entered correctly and no one (in Hollywood) lies about their age. Perhaps this is another example of measuring with a micrometer and cutting with an axe.

This approach to constraints results in bloated software with high maintenance costs. It is a sorry state, indeed, but I'll admit that I take a small amount of delight in the irony that the database world claims concern about redundant data while being a primary player in the repeated specification of the same constraint.

Factoring out Constraint Management

Why don't we validate our data once, so we avoid the triple-specification and triple-code for services situation? The simple answer is trust. Given JavaScript and a web browser UI as implemented today (where a user could use Greasemonkey to change data after it has been validated, for example), a middle tier cannot trust that data verified in the browser is the same data it sees. While there could be more trust between application code in a middle tier and the DBMS, there is no automated mechanism for certifying an application so that a DBMS could have a security feature that is able to accept data from a trusted source.

Without thinking about the possibility of certificates and signatures for each piece of data that has been verified, we are not going to stop the redundancy of repeatedly performing the verification. There is potential to eliminate the DBMS verification if the application software is owned by the same organization and adequate quality assurance is done for the DBMS to accept the validations of the middle tier (and why, pray tell, wouldn't that be the case?) But in order to keep some folks happy, let's just say that all of these validations must transpire. It surely is not also required that each constraint specification and the code for each service be written in different languages, right?

In designing software, we partition our solution, modeling using metaphors and related implementations for such things as objects, services, functions, or sets. We factor or refactor our software solutions so that we pull out frequently-used components for reuse. If we have encoded a constraint such as age <= 140, we could, in theory, reuse both the specification and the validation service wherever it is needed. Deciding what to factor out in the overall scope of a software project, how to partition our software, is part of the software design process.

What keeps us from factoring out constraint specifications and constraint services? If a SQL-DBMS tool is part of the software solution, constraints might be encoded in the database schema. With most DBMS tools it is not feasible, whether for performance or other reasons, to reuse the specification and constraint services of the DBMS throughout related applications. Even if it were, it is unlikely that all relevant constraints would or should be implemented there.

Some organizations choose to put a minimum number of constraints in the DBMS. I hate to say it outloud as I know it is not a popular opinion, but I favor restricting the DBMS schema to a bare minimum of constraints any time the same organization that owns the database also controls the applications that update it. I have yet to see a case where one organization permits another to write directly to its databases (although there might be such), so this is pretty much always my recommendation.

Back-end constraint services can be packaged with the organization's CRUD services used by all applications. The database validations and related manipulations can then use the same constraints as the applications. While working with database management systems lacking even foreign key constraints, it was once troubling how much cost savings there seemed to be using that development environment. I would never have guessed then that I would end up recommending such a strategy. Now I see that these constraints were still in the overall solution, even if lacking in the DBMS schema.

On the UI front, if Javascript is required, it could potentially be generated from app server languages as is done with the new Google Web Toolkit for AJAX development and Ruby on Rails AJAX efforts, although debugging such generated code could be very unpleasant. I would prefer that validations for the UI be performed in the middle tier, with JavaScript using AJAX for asynchronous validations. When the constraint directly affects the UI widget, such as permitting a selection from a drop-down list, JavaScript will still need to get such constraint data to populate the UI, but it need not duplicate it.

We could certainly get closer to having a single specification for a constraint and a single service that validates based on the constraint than what is often the case in large software development efforts. Is it time to refactor our solutions so as not to lock constraint specifications and related services within the DBMS schema? Factoring constraints into our software design might just mean factoring them out.

See comments.

Litter Box

Musings about software development, with a focus on data modeling.


Atom feed

Some of this is worth pawing through.