Wednesday, October 31, 2012

MongoDB and the Relational Car

Sometimes you want your data all nicely normalized, and sometimes you don't. This has come into sharp relief as I go through MongoDB training this week. By way of background, I have a fair amount of experience with many types of databases, data modeling and data thinking in general, so it is interesting and fun to learn about new ways of thinking.
But forst a story. In the dark ages (maybe 1974 or 1975), I wondered about mailing lists. So much so that I devised a way of tracking some of the uses of data among companies - especially early markets in buying and selling of information. I would sign up for a magazine using some unique variant of my names. Keep track of which variant I used for which magazine, and then see what solicitations I would get through the mail using that name variant. Most instructive. American Express sent mail to the largest number of variants.
I do the same things to this day - making up email names addresses just for long enough to validate that I want the service I have signed up for, and then wait and see what else arrives. Of course everything that arrives is by definition spam. But I digress.
I also thought about the "relational car" i.e. what would the world be like if I normalized my vehicles. Kept the wheels with the wheels, the engines with the engines,... You get the idea. I think it is likely that I would be late for work every day. First join all the piece parts together to make a suitable version (assuming that the children hadn't emptied the fuel tank the night before, thus putting the equivalent of a lock on the tank). Then drive off. After coming home, put the updated parts back. Updated???? Yes, because the tires are now more worn...
Clearly from the primary use of having a vehicle as transport, the relational car is far from ideal. I am much better off with the assembled car.
That's kind of how I think in Mongo. Often times the data are much more useful when put together by primary usage than when all normalized and accessed with joins. But, of course, not all the time.
That led me onto thinking about the quality of "relationships" (among the data entities). Many others, in more learned writings than mine, have fussed about different qualities in relationships. Composition is different from association, etc. So a Purchase Order might be composed of Many (at least 1) Purchase Order Line Items, so it seems reasonable to think of these line items inherently bound up with the POs. So the document oriented approach looks pretty good. But, things are less rosy, when I think about the association between the Product and the PO. There it is probably unreasonable to bury the product data inside a PO document. And there are several different kinds of associations we might want to consider.
So, when learning the document oriented DBMS (MongoDB), I am finding myself revisiting types of relationships and whether the distinctions are important. For me I have come down on the side of, I probably care in my master systems, those systems of record that actually run the business. But in those which are simply systems of reference, maybe it makes a whole lot of sense not to worry about the normalization, required schemas and other aspects that make the document oriented databases so interesting.
In the relational model the foreign key is the only relationship condtruct available. Even the cretion of "link" relations relies on the Forign Key. That doesn't seem to me to be a powerful enough construct to express the nuances of the kind of relationship and thus its associated semantics.
Oh, and circling around to something interesting about composition types of relationships, we do have some interesting delete anomalies. If we say that an A is composed of 1 or more Bs, what do we do when attempting to remove the last remaining B. That should somehow kill off the A, of course - or we expressed the rule incorrectly.