Tuesday, November 20, 2012

Grumpy Old Man and MongoDB - Indexes and things

This week (week 4 in the excellent 10gen class on MongoDB) has us looking at things like indexes, profiling, etc.
I am getting used to the syntax (but still dislike the "programming in quotes" model and the use of cryptic special values for specifying sort sequence, etc.).
Lovely looking feature for geospatial indexes, but quite tricky to use. At the base level, the distance measures on the spherical model are expressed in radians. So we have to do that conversion somewhere. PITA so far. I can see why, but that isn't exactly habdy. Would like (and will build) some other mechanisms to sort that out.
If for no other reason, the radians based model doesn't distinguish well between directionality. maybe I want coffee shops within 10 miles North of me (because I am heading that direction, none south of me and maybe within 1 mile each side of the route). I am sure I could code that!
And then for some reason, the MongoDB shell treats using the geospatial spherical model differently from other models. It is invoked through the db.runcommand(...) syntax and not the usual db.dbname.find(...) syntax.
Also since you don't specify which index to use in the db.runcommand(..) syntax, and if you happen to have 2 2d indexes defined it fails. Promising feauture, but could use work.
Utilities are handy - Mongotop and Mongostat are helpful indeed
In many ways MongoDB reminds me of the 1970s system ADABAS, but with updated syntax.

Wednesday, November 14, 2012

Grumpy old man and MongoDB - Transactions

I am beyond scared by the possibility of using MongoDB for any kind of meaningful transactional system.

We always have a balance between "getting stuff through the system" and "getting sufficient accuracy". Sufficient here is really key. ACID properties are vital to ensure that we don't see incomplete transactions WHEN THE KINDS OF TRANSACTIONS WE ARE PROCESSING MUST NOT BE SEEN UNTIL COMPLETE. (caps deliberate).

The "classic" example is the movement of money from one account to another. While the money is being moved, decisions based on the value of either account will be flawed. The "from account" will have a smaller balance than we think, and the "to account" a larger one. So we should probably wait until the transfer transaction has completed before allowing any process to make decisons based on the balance in either account.

In the MongoDB world the update to each account is itself atomic, but there appears to be no overarching transaction context. So it is possible (not very probable) for the document that represents the "from" account to show that the amount has been debited, but that the "to account" has yet to be credited. Assuming that the system does debits before credits. It, of course, doesn't have to, although I think it would be foolish not to.

The designers of the major database management systems (relational or not) have thought carefully through those kinds of implications. They have made sure that records are somehow locked to prevent this kind of behavior. They ensure that updates on both the "from" and "to" sides of the transaction are both handled - or neither is.

Do I really trust a developer with the kind of skills I have to get this right in every case if I get no help from the underlying data management system? I don't think so. I would much rather see the transactional systems using transactional databases. And use these powerful engines (like MongoDB) for situations where I don't have to rely on transactional behaviors.

Now the actual number of cases where transactional behavior of this nature is actually required may be smaller than we think. Often times we see a small transactional component (moving the money) not tied to the delivery of the goods. See this excellent post from Gregor Hohpe.

Friday, November 9, 2012

Grumpy old man and MongoDB - Database Design

It is week three in the MongoDB class put on by 10gen. The instructors have done a great job. The material flows well and is presented nicely. So kudos to the guys.

One of the privileges of being old and grumpy is that you learn that there are no mysteries in system design. However, there are new paradigms sometimes. We have that in the MongoDB world and there are many cases where it can make a big difference. Essentially I am now beginning to think of MongoDB as "relational database with embedded arrays". I don't know for sure (I haven't done the math and nor am I likely to), that MongoDB will support the Relational Calculus. It should (probably, but again, I have not done the math!) support SQL pretty well. Especially a very vanilla form that doesn't use constraints, etc. I am not sure of the value of the DDL aspects of SQL, although I guess one could do that. Much more important would be the layering oof SQL for data manipulation.
Even expressing a join would be fine - and if the data were embedded more power to it. SQL as data access layer vs SQL all the way through the storage subsystem.
There are some semantics changes of course - because of the lack of a real "key" in an embedded document, some of the join-like processing will potentially be a bit odd. Essentially we have to treat the values in an embedded document as we would in a materialized view.
SQ Update and Delete operations are less likely to behave as they do in an RDB. The implications of deletion on embedded documents are subtle. However I can see some great opportunities for some stereotypes here.

This post by Bill Kent is one of the all time great articles on thinking about choices in representation of a simple 'fact', The paper was written in 1988.

As a long time teacher of data modeling (my classes pre-date relational databases!), I have come to a couple of realizations:
  • The approach that I take to logical (E/R, not expressed as tables) modeling won't change with MongoDB
  • There should be some pretty simple guidelines for converting an E/R model to a MongoDB implementation
  • The best looking uses for MongoDB are where something else has already done the validation and linking - insertion into MongoDB becomes an organizational exercise.
  • MongoDB gives some flexibility in order of insertion even when things are linked. So some of the convoluted exercises we have done when creating systems of references in conventional relational databases may go away.
  • The modeling tools (like Embarcadero and ER/WIN) are less help than they used to be - except maybe as pure diagramming tools. This one I am less sure of, since all I have ever seen from these tools is modeling as a relational exercise. If there are other ways possible, I haven't really seen them.
I am looking forward to week 4.

Friday, November 2, 2012

syntactic sucralose

In programming languages there is a concept "syntactic sugar". As wikipedia describes it In computer science, syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express.
In some languages (especially the MongoDB shell), there is the reverse concept. There are language features that are present to "make it work" but have no bearing on anything in the program's context. I call these syntactic sucralose. They are the only things available to get the desired result, but leave a bitter taste in your mouth afterwards.
The case that riled me up today was the $unset "operator" in Mongodb's shell interface. To unset (eliminate a name value pair in a MongoDB document), you write something of the following form for the second argument of the .update method.
{$unset :{foo : 1}}. The 1 in this case is a mandatory positional parameter that has no relation to the current value of foo. In fact you could put anything that is a legitimate value (string, date, objectID, integer, boolean...) in place of the 1.  In fact whatever is placed there is evaluated.  So for example, the code fragment

,{$unset :{foo,x++}} does result in both foo becoming unset and x being incremented.

Even if the value is an unbound variable name, it still is acceptable.

Lots of scope for mischief. This should come with a government health warning