Random System - with a capital S -Thoughts: 2010

Thursday, September 23, 2010

...aaS the alphabet

AaaS - Architecture as a service
BaaS - Business as a Service
CaaS - Corruption as a Service
DaaS - Data as a Service
EaaS - Enterprise as a Service
FaaS - Father as a Service
GaaS - Government as a Service
Haas - Avocado
IaaS - Infrastructure as a service
JaaS - Java as a Service
KaaS - Karma as a Service
LaaS - Logging as a Service
MaaS - Mother as a Service
NaaS - Network as a Service
OaaS - Outsourcing as a Service
PaaS - Platform as a service
QaaS - Quality as a Service
RaaS - Recovery as a Service
SaaS - Software as a Service
TaaS - Testing as a Service
UaaS - Undertaking as a Service
VaaS - Verification as a Service
WaaS - Water as a Service
XaaS - Xenophobia as a a Service
Yaas - Yet another as ervice
ZaaS - Zero as a Service

Sunday, September 19, 2010

Dangerous conversations with Nigel Green

Every couple of weeks, Nigel and I talk about "stuff". This week it was more on his Agile Business Change Design (ABCD) style (details here).

There's still a bit that is causing us both to scratch our heads - and that is how to transition from actor based behaviors to role based thinking. Value networks do a fantastic job of showing what happens by role, but current state organizations are not necessarily organized that way.

In the current state we often see the same role being performed by different people/organizations, or an organization taing on many roles or usually a combination of both. That's likely to be inefficient, but efficiency isn't really the goal.

The beauty of the ABCD style is that it is lightweight, simple, effective, narrative based (as opposed to technique and methodology based). However there are some points at which technique rears its ugly head. That can potentially be a slippery slope - leading to large, methodology bounded, consultant driven, expensive approaches. A favorite technique, applied by MBAs especially, is "cluster analysis". grouping "things" by common properties in order to look for some thread on which to hang a thesis. However, just because it can be long winded and overblown doesn't mean it has to be. The property that we need to cluster around here is the property of "like". When I am checking someone in, I may have to change a reservation, so I am acting "like" a reservation agent.
The ABCD Change Agent (ugh, I hate that term) should quickly in the synthesis step be able to discern the essential roles from the "Commander's Imperative". The actual organizational structures (silos or "cyclinders of excellence") will come out of the operational reality narratives. The skill comes in allowing those in the operational reality to see the roles in a non-threatening way. To recognize that the process isn't an attempt to sabotage their work and working lives, but to more clearly delineate what the organization plans to have done.
Help desks/support centers are often quite confusing in this kind of analysis because the agent at the desk is by definition some kind of generalist. The agent will often have to behave in a variety of different roles - and overlap with people applying the same role in other parts of the organization.
Help desk staff are extremely valuable to have in these sessions because of their breadth - and because of what they see at the Demand end of the operation - they are in touch with what is actually happening, as opposed to what "management" thinks should be happening.
So for the ABCD CA, a goal is to make sure that people on the demand side of the organization - where the money is made, where the contacts with the market the company serves are established and reinforced are fully and properly represented. After all we want to hear how it is, not how some manager's or director's fantasy of how it should be.

Monday, September 6, 2010

Transactions and eventual consistency

My friend, colleague and fellow arechitect Nigel Green (Twitter @taotwit) and I were on a call this morning. Putting the world right of course, but as a by-product of that, we got to discuss systems of record and systems of reference. That led us into some further discussion about Operational data Stores (special cases of Systems of Reference) and eventual consistency. The emerging pattern is an important (and not the only one possible) one.

So first some clarity around terms. I am using System of Record here to mean the authority system that can always provide the absolutely, legally correct value of an item. It is the "ultimate truth" for the item. Every other copy belongs to a System of Reference.

So in my favourite home banking example, my bank's view of the transactions is the system of record, my view (in my local Quicken Copy) is a system of reference. Checks don't bounce because of what happens in Quicken, they bounce because there isn't enough money in the account maintained by the bank.

The is a special kind of system of reference, one that can affect transactions in the system of record. Especially important where the transactions executed in the system of record are relatively long running. Take for example an airline reservation. In the example, I am going to simplify the business rules, simplify the way things actually work just to keep this post short. There is clearly more, but I just want to expose a pattern. The system of reference that I will be describing here is a system of reference that can act on the system of record, e.g. by causing a booking leg to be canceled. An example here might be that if you don't check in for a specific leg, the rest of the legs in the booking may be canceled.

It might (and I use might advisedly) be sensible to make a system of reference responsible for that determination, and the application of the rules, rather than trying to do the rule processing in a system highly optimised for transaction processing. Detecting negative events ("the dog didn't bark") is often quite time consuming and compute expensive.

So a pattern might be for the system of record to deliver a stream of transactions through the event network to one of these systems of reference. This system of reference can make decisions/determinations about "things" which it will potentially want to either report on, or sometimes cause changes to the system of record. There are a couple of ways it could cause the system of record to be updated.

It could treat its own world as a kind of write-through cache. In other words, updates could be made to the system of reference (and any persistent stores that it maintains) and that updating method could issue "change transactions" to the system of record. But what happens if the system of record refuses the transaction for some reason? Now we have to back out the change in the system of reference. Sounds like a case for 2phase commit, whoopee. No we can really gum up the works.

Another approach might be to make the change ONLY in the system of record and wait for that change to come through to the system of reference. That's a kind of eventual consistency model. The system of reference is eventually consistent with the system of record. This is very satisfactory if the time intervals are short. So if the system of reference were within a second or 2 of the system of record, this eventual consistency model might be very handy. If it were over several minutes/days/weeks, this might be very unsatisfactory indeed.

So in my Quicken example, I would be inclined to update the Quicken ledger as I created the transactions. e.g. wrote the checks), and issue the transactions separately (mailing the checks) realizing that I may have to compensate later if some expected event did not occur in the meantime.

In my airline example, I would be very inclined to issue the update only to the system of record, let it apply the rules and then change notify an event after it had done its magic. I would need t make sure the pipe is high speed so the changes can be notified quickly enough, and we would still have to have some compensation mechanisms in the system of reference for when "weirdness happens". However, by forcing the updates through the system of record first and having the system of record eventually consistent gives us a very high performance system with quite a simple set of mechanisms.

Now thinking this way, I can start to see RESTful capabilities on my system of reference. I genuinely am manipulating resources in a very straightforward way. It is all about GET/POST - GET from the system of record, POST to the system of record for use of the data. Internally the system of reference synchs with the system of reference through the event delivery network.

A neat bundle of thinking for one kind of problem.

Monday, July 5, 2010

Meaning and understanding

This post from Steve Baker (twitter @stevebaker) the author of the excellent book, "The Numerati" makes the point that the pace at which language is evolving away from formalisms makes it harder and harder for computers to keep up and deduce meaning. It is already well known that a change of emphasis, the introduction of a pause, and other forms of affect can dramatically change the meaning or intent of an utterance or a written communication.
Formalisms work when there is very precise definition - or the need for very precise definition. So for example, when a process can be broken into very definite steps, then it makes sense to create a formal process for it. However there are many processes that cannot be broken into discrete steps (e.g. t"The Innovation Process", whatever that is). In which case it makes no sense to attempt to apply Lean or other process formalization techniques.
Likewise in language, at the surface level the nuances of human language are so varied that attempting to represent human speech or communication through formalisms is just plain crazy. There is, as Steve Baker argues just too much nuance, too great a rate of change of language and nuance to rely on a formal language approach.
Human to human communication is really negotiation about meaning. When I am in dialog with the short order cook about how I want my sandwich, I may not be terribly precise. The communication depends to some extent on how well I know the cook. So when I want a "dab" of mayonnaise on my sandwich, I have to negotiate with the cook as to what a "dab" means. If the cook and I have a shared understanding or context (like Dagwood and the cook in the comic strip), then it's easy. However, the first time I eat at a specific sandwich shop, there will be no shared context, so we will negotiate over the meaning of a "dab". The conversation might go something like this:

.......
Me: Oh, and I would like a dab of mayonnaise on the sandwich too please.
Cook Holding up a palette knife with some mayonnaise on it: About this much?
Me: No a bit less please, and oh, could you put it directly on the bread, not on top of the meat?
Cook: Sure
Me: Thanks

So at that point we have established what I meant when I said "dab", but that isn't a universal definition. It is a local shared context. So next time I go to the same shop, with the same cook, it might go more quickly - assuming that the cook remembers and that the shop allows for variation.
As we look to apply this to systems, we are faced with the same sorts of difficulties. The more tightly specified an interaction is, the more throughput and efficiency we can get, but the less customer specific variability the system can tolerate. Is it even reasonable to think of the sandwich making process described above as a "system". Yes I think it is, after all once we have agreed on terms, the process is smooth. The trick for us in the systems world is where in the grand scheme of interaction do we put the agreement of terms so that the system can operate efficiently behind the scenes. Where does that necessary customization come from so the customer gets a personalized service, yet the system is efficient enough to run cost effectively.
Continuing the example from above, perhaps the person I am talking to is not the cook, but simply an expeditor. The expeditor's job is to translate from my requirement to the back end system (the line of sandwich makers) in a way that means I am going to get a customized (enough) sandwich. So perhaps the expeditor has access to all the ingredients and knowledge of the standard process, so that when creating the instructions to the sandwich maker, any exceptional ingredients or amounts of ingredients are transmitted with the instructions. So the palette knife with mayonnaise travels with the instructions.
That's an ugly solution, but one we could easily see. Where a shop generally doesn't allow for customization (because it is trying to be super efficient), but realizes that in today's market some customization is necessary to keep the customers happy.
Continuing the sandwich metaphor, au bon pain used to have a process whereby the customer would choose a bread, a spread, a topping (or more) and have a quasi customized sandwich. So the possible ingredients were offered, some rules about composition defined, but final customization (a bit less mayonnaise than is standard please) not an option.
In computer based systems we have the same things going on. At the front end there is negotiation going on. In this Wired Magazine article, the writer observes that there are some 550 improvements being made to the "Google Search Algorithm". Each of these improvements is intended to more accurately divine the intent of the user. In reality they aim to apply several contexts in which a search might be conducted - essentially negotiation between the knowledge managed by Google and the meaning intended by the user. Now once the negotiation is complete, then a powerful transaction system can act on the knowledge.

And that is the knowledge holy grail - enable negotiation at the edge through powerful tools, and back them up with access to extremely efficient, cost effective processing systems.

Saturday, July 3, 2010

The iPhone4 antenna and our industry's approach..

Apple's scrambling around the iPhone4 issues reminds me of something from years back. In days of yore, we programmed on large computers in languages like PL/I. I was pretty knowledgeable about PL/I - and understood how it used memory, what quirks there were in IBM's pretty amazing optimizing compiler. Occasionally debugging required us to examine that thing of beauty - the core dump.

So one day, I was teaching a "how to read a core dump" class when a production problem arose and the on duty team could not resolve it. The dump was duly presented to me to interpret. So I made it a class exercise. The symptom was simple - the program raised a "division by zero exception". The Pl/I language has a special construct called an "On Unit" which is invoked if a matching exception arose. The general error on unit was invoked, the system produced its dump and all seemed to make sense. Except that there was no division being performed in the expression.

More PL/I history here - PL/I does automatic conversion of values in expressions to ensure that it manages the precision correctly. And just occasionally, it converts from the internal Extended Binary Coded Decimal format (where each half byte is stored as a digit 0-9 and the rightmost half byte is a value a-f to denote the sign) to a pure binary value. There is even a special machine instruction for doing this.

So on inspecting the actual place the program stopped, it was the machine instruction that does a convert to binary (and not a division instruction). Weird. On further examination of the oracle (the IBM Messages and Codes manual), we saw that the Convert To Binary instruction will raise a Division By Zero exception in the event that the decimal number is too big. Strange exception for that condition, but OK we know now. Of course by the time that the exception is raised to the PL/I language layer it is also treated as a "zerodivide" exception. And that's what gets reported.

Naturally enough we raised this as an issue with IBM, expecting either some clever fix in the language bindings, like "before actually raising zerodivide we will check the opcode and if it is convert to binary we will raise an overflow exception instead of zerodivide. Or an Oh interesting reaction, the hardware should probably not raise division by zero it should do overflow instead. No such luck. The documentation was fixed instead. It now reads (under the zerodivide section) that sometimes zerodivide can be raised when converting operands in an expression to binary. (Not a verbatim statement this all happened a long time ago).

So what's the parallel with Apple? maybe none, maybe it really is some overly aggressive bars calculating, but if there is a serious underlying problem, it is a whole lot easier to fix at the documentation level (you are holding it wrong), then at the software layer level (we will recalculate the bars) than at the fundamental hardware or platform level (we designed the antenna wrong).

It is natural to look for the least costly and least invasive fix to a problem, but sometimes it backfires.

Friday, April 9, 2010

Caches, write through and interesting semantics

Today I was using Groove (the Microsoft Office tool that allows peer to peer sharing and acts as a nice offline front end for SharePoint).

So imagine an ability to have an offline copy of a Sharepoint document store which you are sharing with others.

Now since this contains items from SharePoint what should the delete behaviors be? If I delete something from this store, should they be deleted from Sharepoint too? What if I don't have delete rights in Sharepoint but do in Groove? What if the person I am sharing the offline content with can delete it, s/he doesn't have rights to delete in SharePoint, but I do. So does my synch with SharePoint cause it to get deleted? So that's the beginning. Let's say that we allow the deletes - at least for now.

Now since the Groove tool is essentially an offline container of the content, it is behaving like a cache. Normally when we delete a container, we delete all the contents too. But if we trickle that delete down into the contents, we will potentially delete the contents from SharePoint which is probably not what you want to do. So here we have a case where an individual delete deletes the item from the cache and from the backing system. But a delete of the container doesn't push the delete to the backing system.

I wonder if there is a better way of thinking about this, because it feels strange (and delivers exactly the desired outcome!)

Tuesday, March 23, 2010

Ranting (again) - This time on User Interfaces

The background to this story is that madame is a swimmer. She saw a friend with a really cool enclosure that allows her to put an ipod shuffle into a waterproof headband. Then when she is mindlessly pounding out the lengths, she can listen to her favourite tunes, learn French or whatever. Since we don't have ipod anythings, it was time to get some. That also means installing the itunes environment onto our various computers. And that's really where this rant starts.

What a horrible piece of software itunes (at least on the PC, I don't have a mac) is. I expected a whole lot better from the company devoted to high style and useability. Sure it works, but it breaks a fundamental rule. The rule is that when you have need to manipulate something, you do it directly and not by proxy.

The naive user (me) might expect that to manage the playlist on the ipod shuffle, you would simply add things to that playlist and delete them from it. What could be simpler, you might ask?

That, of course, is not how it works. For some (doubtless good, but arcane) reason, you create the playlist in itunes itself - on the PC, and then copy that playlist in its entirety to the shuffle. Ok, you think, that deals with some synchronization issues. But it is a royal pain in the patoot as we say here in Texas.

I don't listen to much music, but I do like to listen to tech and marketing talk topics. But I don't usually want to hear them more than once on the shuffle. I listen to the new topics each day in the car too and from work. That means that, because of the lack of controls on the shuffle, I need to delete the listened items each day from the shuffle's playlists. However I want to keep them in other playlists on the PC itself, so I can organize them. It would be great simply to pull up the playlist on the shuffle, mark the items I have heard and press a "disappear" button. But oh no. It doesn't work that way.

So why might it be like that? Perhaps the designers of itunes were only thinking "tunes". In other words, the use case is that people like to listen to the same stuff over and over again. Perhaps there are problems with conflicting playlist names, and this means that I can't have the same named playlist on 2 devices with 2 different sets of recordings. Perhaps the project team were rushed and produced something "just good enough" - they didn't have anyone delivering other srtories for their iterations in development. Well, that doesn't cut it for me. - and worse, I don't know how I could have discovered this prior to purchase. I thought that the apple brand stood for clean design, useability, etc. In some areas it does. itunes isn't one of those areas.

Random System - with a capital S -Thoughts