Thursday, July 12, 2012

Clouds and failures

Cloud infrastructure delivers us a utility view of computing. Models like Amazon's Elastic Cloud Computing (ECC) give us scalability on demand - and consumption based pricing. But along with all of the benefits of the model, there are some downsides that are less well understood or considered. The first is that you are responsible for ensuring that you have chosen the right disaster recovery options. If (when) the cloud infrastrucute hiccups, you do want to make sure your systems are still operational and available. You don't want the very public beatings that occur when Amazon takes a hit and sites hosted by Amazon become unavailable.
Second, there are many more players in the mix than you might think. There is a whole collection of management platforms that help you deal with the complexity of the underlying cloud. I'll use RightScale as an example here. In order to deploy/manage/maintain a specific deployment, the Rightscale environment can be used. That is a whole bunch (technical term, I know) of software that essentially provides processes and tools to configure/start/stop/script/debug/deploy servers in the cloud. That is some fairly complex software. It can break too. Or it can become part of a maintenance window. After all even management software has to be upgraded. So an outage in your management software could become problematic. You need to understand the location, up time guarantees, etc. of your management software.
Above that - because even the management software environments can be a bit cumbersome, companies might add their own layers of management software - think of them as super processes that run canned "scripts" for performing the most common tasks. These can fail too. Often corporations include these as part of their internal infrastructure - providing some handy authentication/authorization services integrated with corporate ldap or other directory services approaches. Where does that live? What's it's failure model?What kind of downtime does it have to take?
There are many moving parts here - so while there are considerable benefits to moving certain kinds of applications and services to the cloud, there may well be more failure scenarios to plan for.

No comments:

Post a Comment