Archive

Archive for the ‘HaaS’ Category

Why the Choice of Cloud Computing Type May Depend On Who’s Buying

November 15, 2008 Leave a comment

Thanks to Ron K. Jeffries’ Cloudy Thinking blog, I was directed to Redmonk’s Stephen O’Grady (who I now subscribe to directly) and his excellent post titled Cloud Types: Fabric vs Instance. Stephen makes an excellent observation about the nature of Infrastructure as a Service (called increasingly “Utility Computing” by Tim O’Reilly followers) and Platform as a Service (that one remains consistent). His observation is this:

“…Tim seems to feel that they are aspects of the types, while I’m of the opinion that they instead define the type. For example, by Tim’s definition, one characteristic of Utility Computing style clouds is virtual machine instances, where my definitions rather centers on that.

Here’s how I typically break down cloud computing styles:

Fabric

Description: A problematic term, perhaps, because a few of the vendors employ it towards different ends, but I use it because it’s descriptive. Rather than deploy to virtualized instances, developers building on this style cloud platform write instead to a fabric. The fabric’s role is to abstract the underlying physical and logical architecture from the developer, and – typically – to assume the burden of scaling.
Example: Google App Engine

Instance

Description: Instance style clouds are characterized by, well, instances. Unlike the fabric cloud, there is little to no abstraction present within instance based clouds: they generally recreate – virtually – a typical physical infrastructure composed of instances that include memory, processing cycles, and so on. The lack of abstraction can offer developers more control, but this control is typically offered at the cost of transparent scaling.
Example: Amazon EC2″

I love that distinction. First, for those struggling to see how Amazon/GoGrid/Flexiscale/etc. relates to Google/Microsoft/Salesforce.com/Intuit/etc., it delineates a very clear difference. If you are reserving servers on which to run applications, it is IaaS. If you are running your application free of care about which and how many resources are consumed, then it is PaaS. Easy.

However, I am even more excited by a thought that occurred to me as I read the post. One of the things that this particular distinction points out is the likelihood that the buyers of each type would be different classes of enterprise IT professionals.

Its not black and white, but I would be willing to bet heavily that :

  • The preponderance of interest in IaaS is from those whose primary concern is system administration; those with complex application profiles, who want to tweak scalability themselves, and who want the freedom to determine how data and code get stored, accessed and acted upon.

  • The preponderance of interest in PaaS is from those whose primary concerns is application development; those with a functional orientation, who want to be more concerned about creating application experiences than worrying about how to architect for deployment in a web environment (or whatever the framework provides).

In other words, server jockeys chose instances, while code jockeys choose fabric.

Now, the question quickly becomes, if developers can get the functionality and scalability/reliability/availability required from PaaS, without hiring the system administrators, why would any enterprise choose IaaS unless they were innovating at the architecture level? On the other hand, if all you want to do is add capacity to existing functionality, or you require an unusual or even innovative architecture, or you need to guarantee that certain security and continuity precautions are in place, why would you ever choose PaaS?

This, in turn, boils right back down to the PaaS spectrum I spoke of recently. Choose your cloud type based on your true need, but also take into account the skill set you will require. Don’t focus on a single brand just because it’s cool to your peers. Pick IaaS if you want to tweak infrastructure, otherwise by all means find the PaaS platform that best suits you. You’ll probably save in the long run.

Now, I’ve clearly suppressed the fact that developers probably still want some portability…though I must note that choosing a programming language alone limits function portability. (Perhaps that’s OK if the productivity values out weigh the likelihood of having to port.) Also, the things that system administrators are doing in the enterprise are extremely important, like managing security, data integrity and continuity. There are no guarantees that any of the existing PaaS platforms can help you with any of that.

Something to think about, anyway. What do you think? Will developers lean towards PaaS, while system administrators lean towards IaaS? Who will win the right to choose within the enterprise?

Salesforce.com Announces They Mean Business

November 5, 2008 Leave a comment

I had some business to take care of in downtown San Francisco this morning, and on my way to my destination, I strolled past Moscone Center, the site of this year’s Dreamforce conference. The news coming out of that conference had peaked my interest a day earlier–I’ll get to that in a minute–but when I saw the graphics and catch phrase of the conference, I had to laugh. Not in mockery, mind you; it was just ironic.

There, spanning the vast entrances of both Moscone North and South was nothing but blue skies and fluffy white…wait for it…clouds. In other words, the single theme of the conference visuals was, I can only assume, cloud computing. Not CRM, not “making your business better”, but an implementation mechanism; a way of doing IT. That’s the irony, in my mind; that in this amazing month or so of cloud computing history, one of the companies most aggressively associating themselves with cloud computing was a CRM company, not a compute capacity or storage provider.

Except, Salesforce.com was already blurring the lines between PaaS and SaaS, even as they open the door to their partners and customers taking advantage of IaaS where it makes sense. Even before Marc Benioff’s keynote yesterday, it was clear that force.com was far more than a way to simply customize the core CRM offering. Granted, most applications launched there took advantage of Salesforce.com data or services in one way or another, but there was clear evidence that the SF gang were targeting a PaaS platform that stood alone, even as it provided the easiest way to draw customers into the CRM application.

The core of the new announcement, Sites, appears to simply be an extension of this. The idea behind Sites is to provide a web site framework that allows customers to address both Intranet and Internet applications without needing to run any infrastructure on-premises. Of course, if you find the built in SF integration makes adopting the CRM platform easier, then SF would be happy to help. Their goal, you see, is summed up in the conference catch phrase: “The End of Software”. (Of course, let’s just ignore the fact that force.com is a software development platform, any way you cut it.)

Skeptical that you can get what you need from a single PaaS offering? Here’s where the genius part of the day’s announcements come in; simply utilize Amazon for the computing and storage needs that force.com was unable to provide. Heck, yeah.

Allow me to observe something important, here. First, note that Salesforce does not have an existing packages software model; thus, there is no incentive whatsoever to offer an on-premesis alternative. Touche, Microsoft. Second, note that Salesforce.com has no problem whatsoever with partnering with someone who does something better than them. En guarde, Google. Finally, pay attention to the fact Salesforce.com is expanding its business offerings in a way that both serves existing customers in increasingly powerful ways, while inviting new, non CRM customers to use productive tools that just happen to include integration with the core offering. PaaS as a marketing hook, not necessarily a business model in and of itself. (If it succeeds on its own, that’s icing on the cake.)

In a three week period that has seen some of the most revolutionary cloud computing announcements, Salesforce.com managed to not only keep themselves relevant, but further managed to make a grab for significant cloud mindshare. Fluffy, white, cloud mindshare.

Update: The Cloud Computing Bill of Rights

Thanks to all that provided input on the first revision of the Cloud Computing Bill of Rights. The feedback has been incredible, including several eye opening references, and some basic concepts that were missed the first time through. An updated “CCBOR” is below, but I first want to directly outline the changes, and credit those that provided input.

  1. Abhishek Kumar points out that government interference in data privacy and security rights needs to be explicitly acknowledged. I hear him loud and clear, though I think the customer can expect only that laws will remain within the constitutional (or doctrinal) bounds of their particular government, and that government retains the right to create law as it deems necessary within those parameters.

    What must also be acknowledged, however, is that customers have the right to know exactly what laws are in force for the cloud systems they choose to use. Does this mean that vendors should hire civil rights lawyers, or that the customer is on their own to figure that out? I honestly don’t know.

  2. Peter Laird’s “The Good, Bad, and the Ugly of SaaS Terms of Service, Licenses, and Contracts” is a must read when it comes to data rights. It finds for enterprises what was observed by NPR the other night for individuals; that you have very few data privacy rights right now, that your provider probably has explicit provisions protecting them and exposing you or your organization, and the cloud exposes risks that enterprises avoid by owning their own clouds.

    This reinforces the notion that we must understand that privacy is not guaranteed in the cloud, no matter what your provider says. As Laird puts it:

    “…[A] customer should have an explicit and absolute right to data ownership regardless of how a contract is terminated.”

  3. Ian Osbourne asks “should there be a right to know where the data will be stored, and potentially a service level requirement to limit host countries?” I say absolutely! It will be impossible for customers to obey laws globally unless data is maintained in known jurisdictions. This was the catalyst for the “Follow the Law Computing” post. Good catch!

  4. John Marsh of GeekPAC links to his own emerging attempt at a Bill of Rights. In it, he points out a critical concept that I missed:

    “[Vendors] may not terminate [customer] account[s] for political statements, inappropriate language, statements of sexual nature, religious commentary, or statements critical of [the vendor’s] service, with exceptions for specific laws, eg. hate speech, where they apply.”

    Bravo, and noted.

  5. Unfortunately, the federal courts have handed down a series of rulings that challenge the ability of global citizens and businesses to do business securely and privately in the cloud. This Bill of Rights is already under grave attack.

Below is the complete text of the second revision of the Cloud Computing Bill of Rights. Let’s call the first “CCBOR 0.1” and this one “CCBOR 0.2”. I’ll update the original post to reflect the versioning.

One last note. My intention in presenting this post was not to become the authority on cloud computing consumer rights. It is, rather, the cornerstone of my Cloud Computing Architecture discussion, in which I need to move on to the next point. I’m working on setting up a WIKI for this “document”. Is there anyone out there in particular that would like to host it?

The Cloud Computing Bill of Rights (0.2)

In the course of technical history, there exist few critical innovations that forever change the way technical economies operate; forever changing the expectations that customers and vendors have of each other, and the architectures on which both rely for commerce. We, the parties entering into a new era driven by one such innovation–that of network based services, platforms and applications, known at the writing of this document as “cloud computing”–do hereby avow the following (mostly) inalienable rights:

  • Article I: Customers Own Their Data

    1. No vendor shall, in the course of its relationship with any customer, claim ownership of any data uploaded, created, generated, modified, hosted or in any other way associated with the customer’s intellectual property, engineering effort or media creativity. This also includes account configuration data, customer generated tags and categories, usage and traffic metrics, and any other form of analytics or meta data collection.

      Customer data is understood to include all data directly maintained by the customer, as well as that of the customer’s own customers. It is also understood to include all source code and data related to configuring and operating software directly developed by the customer, except for data expressly owned by the underlying infrastructure or platform provided by the vendor.

    2. Vendors shall always provide, at a minimum, API level access to all customer data as described above. This API level access will allow the customer to write software which, when executed against the API, allows access to any customer maintained data, either in bulk or record-by-record as needed. As standards and protocols are defined that allow for bulk or real-time movement of data between cloud vendors, each vendor will endeavor to implement such technologies, and will not attempt to stall such implementation in an attempt to lock in its customers.

    3. Customers own their data, which in turn means they own responsibility for the data’s security and adherence to privacy laws and agreements. As with monitoring and data access APIs, vendors will endeavor to provide customers with the tools and services they need to meet their own customers’ expectations. However, customers are responsible for determining a vendor’s relevancy to specific requirements, and to provide backstops, auditing and even indemnification as required by agreements with their own customers.

      Ultimately, however, governments are responsible for the regulatory environments that define the limits of security and privacy laws. As governments can choose any legal requirement that works within the constraints of their own constitutions or doctrines, customers must be aware of what may or may not happen to their data in the jurisdictions in which data resides, is processed or is referenced. As constitutions vary from country to country, it may not even be required for governments to inform customers what specific actions are taken with or against their data. That laws exist that could put their data in jeopardy, however, is the minimum that governments convey to the market.

      Customers (and their customers) must leverage the legislative mechanisms of any jurisdiction of concern to change those parameters.

      In order for enough trust to be built into the online cloud economy, however, governments should endeavor to build a legal framework that respects corporate and individual privacy, and overall data security. While national security is important, governments must be careful not to create an atmosphere in which the customers and vendors of the cloud distrust their ability to securely conduct business within the jurisdiction, either directly or indirectly.

    4. Because regulatory effects weigh so heavily on data usage, security and privacy, vendors shall, at a minimum, inform customers specifically where their data is housed. A better option would be to provide mechanisms by which users can choose where their data will be stored. Either way, vendors should also endeavor to work with customers to assure that their systems designs do not conflict with known legal or regulatory obstacles. This is assumed to apply to primary, backup and archived data instances.
  • Article II: Vendors and Customers Jointly Own System Service Levels

    1. Vendors own, and shall do everything in their power to meet, service level targets committed to with any given customer. All required effort and expense necessary to meet those explicit service levels will be spent freely and without additional expense to the customer. While the specific legally binding contracts or business agreements will spell out these requirements, it is noted here that these service level agreements are entered into expressly to protect both the customer’s and vendor’s business interests, and all decisions by the vendor will take both parties equally into account.

      Where no explicit service level agreement exists with a customer, the vendor will endeavor to meet any expressed service level targets provided in marketing literature or the like. At no time will it be acceptable for a vendor to declare a high level of service at a base price, only to later indicate that that level of service is only available at a higher premium price.

      It is perfectly acceptable, however, for a vendor to expressly sell a higher level of service at a higher price, as long as they make that clear at all points where a customer may evaluate or purchase the service.

    2. Ultimately, though, customers own their service level commitments to their own internal or external customers, and the customer understands that it is their responsibility to take into account possible failures by each vendor that they do business with.

      Customers relying on a single vendor to meet their own service level commitments enter into an implicit agreement to tie their own service level commitments to the vendor’s, and to live and die by the vendor’s own infrastructure reliability. Those customers who take their own commitments seriously will seek to build or obtain independent monitoring, failure recovery and disaster recovery systems.

    3. Where customer/vendor system integration is necessary, the vendor’s must offer options for monitoring the viability of that integration at as many architectural levels as required to allow the customer to meet their own service level commitments. Where standards exist for such monitoring, the vendor will implement those standards in a timely and complete fashion. The vendor should not underestimate the importance of this monitoring to the customer’s own business commitments.

    4. Under no circumstances will vendors terminate customer accounts for political statements, inappropriate language, statements of sexual nature, religious commentary, or statements critical of the vendor’s service, with exceptions for specific laws, e.g. hate speech, where they apply.
  • Article III: Vendors Own Their Interfaces

    1. Vendors are under no obligation to provide “open” or “standard” interfaces, other than as described above for data access and monitoring. APIs for modifying user experience, frameworks for building extensions or even complete applications for the vendor platform, or such technologies can be developed however the vendor sees fit. If a vendor chooses to require developers to write applications in a custom programming language with esoteric data storage algorithms and heavily piracy protected execution systems, so be it.

      If it seems that this completely abdicates the customer’s power in the business relationship, this is not so. As the “cloud” is a marketplace of technology infrastructures, platforms and applications, the customer exercises their power by choosing where to spend their hard earned money. A decision to select a platform vendor that locks you into proprietary Python libraries, for instance, is a choice to support such programming lock-in. On the other hand, insistence on portable virtual machine formats will drive the market towards a true commodity compute capacity model.

      The key reason for giving vendors such power is to maximize innovation. By restricting how technology gets developed or released, the market risks restricting the ways in which technologists can innovate. History shows that eventually the “open” market catches up to most innovations (or bypasses them altogether), and the pace at which this happens is greatly accelerated by open source. Nonetheless, forcing innovation through open source or any other single method runs the risk of weakening capitalist entrepreneurial risk taking.

    2. The customer, however, has the right to use any method legally possible to extend, replicate, leverage or better any given vendor technology. If a vendor provides a proprietary API for virtual machine management in their cloud, customers (aka “the community”, in this case) have every right to experiment with “home grown” implementations of alternative technologies using that same API. This is also true for replicating cloud platform functionality, or even complete applications–though, again, the right only extends to legal means.

      Possibly the best thing a cloud vendor can do to extend their community, and encourage innovation on their platform from community members is to open their platform as much as possible. By making themselves the “reference platform” for their respective market space, an open vendor creates a petrie dish of sorts for cultivating differentiating features and successes on their platform. Protective proprietary vendors are on their own.

These three articles serve as the baseline for customer, vendor and, as necessary, government relationships in the new network-based computing marketplace. No claim is made that this document is complete, or final. These articles may be changed or extended at any time, and additional articles can be declared, whether in response to new technologies or business models, or simply to reflect the business reality of the marketplace. It is also a community document, and others are encouraged to bend and shape it in their own venues.

Are We Overselling the Cloud to Ourselves?

I was doing some casual reading tonight (which is all I have time to do lately, it seems), when I came across this post from Thomas Wailgum of CIO.com on InfoWorld. (Ain’t syndication grand?) The majority of the post is commentary from Gartner about the relative infancy of SaaS ERP solutions relative to their on-premises bretheren. Interesting in and of itself, but not normally worthy of a post here.

However, on the second page, I came across the following quote:

Other inhibitors to more widespread SaaS ERP adoption, Ganly contends, relate to total cost of ownership (TCO). TCO of “SaaS ERP suites likely will be significant and may not compare favorably with on-premises solutions,” she adds. This problem applies to vendors as well. SaaS vendors “often have unrealistic expectations of their operating costs,” she writes. “The multitenant architecture needed for SaaS ERP suites results in high internal efforts and costs for the initial setup and the ongoing maintenance and upgrade of the system.”

Security has also been an issue with SaaS ERP offerings, “especially with regard to financial data and privacy concerns,” Ganly writes. “Vendors must prove to organizations that are considering SaaS ERP adoption that their security and privacy concerns are unfounded through super low-cost or no-cost, proof-of-concept trials, encouraging early adoption through value pricing and getting early adopters to share their success stories.”

[Emphasis mine.]

It occurs to me that this is a really good point to consider when looking at the economics of the cloud computing market. For SaaS vendors, cost-of-sales is still high, as the sale is (and always will be) a hybrid of the traditional enterprise sales model: high investment in building customer relationships, proving technical and business feasibility, and navigating corporate politics, though likely with fewer of the “big meeting” costs found in traditional relationship sales.

Thus, the “economies of scale” from data center operations may be vastly overshadowed by cost of acquiring customers.

However, a “pure infrastructure” play (such as poster child Amazon), eliminates most of the cost-of-sales if they can prove low barrier to entry and significant flexibility of use. Most customers discover Amazon, get set up for free, then pay nominal charges to figure out for themselves how to use the platform. There is no real data lockin, as the storage services are essentially device storage (as opposed to specific data schemas), so the cost of choosing not to move forward with a pure IaaS vendor is relatively low.

There are few CxO level relationships between AWS and their customers (though I don’t doubt there are several with, say, financial services megoliths with deep pockets and an interest in influencing AWS).

The point is, when most technologists think of the cloud, they think of something like Amazon, not something like DemandERP. But much of the value of the cloud comes from getting the resources you need in (usually) an on demand model. If price and experience can’t be both superior to on-premises ERP for the customer, and profitable for the vendor…well, as the kids say these days, “fail”.

I worry that many of the boutique IaaS vendors are also going to fall into the same trap–not understanding how the cost of acquiring customers to a specialized platform or service will wipe out the economies of scale savings of multitennancy. There will be a lot of churn out there in the coming years, and a lot of wispy corpses floating in the clouds. Caveat emptor.

Oh, and to the point of the efficiency of Amazon’s model: Jeff Barr notes that he can’t find a failed startup that used EC2/S3 as its core infrastructure:

“One of the major value propositions of Amazon Web Services is the utility pricing plan. That is, you only pay for what you use, and the cost is very low. Sometimes it feels like I am just saying that: not because there is any doubt that it’s true; rather because it’s difficult to produce metrics to back up assertions that “low cost utility pricing” is truly a game changer.

Then it hit me… Looking at the list of Start-Up Project presentations on Slideshare’s site, I realized that not a single one of these companies is “off the air”; that is, they all are still in business. In the Startup world that is nothing short of amazing—especially in this economy. (Some of the decks on Slideshare’s site are not from last year’s startup events; however even those other companies appear to be alive and well.) Amazon can’t take all the credit for this track record; however it does seem to be a solid data point that validates the value proposition.”

That is amazing, if it holds true.

The Principles of a Cloud Oriented Architecture

The market is hot. The technologies are appearing fast and furious. The tools you need are out there, but they are young, often untested, and always deliver unpredictable reliability. You’ve researched the economics, and you know now that cloud computing is a) here to stay, and b) offers economic advantages that–if realized–could stretch you IT budget and quite possibly catapult your career.

Now what?

What is often overlooked in the gleeful rush to cloud computing is the difficulty in molding the early technologies in the space into a truly bulletproof (or even bullet-resistant) business infrastructure. You see it all over the Internet; the push and pull between innovation and reliability, the concerns about security, monitoring and control, even the constant confusion over what entails cloud computing, what technologies to select for a given problem, and how to create an enterprise-class business system out of those technologies.

The truth is, cloud computing doesn’t launch our technical architectures into the future. It is, at its heart, an economic model that drives the parameters around how you acquire, pay for and scale the infrastructure architectures you already know. Its not a question of changing the required problems to solve when utilizing data centers, just a change to the division of responsibilities amongst yourself, your organization, your cloud providers and the Internet itself.

To this end, I offer you a series of posts (perhaps moving to a WIKI in the near future) describing in depth my research into what it takes to deliver a systems architecture with the following traits:

  1. It partially or entirely incorporates the clouds for at least one layer of the Infrastructure/Platform/Application stack.
  2. Is focused on consumers of cloud technologies, not the requirements of those delivering cloud infrastructures, either public or private (or even dark).
  3. Takes into account a variety of technical, economic and even political factors that systems running in the “cloud” must take into account.
  4. Is focused at least as much on the operational aspects of the system as the design and development aspects

The idea here is not to introduce an entirely new paradigm–that’s the last thing we need given the complexity of the task ahead of us. Nor is it to replace the basic principles of SOA or any other software architecture. Rather, the focus of this series is on how to best prepare for the new set of requirements before us.

Think about it. We already deal (or try to deal) with a world in which we don’t entirely have control over every aspect of the world our applications live in. If we are software developers, we rely on others to build our servers, configure our networks, provide us storage and weld them all together into a cohesive unit. System administrators are, in large enterprises anyway, specializing in OS/application stacks, networking, storage or system management. (Increasingly you can add facilities and traditional utilities to this list.)

Even when we outsource to others–shifting responsibility for management of parts or all of our IT infrastructure to a vendor–the vendor doesn’t have control over significant elements of the end-to-end operations of our applications; namely, the Internet itself. But with outsourcing, we typically turn over entire, intact architecture stacks, with a few, very well bounded integration points to manage (if any) between outsourced systems and locally maintained systems.

The cloud is going to mess this up. I say this not just because the business relationship is different from outsourcing, but also because what you are “turning over” can be a *part* of a system stack. Smugmug outsources storage and job processing, but not the web experience that relies on both. Applications that run entirely on EC2/S3 outsource the entire infrastructure, but not the application development, or even the application system management. (This is why RightScale, Hyperic and others are finding some traction with AWS customers.)

To prepare for a cloud oriented architecture, one understand what responsibilities lie where. So, I’ll give you a teaser of what is to come with the short-short version of where I see these responsibilities lying (subject to change as I talk to others, including yourselves if you choose to comment on this post):

  • The enterprise has responsibility for the following:
    • Defining the business solution to be solved, the use cases that define that solution, and the functional requirements to deliver those use cases
    • Evaluating the selection of technical and economic approaches for delivering those functional requirements, and selecting the best combination of the two. (In other words, the best combination may not contain either the best technical or best economic selection, but will outweigh any other combination of the two.)
    • Owning the service level agreements with the business for the delivery of those use cases. This is critically important. More on this below.
  • The cloud provider has responsibility for the following:
    • Delivering what they promised you (or the market) that they would deliver. No more, no less.
    • Providing you with transparent and honest billing and support services.
  • The Internet itself is only responsible for providing you with an open, survivability reliable infrastructure for interconnecting the networks you need to run your applications and/or services. There are no promises here about reliability or scalability or even availability. It should be considered a technical wilderness, and treated accordingly.

Now, about SLAs. Your cloud provider does not own your SLAs, you do. They may provide some SLAs that support your own, but they are not to be blamed if you fail to achieve the SLAs demanded of you. If your applications or services fail because the cloud failed, you failed. Given that, don’t “outsource” your SLAs, at least not logically. Own them.

In fact, I would argue that the single most important function of a cloud-centric IT shop after getting required business functionality up and running in the first place, is monitoring and actively managing that functionality; switching vendors, if necessary, to continue service at required levels. The one big piece of IT-specific software that should always run in IT data centers, in my opinion, is the NOC infrastructure. (Although, perhaps in this context its more of a Cloud Operations Center, but I hate the resulting acronym for obvious reasons.)

I’ll focus more on these responsibilities in future posts. All posts in this series will be tagged “coa principles”. Please feel free to provide me feedback in the comments, contact me to review your thoughts on this topic, or simply to send me links that you think I should be aware of. I am also working to find other bloggers who wish to take ownerships of parts of this primer (cloud security, for example) so let me know if you are interested there as well.

I am excited about this. This body of knowledge (or at least the faint traces of knowledge) have been rattling inside my head for some time, and it feels good to finally be sharing them with you.

Cloud Outages, and Why *You* Have To Design For Failure

I haven’t posted for a while because I have been thinking…a lot…about cloud computing, inevitable data center outages, and what it means to application architectures. Try as I might to put the problem on the cloud providers, I keep coming back to one bare fact; the cloud is going to expose a lot of the shortcomings of today’s distributed architectures, and this time it’s up to us to make things right.

It all started with some highly informative posts from the Data Center Knowledge blog chronicling outages at major hosting companies, and failures that helped online companies learn important lessons about scaling, etc. As I read these posts, the thought that struck my mind was, “Well, of course. These types of things are inevitable. Who could possibly predict every possible negative influence on an application, much less a data center.” I’ve been in enough enterprise IT shops to know that even the very best are prepared for something unexpected to happen. In fact, what defines the best shops are that they assume failure and prepare for it.

Then came the stories of disgruntled employees locking down critical information systems or punching the emergency power kill switch on their way out the door. Whether or not you are using the cloud, human psychology being what it is, we have to live every day with immaturity or even just plain insanity.

Yet, each time one of the big name cloud vendors has an outage–Google had one, as did Amazon a few times, including this weekend–there are a bunch of IT guys crying out, “Well, there you go. The cloud is not ready for production.”

Baloney, I say. (Well, I actually use different vocabulary, but you get the drift.) Truth is, the cloud is just exposing people’s unreasonable expectations for what a distributed, disparate computing environment provides. The idea that some capacity vendor is going to give you 100% up time for years on end–whether they promised it or not–is just delusional. Getting angry at your vendor for an isolated incident or poo-pooing the market in general just demonstrates a lack of understanding of the reality of networked applications and infrastructure.

If you are building an application for the Internet–much less the cloud–you are building a distributed software system. A distributed system, by definition, relies on a network for communication. Some years ago, Sun’s Peter Deutsch and others at Sun postulated a series of fallacies that tend to be the pitfalls that all distributed systems developers run into at one time or another in their career. Hell, I still have to check my work against these each and every time I design a distributed system.

Key among these is the delusion that the network is reliable. It isn’t, it never has been, and it never will be. For network applications, great design is defined by the application or application system’s ability to weather undesirable states. There are a variety of techniques for achieving this, such as redundancy and caching, but I will dive into those in more depth in a later post. (A great source for these concepts is http://highscalability.com.)

Some of the true pioneers in the cloud realized this early. Phil Wainwright notes that Alan Williamson of Mediafed made what appears to be a prescient decision to split their processing load between two cloud providers, Amazon EC2/S3 and FlexiScale. Even Amazon themselves use caching to mitigate S3 outages on their retail sites (see bottom of linked post for their statement).

Michael Hickins notes in his E-Piphanies blog that this may be an amazing opportunity for some skilled entrepreneurs to broker failure resistance in the cloud. I agree, but I think good distributed system hygiene begins at home. I think the best statement is a comment I saw on ReadWriteWeb:

“People rankled about 5 hours of downtime should try providing the same level of service. In my experience, it’s much easier to write-off your own mistakes (and most organizations do), than it is to understand someone else’s — even when they’re doing a better job than you would.”

Amen, brother.

So, in a near future post I’ll go into some depth about what you can do to utilize a “cloud oriented architecture”. Until then, remember: Only you can prevent distributed application failures.

Which Sun Do You Orbit?

I love cloud computing. I love the concept, I love many of the implementations, and I love the opportunity that such a major disruption creates for entrepreneurs and tech giants alike. There is much to be excited about, though the market is in its infancy.

Or markets, if you look closely. Simon noted that at his Opscon presentation this year he ended up on the receiving end of an extended diatribe from a gentleman who was arguing determinately that software would never be portable between Amazon EC2 and Google AppEngine (which is probably very true). Simon’s response was right on the money:

“I must admit I was somewhat perplexed at why this person ever thought they would and why they were talking to me about it. I explained my view but I also thought that I’d reiterate the same points here.

From the ideas of componentisation, the software stack contains three main stable layers of subsystems from the application to the framework to hardware. This entire software stack is shifting from a product to a service based economy (due to commoditisation of IT) and this will eventually lead to numerous competitive utility computing markets based upon open sourced standards at the various layers of this stack.

These markets will depend upon substitutability (which includes portability and interoperability) between providers. For example you might have multiple providers offering services which match the open SDK of Google App Engine or another market with providers matching Eucalyptus. What you won’t get is substitutability from one layer of the stack (e.g. the hardware level where EC2 resides) to another (e.g. the framework level where GAE resides). They are totally different things: apples and pears.”

I want to take Simon’s “stack” theory and refine it further. Look at the layers of the stack, and note that there appears to be a relatively small number of companies in each that can actually drive a large following to their particular set of “standards”. In the platform space, of course Google’s python-focused (for now) restricted library set is where much of the focus is, but no one has counted out Bungie Labs yet, nor is anyone ignoring what Yahoo might do in this space. Each vendor has their framework (as Simon rightly calls the platform itself), but each has a few followers building tools, extensions, replications and other projects aimed at both benefiting from and extending the benefits of the platform. The diagram below identifies many of the current central players, or “suns”, that exist in each technology stack today:

Credit: Kent Langley, ProductionScale

I call these communities of central players and satellites “solar systems” (though perhaps it would be more accurate to call them “nodes and edges”, as we will see later).

In each solar system–say the Google AppEngine solar system–you will find an enthusiastic community of followers who thoroughly learn the platform, push its limits, and frequently (though not in every case) find economic and productivity benefits that keep them coming back. Furthermore, the most successful satellite projects will attract their own satellites, and an ever changing environment will form, though the original central players will likely maintain their role for decades (basically until the market is disrupted by an even better technical paradigm).

You already see a very strong Amazon system forming. RightScale, Enomalism and ELASTRA, are all key satellites to AWS’s sun. Now you are even starting to hear about satellites of satellites in that space, such as GigaSpace’s partnership with RightScale. However, if you look closely at this system, you begin to see the breakdown in the strict interpretation of this analogy, as several of the players (CohesiveFT’s ElasticServer On-Demand, for example) starting to address multiple suns in a particular “stack”. Thus my earlier comment that perhaps a nodal analogy is somewhat better.

The key here is that for some time from now, technologies created for the cloud will be attached to one or two so-called solar systems in the stack the technology addresses. Slowly standards will start to appear (as one solar system begins to dominate or subsume the others), and eventually the stack will play as a commodity market, though (I would argue) still centered around one key player. By the time this happens, some cross pollination of the stacks themselves will start happen (as has already happened with the prototype of GAE running in EC2), at which point new gaps in standards will be identified. This is going to take probably two decades to play out entirely, at which point the cloud market will probably already face a major disruptive alternative (or “reinvention”).

I say this not to be cynical nor to pontificate for pontification’s sake. I say this because I believe developers are already starting to choose their “solar system”, and thus their technological options are already being dictated by which satellite technologies apply to their chosen sun. Recognizing this as OK, in fact natural to the process, and acknowledging that religious wars between platforms–or at least stacks–is kind of pointless, will make for a better climate to accelerate the consolidation of technical platforms into a small set of commodity markets. Then the real fun begins.

Of course, I’m a big fan of religious wars myself…