Archive

Archive for the ‘coa principles’ Category

The enterprise "barrier-to-exit" to cloud computing

December 2, 2008 Leave a comment

An interesting discussion ensued on Twitter this weekend between myself and George Reese of Valtira. George–who recently posted some thought provoking posts on O’Reilly Broadcast about cloud security, and is writing a book on cloud computing–argued strongly that the benefits gained from moving to the cloud outweighed any additional costs that may ensue. In fact, in one tweet he noted:

IT is a barrier to getting things done for most businesses; the Cloud reduces or eliminates that barrier.

I reacted strongly to that statement; I don’t buy that IT is that bad in all cases (though some certainly is), nor do I buy that simply eliminating a barrier to getting something done makes it worth while. Besides, the barrier being removed isn’t strictly financial, it is corporate IT policy. I can build a kick butt home entertainment system for my house for $50,000; that doesn’t mean it’s the right thing to do.

However, as the conversation unfolded, it became clear that George and I were coming at the problem from two different angles. George was talking about many SMB organizations, which really can’t justify the cost of building their own IT infrastructure, but have been faced with a choice of doing just that, turning to (expensive and often rigid) managed hosting, or putting a server in a colo space somewhere (and maintaining that server). Not very happy choices.

Enter the cloud. Now these same businesses can simply grab capacity on demand, start and stop billing at their leisure and get real world class power, virtualization and networking infrastructure without having to put an ounce of thought into it. Yeah, it costs more than simply running a server would cost, but when you add the infrastructure/managed hosting fees/colo leases, cloud almost always looks like the better deal. At least that’s what George claims his numbers show, and I’m willing to accept that. It makes sense to me.

I, on the other hand, was thinking of medium to large enterprises which already own significant data center infrastructure, and already have sunk costs in power, cooling and assorted infrastructure. When looking at this class of business, these sunk costs must be added to server acquisition and operation costs when rationalizing against the costs of gaining the same services from the cloud. In this case, these investments often tip the balance, and it becomes much cheaper to use existing infrastructure (though with some automation) to deliver fixed capacity loads. As I discussed recently, the cloud generally only gets interesting for loads that are not running 24X7.

(George actually notes a class of applications that sadly are also good candidates, though they shouldn’t necessarily be: applications that IT just can’t or won’t get to on behalf of a business unit. George claims his business makes good money meeting the needs of marketing organizations that have this problem. Just make sure the ROI is really worth it before taking this option, however.)

This existing investment in infrastructure therefore acts almost as a “barrier-to-exit” for these enterprises when considering moving to the cloud. It seems to me highly ironic, and perhaps somewhat unique, that certain aspects of the cloud computing market will be blazed not by organizations with multiple data centers and thousands upon thousands of servers, but by the little mom-and-pop shop that used to own a couple of servers in a colo somewhere that finally shut them down and turned to Amazon. How cool is that?

The good news, as I hinted at earlier, is that there is technology that can be rationalized financially–through capital equipment and energy savings–which in turn can “grease the skids” for cloud adoption in the future. Ask the guys at 3tera. They’ll tell you that their cloud infrastructure allows an enterprise to optimize infrastructure usage while enabling workload portability (though not running workload portability) between cloud providers running their stuff. VMWare introduced their vCloud initiative specifically to make enterprises aware of the work they are doing to allow workload portability across data centers running their stuff. Cisco (my employer) is addressing the problem. In fact, there are several great products out there who can give you cloud technology in your enterprise data center that will open the door to cloud adoption now (with things like cloudbursting) and in the future.

If you aren’t considering how to “cloud enable” your entire infrastructure today, you ought to be getting nervous. Your competitors probably are looking closely at these technologies, and when the time is right, their barrier-to-exit will be lower than yours. Then, the true costs of moving an existing data center infrastructure to the cloud will become painfully obvious.

Many thanks to George for the excellent discussion. Twitter is becoming a great venue for cloud discussions.

Do Your Cloud Applications Need To Be Elastic?

November 22, 2008 Leave a comment

I got to spend a few hours at Sys-Con’s Cloud Computing Expo yesterday, and I have to say it was most certainly an intellectually stimulating day. Not only was just about every US cloud startup represented in one way or another, but included were an unusual conference session, and a meetup of fans of CloudCamp.

While listening in on a session, I overheard one participant ask how the cloud would scale their application if they couldn’t replicate it. This triggered a strong response in me, as I really feel for those that confuse autonomic infrastructures with magic applied to scaling unscalable applications. Let me be clear, the cloud can’t scale your application (much, at least) if you didn’t design it to be scaled. Period.

However, that caused me to ask myself whether or not an application had to be horizontally scalable in order to gain economically while running in an Infrastructure as a Service (IaaS) cloud. The answer, I think, is that it depends.

Chris FlexFleck of Citrix wrote up a pretty decent two part explanation of this on his blog a few weeks ago. He starts out with some basic costs of acquiring and running 5 Quad-core servers–either on-premises (amortized over 3 years at 5%) or in a colocation data center–against the cost of running equivalent “high CPU” servers 24X7 on Amazon’s EC2. The short short of his initial post is that it is much more expensive to run full time on EC2 than it is to run on premises or in the colo facility.

How much more expensive?

  • On-premises: $7800/year
  • Colocation: $13,800/year
  • Amazon EC2: $35,040/year

I tend to believe this reflects the truth, even if its not 100% accurate. First, while you may think “ah, Amazon…that’s 10¢ a CPU hour”, in point of fact most production applications that you read about in the cloud-o-sphere are using the larger instances. Chris is right to use high CPU instances in his comparison at 80¢/CPU hour. Second, while its tempting to think in terms of upfront costs, your accounting department will in fact spread the capital costs out over several years, usually 3 years for a server.

In the second part of his analysis, however, Chris notes that the cost of the same Amazon instances vary based on the amount of time they are actually used, as opposed to the physical infrastructure that must be paid for whether it is used or not (with the possible exception of power and AC costs). This comes into play in a big way if the same instances are used judiciously for varying workloads, such as the hybrid fixed/cloud approach he uses as an example.

In other words, if you have an elastic load, plan for “standard” variances on-premises, but allow “excessive” spikes in load to trigger instances on EC2, you suddenly have a very compelling case relative to buying enough physical infrastructure to handle excessive peaks yourself. As Chris notes:

“To put some simple numbers to it based on the original example, let’s assume that the constant workload is roughly equal to 5 Quadcore server capacity. The variable workload on the other hand peaks at 160% of the base requirement, however it is required only about 400 hours per year, which could translate to 12 hours a day for the month of December or 33 hours per month for peak loads such as test or batch loads. The cost for a premise only solution for this situation comes to roughly 2X or $ 15,600 per year assuming existing space and a 20% factor of safety above peak load. If on the other hand you were able to utilize a Cloud for only the peak loads the incremental cost would be only $1,000. ( Based on Amazon EC2 )

Premise Only
$ 15,600 Annual cost ( 2 x 7,800 from Part 1 )
Premise Plus Cloud
$ 7,800 Annual cost from Part 1
$ 1,000 Cloud EC2 – ( 400 x .8 x 3 )
$ 8,800 Annual Cost Premise Plus Cloud “

The lesson of our story? Using the cloud makes the most sense when you have an elastic load. I would postulate that another option would be a load that is not powered on at full strength 100% of the time. Some examples might include:

  • Dev/test lab server instances
  • Scale-out applications, especially web application architectures
  • Seasonal load applications, such as personal income tax processing systems or retail accounting systems

On the other hand, you probably would not use Infrastructure as a Service today for:

  • That little accounting application that has to run at all times, but has at most 20 concurrent users
  • The MS Exchange server for your 10 person company. (Microsoft’s multi-tenant Exchange online offering is different–I’m talking hosting your own instance in EC2)
  • Your network monitoring infrastructure

Now, the managed hosting guys are going to probably jump down my throat with counter arguments about the level of service provided by (at least their) hosting clouds, but my experience is that all of these clouds actually treat self-service as self-service, and that there really is very little difference between do-it-yourself on-premises and do-it-yourself in cloud.

What would change these economics to the point that it would make sense to run any or all of your applications in an IaaS cloud? Well, I personally think you need to see a real commodity market for compute and storage capacity before you see the pricing that reflects economies in favor of running fixed loads in the cloud. There have been a wide variety of posts about what it would take [pdf] to establish a cloud market in the past, so I won’t go back over that subject here. However, if you are considering “moving my data center to the cloud”, please keep these simple economics in mind.

The PaaS Spectrum: Choosing Your Coding Cloud

October 12, 2008 Leave a comment

Platform as a Service is a fascinating space to me. As I noted in one of my reviews of Google AppEngine when it was released, there is a certain development experience that comes with a good distributed platform that understands both simple development-test cycles, yet also reduces the complexity of delivering highly scalable and reliable applications to a complex data center. With the right platform, and there are many, a development team can leapfrog the hours of pain and effort required to stitch together hardware, software, networking and storage to create a bulletproof web application.

At Cloud Camp Silicon Valley earlier this month, a group of us discussed this in some depth. A crowd of about thirty assorted representatives of cloud vendors and customers alike engaged in a lively discussion of what the elements of cloud oriented architectures are, and how one chooses the right architecture.

I spoke (perhaps too much) about software fluidity, and it was noted that many PaaS platforms limit that fluidity, rather than enable it. Think Google AppEngine or force.com or Bungee Connect. Great products, but not exactly built to make your application portable and dynamic. (Google and others are open sourcing all or part of their platforms, but the ecosystem to port applications isn’t there yet. See below.) So, the conclusion went, perhaps you choose some PaaS offerings when time-to-market was the central tenet of your project, not portability. Others (possibly including IaaS, if you want to be technical) make sense when portability is your primary concern, but you’ll have to do more work to get your application out the door.

This creates a spectrum on which PaaS offerings would fall:

This makes perfect sense to me. Choose to use an “all-in-one” platform from a single software+service+infrastructure vendor, and they can hide much of the complexity of coding, deploying and operating your application from you. On the other hand, select an infrastructure-only vendor using “standard” OS images (typically based on one of the major linux distros) but little else, and you can port your applications to your hearts content, but you’ll have to do all of the configuration of database connection, middleware memory parameters, etc. yourself. Many platforms will lie somewhere in the middle of this spectrum, but the differences between the edges is striking, and most platforms will fall towards one end or the other.

For an example of a relatively “extreme right” platform, take a look at Force.com, the application platform provided by, and tied closely to, Salesforce.com, and its APEX language. How much do they provide in the way of productivity? Well, Rich Unger, an engineer at Salesforce.com (and one of the participants in the CloudCamp SV discussion), has an excellent blog that covers APEX. Here’s one example that he gives:

“Database operations don’t have to establish a connection (much less manage a connection pool) [in APEX]. Also, the object-relational mapping is built into the language. It’s even statically typed. Let’s say you want the first and last names of all your contacts. In Java, there are many ways to set this up, depending on whether you’re using JPA, straight JDBC, entity beans, etc. In general, you must to at least these four things:

  1. Write an entity class, and annotate it or map it to a DB table in an xml file
  2. Configure the DB and connection pool
  3. acquire a connection in the client code
  4. Perform the query

In Apex, you’d do just do #4:

Contact[] mycontacts = [select firstname, lastname from Contact];
for (Contact c : mycontacts) {
System.debug(c.firstname + ' ' + c.lastname);
}

That’s it. You could even shorten this by putting the query right into the for loop. The language knows how to connect to the DB. There’s no configuration involved. I’m not hiding any XML files. Contact is a standard data type. If you wanted a custom data type, you’d configure that through the Salesforce.com UI (no coding). Adding the new type to the DB automatically configures the O-R mapping. Furthermore, if you tried:

Account[] myaccounts = [select firstname, lastname from Contact];

…it wouldn’t even compile. Static typing right on down to the query. Try that by passing strings into Jdbc!”

Freakin’ brilliant! That is, as long as you wanted to write an application that used the Salesforce.com databases and ran on the Force.com infrastructure. Not code that you can run on AppEngine or EC2.

On the other hand, I’ve been working with GoGrid for a little while getting Alfresco to run in a clustered configuration on their standard images. It has been amazing, and helped along both by the fact that GoGrid gives you root access to the virtual server (very cool!), and that the standard Alfresco Enterprise download (trial version available for free) contains a Tomcat instance, and installs with a tar command, a single properties file change and a database script. So, combine a CentOS 64-bit image with Alfresco 2.2 Enterprise, make sure iptables has port 8080 open, and away you go. The best thing is that–in theory–I should be able to grab the relevant files from that CentOS image, copy them to a similar image on, say, Flexiscale, and be up and running in minutes. However, I did have to manage some very techie things; I had to edit iptables, for instance, and know how to confirm that I had the right Java version for Tomcat.

By the way, long term operational issues are similarly affected by your choice of PaaS provider. If you have root access to the server, you must handle a measurable percentage of issues that are driven by configuration changes over time in your system. On the other hand, if your code is running on a complete stack that the vendor maintains for backward compatibility, and that hides configuration issues from you at the get-go, you may not have to do much of anything to keep your system running at reasonable service levels.

Today, the choice is up to you.

I wonder, though, if this spectrum has to be so spread out. For example, as I wrote recently, I see a huge opportunity for application middleware vendors, such as GigaSpaces, BEA and JBOSS, to provide a “portability layer” that would allow both reduced configuration on prebuilt app server/OS images, but at the same time allow the application on top of the app server to be portable to just about any instance of that server in the cloud. (There would likely be more configuration required on the middleware option than the APEX example earlier. For instance, the application server and/or application itself would have to be “pointed” to the database server.)

Google AppEngine should, in theory, be on this list. However, while they open sourced the API and development “simulator”, they have not provided source to the true middleware itself–the so-called Google “magic dust”. Implementing a truly scalable alternative AppEngine platform is an exercise left up to the reader. Has anyone built a true alternative AppEngine-compatible infrastructure yet? I hear rumors of what’s to come, but as far as I know, nothing exists today. So, AppEngine is not yet portable. To be fair, there is no JBOSS “cloud edition” yet, either. GigaSpaces is the only vendor I’ve seen actively pursue this route.

While we are waiting for more flexible options, you are left with a choice to make. Do you need it now at all costs? Do you need it portable at all costs? Or do you need something in between? Where you fall in the PaaS spectrum is entirely up to you.

Elements of a Cloud Oriented Architecture

In my post, The Principles of Cloud Oriented Architectures, I introduced you to the concept of a software system architecture designed with “the cloud” in mind:

“…I offer you a series of posts…describing in depth my research into what it takes to deliver a systems architecture with the following traits:

  1. It partially or entirely incorporates the clouds for at least one layer of the Infrastructure/Platform/Application stack.
  2. Is focused on consumers of cloud technologies, not the requirements of those delivering cloud infrastructures, either public or private (or even dark).
  3. Takes into account a variety of technical, economic and even political factors that systems running in the “cloud” must take into account.
  4. Is focused at least as much on the operational aspects of the system as the design and development aspects

The idea here is not to introduce an entirely new paradigm–that’s the last thing we need given the complexity of the task ahead of us. Nor is it to replace the basic principles of SOA or any other software architecture. Rather, the focus of this series is on how to best prepare for the new set of requirements before us.”

I followed that up with a post (well, two really) that set out to define what our expectations of “the cloud” ought to be. The idea behind the Cloud Computing Bill of Rights was not to lay out a policy platform–though I am flattered that some would like to use it as the basis of one— but rather to set out some guidelines about what cloud computing customers should anticipate in their architectures. In this continuing “COA principles” series, I intend to lay out what can be done to leverage what vendors deliver, and design around what they fail to deliver.

With that basic framework laid out, the next step is to break down what technology elements need to be considered when engineering for the cloud. This post will cover only the list of some such elements as I understand them today (and feel free to use the comments below to add your own insights), and future posts will provide a more thorough analysis of individual elements and/or related groups of elements. The series is really very “stream of consciousness”, so don’t expect too much structure or continuity.

When considering what elements matter in a Cloud Oriented Architecture, we consider first that we are talking about distributed systems. Simply utilizing Salesforce.com to do your Customer Relationship Management doesn’t require an architecture; integrating it with your SAP billing systems does. As your SAP systems most likely don’t run in Salesforce.com data centers, the latter is a distributed systems problem.

Most distributed systems problems have just a few basic elements. For example:

  • Distribution of responsibilities among component parts

  • Dependency management between those component parts

  • Scalability and reliability

    • Of the system as a whole
    • Of each component
  • Data Access and Management

  • Communication and Networking

  • Monitoring and Systems Management

However, because cloud computing involves leveraging services and systems entirely outside of the architect’s control, several additional issues must be considered. Again, for example:

  • How are the responsibilities of a complex distributed system best managed when the services being consumed are relatively fixed in the tasks they can perform?

  • How are the cloud customer’s own SLA commitments best addressed when the ability to monitor and manage components of the system may be below the standards required for the task?

  • How are the economics of the cloud best leveraged?

    • How can a company gain the most work for the least amount of money?
    • How can a company leverage the cloud marketplace for not just cost savings, but also increased availability and system performance?

In an attempt to address the more cloud-specific distributed systems architecture issues, I’ve come up with the following list of elements to be addressed in a typical Cloud Oriented Architecture:

  • Service Fluidity – How does the system best allow for static redeployment and/or “live motion” of component pieces within and across hardware, facility and network boundaries? Specific issues to consider here include:

    • Distributed application architecture, or how is the system designed to manage component dependencies while allowing the system to dynamically find each component as required? (Hint: this problem has been studied thoroughly by such practices as SOA, EDA, etc.)
    • Network resiliency, or how does the system respond to changes in network location, including changes in IP addressing, routing and security?
  • Monitoring – How is the behavior and effectiveness of the system measured and tracked both to meet existing SLAs, as well as to allow developers to improve the overall system in the future? Issues to be considered here include:

    • Load monitoring, or how do you measure system load when system components are managed by multiple vendors with little or know formal agreements of how to share such data with the customer or each other?
    • Cost monitoring, or how does the customer get a accurate accounting of the costs associated with running the system from their point of view?
  • Management – How does the customer configure and maintain the overall system based on current and ongoing technical and business requirements? Examples of what needs to be considered here includes:

    • Cost, or what adjustments can be made to the system capacity or deployment to provide the required amount of service capacity at the lowest cost possible? This includes ways to manage the efficiency of computation, networking and storage.
    • Scalability, or how does the system itself allow changes to capacity to meet required workloads? These changes can happen:
      • vertically (e.g. get a bigger box for existing components–physically or virtually)
      • horizontally (e.g. add or remove additional instances of one or more components as required)
      • from a network latency perspective (adjust the ways in which the system accesses the network in order to increase overall system performance)
    • Availability, or how does the system react to failure or any one component, or any group of components (e.g. when an entire vendor cloud goes offline)?
  • Compliance – How does the overall system meet organizational, industry and legislative regulatory requirements–again, despite being made up of components from a variety of vendors who may themselves provide computing in a variety of legal jurisdictions?

Now comes the fun of breaking these down a bit, and talking about specific technologies and practices that can address them. Please, give me your feedback (or write up your criticism on your own blog, but link here so I can find you). Point me towards references to other ways to think about the problem. I look forward to the conversation.

Update: The Cloud Computing Bill of Rights

Thanks to all that provided input on the first revision of the Cloud Computing Bill of Rights. The feedback has been incredible, including several eye opening references, and some basic concepts that were missed the first time through. An updated “CCBOR” is below, but I first want to directly outline the changes, and credit those that provided input.

  1. Abhishek Kumar points out that government interference in data privacy and security rights needs to be explicitly acknowledged. I hear him loud and clear, though I think the customer can expect only that laws will remain within the constitutional (or doctrinal) bounds of their particular government, and that government retains the right to create law as it deems necessary within those parameters.

    What must also be acknowledged, however, is that customers have the right to know exactly what laws are in force for the cloud systems they choose to use. Does this mean that vendors should hire civil rights lawyers, or that the customer is on their own to figure that out? I honestly don’t know.

  2. Peter Laird’s “The Good, Bad, and the Ugly of SaaS Terms of Service, Licenses, and Contracts” is a must read when it comes to data rights. It finds for enterprises what was observed by NPR the other night for individuals; that you have very few data privacy rights right now, that your provider probably has explicit provisions protecting them and exposing you or your organization, and the cloud exposes risks that enterprises avoid by owning their own clouds.

    This reinforces the notion that we must understand that privacy is not guaranteed in the cloud, no matter what your provider says. As Laird puts it:

    “…[A] customer should have an explicit and absolute right to data ownership regardless of how a contract is terminated.”

  3. Ian Osbourne asks “should there be a right to know where the data will be stored, and potentially a service level requirement to limit host countries?” I say absolutely! It will be impossible for customers to obey laws globally unless data is maintained in known jurisdictions. This was the catalyst for the “Follow the Law Computing” post. Good catch!

  4. John Marsh of GeekPAC links to his own emerging attempt at a Bill of Rights. In it, he points out a critical concept that I missed:

    “[Vendors] may not terminate [customer] account[s] for political statements, inappropriate language, statements of sexual nature, religious commentary, or statements critical of [the vendor’s] service, with exceptions for specific laws, eg. hate speech, where they apply.”

    Bravo, and noted.

  5. Unfortunately, the federal courts have handed down a series of rulings that challenge the ability of global citizens and businesses to do business securely and privately in the cloud. This Bill of Rights is already under grave attack.

Below is the complete text of the second revision of the Cloud Computing Bill of Rights. Let’s call the first “CCBOR 0.1” and this one “CCBOR 0.2”. I’ll update the original post to reflect the versioning.

One last note. My intention in presenting this post was not to become the authority on cloud computing consumer rights. It is, rather, the cornerstone of my Cloud Computing Architecture discussion, in which I need to move on to the next point. I’m working on setting up a WIKI for this “document”. Is there anyone out there in particular that would like to host it?

The Cloud Computing Bill of Rights (0.2)

In the course of technical history, there exist few critical innovations that forever change the way technical economies operate; forever changing the expectations that customers and vendors have of each other, and the architectures on which both rely for commerce. We, the parties entering into a new era driven by one such innovation–that of network based services, platforms and applications, known at the writing of this document as “cloud computing”–do hereby avow the following (mostly) inalienable rights:

  • Article I: Customers Own Their Data

    1. No vendor shall, in the course of its relationship with any customer, claim ownership of any data uploaded, created, generated, modified, hosted or in any other way associated with the customer’s intellectual property, engineering effort or media creativity. This also includes account configuration data, customer generated tags and categories, usage and traffic metrics, and any other form of analytics or meta data collection.

      Customer data is understood to include all data directly maintained by the customer, as well as that of the customer’s own customers. It is also understood to include all source code and data related to configuring and operating software directly developed by the customer, except for data expressly owned by the underlying infrastructure or platform provided by the vendor.

    2. Vendors shall always provide, at a minimum, API level access to all customer data as described above. This API level access will allow the customer to write software which, when executed against the API, allows access to any customer maintained data, either in bulk or record-by-record as needed. As standards and protocols are defined that allow for bulk or real-time movement of data between cloud vendors, each vendor will endeavor to implement such technologies, and will not attempt to stall such implementation in an attempt to lock in its customers.

    3. Customers own their data, which in turn means they own responsibility for the data’s security and adherence to privacy laws and agreements. As with monitoring and data access APIs, vendors will endeavor to provide customers with the tools and services they need to meet their own customers’ expectations. However, customers are responsible for determining a vendor’s relevancy to specific requirements, and to provide backstops, auditing and even indemnification as required by agreements with their own customers.

      Ultimately, however, governments are responsible for the regulatory environments that define the limits of security and privacy laws. As governments can choose any legal requirement that works within the constraints of their own constitutions or doctrines, customers must be aware of what may or may not happen to their data in the jurisdictions in which data resides, is processed or is referenced. As constitutions vary from country to country, it may not even be required for governments to inform customers what specific actions are taken with or against their data. That laws exist that could put their data in jeopardy, however, is the minimum that governments convey to the market.

      Customers (and their customers) must leverage the legislative mechanisms of any jurisdiction of concern to change those parameters.

      In order for enough trust to be built into the online cloud economy, however, governments should endeavor to build a legal framework that respects corporate and individual privacy, and overall data security. While national security is important, governments must be careful not to create an atmosphere in which the customers and vendors of the cloud distrust their ability to securely conduct business within the jurisdiction, either directly or indirectly.

    4. Because regulatory effects weigh so heavily on data usage, security and privacy, vendors shall, at a minimum, inform customers specifically where their data is housed. A better option would be to provide mechanisms by which users can choose where their data will be stored. Either way, vendors should also endeavor to work with customers to assure that their systems designs do not conflict with known legal or regulatory obstacles. This is assumed to apply to primary, backup and archived data instances.
  • Article II: Vendors and Customers Jointly Own System Service Levels

    1. Vendors own, and shall do everything in their power to meet, service level targets committed to with any given customer. All required effort and expense necessary to meet those explicit service levels will be spent freely and without additional expense to the customer. While the specific legally binding contracts or business agreements will spell out these requirements, it is noted here that these service level agreements are entered into expressly to protect both the customer’s and vendor’s business interests, and all decisions by the vendor will take both parties equally into account.

      Where no explicit service level agreement exists with a customer, the vendor will endeavor to meet any expressed service level targets provided in marketing literature or the like. At no time will it be acceptable for a vendor to declare a high level of service at a base price, only to later indicate that that level of service is only available at a higher premium price.

      It is perfectly acceptable, however, for a vendor to expressly sell a higher level of service at a higher price, as long as they make that clear at all points where a customer may evaluate or purchase the service.

    2. Ultimately, though, customers own their service level commitments to their own internal or external customers, and the customer understands that it is their responsibility to take into account possible failures by each vendor that they do business with.

      Customers relying on a single vendor to meet their own service level commitments enter into an implicit agreement to tie their own service level commitments to the vendor’s, and to live and die by the vendor’s own infrastructure reliability. Those customers who take their own commitments seriously will seek to build or obtain independent monitoring, failure recovery and disaster recovery systems.

    3. Where customer/vendor system integration is necessary, the vendor’s must offer options for monitoring the viability of that integration at as many architectural levels as required to allow the customer to meet their own service level commitments. Where standards exist for such monitoring, the vendor will implement those standards in a timely and complete fashion. The vendor should not underestimate the importance of this monitoring to the customer’s own business commitments.

    4. Under no circumstances will vendors terminate customer accounts for political statements, inappropriate language, statements of sexual nature, religious commentary, or statements critical of the vendor’s service, with exceptions for specific laws, e.g. hate speech, where they apply.
  • Article III: Vendors Own Their Interfaces

    1. Vendors are under no obligation to provide “open” or “standard” interfaces, other than as described above for data access and monitoring. APIs for modifying user experience, frameworks for building extensions or even complete applications for the vendor platform, or such technologies can be developed however the vendor sees fit. If a vendor chooses to require developers to write applications in a custom programming language with esoteric data storage algorithms and heavily piracy protected execution systems, so be it.

      If it seems that this completely abdicates the customer’s power in the business relationship, this is not so. As the “cloud” is a marketplace of technology infrastructures, platforms and applications, the customer exercises their power by choosing where to spend their hard earned money. A decision to select a platform vendor that locks you into proprietary Python libraries, for instance, is a choice to support such programming lock-in. On the other hand, insistence on portable virtual machine formats will drive the market towards a true commodity compute capacity model.

      The key reason for giving vendors such power is to maximize innovation. By restricting how technology gets developed or released, the market risks restricting the ways in which technologists can innovate. History shows that eventually the “open” market catches up to most innovations (or bypasses them altogether), and the pace at which this happens is greatly accelerated by open source. Nonetheless, forcing innovation through open source or any other single method runs the risk of weakening capitalist entrepreneurial risk taking.

    2. The customer, however, has the right to use any method legally possible to extend, replicate, leverage or better any given vendor technology. If a vendor provides a proprietary API for virtual machine management in their cloud, customers (aka “the community”, in this case) have every right to experiment with “home grown” implementations of alternative technologies using that same API. This is also true for replicating cloud platform functionality, or even complete applications–though, again, the right only extends to legal means.

      Possibly the best thing a cloud vendor can do to extend their community, and encourage innovation on their platform from community members is to open their platform as much as possible. By making themselves the “reference platform” for their respective market space, an open vendor creates a petrie dish of sorts for cultivating differentiating features and successes on their platform. Protective proprietary vendors are on their own.

These three articles serve as the baseline for customer, vendor and, as necessary, government relationships in the new network-based computing marketplace. No claim is made that this document is complete, or final. These articles may be changed or extended at any time, and additional articles can be declared, whether in response to new technologies or business models, or simply to reflect the business reality of the marketplace. It is also a community document, and others are encouraged to bend and shape it in their own venues.

The Cloud Computing Bill of Rights

Update: Title and version number added before Cloud Computing Bill of Rights text below.

Before you architect your application systems for the cloud, you have to set some ground rules on what to expect from the cloud vendors you either directly or indirectly leverage. It is important that you walk into these relationships with certain expectations, in both the short and long term, and both those that protect you and those that protect the vendor.

This post is an attempt to capture many of the core rights that both customers and vendors of the cloud should come to expect, with the goal of setting that baseline for future Cloud Oriented Architecture discussions.

This is but a first pass, presented to the community for feedback, discussion, argument and–if deserved–derision. Your comments below will be greatly appreciated in any case.

The Cloud Computing Bill of Rights (0.1)

In the course of technical history, there exist few critical innovations that forever change the way technical economies operate; forever changing the expectations that customers and vendors have of each other, and the architectures on which both rely for commerce. We, the parties entering into a new era driven by one such innovation–that of network based services, platforms and applications, known at the writing of this document as “cloud computing”–do hereby avow the following (mostly) inalienable rights:

  • Article I: Customers Own Their Data

    1. No vendor shall, in the course of its relationship with any customer, claim ownership of any data uploaded, created, generated, modified, hosted or in any other way associated with the customer’s intellectual property, engineering effort or media creativity. This also includes account configuration data, customer generated tags and categories, usage and traffic metrics, and any other form of analytics or meta data collection.

      Customer data is understood to include all data directly maintained by the customer, as well as that of the customer’s own customers. It is also understood to include all source code and data related to configuring and operating software directly developed by the customer, except for data expressly owned by the underlying infrastructure or platform provided by the vendor.

    2. Vendors shall always provide, at a minimum, API level access to all customer data as described above. This API level access will allow the customer to write software which, when executed against the API, allows access to any customer maintained data, either in bulk or record-by-record as needed. As standards and protocols are defined that allow for bulk or real-time movement of data between cloud vendors, each vendor will endeavor to implement such technologies, and will not attempt to stall such implementation in an attempt to lock in its customers.

  • Article II: Vendors and Customers Jointly Own System Service Levels

    1. Vendors own, and shall do everything in their power to meet, service level targets committed to with any given customer. All required effort and expense necessary to meet those explicit service levels will be spent freely and without additional expense to the customer. While the specific legally binding contracts or business agreements will spell out these requirements, it is noted here that these service level agreements are entered into expressly to protect the customer’s business interests, and all decisions by the vendor will take this into account.

      Where no explicit service level agreement exists with a customer, the vendor will endeavor to meet any expressed service level targets provided in marketing literature or the like. At no time will it be acceptable for a vendor to declare a high level of service at a base price, only to later indicate that that level of service is only available at a higher premium price.

      It is perfectly acceptable, however, for a vendor to expressly sell a higher level of service at a higher price, as long as they make that clear at all points where a customer may evaluate or purchase the service.

    2. Ultimately, though, customers own their service level commitments to their own internal or external customers, and the customer understands that it is their responsibility to take into account possible failures by each vendor that they do business with.

      Customers relying on a single vendor to meet their own service level commitments enter into an implicit agreement to tie their own service level commitments to the vendor’s, and to live and die by the vendor’s own infrastructure reliability. Those customers who take their own commitments seriously will seek to build or obtain independent monitoring, failure recovery and disaster recovery systems.

    3. Where customer/vendor system integration is necessary, the vendor’s must offer options for monitoring the viability of that integration at as many architectural levels as required to allow the customer to meet their own service level commitments. Where standards exist for such monitoring, the vendor will implement those standards in a timely and complete fashion. The vendor should not underestimate the importance of this monitoring to the customer’s own business commitments.

  • Article III: Vendors Own Their Interfaces

    1. Vendors are under no obligation to provide “open” or “standard” interfaces, other than as described above for data access and monitoring. APIs for modifying user experience, frameworks for building extensions or even complete applications for the vendor platform, or such technologies can be developed however the vendor sees fit. If a vendor chooses to require developers to write applications in a custom programming language with esoteric data storage algorithms and heavily piracy protected execution systems, so be it.

      If it seems that this completely abdicates the customer’s power in the business relationship, this is not so. As the “cloud” is a marketplace of technology infrastructures, platforms and applications, the customer exercises their power by choosing where to spend their hard earned money. A decision to select a platform vendor that locks you into proprietary Python libraries, for instance, is a choice to support such programming lock-in. On the other hand, insistence on portable virtual machine formats will drive the market towards a true commodity compute capacity model.

      The key reason for giving vendors such power is to maximize innovation. By restricting how technology gets developed or released, the market risks restricting the ways in which technologists can innovate. History shows that eventually the “open” market catches up to most innovations (or bypasses them altogether), and the pace at which this happens is greatly accelerated by open source. Nonetheless, forcing innovation through open source or any other single method runs the risk of weakening capitalist entrepreneurial risk taking.

    2. The customer, however, has the right to use any method legally possible to extend, replicate, leverage or better any given vendor technology. If a vendor provides a proprietary API for virtual machine management in their cloud, customers (aka “the community”, in this case) have every right to experiment with “home grown” implementations of alternative technologies using that same API. This is also true for replicating cloud platform functionality, or even complete applications–though, again, the right only extends to legal means.

      Possibly the best thing a cloud vendor can do to extend their community, and encourage innovation on their platform from community members is to open their platform as much as possible. By making themselves the “reference platform” for their respective market space, an open vendor creates a petrie dish of sorts for cultivating differentiating features and successes on their platform. Protective proprietary vendors are on their own.

These three articles serve as the baseline for customer/vendor relationships in the new network-based computing marketplace. No claim is made that this document is complete, or final. These articles may be changed or extended at any time, and additional articles can be declared, whether in response to new technologies or business models, or simply to reflect the business reality of the marketplace. It is also a community document, and others are encouraged to bend and shape it in their own venues.

Comments, complaints or questions can be directed to the author through the comments section below.

The Principles of a Cloud Oriented Architecture

The market is hot. The technologies are appearing fast and furious. The tools you need are out there, but they are young, often untested, and always deliver unpredictable reliability. You’ve researched the economics, and you know now that cloud computing is a) here to stay, and b) offers economic advantages that–if realized–could stretch you IT budget and quite possibly catapult your career.

Now what?

What is often overlooked in the gleeful rush to cloud computing is the difficulty in molding the early technologies in the space into a truly bulletproof (or even bullet-resistant) business infrastructure. You see it all over the Internet; the push and pull between innovation and reliability, the concerns about security, monitoring and control, even the constant confusion over what entails cloud computing, what technologies to select for a given problem, and how to create an enterprise-class business system out of those technologies.

The truth is, cloud computing doesn’t launch our technical architectures into the future. It is, at its heart, an economic model that drives the parameters around how you acquire, pay for and scale the infrastructure architectures you already know. Its not a question of changing the required problems to solve when utilizing data centers, just a change to the division of responsibilities amongst yourself, your organization, your cloud providers and the Internet itself.

To this end, I offer you a series of posts (perhaps moving to a WIKI in the near future) describing in depth my research into what it takes to deliver a systems architecture with the following traits:

  1. It partially or entirely incorporates the clouds for at least one layer of the Infrastructure/Platform/Application stack.
  2. Is focused on consumers of cloud technologies, not the requirements of those delivering cloud infrastructures, either public or private (or even dark).
  3. Takes into account a variety of technical, economic and even political factors that systems running in the “cloud” must take into account.
  4. Is focused at least as much on the operational aspects of the system as the design and development aspects

The idea here is not to introduce an entirely new paradigm–that’s the last thing we need given the complexity of the task ahead of us. Nor is it to replace the basic principles of SOA or any other software architecture. Rather, the focus of this series is on how to best prepare for the new set of requirements before us.

Think about it. We already deal (or try to deal) with a world in which we don’t entirely have control over every aspect of the world our applications live in. If we are software developers, we rely on others to build our servers, configure our networks, provide us storage and weld them all together into a cohesive unit. System administrators are, in large enterprises anyway, specializing in OS/application stacks, networking, storage or system management. (Increasingly you can add facilities and traditional utilities to this list.)

Even when we outsource to others–shifting responsibility for management of parts or all of our IT infrastructure to a vendor–the vendor doesn’t have control over significant elements of the end-to-end operations of our applications; namely, the Internet itself. But with outsourcing, we typically turn over entire, intact architecture stacks, with a few, very well bounded integration points to manage (if any) between outsourced systems and locally maintained systems.

The cloud is going to mess this up. I say this not just because the business relationship is different from outsourcing, but also because what you are “turning over” can be a *part* of a system stack. Smugmug outsources storage and job processing, but not the web experience that relies on both. Applications that run entirely on EC2/S3 outsource the entire infrastructure, but not the application development, or even the application system management. (This is why RightScale, Hyperic and others are finding some traction with AWS customers.)

To prepare for a cloud oriented architecture, one understand what responsibilities lie where. So, I’ll give you a teaser of what is to come with the short-short version of where I see these responsibilities lying (subject to change as I talk to others, including yourselves if you choose to comment on this post):

  • The enterprise has responsibility for the following:
    • Defining the business solution to be solved, the use cases that define that solution, and the functional requirements to deliver those use cases
    • Evaluating the selection of technical and economic approaches for delivering those functional requirements, and selecting the best combination of the two. (In other words, the best combination may not contain either the best technical or best economic selection, but will outweigh any other combination of the two.)
    • Owning the service level agreements with the business for the delivery of those use cases. This is critically important. More on this below.
  • The cloud provider has responsibility for the following:
    • Delivering what they promised you (or the market) that they would deliver. No more, no less.
    • Providing you with transparent and honest billing and support services.
  • The Internet itself is only responsible for providing you with an open, survivability reliable infrastructure for interconnecting the networks you need to run your applications and/or services. There are no promises here about reliability or scalability or even availability. It should be considered a technical wilderness, and treated accordingly.

Now, about SLAs. Your cloud provider does not own your SLAs, you do. They may provide some SLAs that support your own, but they are not to be blamed if you fail to achieve the SLAs demanded of you. If your applications or services fail because the cloud failed, you failed. Given that, don’t “outsource” your SLAs, at least not logically. Own them.

In fact, I would argue that the single most important function of a cloud-centric IT shop after getting required business functionality up and running in the first place, is monitoring and actively managing that functionality; switching vendors, if necessary, to continue service at required levels. The one big piece of IT-specific software that should always run in IT data centers, in my opinion, is the NOC infrastructure. (Although, perhaps in this context its more of a Cloud Operations Center, but I hate the resulting acronym for obvious reasons.)

I’ll focus more on these responsibilities in future posts. All posts in this series will be tagged “coa principles”. Please feel free to provide me feedback in the comments, contact me to review your thoughts on this topic, or simply to send me links that you think I should be aware of. I am also working to find other bloggers who wish to take ownerships of parts of this primer (cloud security, for example) so let me know if you are interested there as well.

I am excited about this. This body of knowledge (or at least the faint traces of knowledge) have been rattling inside my head for some time, and it feels good to finally be sharing them with you.