Archive for the ‘Uncategorized’ Category

The art of failure in cloud’s complex system

After reading Sydney Dekker’s insightful book, Drift into Failure, about evaluating failure in a complex systems environment, I’m currently fascinated by all of the ways cloud computing will challenge the “best practices” of the client-server era. Dekker explores disaster avoidance and evaluation in complex adaptive systems environments, where small changes in initial conditions lead to large variances in outcome, repeating individual actions rarely results in the same outcome system-wide, and pinpointing “causes” of failure is nearly impossible.

Put that into the context of IT. Think about all of the ways we have tried to lock down the environment in which an application runs, so that there are no variances in initial conditions (in theory—more on that later). For decades now, we’ve been trying to guarantee that each layer of the computing stack is an environment that is predictable, reliable, and stable.

Is cloud a poor foundation?

Unfortunately, cloud computing screws that up—big time. For clouds, resource pools create a combinatoric problem where even the most reliable individual components combine to give you a largely unstable system-wide infrastructure. If you have 1000 servers that are 99.9% reliable (about 1 failure/3 years), the resulting chance of server failure within the system as a whole will be 1-(.999^1000), or 59.9.%, or about 3 failures every five days (if my rusty math is right).

99.98% reliability (about 1 failure every 10 years for each device) at the same scale still leaves you with about 1 failure every four days.

Now apply that math to the entire cloud infrastructure, services, applications and data environment. Your rate of something in the system failing will astound you. A rock solid foundation, rigid and unchanging, is impossible.

In fact, trying to build a rigid architecture for any software system in the cloud is asking for stability. As a recent post about the stability vs reliability tradeoff (in the context of economics) makes quite clear, stability doesn’t work out quite the way you’d expect in a complex system. So, software in the cloud has to be flexible, rather than rigid.

Cloud’s answer: resiliency

A resilient foundation, on the other hand, is quite achievable, if you treat the cloud as the complex system that it is. This is the key thing that Dekker has taught me so far. When designing applications, don’t concentrate on drilling down farther and farther in the design specs trying to make sure each function is perfectly designed and optimized for a static set of conditions.

Instead, think “up and out”. For example:

  • Design the application from components that are themselves designed to survive in an unstable systems environment.
  • Make sure every call out to an API assumes that API is untrustworthy (in the performance and availability sense—if not the security and functionality sense).
  • Make sure all service functions are written to handle even the most ridiculous inputs from future clients.
  • Put limits on the amount of time you’ll wait for a remote service to respond.
  • Have backup contingencies for data if the primary source of that data is unavailable—such as falling back to a cached value, if applicable.

This is the art of building components for a complex adaptive computing environment. Think about what can go wrong with the relationships between things with the same (or greater) fervor as what can go wrong with each individual thing.

That’s the essence of how complex adaptive systems will most radically change the “best practices” of IT: the center of availability design will have to shift from the vertical stack of a single unit of deployment (an application or a service), to the relationships between all elements of the total system being operated, including the system’s relationships with other external factors, including other systems.

In future posts, I want to explore this topic in much more depth. However, to get you started, I highly recommend catching up on what Netflix is doing in this regards. Two posts in particular, “Making the Netflix API More Resilient” and “Fault Tolerance in a High Volume, Distributed System“, are especially practical if you are building a complex cloud application at scale.

Oh, and one other thing.  How fun is this going to be for enterprise architects…? 😉

Categories: Uncategorized

Five resources for learning complex adaptive systems

February 21, 2012 3 comments

As I get started contributing more regularly to this blog, I wanted to give everyone a baseline for understanding what I mean when I talk about complex adaptive systems (or CAS). It’s not an easy subject to master—not because the core concepts are difficult, but because the effects of those concepts on the world around as are so rich, varied and profound.

So, I quickly wanted to make everyone aware of a few resources that I consider great ways to get in touch with the core of CAS. Some of them are relatively passive—you can just read or watch them at your leisure. Others are quite interactive, and you can use them to get “hands-on” experience. All of them are the most engaging that I’ve found to date.

  1. Complexity: The Emerging Science At The Edge Of Order And Chaos, by M. Mitchell Waldrop  At this point, this book might be called a classic, given the fact that it was originally published in 1992, but it is still the most engaging introduction to the multi-disciplinary beginnings of complexity theory that I am aware of. By telling the story of the famed Santa Fe Institute, and how key members got interested in both the beginnings of complexity theory and the institute itself, Waldrop manages to bring newcomers to the subject in both an intelligent and entertaining fashion. A must read if you don’t know much of anything about complexity theory and CAS.
  2. Complex Adaptive Systems: An Introduction to Computational Models of Social Life, by John H Miller and Scott E Page – This is an excellent guide to the computational models that scientists are using to understand and experiment with complex adaptive systems. Chapter after chapter introduce you to models both simple and complex, with enough information about how the models were formed that you can create them yourself in the programming environment of your choice. However, even if you don’t choose to recreate the models, it is an eye-opening read.
  3. TUDelft SPM 4530 and SPM 9555 – “Agent Based Modeling of complex energy and industrial networks” and “Agent Based Modeling of Complex Adaptive Systems – Advanced” – I originally found this course material via iTunes, but this Wiki page is the official page of a freely available course delivered by Igor Nikolic, Assistant Professor at the Energy and Industry group, Faculty of Technology, Policy and Management faculty, Delft University of Technology. You can step through each of his lectures, as well as see the practical assignments. The only thing you can’t get access to is the actual output of the students actually taking the class, which makes a ton of sense to me. The cool thing is, the practicals use the next tool extensively. There is hours of material here, so be prepared to spend a few nights and weekends on it.
  4. NetLogo  – This is just about the coolest tool I’ve found to date for modeling complex adaptive systems. Described as a “multi-agent programmable modeling environment”, NetLogo provides a basic engine and a simple modelling language that allows you to create all kinds of amazing CAS models. It also has a big list of sample models, including classics such as flocking birds, forest fires and even a model of PageRank from Larry Page and Sergey Brin’s Stanford days. This one is really fun for understanding a large cross section of CAS modeling. One of these days, I’m hoping to create an original “from scratch” model of some aspect of cloud computing in this tool.
  5. Drift into Failure, by Sidney Dekker – I am currently reading this one, so it may be a bit premature to recommend it as a starting place. It certainly is a dense work, but what makes it important to me is it’s clear statement of situations where complexity takes many apparently good, honest decisions and aggregates them into less than desirable outcomes. This is a must read for those who get CAS to a certain extent, but need to better understand why CAS means we have to think differently about software and systems design and operations in the cloud. This is the very meat of what I think most programmers and operators don’t understand about cloud today.

There will be other resources that I will share with you as time goes by. (For example, Adrian Cockcroft of Netflix pointed me to this post which I haven’t digested yet, but which title intrigues me to no end.) In the meantime, please enjoy these, and please post comments with your own recommendations for key sources of complex adaptive systems knowledge. Or cloud knowledge, for that matter. I look forward to learning from you as much as sharing my own learning with you.

Categories: Uncategorized

The Wisdom of Clouds is moving!!!

December 8, 2008 Leave a comment

Finally! It’s been a long time coming, but the “morphing” of The Wisdom of Clouds I’ve hinted at a couple of times in the last month is finally here. Dan Farber and Margaret Kane, the good editors the CNET, have agreed to publish this blog (with a slight name change) on the CNET Blog Network. Hence forth the blog will be titled “The Wisdom of the Clouds”, and located at Please go there and subscribe today.

CNET is one of the most respected IT news sources, and with about 15 million unique visitors a month this is a huge opportunity to broaden the cloud computing discussion to the mainstream IT community. The other members of the CNET Blog Network include such thought leaders as Matt Asay, Gordon Haff and Peter N. Glaskowsky, and I am humbled to be listed among them.

However, I also want to recognize and thank each of you for helping to make The Wisdom of Clouds what it is today. At the beginning of 2008, I had a little over 120 subscribers. This last week saw a record 948 subscribers, with over 200 of you reading each new post within 24 hours of it hitting the feeds, and about 50 more reading the same on the blog pages itself. It has been tremendously enriching to see the uptake in interest, and I am grateful to each of you for your interest, attention and feedback. Thank you.

Unfortunately, this transition will not be without its inconveniences. As you may have guessed, I will no longer be publishing to this site; for now will become an archive site for the two years or so of posts that I’ve written since early in my Cassatt days. I will frequently reference back to those posts initially, but all new material will appear at CNET. If you want to follow where the conversation goes from here, it is important that you go the the CNET URL and subscribe.

I will probably continue to publish my bookmarks to the existing feed for a while, but I want to consolidate that traffic with the article publications over time. Stay tuned for how that will work out. I won’t be bookmarking my own posts as a rule; thus subscribe to the new feed.

Please let me know if you have any problems or concerns with the transition, and I hope that each and every one of you will continue to be a part of my own education about the cloud and its consequences. As always, I can be reached at jurquhart at (ignore this) yahoo dot com.

Again, thank you all, and I’ll see you on The Wisdom of the Clouds.

Categories: Uncategorized

The Two Faces of Cloud Computing

December 6, 2008 Leave a comment

One of the fun aspects of a nascent cloud computing market is that there are “veins” of innovative thinking to be mined from all of the hype. Each of us discover these independently, though the velocity of recognition increases greatly as the effects of “asymmetrical follow” patterns take effect. Those “really big ideas” of cloud computing usually start as a great observation by one or a few independent bloggers. If you are observant, and pay attention to patterns in terminology and concepts, you can get a jump on the opportunities and intellectual advances triggered by a new “really big idea”.

One of these memes that I have been noticing more and more in the last week is that of the two-faceted cloud; the concept that cloud computing is beginning to address two different market needs, that of large scale web applications (the so-called “Web 2.0” market), and that of traditional data center computing (the so-called “Enterprise” market). As I’ll try to explain, this is a “reasonably big idea” (or perhaps “reasonably big observation” is a more accurate portrayal).

I first noticed the meme when I was made aware of a Forrester report titled “There Are Two Types Of Compute Clouds: Server Clouds And Scale-Out Clouds Serve Very Different Customer Needs”, written by analyst Frank E. Gillett. The abstract gives the best summary of the concept that I’ve found to date:

“Cloud computing is a confusing topic for vendor strategists. One reason? Most of us confuse two fundamentally different types of compute clouds as one. Server clouds support the needs of traditional business apps while scale-out clouds are designed for massive, many-machine workloads such as Web sites or grid compute applications. Scale-out clouds differ from server clouds in five key ways: 1) much larger workloads; 2) loosely coupled software architecture; 3) fault tolerance in software, not hardware; 4) simple state management; and 5) server virtualization is for provisioning flexibility — not machine sharing. Strategists must update their server virtualization plans to embrace the evolution to server cloud, while developing a separate strategy to compete in the arena for scale-out clouds.”

Get it? There are two plans of attack for an enterprise looking to leverage the cloud:

  • How do you move existing load to the IaaS, PaaS, and SaaS providers?
  • How do you leverage the new extremely large scale infrastructures used by the Googles and Amazons of the world to create new competitive advantage?

Around then I started seeing references to other posts that suggested the same thing; that there are two customers for the cloud: those that need to achieve higher scale at lower costs than possible before, and those that want to eliminate data center capital in favor of a “pay-as-you-go” model.

I’m not sure how revolutionary this observation is (obviously many people noticed it before it clicked with me), but it is important. Where is it most obvious? In my opinion, the three PaaS members of the “big four” are good examples:

  • Google is the sole Scale-out vendor on the list…for now. I hear rumors that Microsoft may explore this as well, but for now it is not Mr. Softy’s focus.
  • Microsoft’s focus is, on the other hand, the enterprise. By choosing a .NET centric platform, Azure, complete with Enterprise Service Bus and other integration-centric technologies, they have firmly targeted the corporate database applications that run so much of our economy today.
  • is perhaps the most interesting in that they chose to compete for enterprises with and Sites, but through a “move all your stuff here” approach. Great for the users, but perhaps a disadvantage to those wishing to build stand-alone systems, much less those wishing to integrate with their on-premises SAP instances.

The point here, I guess, is that comparisons between Scale-out and Enterprise clouds, while sometimes tempting (especially in the Google vs. Microsoft case), are rather useless. They serve different purposes, often for completely different audiences, and enterprise IT organizations would do better to focus their efforts on the specific facet of cloud computing that applies to a given project. If you are a budding PaaS vendor, understand the distinction, and focus on the technologies required to meet your market’s demand. Don’t try to be “all cloud to all people”.

Except, possibly, if you are Microsoft…

Categories: Uncategorized

What is the value of IT convenience?

November 29, 2008 Leave a comment

RPath’s Billy Marshall wrote a post that is closely related to a topic I have been thinking about a lot lately. Namely, Billy points out that the effect of server virtualization hasn’t been to satisfy the demand on IT resources, but simply to accelerate that demand through simplifying resource allocation. Billy gives a very clear example of what he means:

“Over the past 2 weeks, I have had a number of very interesting conversations with partners, prospects, customers, and analysts that lead me to believe that a virtual machine tsunami is building which might soon swamp the legacy, horizontal system management approaches. Here is what I have heard:

Two separate prospects told me that they have quickly consumed every available bit of capacity on their VMware server farms. As soon as they add more capacity, it disappears under the weight of an ever pressing demand of new VMs. They are scrambling to figure out how they manage the pending VM sprawl. They are also scrambling to understand how they are going to lower their VMware bill via an Amazon EC2 capability for some portion of the runtime instances.

Two prominent analysts proclaimed to me that the percentage of new servers running a hypervisor as the primary boot option will quickly approach 90% by 2012. With all of these systems sporting a hypervisor as the on-ramp for applications built as virtual machines, the number of virtual machines is going to explode. The hypervisor takes the friction out of the deployment process, which in turn escalates the number of VMs to be managed.”

The world of Infrastructure as a Service isn’t really any different:

Amazon EC2 demand continues to skyrocket. It seems that business units are quickly sidestepping those IT departments that have not yet found a way to say “yes” to requests for new capacity due to capital spending constraints and high friction processes for getting applications into production (i.e. the legacy approach of provisioning servers with a general purpose OS and then attempting to install/configure the app to work on the production implementation which is no doubt different than the development environment). I heard a rumor that a new datacenter in Oregon was underway to support this burgeoning EC2 demand. I also saw our most recent EC2 bill, and I nearly hit the roof. Turns out when you provide frictionless capacity via the hypervisor, virtual machine deployment, and variable cost payment, demand explodes. Trust me.”

Billy isn’t the only person I’ve heard comment about their EC2 bill lately. Justin Mason commented on my post, “Do Your Cloud Applications Need to be Elastic?”:

“[W]e also have inelastic parts of the infrastructure that could be hosted elsewhere at a colo for less cost, and personally, I would probably have done this given the choice; but mgmt were happier just to use EC2 as widely as possible, despite the additional costs, since it keeps things simpler.”

In each case, management chooses to pay more for convenience.

I think these examples demonstrate an important decision point for IT organizations, especially during these times of financial strife. What is the value of IT convenience? When is it wise to choose to pay more dollars (or euros, or yen, or whatever) to gain some level of simplicity or focus or comfort? In the case of virtualization, is it always wise to leverage positive economic changes to expand service coverage? In the case of cloud computing, is it always wise to accept relatively high price points per CPU hour over managing your own cheaper compute loads?

I think there are no simple answers, but there are some elements that I would consider if the choice was mine:

  • Do I already have the infrastructure and labor skills I need to do it just as well or better than the cloud? If I were to simply apply some automation to what I already have, would it deliver the elasticity/reliability/agility I want without committing a monthly portion of my corporate revenues to an outside entity?

  • Is virtualization and/or the cloud the only way to get the agility I need to meet my objectives? The answer here is often “yes” for virtualization, but is it as frequently for cloud computing?

  • Do I have the luxury of cash flow that allows for me to spend up a little for someone else to worry about problems that I would have to handle otherwise? Of course, this is the same question that applies to outsourcing, managed hosting, etc.

One of the reasons you’ve seen a backlash against some aspects of cloud computing, or even a rising voice to the “its the same thing we tried before” argument, is that much of the marketing hype out there is staring to ignore the fact that cloud computing costs money; costs enough to provide a profit to the vendor. Yes, it is true that many (most?) IT organizations have lacked the ability to deliver the same efficiencies as the best cloud players, but that can change and change quickly if those same organizations were to look to automation software and infrastructure to provide that efficiency.

My advice to you: if you already own data centers, and if you want convenience on a budget, balance the cost of Amazon/GoGrid/Mosso/whoever with the value delivered by Arjuna/3TERA/Cassatt/Enomaly/etc./etc./etc., including controlling your virtualization sprawl and preparing you for using the cloud in innovative ways. Consider making your storage and networking virtualization friendly.

Sometimes convenience starts at home.

Categories: Uncategorized

Two! Two! Two! Two great Overcast podcasts for your enjoyment

November 29, 2008 Leave a comment

It’s been a busy week or so for Geva Perry and I, as we took Overcast to a joint podcast with John Willis’s CloudCafe podcast, and had a fabulous discussion with Greg Ness of Both podcasts are available from the Overcast blog.

The discussion with John focused on definitions in the cloud computing space, and some of the misconceptions that people have about the cloud, what it can and can’t do for you, and what all that crazy terminology refers to. John is an exceptionally comfortable host, and his questions drove a deep conversation about what the cloud is, various components of cloud computing, and adjunct terms like “cloudbursting”. It was a lot of fun to do, and I am grateful for John’s invitation to do this.

Greg Ness demonstrated his uniquely deep understanding of what network security entails in a virtualized data center, and how automation is the lynch pin of protecting that infrastructure. Topics ranged from this year’s DNS exploit and the pace at which systems are getting patched to address it, to the reasons why the static network we all knew and loved is DOA in a cloud (or even just a virtualized) world. I really admire Greg, and find his ability to articulate difficult concepts with the help of historical insight very appealing. I very much appreciate his taking time out of his busy day to join us.

We are busy lining up more great guests for future podcasts, so stay tuned–or better yet, subscribe to Overcast at the Overcast blog.

Categories: Uncategorized

Is IBM the utlimate authority on cloud computing?

November 24, 2008 Leave a comment

There was an interesting announcement today from IBM regarding their new “Resiliant Cloud” seal of approval–a marketing program targeted at cloud providers, and at customers of the cloud. The idea is simple, if I am reading this right:

  • IBM gets all of the world’s cloud vendors to pay them a services fee to submit their services to a series of tests that validate (or not) whether the cloud is resiliant, secure and scalable. Should the vendor’s offering pass, they get to put a “Resiliant Cloud” logo on their web pages, etc.

  • Customers looking for resiliant, secure and scalable cloud infrastructure then can select from the pool of “Resiliant Cloud” offerings to build their specific cloud-based solutions. Oh, and they can hire IBM services to help them distinguish when to go outside for their cloud infrastructure, and when to convert their existing infrastructure. I’m sure IBM will give a balanced analysis as to the technology options here…

I’m sorry, but I’m a bit disappointed with this. IBM has been facing a very stiff “innovator’s dilemma” when it comes to cloud computing, as noted by GigaOm’s Stacy Higgenbotham:

“IBM has been pretty quiet about its cloud efforts. In part because it didn’t want to hack off large customers buying a ton of IBM servers by competing with them. The computing giant hasn’t been pushing its own cloud business until a half-hearted announcement at the end of July, about a month and half after a company exec had told me IBM didn’t really want to advertise its cloud services.”

She goes on to note, however, that IBM has some great things in the works, including a research project in China that shows great promise. That’s welcome news, and I look forward to IBM being a major player on the cloud computing stage again. However, this announcement is just an attempt at making IBM the “godfather” of the cloud market, and that’s not interesting in the least.

Still, I bet if you want to be an IBM strategic partner, you’d better get on board with the program. Amazon, are you going to pay the fee? Microsoft? Google? Anyone?

Categories: Uncategorized