Archive for the ‘complexity’ Category

Exploring cloud and complex systems

February 13, 2012 3 comments

This post—this blog—have been a long time coming.

While many may know me from my long journey exploring cloud computing (the first few years of which are archived to this blog, followed by three years at CNET writing The Wisdom of Clouds, and now my continuing work on GigaOm/cloud), for some time now I’ve been keenly interested in a more specific topic under the cloud computing umbrella, namely how cloud computing is driving application architectures to adopt the traits of complex adaptive systems.

Complex adaptive systems (CAS) are fascinating beasts. Described by complex systems pioneer John Holland as “systems that have a large numbers of components, often called agents, that interact and adapt or learn”, CAS are the reason major systems in nature work—from biology to ecology to economics and society. CAS allow for constant change, with an emphasis on changes that make the system stronger, although there is a constant risk that the system will see negative events as well.

Why this is fascinating to me is that CAS resist the ability to break them down to component parts, establishing clear cause-and-effect relationships between the agents involved and the emergent behavior of the system as a whole. In fact, the sheer complex nature of these systems means that outcomes from any given action within a system in at best unpredictable, and quite likely impossible.

An excellent example of this, that I’ve used before, is a pile of sand on a table. Imagine dropping more sand, one grain at a time, onto that pile. Quick, how many grains of sand will fall off the table with each grain added to the “system”? It is impossible to predict.

Now, granted, a table of sand isn’t exactly “adaptive”, but it is indicative of what complexity does to predictability.

CAS and IT

When applied to IT—everything from architectures to markets to organizations—the science of complex adaptive systems has real deep ramifications to the ways we plan, design, build, trouble shoot and adapt our most important applications of technology to business. We can’t do everything top-down anymore. Much, much more of our work has to be built and maintained from the bottom up.

Now, I am a novice at all this, so much of what I believe I know today will likely turn out to be wrong. As a quick example, the “Cloud as Complex Systems Architecture” presentation I am giving at Cloud Connect in Santa Clara, CA on Tuesday was supposed to center around focusing on how you automate the operations of individual software components to survive in the cloud. “Focus on tweaking agent automation”, my message was going to be.

However, that is counter to the reality of what has to happen; it is just as important to evaluate the system as a whole, and adjust *whatever* needs to be adjusted to when issues are identified system wide. In other words, an agent-level focus is exactly the kind of thing that gets you in trouble in complex systems. You see? I am still learning.

I hope you will join me as I take this journey. Follow me on Twitter at @jamesurquhart. Subscribe to the RSS feed for this blog. Leave comments. Challenge me. Point me to new sources of information. Tell me I’m full of it and should start over. In return, I promise to listen, and to share my own journey, including insights I gain from books, online courses and the other very smart people I am very lucky to interact with.

This is going to be fun. I’m pumped to get started.


Children of the Net: Why Our Decendents Will Love The Cloud

January 23, 2008 2 comments

Our children–or perhaps our grandchildren–won’t remember a time when there was a PC on every desk, or when you had to go to Fry’s Electronics to buy a shrink-wrapped copy of your favorite game. This, as Nick notes frequently in The Big Switch, is one of the real parallels between what our ancestors went through with electrification and what we have yet to go through with compute utilities. Heck, I already find it hard to remember when I didn’t have access to the World Wide Web, and in what year all of that changed. Also, I’m frankly already taking the availability of services from the cloud for granted.

My Dad used to tell me stories of when he lived in a house in Scotland with only a few lights and no other electrical appliances, no indoor plumbing and no telephone. I can’t imagine living like that, but it was just about 50-60 years ago. Those born in the latter half of the twentieth century (in an industrialized country) are perhaps the first to live a lifetime without seeing or experiencing life without multiple sockets in every room. It is unimaginable what life was like for our ancestors pre-electrification.

There will likely be both positive and negative consequences that come from any innovation, but to the innovator’s descendants, they won’t remember things any other way. In the end, once basic needs are taken care of, all human kind cares about is lifestyle anyway, so the view of how “good” an “era” is, is largely driven by how well those needs are taken care of. One of those basic needs is the need to create/learn/adapt, but another one is the need for predictability of outcome. This constant battle between the yearn for freedom and the yearn for control is what makes human culture evolve in brilliantly intricate ways.

I for one hold out hope that our descendants will be increasingly satisfied with their lifestyles, which–in the end–is probably what we all want to see happen. Will those lifestyles be better or worse from our perspective as ancestors? Who knows…but it won’t really matter, now, will it?

Of course, one of the biggest challenges to humanity is meeting even the basic needs of its entire population. To date, the species has failed to achieve this–the study of economics is largely targeted at understanding why this is. Cloud computing could, as Nick suggests, actually make it more difficult for some groups of people to meet their basic needs, but I would argue that this would be counter productive to the rest of society.

At the core of my argument is the fact that so much of online business is predicated on massive numbers of people being able to afford a given product. Nick argues that life in the newspaper world shows us the future of most creative enterprises; the ease of the masses to create and find content makes it difficult to sell advertising to support newspapers, thus the papers struggle. But if huge numbers of people are out of work, with no one valuing their talents and experience, that will lead to less consumer spending. Less consumer spending will lead to less advertising, which will in turn lead to less income for “the cloud” (i.e. those companies making money from advertising in the cloud). Its a horribly negative feedback cycle for online properties/services, and one I think will fail to come to pass.

The alternative is that the best of the talent out there continue to find ways to get paid, while the masses are still encouraged to participate. Newspaper journalists are already finding opportunities online, though perhaps at a slower pace then some would like. I believe that ventures such as and even YouTube will create economic opportunities for videographers and film makers to rise above the noise. Musicians are already experimenting with alternative online promotion and sales tools that will change the way we find, buy and consume music. Yes, the long tail will flourish, but the head of the tail will continue to make bank.

The result of this is simply a shifting of the economic landscape, not a wholesale collapse into a black hole. Yeah, the wealth gap thing is a big deal (see Nick’s book), but I believe that the rich are going to start investing some of that money back into the system when the new distribution mechanisms of the online world mature–and that should create jobs, fund creative talent and create a new world in which those that adapt thrive, and those that don’t struggle.

Did I mention I think the utility computing market is a complex adaptive system?

"Social Production" vs. "Greed" Online

I want to start my comparison of Yochai Benkler’s tome, “The Wealth of Networks: How Social Production Transforms Markets and Freedom”, and Nick Carr’s “The Big Switch: Rewiring the World from Edison to Google” with coverage of the direct critique of the former in the latter.

Benkler proposes that we are entering a new phase of economic history, which he calls the “networked information economy”. Counter to the prior industrial economy, this phase is highlighted by the rising effect of “non-market” production on the creation of intellectual capital, made possible by the near zero cost of creating and sharing content on the Internet.

According to Benkler, in a network based economy:

  1. “Individuals can do more for themselves independently of the permission or cooperation of others.”
  2. “Individuals can do more in loose affiliation with others, rather than requiring stable, long-term relations, like coworker relations or participation in formal organizations, to underwrite effective cooperation.”

As a result of this, says Benkler, “we can make the twenty-first century one that offers individuals greater autonomy, political communities greater democracy, and societies greater opportunities for cultural self-reflection and human connection.”

In chapter 7 of Carr’s book, titled “From the Many to the Few”, Carr makes an argument for the inequitable effects of social networking and unpaid content creation. With specific reference to Benkler and others writing about the rising importance of the so-called “gift economy”, he notes that

“[t]here’s truth in such claims, as anyone looking at the Web today can see…[b]ut there is a naivete, or at least a short-sightedness, to these arguments as well. The Utopian rhetoric ignores the fact that the market economy is rapidly subsuming the gift economy.”

As evidence, Carr notes that two of the most important Web 2.0 acquisitions of the last couple of years–that of Flickr by Yahoo, and YouTube by Google–were driven in large part by the incredible economics of these companies. When Flickr was acquired for $35 million, there were less than 10 people on staff. YouTube had less than 70 employees when they were bought for 1.65 billion.

However, perhaps the most astounding comparison between the two is that both had millions of people producing, organizing and promoting content, but effectively none of them got a single dime of equity. When YouTube was sold, each of the 3 founders got about a third of a billion dollars for 10 months of work. Its hard to argue that Google bought the web site software for that price. Google bought content and traffic, both of which were largely attributable to those unpaid millions.

I think Carr is right, unfortunately, that we overestimate the influence that “open” technologies will have on the incumbent industrial system. Carr notes important evidence like the growing income gap between the richest Americans and the rest of us, as well as the struggle that newspapers and other media companies are having to generate sufficient income to sustain their businesses–and, in turn, their employee’s standard of living. I will add that even the distinct line between “open source” and “proprietary” projects is blurring, as Anne Zelenka notes on GigaOM today. The result of this trend will, of course, be mixed. At times the content created out of love, frustration or even narcissism will loosen the grip of corporate systems on our society, but these may always be offset by new controls and entrepreneurial successes by these same systems.

On the other hand, I think Nick is too skeptical about the amount of change that will beset business in the coming decades. It is easy to think of ways to provide equity to those that produce content, and I believe someone will come up with a business that does so in the next year or two. Furthermore, the process of democracy itself may be changed significantly in the next two decades, as both the government and entities seeking influence over the government (or seeking to loosen the control of government) find new ways to tweak the system. John Udell at Microsoft has covered an interesting corollary, public access to government data, and noted some of the progress made in that space.

Those of you that have read me for a while know that I am extremely interested in complexity theory and its applications to technological development. In the end, I believe what we are going to see in the next data is an “edge of chaos” process, where the forces of liberalization continually struggle against the forces of social and economic inertia. In the long term, however, I believe that this process will continually better the lives of those swept up in it; with (significant) luck, the lives of everyone on Earth. What is left to chance, however, is the amount of pain and suffering that may be felt as change takes place.

Some more skepticism about Amazon WS as a business plan

December 18, 2007 Leave a comment

I’ve been interested in Don McAskill’s review of Amazon’s SimpleDB in light of SmugMug’s future plans. He is very positive that he can use this service for what it is intended for; infinitely scalable storage and retrieval of small, structured data sets. His post is a good one if you want to get a clearer idea of what you can and shouldn’t do with SimpleDB.

However, I worry for Don. As a growing number of voices have been pointing out, committing your business growth to Amazon, especially as a startup, may not be a great thing. Kevin Burton, founder and CEO of spinn3r and Tailrank, notes that this depends on what your processing and bandwidth profiles are, but there are a large number of services that would do better to buy capacity from, say, a traditional managed hosting facility.

Burton uses the term “vendor lock-in” a few times, which certainly echos comments that Simon and I have been making recently. But Burton brings up an additional point about bandwidth costs that I think have to be carefully considered before you jump on the Amazon-as-professional-savior bandwagon. He notes that for his bandwidth intensive business, Amazon would cost 3X what it currently costs spinn3r to access the net.

Burton goes on to suggest an alternative that he would love to see happen: bare metal capacity as a service. Similar to managed hosting, the idea would be for the system vendors to lease systems for a cost somewhat above what it would take to buy the system, but broken down over 2-3 years. Since the credit worthiness of most startups is an issue, lease default concerns can be mitigated by keeping the systems on the vendor’s premises. Failure to pay would result in blocked access to the systems, for both the customer and their customers.

I like this concept as a hybrid between the “cloud” concepts and traditional server ownership. Startups can get the capacity they need without committing capital that could be used to hire expertise instead. On the negative side, however, this does nothing to reduce operational costs at the server levels, other than eliminating rack/stack costs. And Burton says nothing about how such an operation would charge for bandwidth, one of his key concerns about Amazon.

There have been a few other voices that have countered Kevin, and I think they should definitely be heard as this debate grows. Jay at thecapacity points out the following:

[B]usiness necessitates an alternate reality and if expediency, simplicity and accuracy mean vendor constraint, so be it.

I agree with this, but I think that it is critical that businesses choose to be locked in with open eyes, and a “disaster recovery” plan should something go horribly wrong. Remember, it wasn’t that long ago that Amazon lost a few servers accidentally.

(Jay seems to agree with this, as he ends his post with:

When companies talk about outsourcing these components, or letting a vendor’s software product dictate their business & IT processes… I always check to make sure my lightsaber is close.

This is in reference to Marc Hedlund’s post, “Jedi’s build their own lightsabers”.)

Nitin Borwankar, a strong proponent of Amazon SimpleDB commented on Kevin’s post that SimpleDB is a long tail play, and that the head of the data world would probably want to run on their own servers. This is an incredibly interesting statement, as it seems to suggest that even though SimpleDB scales almost infinitely from a technical perspective, it doesn’t so much from a business prospective.

On a side note, its been a while since I spoke about complexity theory and computing, but let me just say that this tension between “Get’r done” and “ye kanna take our freedom!” is exactly the kind of tension what you want in a complex system. As long as utility/cloud computing stays at the phase change between these two needs, we will see fabulous innovation that allows computing technologies to remain a vibrant and ever innovating ecosphere.

I love it.

A Helping Hand Comes In Handy Sometimes

You may remember my recent post on how data centers resemble complex adaptive systems. This description of a data center has a glaring difference from a true definition of complex adaptive systems, however; data centers require some form of coordinated management beyond what any single entity can provide. In a truly complex adaptive system, there would be no “policy engines” or even Network Operations Centers. Each server, each switch, each disk farm would attempt to adapt to its surroundings, and either survive or die.

Therein lies the problem, however. Unlike a biological system, or the corporate economy, or even a human society, data centers cannot afford to have one of its individual entities (or “agents” in complex systems parlance) arbitrarily disappear from the computing environment. It certainly cannot rely on “trial and error” to determine what survives and what doesn’t. (Of course, in terms of human management of IT, this is often what happens, but never mind…)

Adam Smith called the force that guided selfish individuals to work together for the common benefit of the community the “invisible hand“. The metaphor is good for explaining how decentralized adaptive systems can organize for the greater good without a guiding force, but the invisible hand depends on the failure of those agents who don’t adapt.

Data centers, however, need a “visible hand” to quickly correct some (most?) agent failures. To automate and scale this, certain omnipotent and omnipresent management systems must be mixed into the data center ecology. These systems are responsible for maintaining the “life” of dying agents, particularly if the agents lose the ability to heal themselves.

Now, a topic for another post is the following: can several individual resource pools, each with their own policy engine, be joined together in a completely decentralized model?

VMWare TSX and Reducing Complexity in the Data Center

I had a busy last few days of the week last week. On Wednesday, I attended VMWare TSX in Las Vegas, and on Friday, I had the chance to hear Bill Coleman speak about utility computing, and the events that have lead up to its sudden resurgence in the market place.

All in all, TSX was one of the most informative VMWare events I have ever attended. I only had the chance to attend three sessions–CPU scheduling, ESX networking and DRS/HA–but all three were packed with useful information. (The slides linked here are from a TSX conference in Nice, Italy, April 3-5, 2007. They are a little different from the slides I saw, but are similar enough to communicate the basic concepts.)

If you don’t know much about VMWare CPU scheduling, check out that deck. Sure, its basic scheduling stuff, but it is very helpful when it comes to understanding how VMWare settings affect processor share. The networking deck is also critical if you must deploy network applications to virtual machines.

The DRS/HA deck has some helpful tips, but also clearly demonstrates the limited scale of DRS/HA. A 16 physical node limit per HA cluster, for example, is going to be problematic for most medium to large data centers. Furthermore, these are very server-centric technologies; the concept of Service Level Automation is clearly missing, as there is no concept at all of a service or application to be measured. They are hinting at a few new app-level monitors in a later release, but I just don’t think monitoring service levels from a business perspective is very important to VMWare.

Bill’s speech to the IT department of a large manufacturer was very interesting, if for no other reason than it clearly spelled out the argument for reducing complexity in the data center. (For a quick and dirty argument, see this article.) We are definitely at a cross roads now; IT can choose to attack complexity with people or technology. Most of us are betting technology will win. Furthermore, Bill told the assembled techies, the early adopters of any platform technology get the best jobs when that platform becomes mainstream. Almost nobody is predicting that utilty computing will fail in the long term, so now is the time to jump aboard and get involved.

I should have some time to complete the Service Level Automation Deconstructed series this week. Stay tuned for more.

Complexity and the Data Center

I just finished rereading a science book that has been tremendously influential on how I now think of software development, data center management and how people interact in general. Complexity: The Emerging Science at the Edge of Order and Chaos, by M. Mitchell Waldrop, was originally published in 1992, but remains today the quintessential popular tome on the science of complex systems. (Hint: READ THIS BOOK!)

John Holland (as told in Waldrop’s history) defined complex systems as having the following traits:

  • Each complex system is a network of many “agents” acting in parallel
  • Each complex system has many levels of organization, with agents at any one level serving as the building blocks for agents at a higher level
  • Complex systems are constantly revising and rearranging their building blocks as they gain experience
  • All complex adaptive system anticipate the future (though this anticipation is usually mechanical and not conscious)
  • Complex adaptive systems have many niches, each of which can be exploited by an agent adapted to fill that niche

Now, I don’t know about you, but this sounds like enterprise computing to me. It could be servers, network components, software service networks, supply chain systems, the entire data center, the entire IT operations and organization, etc. What we are all building here is self organizing…we may think we have control, but we are all acting as agents in response to the actions and conditions imposed by all those other agents out there.

A good point about viewing IT as a complex system can be found in Johna Till Johnson‘s Networld article, “Complexity, crisis and corporate nets“. Johna’s article articulates a basic concept that I am still struggling to verbalize regarding the current and future evolution of data centers. We are all working hard to adapt to our environments by building architectures, organizations and processes that are resistant to failure. Unfortunately, entire “ecosystem” is bound to fail from time to time. And there is no way to predict how or when. The best you can do is prepare for the worse.

One of the key reasons that I find Service Level Automation so interesting is that it provides a key “gene” to the increasingly complex IT landscape; the ability to “evolve” and “heal” the physical infrastructure level. Combine this with good, resilient software architectures (e.g. SOA and BPM) and solid feedback loops (e.g. BAM, SNMP, JMX, etc.) and your job as the human “DNA” gets easier. And, as the dynamic and automated nature of these systems gets more sophisticated, our IT environments get more and more self organizing, learning new ways to optimize themselves (often with human help) even as the environment they are adapting to constantly changes.

In the end, I like to think that no matter how many boneheaded decisions corporate IT makes, no matter how many lousy standards or products are introduced to the “ecosystem”, the entire system will adjust and continually attempt to correct for our weaknesses. In the end, despite the rise and fall of individual agents (companies, technologies, people, etc.), the system will continually work to serve us better…at least until that unpredictable catastrophic failure tears it all down and we start fresh.