Archive

Archive for the ‘active power management’ Category

Is a Grid a Cloud? Probably not, but…

Sam Johnston has recently been writing some very provocative posts (provocative as in “thought producing” as well as, at times, “controversial”). One of his latest is his missive on cloud computing, and the confusion created by vendors pushing their grid platforms as defining cloud computing.

He has some good points, and I recommend reading the post. However, very early on he makes a statement that I think clearly demonstrates his own flawed logic when it comes to the term “private cloud”. In the first paragraph, he says:

“Some of this confusion is understandable given issues get complex quickly when you start peeling off the layers, however much of it comes from the very same opportunistic hardware & software vendors who somehow convinced us years ago that clusters had become grids. These same people are now trying to convince us that grids have become clouds in order to sell us their version of a ‘private cloud’ (which is apparently any large, intelligent and/or reliable cluster).”

[Emphasis mine.]

There’s the root problem, right there. By equating a “private cloud” with “any large, intelligent and/or reliable cluster”, he misses much of what the private cloud is–and biases his definition from the point of view of traditional job based grid computing (which does act very much like a cluster).

Let’s use my alma mater as an example of a private cloud infrastructure vendor that does not sell a clustering platform–at least not in the traditional sense of the word, as it relates to software. Cassatt does not tie a bunch of servers into a single, interconnected unit for a workload run on top of it. In fact, that remains the job of the software platform deployed into Cassatt, if it is indeed desired. There is no software coordination intelligence in Cassatt today (other than some dependency management to control startup and shutdown).

Cassat works purely at the server and OS level. No, it doesn’t create an OS cluster, because the OS isn’t aware that it is being managed. All that Cassatt does is pool server resources into a general pool that can be assigned as needed to meet capacity (and reliability) demands as defined by the service levels applied to the software payloads. If Cassatt sees that application A needs more capacity, it grabs another server. If an instance of server B goes down, Cassatt creates a new instance with the same IP address and hostname (if safe) as the original.

Cassatt is not job based. Any running server payload, including web applications, enterprise applications or “always on” monitoring and feed reading processes can be hosted in exactly the same manner as batch jobs. Cassatt doesn’t do queueing of jobs, it just provisions servers as needed to meet the service levels defined for business workloads.

Read Cassatt’s web site for more. They say it much better than I am expressing it now.

The point is, though, that Cassatt is not a cluster, it is a resource pool, and as such acts much more like a cloud than a grid. Sam may say “well, that’s just autonomic computing” and he’s right, but the cloud is autonomic. So calling an autonomic system running behind an enterprise firewall a “private cloud” is not much of a stretch at all.

By the way, ksankar of http://doubleclix.wordpress.com notes nine great differences between a grid and a cloud. I think he captured more of my own thinking about this subject in that one post than I’ve been able to express in the last three years. Worth a read as well.

Finally, subscribe to Sam’s blog. He’s asking some important questions, and deserves your attention.

Thinking about SLAuto in a frenzied cloud

I’ve been quite silent for a week or two, mostly because of my responsibilities as a sales engineer; doing my part in closing key deals for my employer. I’ve spent this time sitting in meetings, installing and configuring software, and measuring power savings in large dev/test lab installations. (By large I mean hundreds approaching thousands of servers.) All in all, its been a successful couple of weeks, but its kept me from keeping too close an eye on the big news coming out of the cloud and utility computing markets.

However, as I thought about this more, I realized that I have drifted significantly from my core subject, Service Level Automation (or SLAuto), in the last six months or so–mostly due to the incredible burst of cloud computing innovation to be announced and/or delivered in that time frame. I still believe that there are two key components to an open cloud market that scales:

  • Portable platforms that allow customers to change vendors on a whim
  • Automation that takes action to acquire, release or replace services based on pre-determined service targets

The latter, simply said, is SLAuto.

Of course, what is happening is sort of the nascent birth of cloud computing technologies, where the DNA hasn’t had a chance to recombine to build long term survivability into any given “species” yet. We all knew that AWS was doing cool things, but who knew that they would cross the chasm in terms of customer demand as completely as they did? Yet, there is no portability story for Amazon (at least not off of Amazon); and the market forming for SLAuto (see RightScale and others) is tightly tied to the Amazon platform.

The rest of the “big” announcements are worse: Microsoft has no concept of management in Live Mesh (other than synchronization) that I can see, and Google and Yahoo are both building platforms with developers in mind, where service levels are a business agreement, not a platform differentiator. I understand we are taking baby steps here, but I wonder how long it is before corporate IT realizes that they are both a) locked in (at least in an economic sense), and b) paying too much to operate software that doesn’t even run in their data center.

Now, I say all of this, but truth be told, most corporate IT shops don’t do SLAuto today. So, why should this change in the cloud? I hinted at it earlier: scale. Not scale of functional execution or data access, as we usually think of the term, but scale of market–the speed at which companies will need to respond to the ever evolving marketplace for cloud services and platforms. As self-professed “open” nature of Google and Yahoo’s platforms become more of a reality, combined with true innovation in “industry” standard APIs (for capacity management, code platforms and feature integration), there is little doubt that pressure will be on the IT shop to optimize the cost of delivering business services to the rest of the company. Again, I argue that this cannot be done without SLAuto. Prove me wrong.

I am really concerned that SLAuto is still considered “bleeding edge” in most IT shops. Its not rocket science, and the future of IT cost management almost certainly has to be built around it. On the other hand, perhaps as some of these customers I worked with the last couple of weeks serve as references to the value of SLAuto–at least in terms of energy costs–more of them will understand its urgency.

Green Aware Programming

February 13, 2008 Leave a comment

monkchips writes about “green aware programming“, as coined by Christopher O’Connor, vice-president strategy and market management, Tivoli. I responded and pointed out that “green = cheap” in the utility computing world.

Analyzing the Green opportunity

February 11, 2008 Leave a comment

I just want to quickly bring Ken Oestreich’s analysis of the Green Grid meeting in San Francisco (Day 1 and Day 2), and its aftermath to your attention. Pay special attention to the aftermath post, as it is one of the most well thought out statements of the status and opportunity for the Green Grid organization I have seen.

Ken really knows his stuff with respect to the Green Data Center movement, so if you have any interest in the subject at all, subscribe to his blog. His earlier analysis of DC energy efficiency metrics is an all time classic on the subject.

The IT Power Divide

September 28, 2007 Leave a comment

The electric grid and the computing grid (RoughType: Nicholas Carr): Nicholas describes the incredible disconnect between IT’s perception of power as an issue

…[O]nly 12% of respondents believe that the energy efficiency of IT equipment is a critical purchasing criterion.

and the actual scale of the issue in reality

…[A] journeyman researcher named David Sarokin has taken a crack at estimating the overall amount of energy required to power the country’s computing grid…[which] amounts to about 350 billion kWh a year, representing a whopping 9.4% of total US electricity consumption.

Amen, brother. In fact, the reason you haven’t heard from me as often in the last two to three weeks is that I have been steadfastly attending a variety of conferences and customer prospect meetings discussing Active Power Management and SLAuto. What I’ve learned is that there are deep divides between the IT and facility views of electrical efficiency:

  • IT doesn’t see the electric bill, so they think power is mostly an upfront cost issue (building a data center with enough power to handle eventual needs) and an ongoing capacity issue (figuring out how to divide power capacity among competing needs). However, their bottom line remains meeting the service needs of the business.

  • Facilities doesn’t see the constantly changing need for information technology of the business, and sees electricity mostly as a upfront capacity issue (determining how much power to deliver to the data center based on square footage and proposed Kw/sq ft) and an ongoing cost issue (managing the monthly electric bill). The bottom line in this case is value, not business revenue.

Thus, IT believes that once they get a 1 Mw data center, they should figure out how to efficiently use that 1 Mw–not how to squeeze efficiencies out of the equipment to run at some number measurably below 1 Mw. Meanwhile, facilities gets excited about any technology that reduces overall power consumption and maintains excess power capacity, but lacks the insight into what approaches can be taken that will not impact the business’s bottom line.

With an SLAuto approach to managing power for data centers, both organizations can be satisfied–if they would only take the time to listen to each other’s needs. IT can get a technical approach that minimizes (or has zero effect) on system productivity, while facilities sees a more “optimal” power bill every month. Furthermore, facilities can finally integrate IT into the demand curtailment programs offered by their local power utilities, which can generate significant additional rebates for the company.

Let me know what you think here. Am I off base? Do you speak regularly with your facilities/IT counter part, and actively search for ways to reduce the cost of electricity while meeting service demand?

Links – 09/10/2007

September 10, 2007 1 comment

Brave New World (Isabel Wang): I can’t begin to express how sorry I am to see Isabel Wang leave the discussion, as her voice has been one of the clearest expressions of the challenges before the MSP community. However, I understand her need to go where her heart takes her, and I wish her the best of luck in all of her endeavors.

(Let me also offer my condolences to Isabel and the entire 3TERA community for the loss of their leader and visionary, Vlad Miloushev. His understanding of the utility computing opportunity for MSPs will also be missed.)

MTBF: Fear and Loathing in the Datacenter (Aloof Architecture: Aloof Schipperke): Aloof discusses his mixed feelings about my earlier post on changing the mindset around power cycling servers. I understand his fears, and hear his concerns; MTBF (or more to the point, MTTF) isn’t a great indicator of actual service experience. However, even by conservative standards, the quality and reliability of server components has improved vastly in the last decade. Does that mean perfection? Nope. But as Aloof notes, our bad experiences get ingrained in the culture, so we overcompensate.

CIOs Uncensored: Whither The Role Of The CIO? (InformationWeek: John Sloat): Nice generality, Bob! Seriously, does he really expect that *every* IT organization will shed its data centers for service providers? What about defense? Banking? Financial markets? While I believe that most IT shops are going to go to a general contractor/architect role, I think there is still a big enough market for enterprise data centers that markets to support them will go on for years to come.

That being said, most of you out there should look at your own future with a service-oriented computing (SOC?) world in mind.

An easy way to get started with SLAuto

September 4, 2007 4 comments

It’s been an interesting week, leading up to the Labor Day weekend, but as of this morning I get to talk more openly about one project that has been taking a great deal of my time. As I have blogged about Service Level Automation (“SLAuto”), it may have dawned on some of you that achieving nirvana here means changing a lot about your current architecture and practices.

For example, decoupling software from hardware is easy to say, but requires significant planning and execution to implement (though this can be simplified somewhat with the right platform). Building the correct monitors, policies and interfaces is also time intensive work that requires the correct platform for success. However, as noted before, the biggest barriers to implementing SLAuto and utility computing are cultural.

There is an opportunity out there right now to introduce SLAuto without all of the trappings of utility computing, especially the difficult decoupling of software from hardware. It is an opportunity that the Silicon Valley is going ga-ga over, and it is a real problem with real dollar costs for every data center on the planet.

The opportunity is energy consumption management, aka the “green data center”.

Rather than pitch Cassatt’s solution directly, I prefer to talk about the technical opportunity as a whole. So let’s evaluate what is going on in the “GDC” space these days. As I see it, there are three basic technical approaches to “green” right now:

  1. More efficient equipment, e.g. more power efficient chips, server architectures, power distribution systems, etc.
  2. More efficient cooling, e.g. hot/cold aisles, liquid cooling, outside air systems, etc.
  3. Consolidation, e.g. virtualization, mainframes, etc.

Still, there is something obvious missing here: no matter which of these technologies you consider, not one of them is actually going to turn off unused capacity. In other words, while everyone is working to build a better light bulb or to design your lighting so you need fewer bulbs, no one is turning off the lights when no-one is in the room.

That’s where SLAuto comes in. I contend that there are huge tracks of computing in any large enterprise where compute capacity runs idle for extended periods. Desktop systems are certainly one of the biggest offenders, as are grid computing environments that are not pushed to maximum capacity at all times. However, possibly the biggest offender in any organization that does in-house development, extensive packaged system customization or business system integration is the dev/test environment.

Imagine such a lab where capacity that will be unused each evening/weekend, or for all but two weeks of a typical development cycle, or at all times except when testing a patch to a three year old rev of product, was shut down until needed. Turned off. Non-operational. Idle, but not idling.

Of course, most lab administrators probably feel extremely uncomfortable with this proposition. How are you going to do this without affecting developer/QA productivity? How do you know its OK to turn off a system? Why would my engineers even consider allowing their systems to be managed this way?

SLAuto addresses these concerns by simply applying intelligence to power management. A policy-based approach means a server can be scheduled for shutdown each evening (say, at 7PM), but be evaluated before shutdown against a set of policies that determine whether it is actually OK to complete the shut down.

Some example policies might be:

  • Are certain processes running that indicate a development/build/test task is still underway?
  • Is a specific user account logged in to the system right now?
  • Has disk activity been extremely low for the last four hours?
  • Did the owner of the server or one of his/her designated colleagues “opt-out” of the scheduled shutdown for that evening?

Once these policies are evaluated, we can see if the server meets the criteria to be shut down as requested. If not, keep it running. Such a system needs to also provide interfaces for both the data center administrators and the individual server owners/users to control the power state of their systems at all times, set policies and monitor power activities for managed servers.

I’ll talk more about this in the coming week, but I welcome your input. Would you shut down servers in your lab? Your grid environment? Your production environment? What are your concerns with this approach? What policies come to mind that would be simple and/or difficult to implement?