In June I participated in a Forrester webinar hosted by Glenn O'Donnell called
"DevOps: Friction-Free Collaboration for Development and Operations" DevOps (a portmanteau of development and operations) is quite the buzzword now, and the discussion yielded many attempts to define it. One thing we agreed on: Unfortunately development and operations are two organizational entities that tend not to get along very well. In this post I'll explain why DevOps represents a mental shift from "us versus them" to a more cohesive, results-oriented approach.

Luke Kanies is the CEO and a founder of Puppet Labs, where he provides the vision and direction for the company as it builds products for next generation IT automation. He is the original author of Puppet Labs' primary product, Puppet. Prior to Puppet Labs, he was a consultant, speaker and author on topics related to configuration management and system administration. Puppet Labs will host PuppetConf, an operations conference in Portland, on September 22-23, 2011.

Getting Over the Wall of Confusion

In my experience in operations there's always been a difference in perspective between Dev and Ops, but it's always been more of an impediment than a benefit. The common goal should be getting apps deployed as quickly, safely, and efficiently as possible, but each group instead has a more short-term priority not necessariliy related to the results the business is looking for. Lee Thompson (formerly of E*TRADE, now of DTO Solutions) coined the term "wall of confusion" to describe the apparent inability for development and operations teams to communicate around a common goal, and this wall of confusion is a critical barrier to effective teamwork.

Meanwhile, operations is using tools built a decade ago following practices developed even before that, and generally in a one-off, script-driven solution. Even worse, in many cases operations has had heavyweight tools and processes forced on them from above, which they build tools to work around rather than with. As a result, operations can't clearly specify to development what a deployable application looks like, so development can't consistently deliver software that operations can get into the infrastructure.

Another significant limiting factor in getting software delivered is that operations has traditionally moved extremely slowly in tool adoption. Developers are running with two screens and using the best tools money can buy or hackers can build, whether it's the fastest processors, most powerful editors or compilers doing parallel compilation across a cluster of however many nodes.

But at the end of the day, the business goals are the same -- Ops and Devs should always be moving forward, adopting the most powerful tools available and asking the fundamental question: "How can I more rapidly deliver reliable, scalable services that are critical for the business?"

DevOps Means Everyone Can Be Iron Man

Just as the best companies have realized that outsourcing development is a bad idea because great developers able to build great applications is core to almost every business these days, organizations are realizing that sysadmin productivity should be a primary driver, not automating the people away. The previous generation of operations tools was built around rigid automation that attempted to protect the network from the people, assuming that the most critical work can be done by software. Modern DevOps tools recognize that the best people need powerful tools, not restrictive tools - they want their tools to make them more powerful, like Iron Man, not limited use robots. Because they're powerful these tools don't provide the same kind of protections that more limited tools do, but that same power provides very fast recovery abilities when things do go wrong. To extend the superhero metaphor, with great power comes great responsibility. You can destroy the universe, but without that ability you also can't build the universe from scratch whenever needed.

I don't think more engineering is the answer but we do need more power and more capabilities. Operations also needs to understand that they are in the line of fire and they need to provide direct business value. If you're a sysadmin and you don't know why your company exists, what products you're delivering and how those products provide value to customers then you need to get out of the way -- understanding "what" and "why" is critical.

In the survey our company conducted of more than 700 people in both operations and development, people showed us what "powers" they expected from implementing DevOps -- 55% ranked "automation of configuration management tasks" as the top expected benefit. The other top benefits were "improve the quality of software deployments" and "cultural change, collaboration and cooperation."

DevOps Means Sharing Responsibility

Ideally, companies work out a system in which whoever makes the mistake pays the price. The reason Ops is so often scared of Dev deploying is that Dev doesn't really care how secure their apps are, how hard they are to deploy, how hard they are to keep running or how many times you have to restart it, because Ops pays the price for those mistakes, not Dev. In most organizations the mandate of a developer is merely to produce a piece of software that worked on a workstation -- if it worked on your workstation and you can't make it work in production, it's Operations' fault if they can't get that to thousands of machines all around the world.

Google is a great example in switching up that process. When they deploy new applications, the developers carry the pagers until the stop going off -- only when they stop getting outage alerts does operations take over the operational running of a system.

Another excellent example is MessageOne, now a division of Dell. At MessageOne developers delivered Puppet modules that deployed and maintained their apps, and operations was responsible of overall running of the system. If there was an outage because an application wasn't maintaining itself, such as not cleaning up logs, Operations would fix it but then file a bug with Development.

In contrast, an environment where developers are responsible for change and operations are responsible for stability produces inevitable conflict. This is unfortunately the status quo in most organizations. As the graph below shows, the biggest barriers to DevOps adoption are that "the value of DevOps isn't understood outside my group" and "there is no common management structure between development and operations." Helping both groups understand the value of common, results-oriented goals goes a long ways to fixing this kind of dysfunction. The ultimate goal is to move to a world where operations is a service provider to the application organization, allowing developers to directly deploy and manage their own applications onto a secure, stable platform built and maintained by operations.

In the shorter term, if you're in development, start worrying about what your code looks like in production. Ask yourself, "When my app is up and running, how do I know if it's running well?" Deploying an app to production on something like Amazon EC2 is also a fantastic way to see things from an operations perspective.

If you're in operations, ask yourself how you best meet the needs of the people you work for, rather than just solve the problems put in front of you. Start adopting tools of the trade and join the tools communities, whether it's Puppet, Chef, Func, Capistrano or Rails. Get on IRC and get on the mailing list. Start making mistakes and getting messy and pretty soon you'll be an expert.

Getting Involved in DevOps

DevOps may just be emerging as a concept, but its practitioners are active and passionate about getting involved online and learning more by attending conferences and camps. Our survey showed DevOps discussions are happening in these forums:

Star Trek ops photo by Kreg Steppe