Best practices for change management in the age of DevOps

Change management in IT organizations is usually associated with bureaucracy, long painful approval processes, and often a lot of manual effort, which is time consuming. So when you see "change management" and "DevOps" in the same sentence, it looks like they don't belong together. DevOps is about a continuous and iterative approach, where speed and efficiency are at its core. Change management is often perceived as the total opposite of that. It's a bit like the misconception that people have with planning and agile: we don't plan, because we are agile. In this case, it goes like this: "We are IT, and therefore we don't do DevOps." The truth of the matter is that, in this day and age, where technology is constantly changing and evolving, it's more important than ever for organizations to adopt DevOps principles in order to stay relevant and deliver on commitments and goals in a timely manner. Simply put, organizations that do not transition from their traditional change management processes and are unable to deliver change at a fast pace will be left behind.

Being an enterprise software company, Atlassian knows a thing or two about delivering change rapidly. Changes to our products, including Jira Software, Jira Service Desk, Confluence, Bitbucket, and Trello don't just happen out of the blue. We have processes, practices, and mechanisms in place to help ensure that changes are deployed safely and quickly. We believe that our approach to managing change in software development can be applied to IT services and operations, whether it's upgrading an existing network infrastructure, installing a new tool, or implementing a brand-new technology across an entire organization.

Before we start talking about Atlassian's approach to change management, let's start with a short history of Atlassian.

Atlassian was founded in 2002 by two university graduates who decided to start a software company because they didn't want to spend their time in the corporate world and wear suits every day. With that simple goal, Atlassian became real with the launch of Jira 1.0.

Fast forward to 2020, and Atlassian has grown to over 5,000 employees, with offices around the globe. The company has gone from a team of two coding in the comfort of their homes to thousands of developers worldwide, a global team of many writing code and releasing software changes multiple times a day, every day. Atlassian engineering leaders have seen first hand how important it is to have a combination of principles, processes, and mechanisms as the organization fulfills its mission of unleashing the potential of every team via its collaboration software products.

In this article, we share with you insights from four senior Atlassian engineering leaders on their approach to change management that they have seen worked within the company. We spoke to Andre Serna, Head of Engineering for Server, looking after on-premise products; Jonathon Creenaune, Head of Cloud Foundations, looking after infrastructure and platform services; Paul Slade, Head of Engineering at Experience Platform, looking after platform teams for our Cloud products; and Zak Islam, Head of Engineering at Tech Teams, who looks after product teams for our flagship products like Jira Service Desk and Bitbucket. These leaders have a combined tenure of 27 years at Atlassian, so they’ve seen a lot. We hope you can learn from their experiences.

Based on insights shared by these senior engineering leaders from Atlassian, the following five best practices can help transform a slow, inefficient way of managing changes into a streamlined process while still fulfilling the objectives of a traditional change management process – traceability, transparency, and risk management.

1. Embrace Agile and DevOps principles to reduce human effort and errors

Agile and DevOps principles like AN iterative approach, continuous delivery, and automation are not new TO startups and software development industry, but it's often seen as less applicable to highly process-driven, heavily regulated industries like government and banking. But what if we told you that being able to leverage Agile and DevOps principles is essential to ensuring a swift and seamless change management in today's business landscape?

Take it from Andre, a Head of Engineering looking after on-premise Atlassian products, where a majority of the customer base is large enterprises.

"Agile and DevOps principles can definitely work hand in hand with regulation and compliance. For example, when I was working for the Commerce team, we were the first team within Atlassian to introduce regulation and compliance processes as we went through our Initial Public Offering (IPO), in the form of SOX compliance. We introduced lightweight, automated, and effective processes to meet compliance and regulation requirements. Automation should play a big role in this day and age. Automation enables segregation of duties. For example, if a change meets certain criteria – say two people have signed off on a code change via a pull request – it can be used as a sign-off for compliance. You will note that here, we are leveraging agile ways of working and DevOps principles by introducing smart automation through sensible rules and thresholds to existing processes. Smart workflow, automation, and agile processes are beneficial in any industry, including heavily regulated ones," he says.

Paul, who looks after platform teams for Cloud versions of our flagship products, agrees wholeheartedly. He says, "I believe more and more businesses need to embrace technology in their change management, from using technology to automate processes to providing employees with e-learning and in-product onboarding of changes so they understand the how and why behind those changes and are able to take full advantage of them."

2. Autonomy over authority for rapid decision making

Many IT organizations that practice traditional change management are familiar with the Information Technology Infrastructure Library (ITIL). ITIL contains a set of detailed practices for IT service management (ITSM) focused on aligning IT services with business needs. One widely known ITIL component is CAB, which stands for Change Advisory Board. The CAB delivers support by approving requested changes and assisting in the assessment and prioritization of changes. This body is generally made up of IT and Business representatives and might include a change manager, user managers and groups, technical experts, possible third parties, and customers.

While there are many benefits of CABs, they can sometimes contribute to slow decision-making, as a group of people from the CAB would need to process change requests, prioritize these requests manually, assess the risks and impact of each change, go back and forth with multiple change requestors to obtain necessary information, obtain alignment within the CAB, and finally come to a decision. It's not uncommon for IT organizations to have change requests that take months, or even whole quarters. This is very inefficient if a change request is small and doesn't warrant this kind of time commitment from all parties involved.

At Atlassian, we opt for autonomy over authority when it comes to making decisions about changes. Jonathon, Head of Cloud Foundations, shared an approach that works well within his department. He explains, "We use a daily release review process within our department to help determine risk of change and unleash ownership to those rolling out changes; we do this via asking relevant questions. To be clear, daily release review is not like an approval step that you often experience with CAB; it's more Subject Matter Experts (SMEs) asking questions to uncover blind spots and prompt thinking so the person rolling out the change can make an informed decision about their approach."

We asked Jonathon how he controls the risk of proposed changes. "For risk assessment, the first line manager has the accountability to determine how risky a change is. This is done way before the daily release review process. At Atlassian, we are open and trusting. We communicate openly and we challenge directly. This approach of autonomy over authority enables swift decision-making while providing clear accountability and expectations," he says.

3. Plan for releasing change from the very beginning, without adding extra processes

Failing to plan is planning to fail, and this also applies to releasing changes. However, the key is not to introduce a new process for planning, but to integrate it with existing processes in performing the work. Jonathon described how this works in practice in his department, Cloud Foundations, which is responsible for infrastructure, build engineering, deployment, provisioning, and many other platform services that power Atlassian products and development, so he's familiar with best practices to perform change safely and swiftly, where agile is at the top of everyone's mind but failures of systems have a direct impact on all employees’ productivity.

"Our change management process starts from the very beginning, from when a Jira ticket is created for any type of work. We start thinking about risk management, identify how we're going to roll out changes effectively, and what mechanisms we would use. For example: Do we need to feature flag this change or not, what would be the ideal soaking period, what are our considerations on how to test or roll out, etc," Jonathon explains.

4. Minimize the negative impact of change by preparing for failures and incidents

We understand that one of the fundamental reasons for IT organizations to practice traditional change management processes is to minimize the negative impact changes can have on customers and businesses. We talked with Zak, Head of Engineering for Cloud versions of our flagship products, on his approach to minimizing negative impacts of change for Cloud customers. He says, "It comes down to the mechanisms that we have. CI/CD sounds amazing, but in reality, it's difficult to be 100 percent sure about how something in a master code branch is going to work out in production. This is why we should have canary environments. At Atlassian, we soak our code changes in one environment first and deploy our code changes in stages."

He adds, "Blast radius is the lever that I use in ensuring safe yet fast deployments; we can move fast when we have a smaller blast radius. With a smaller blast radius, customer impact is small if anything goes wrong."

Zak continues, "The key for us is to make it easy for engineers to deploy to production in a way that is seamless and fast. Safe and fast. For example, for Jira monolith, we used to deploy to all regions at the same time. But we realized that a breaking change could bring every region down and customer impact would be very large. That's not good. So now we deploy in stages. To elaborate a bit further; we have six AWS Regions, so firstly, we deploy to an internal production-like environment, and then to a specific region in production, and then when we have enough confidence that the change is safe, we deploy to all regions."

Zak says that as software becomes more ubiquitous and critical to running many businesses, it presents a new set of challenges. "Confidence in our products working as expected in all regions in all environments is gained via monitoring, anomaly detection and alerting. All of this should be as automatic as possible."

He adds, "It's good to trust your software and systems, but they change. There is a misconception that if you leave your systems and code alone, everything will stay the same. That's absolutely not true. There are network changes, operation system upgrades, and other changes that you may not know. Therefore, it's important to trust your systems, but verify continuously, through mechanisms such as war games and chaos engineering practices, to make sure they are resilient. When something goes wrong, the failure should show up in monitoring tools and trigger an automatic alert."

5. Address both internal process pain points and customer pain points to change

At Atlassian, the very reason for releasing changes is so we can provide better value to our customers, whether they're internal employees or external users of our software. We believe this is the same for many organizations.

Paul is a big believer in continuously improving our processes and products to become better. He emphasizes continuous improvement through finding and solving pain points of your organization and your customers. "For us, our pain points stem from what we are optimizing for internally – safety and agility – versus what our customers want – stability and control. Our customers, rightfully, want to understand how software or product changes impact their businesses, to know the timeline of the changes, and to have some degree of control. Internally, we are optimizing for reducing risk through small batch sizes and therefore maximizing the throughput of change. We also want to learn from our change so that we can continuously improve and do better for our customers next time. These two motivations can sometimes be at odds with each other," Paul says.

When asked about how Atlassian is handling those pain points, he answers, "We are focusing on increasing the robustness of tooling and flexibility in how we roll out change so that we can give the control back to customers. As our tooling gets better, our ability to manage risk in a controlled manner improves and we can then provide flexibility to our customers on how they would like to accept or implement the change within their organizations."

Final words of wisdom

Finally, we'll leave you with key pieces of advice that Atlassian engineering leaders have for organizations wanting to leverage DevOps principles and practice modern change management.

Firstly, Andre hits the nail on the head by going back to why organizations are doing change management in the first place. He says, "Mindset is important. People often think change management means change avoidance. There is a tendency to equate change management with signoffs and making people jump through hoops, rather than achieving the end goal of good traceability. It's important to remember that change management is meant to facilitate change and make it traceable. My advice is to make use of technology, tools, and automation rules to track the entire process instead of having people keep track of this manually."

Jonathon shared a guiding principle that his department follows on change management, especially when it comes to non user-facing changes like database version upgrades or infrastructure improvement. "To me, every decision we make in my department should be about, 'how we can make sure that we do not need to do change management?' In an ideal world, nobody should need to know about the changes that we are making because those changes are transparent. So we need to think about how to make changes effectively, smoothly, seamlessly, and transparently. That's our guiding principle," he says.

Paul’s advice is to look at the problem in two stages and solve it in a phased approach. He says, "The first step is an internal process and it is about automation and managing the risk that comes with change. My advice is to separate the code deployment from turning on the change, with the goal of enabling faster rollback and better targeting of cohorts. In other words, progressive rollout."

"The second step has to do with flexibility. Once the process is mechanized, offering more flexibility to customers and users via tools, offering them what they need, helps them recognize that change is important and how they can go about digesting that change via change boarding," he continues.

Last but not least, Zak has a piece of short and sweet – yet highly actionable – advice. He advises, "Start with the right best practices, and then back them up with tools, and constantly review your progress. When there is a problem, whether it's a process or tool, don't invent a new solution. Instead, identify and solve the root cause."

We hope you leverage our strength in software to help bridge the gap between development and IT/Ops teams. Atlassian has all the tools needed to help automate this process, and provide context from code to incident. As more companies are becoming software companies, following an efficient change management process becomes increasingly important. In this dynamic environment, providing a superior customer experience is a key differentiator, and shipping value to customers faster becomes critical for success. It's now time to stop treating change with a "one size fits all" approach and use smart automation, tools, technologies, and practices to ship value faster without compromising on risk.