Feature flagging, as described in our first tip below and in our webinar about how we built the new Jira experience, is an essential part of the way Atlassian builds products. When done well, feature flagging can provide enormous value to software teams. However, without the right practices in place there can be significant drawbacks.
These five tips will help you and your team get started off right.
Tip 1: Understand why you want to use feature flags.
There are so many potential use-cases to explore. We use feature flags to help our engineering teams practice continuous delivery while ensuring feature releases to customers go smoothly. We do this by separating code deployments from feature releases, and we put most new code we introduce into our products behind a flag. This practice enables our engineering teams to work fast with less risk of disrupting the customer experience. It also helps us separate features from one another, which means we can roll-forward or roll-back individual features without affecting other features. This frees teams to keep work flowing into production.
Turning on features and rolling them out progressively gives our product teams more control over the experience we want to deliver. Plus, it enables us to build a tighter feedback loop with customers and ultimately enables our teams to deliver value to our customers faster. When we release, we can measure the success of a feature and ask for feedback before rolling out more broadly. If a release doesn’t go as planned — we get negative feedback, something breaks, there’s a bug — we can turn the feature off by using feature flags as a kill switch.
Feature flags make a particularly effective kill switch because you can turn a feature off in seconds without any code rollbacks.
Tip 2: Measure the success of your feature.
Operationally, we ask ourselves: does the feature introduce new problems when the flag is turned on? We prevent major errors when we turn a flag on by monitoring the logs. If we see errors, we switch the flag off, fix the problem, and then resume the rollout when ready.
Strategically, we also look at whether the feature is doing what it’s supposed to be doing. For example, if someone is introducing a performance improvement, we monitor to ensure the new code actually improves performance. The same goes if someone is trying to improve usability of a product. There needs to be a form of measurement in place that ensures we are actually improving usability. That is, moving the right metrics in a positive direction.
Bonus tip: Check to make sure your new code is actually running. In some cases, nothing behind the feature flag is being hit in production. We look to see if the new code is actually running with logging or other tools.
Tip 3: Make sure you have guiding principles in place for managing feature flags.
Without principles in place, you may introduce more technical debt into your products, which can make them harder to support and debug. Accordingly, we’ve developed several principles that govern feature flags across our business. For example, we have a naming convention that ensures stakeholders introducing changes always communicate:
- The product(s) they’re making a change to (for Atlassian, this could be one of several)
- The part of that product they’re making a change to (e.g., in Jira Software, the board, the issue view, etc.)
- The new feature or behavior they’re introducing
As a rule of thumb, we always delete feature flags after a feature has been rolled out to 100% of customers. We also aspire to remove flags as quickly as possible, which can vary depending on the rollout of a new feature. We don’t believe in leaving feature flags indefinitely in our code base because (among other reasons) they add complexity, which can make it harder to implement future features.
We also limit the number of places in the code base where a flag can switch behavior (ideally to a single place). Rather than introduce several switches throughout our code base, our best practice is to duplicate code on either side of the flag and then delete the code on the inactive side of the flag once it’s rolled out to 100%.
Tip 4: Create a workflow around feature flagging so there’s a clear process for how flags should be rolled out, managed, and cleaned up.
We’ve developed a kanban board with a workflow to capture each state of a feature flag: flag creation, feature rollout to internal instances, feature rollout to customers, feature rollout to 100% (and ready for cleanup), and when the process is complete.
We’ve built an automated mechanism to do this. Every time a flag is created in our feature management tool, an issue in Jira is automatically created. As the feature behind the flag gets rolled out, the issue in Jira (representing the flag) automatically moves through the specific states of the workflow.
Bonus tip: We’ve also created a Stride bot for reminders. The bot sends out a weekly reminder when flags move from one state of workflow to the next, if the targeting of a flag changes, or if flags can be removed.
Tip 5: Create a plan to give visibility and context to relevant stakeholders about new features being rolled out.
With feature flags, there can be different cohorts of customers seeing different versions of the same product. The right tooling and communications strategies are key for ensuring all teams are well equipped to support, market, and debug the respective products. A starting point for solving this problem is integrating a feature management tool with Jira issues to make sure everyone who views an issue has the right context.
We also run a daily in-person meeting between Site Reliability Engineers and developers to roll out new feature flags. Before the daily session, each developer responsible for rolling out a new feature adds details to a shared Confluence template to share the changes they’re making, the expected outcomes, contact details, and links to relevant dashboards.
The templates with these details are published in a shared Confluence space and organized by date. (There’s a new one each day.) There’s a Stride room to discuss issues immediately (so we don’t needlessly hold back new features) or ask questions for the following day’s meeting.