Using an event-driven architecture to improve Jira Software responsiveness

Imagine a team working remotely or collocated. The team is made up of engineers, architects, designers, and product managers, and they all use Jira Software to collaborate on their day-to-day work.

However, every time one of them updates an issue, project, or board, their fellow team members do not see the change instantaneously. Instead they have to wait for approximately a minute before the board is updated or, even worse, they have to refresh their page to see the changes.

This, until recently, was the situation with Next-Gen Jira Software: We had to make a request for updated board data every minute from the front-end to receive updates. This happens regardless of whether any change actually took place, leading to both a delay in the user experience until the next poll happens, and a very large number of unnecessary requests – because for the vast majority of polling attempts, there is no update to the front-end.

We needed a solution that facilitated collaboration within seconds, not minutes.

Exploring an event-driven architecture

We were pretty sure that an event-driven architecture (EDA) would solve our problems. Event-driven architectures consist of event producers and consumers that are loosely coupled via event channels. The event producers are systems of record that generate events whenever a significant update happens – for example, when the summary of a Jira issue is updated. Consumers can subscribe to these events and perform further processing in a completely decoupled way, in fact, the producer would typically not know whether there are any consumers or what they are using the events for.

This made sense for Jira because we already had event streams of issue and project updates in use by our backend services. All that was needed was a way to deliver the right events to the right front-end. Atlassian's platform already had most of the building blocks, but we had to assemble them in the right way to deliver this new experience.

Routing of backend events

The routing of backend events diagram.

At the core of the solution lies JiRT (Jira realtime service): a new service that we created specifically to bridge the gap between Jira's event stream, which already generated backend events for most updates in Jira, and the front-end that users are interacting with.

On one end, JiRT receives events from Jira using Atlassian's internal event bus, StreamHub, which provides decoupled service-to-service communication through an EDA. This infrastructure allows us to deliver messages with low latency that guarantees at-least-once delivery of events.

Since event producers and consumers are decoupled by StreamHub, no changes were required to either Jira or StreamHub to start receiving events – perhaps the single greatest benefit of adopting an EDA. Though StreamHub uses AWS Kinesis internally, each consumer provides their own SQS queue where events are delivered. We could simply configure StreamHub with the events we need (which Jira already sends to StreamHub for other purposes) and the queue where we want events to be delivered, and start consuming the events in JiRT.

Once JiRT has received an event, it extracts the relevant information from it, such as project, board and issue information, which then it's pushed to a client event channel to notify any subscribed front-end clients that an update has occurred. Since these channels can be subscribed to by any front-end, we are careful not to transmit any sensitive information with the event, since permissions may be updated between subscription and the event being received. Instead we only transmit identifiers, and require the client to request more data if required so that the user can be authorised as normal.

The infrastructure providing the client channels is the FPS 
(Frontend PubSub) service, which is another part of the Atlassian platform that was already used by other products. FPS performs two functions in this solution:

  1. It creates an abstraction over the underlying realtime client event delivery infrastructure. We are currently using PubNub for this, but FPS allows us to seamlessly integrate without needing to know any PubNub specifics. To do that, it provides both the client libraries (for our web and mobile clients) and backend API's to deliver these events. This also means that we can transparently switch to another provider if we ever want to.
  2. At the time that a client subscribes to a channel, FPS checks that the logged-in user has access to the backing resource using our Permissions service. This ensures we never have to worry about an unauthorized client gaining access to the event feed. In this case the backing resource is a Jira project or board.

You might be wondering at this point how the event is structured to provide useful information to the client that correlates to a backend change. To achieve this, every event carries two standardised pieces of information from end to end:

  • The ARI (Atlassian Resource Identifier), which uniquely identifies any content across all of Atlassian's products. ARI's always follows the format ari:cloud:<resource_owner>:<cloud_id>:<resource_type>/<resource_id>, which provides enough information to resolve any content. In this case the resource is a project, and an example ARI would be ari:cloud:jira:4dfd1093-e001–45b2-b76a-527eb0b1d6c0:project/15001.
  • The AVI (Atlassian Event Identifier), which identifies the type of event that occurred on the resource. The AVI has the format avi:<source>:<action>:<entity>, and we subscribe to several of these events in JiRT, for example:
avi:jira:assigned:issue
avi:jira:commented:issue
avi:jira:created:issue
avi:jira:deleted:project
avi:jira:deleted:issue
avi:jira:mentioned:issue
avi:jira:transitioned:issue
avi:jira:updated:issue

In the front-end, clients specify the resources and events that they want to receive by providing these ARI's and AVI's as parameters to FPS when subscribing to a channel. This has the benefit that we can change the events we receive in the front-end without necessarily needing any backend changes. By using these standards we are able to deliver events in a consistent way that is not limited to a single experience such as the board screen, and we have already seen this being taken up by other teams working on Jira issue view and roadmaps. When creating an EDA, it is definitely worth clearly defining such events and resources to achieve this generic reusability.

Events received from Jira via StreamHub are transformed by JiRT into the following client event payload:

{
  "channels": [...],
  "type": "avi:jira:<action>:<entity>",
  "payload": {
     "projectId": <number>,
     "issueId": <number>,
     "atlassianId": <string>,
  }
}

The event contains enough information for the client to determine if the board should be either partially or fully updated. For example, if a new issueId is received for a board in a avi:jira:created:issue 
event, we can choose to only fetch that one issue, while other events may require a full refresh. We also send the atlassianId which uniquely identifies the user that triggered the event, which is useful for filtering events. The event provides no potentially sensitive information such as user-generated content, this has to be retrieved from the backend (if needed), so we can apply permission checks for the logged-in user.

Front-end integration

In the web front-end, we created a reusable React component to better integrate with our frontend stack. This component is a wrapper that loads the FPS javascript client asynchronously to reduce initial load time and to keep our application bundles smaller. It has a very simple component API and it exposes callbacks that allows consumers to receive events.

<Realtime
    channels={[ ... ]}
    events={[ ... ]}
    onJoin={(channels, analyticEvent) => { ... }}
    onReceive={(event, analyticEvent) => { ... }}
    onLeave={(channels, analyticEvent) => { ... }}
/>

The component has the following properties:

  • channels: List of channels to join
  • events: List of supported events to subscribe
  • onJoin: Optional callback which notifies when the client joint given channels
  • onReceive: Callback to notify the received subscribed events
  • onLeave: Optional callback which notifies when the client left given channels

Depending on the use case, it's worth considering how these events would impact the application and the strategy to update your data. For Next-Gen board and backlog, we limited complexity on the client by just assuming for each received event that the board or backlog is out-of-date and needs the data to be refreshed. To limit load generated on the backend during heavy front-end activity, we use the following strategies to reduce the number updates:

  • User self-triggered events are ignored. In the future, we may use end-to-end tracing headers to refine this for the case where a user is using multiple clients
  • Incoming events are buffered in the client under whether there's an in-flight API request, or there's an ongoing drag operation
  • After the conditions above are completed the buffered actions are emitted
  • Events flushed from the buffer are processed one at a time, and while an event is being processed others will be ignored, therefore guaranteeing that only one request will be handled at a time
  • While an event is being processed, it may be discarded if any of the conditions above starts.

There are also simpler cases like the issue view screen where we don't need as many safeguard mechanisms, since the scope is reduced to only the one issue being viewed and fewer updates are expected compared to the board.

So where does this leave us?

If you're trying to enable looser coupling with high throughput between services or make your application responsive to backend changes, it is definitely worth looking into an EDA. We were able to combine a couple of existing platform components with a really simple service in order to deliver a powerful new feature quickly. Best of all, we were able to do so without impacting any other teams – the events we needed were already emitted into the platform and adding consumers was supported by the platform itself.

By extending this event-driven approach to the front-end in a generic way, other teams are now also able to consume these backend event streams, enabling more parts of our front-end to be responsive to backend changes. With the success of this project we're extending our use of EDA in Jira Software, both by the events we produce and consume, and by exploring other techniques enabled by this approach such as event sourcing.