Atlassian Rovo Dev Research: What Types of Code Review Comments Do Developers Most Frequently Resolve?

Published November 10, 2025 in How we build

Jirat Pasuksmit

Senior Machine Learning Engineer

Kla Tantithamthavorn

Principal Machine Learning Researcher

Rovo Dev flags 2.8× more bugs and 1.4× more maintainability issues than humans, helping Atlassian teams resolve the code review comments that matter most and ship quality code faster.

What is Rovo Dev Code Reviews?

Rovo Dev is an LLM-powered agent that can review pull requests and post actionable suggestions to accelerate merges and deployments. It targets issues developers most often act on, i.e., readability, bugs, and maintainability, and formats each suggestion so authors can apply the in-line changes by just one click.

Beyond single comments, Rovo Dev raised important quality and maintainability issues while respecting repository style guides. The goal is simple: handle the high-volume checks and suggest actionable (resolvable) comments, in order to free up human time for more important tasks. Therefore, we need to answer the question “What Types of Code Review Comments Do Developers Most Frequently Resolve?”

Atlassian’s latest research, “What Types of Code Review Comments Do Developers Most Frequently Resolve?,” accepted at the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025, Seoul, South Korea), which is one of the most prestigious software engineering conferences (Ranked CORE A*). This research is in collaboration with Dr. Patanamon Thongtanunam from The University of Melbourne.

In this research, we investigated an LLM-classifier to investigate the types of review comments written by humans and Rovo Dev, and which types are frequently resolved by developers.

Breaking Down Code Review Comments

Our research categorizes code review comments into 4 (+1) main types.

Readability comments focus on making code easier to understand, such as improving naming conventions, formatting, or removing unused variables.
Bug comments identify faults in functionality, like logical errors, incorrect variable types, or resource management issues.
Maintainability comments address long-term code health and security, suggesting changes like refactoring, improving documentation, security and vulnerability issues, or following best practices.
Design comments deal with broader architectural or structural choices, such as proposing new implementation patterns, reorganizing code, or refining data models.
No Issue comments are not directly relevant to the four types above but focus on raising open-ended discussions, affirmation, humor, or acknowledgment.

How Rovo Dev and Human Reviewers Differ

Approach:
To compare Rovo Dev and human reviewers, we used OpenAI’s GPT-4.1 to automatically classify thousands of code review comments into four types: readability, bug, maintainability, or design. Prompts were carefully crafted and refined, with the model providing both explanations and confidence scores for each classification.

Reliability & Sanity Check:
Six experts manually reviewed 336 internal comments to refine prompts and confirm strong agreement. For further validation, two annotators independently labeled 100 comments, with third annotator resolving any disagreements. We found moderate agreement between human and LLM classifications, supporting the reliability of our approach.

Dataset:

In this work, we analyzed around 4,000 review comments from Atlassian internal projects and 1,000 from open source projects. The key distinction is that internal projects mainly use Typescript, Javascript, and Kotlin, while open source projects are mostly Java, C++, and Python.

Results:

When we compared Rovo Dev’s code review comments to those written by humans, we found each brings unique strengths.

Atlassian’s internal projects, Rovo Dev generated a higher proportion of bug (2.8x) and maintainability (1.4x) comments, while human reviewers focused more on readability and slightly more design. In addition, Rovo Dev is focusing only on actionable comments with only 0.6% classified as “No Issue”.

Figure 1: The distribution of human-written vs. Rovo Dev comments for Atlassian’s internal projects

For open source projects, Rovo Dev leaned into maintainability (1.6x), whereas human reviewers provided more design-related feedback. This complementary focus means Rovo Dev helps teams catch more actionable issues, while humans contribute deeper contextual and architectural insights.

Figure 2: The distribution of human-written vs. Rovo Dev comments for open-source projects

Which Comments Actually Get Resolved?

Figure 3: Example of code-resolve UI in Bitbucket

Not all comments lead to code changes. Our analysis showed that developers are most likely to resolve comments about readability, bugs, and maintainability. These comment types are found to be more specific and actionable, i.e., code changes were made to resolve the comments, with resolution rates between 36% and 43%. However, design comments, which often require broader changes or more discussion, were less likely to be resolved right away in the same pull-request. By focusing on the types of comments that developers actually address, Rovo Dev helps teams move faster and improve code quality, while human reviewers can focus on higher-level improvements.

Figure 4: The resolution rate for each Rovo dev’s comment type in Atlassian projects.

Key Takeaway

Focusing on the types of code review comments that developers actually resolve, like readability, bugs, and maintainability, can make code reviews faster and more effective. Software teams at Atlassian integrated Rovo Dev in their workflows, combining Rovo Dev’s targeted, actionable suggestions with the broader insights of human reviewers. Now, they can catch more issues, reduce bottlenecks, and ship higher-quality code with greater confidence.

Keen to try? Curious to learn more? Check out Rovo Dev today!

Atlassian Rovo Dev Research: What Types of Code Review Comments Do Developers Most Frequently Resolve?

What is Rovo Dev Code Reviews?

Breaking Down Code Review Comments

How Rovo Dev and Human Reviewers Differ

Which Comments Actually Get Resolved?

Key Takeaway

How We Unlocked Performance at Scale with Jira Platform

Pull request intervention for infrastructure-as-code risks with Bitbucket custom merge checks

Building a Multi Region Compliant Customer Data Lake at Scale

Mobbing with AI

What is Rovo Dev Code Reviews?

Breaking Down Code Review Comments

How Rovo Dev and Human Reviewers Differ

Which Comments Actually Get Resolved?

Key Takeaway

More in How we build

How We Unlocked Performance at Scale with Jira Platform

Pull request intervention for infrastructure-as-code risks with Bitbucket custom merge checks

Building a Multi Region Compliant Customer Data Lake at Scale

Mobbing with AI