Atlassian Rovo Dev Research: What Types of Code Review Comments Do Developers Most Frequently Resolve?

Atlassian Rovo Dev Research: What Types of Code Review Comments Do Developers Most Frequently Resolve?

Rovo Dev flags 2.8× more bugs and 1.4× more maintainability issues than humans, helping Atlassian teams resolve the code review comments that matter most and ship quality code faster.

What is Rovo Dev Code Reviews?

Rovo Dev is an LLM-powered agent that can review pull requests and post actionable suggestions to accelerate merges and deployments. It targets issues developers most often act on, i.e., readability, bugs, and maintainability, and formats each suggestion so authors can apply the in-line changes by just one click.

Beyond single comments, Rovo Dev raised important quality and maintainability issues while respecting repository style guides. The goal is simple: handle the high-volume checks and suggest actionable (resolvable) comments, in order to free up human time for more important tasks. Therefore, we need to answer the question “What Types of Code Review Comments Do Developers Most Frequently Resolve?”

Atlassian’s latest research, “What Types of Code Review Comments Do Developers Most Frequently Resolve?,” accepted at the 40th IEEE/ACM International Conference on Automated Software Engineering (ASE 2025, Seoul, South Korea), which is one of the most prestigious software engineering conferences (Ranked CORE A*). This research is in collaboration with Dr. Patanamon Thongtanunam from The University of Melbourne.

In this research, we investigated an LLM-classifier to investigate the types of review comments written by humans and Rovo Dev, and which types are frequently resolved by developers.

https://atlassianblog.wpengine.com/wp-content/uploads/2025/11/codereview.mp4

Breaking Down Code Review Comments

Our research categorizes code review comments into 4 (+1) main types.

How Rovo Dev and Human Reviewers Differ

Approach:
To compare Rovo Dev and human reviewers, we used OpenAI’s GPT-4.1 to automatically classify thousands of code review comments into four types: readability, bug, maintainability, or design. Prompts were carefully crafted and refined, with the model providing both explanations and confidence scores for each classification.

Reliability & Sanity Check:
Six experts manually reviewed 336 internal comments to refine prompts and confirm strong agreement. For further validation, two annotators independently labeled 100 comments, with third annotator resolving any disagreements. We found moderate agreement between human and LLM classifications, supporting the reliability of our approach.

Dataset:

In this work, we analyzed around 4,000 review comments from Atlassian internal projects and 1,000 from open source projects. The key distinction is that internal projects mainly use Typescript, Javascript, and Kotlin, while open source projects are mostly Java, C++, and Python.

Results:

When we compared Rovo Dev’s code review comments to those written by humans, we found each brings unique strengths.

Figure 1: The distribution of human-written vs. Rovo Dev comments for Atlassian’s internal projects

For open source projects, Rovo Dev leaned into maintainability (1.6x), whereas human reviewers provided more design-related feedback. This complementary focus means Rovo Dev helps teams catch more actionable issues, while humans contribute deeper contextual and architectural insights.

Figure 2: The distribution of human-written vs. Rovo Dev comments for open-source projects

Which Comments Actually Get Resolved?

Figure 3: Example of code-resolve UI in Bitbucket

Not all comments lead to code changes. Our analysis showed that developers are most likely to resolve comments about readability, bugs, and maintainability. These comment types are found to be more specific and actionable, i.e., code changes were made to resolve the comments, with resolution rates between 36% and 43%. However, design comments, which often require broader changes or more discussion, were less likely to be resolved right away in the same pull-request. By focusing on the types of comments that developers actually address, Rovo Dev helps teams move faster and improve code quality, while human reviewers can focus on higher-level improvements.

Figure 4: The resolution rate for each Rovo dev’s comment type in Atlassian projects.

Key Takeaway

Focusing on the types of code review comments that developers actually resolve, like readability, bugs, and maintainability, can make code reviews faster and more effective. Software teams at Atlassian integrated Rovo Dev in their workflows, combining Rovo Dev’s targeted, actionable suggestions with the broader insights of human reviewers. Now, they can catch more issues, reduce bottlenecks, and ship higher-quality code with greater confidence.

Keen to try? Curious to learn more? Check out Rovo Dev today!

Exit mobile version