How it started
What happens when a project estimated at two to three quarters gets built in 14 working days?
In 2022, the team built App Updates, a feature that gives organization admins access to release notes about Atlassian changes and controls over how they’re delivered (Keep up with changes | Atlassian Support). It was a great experience for admins, but every other user was stuck with basic in-product change notifications.
By 2024, we started asking – why not give everyone access to App Updates? Seeing what’s new and what’s coming is valuable for every type of user, not just admins. But the project to build a public version was estimated at two to three quarters, and with competing priorities, it never got funded.
Then we got Rovo Dev.
The question became: how fast can we build a public version with AI-assisted coding, using just 2 part-time frontend engineers (each at ~30% availability) and 2 part-time backend engineers?
The answer was 14 working days for a fully functional frontend, covered by unit tests. First pull request landed on Feb 19, 2026 where core frontend functionality was complete by Mar 6th. From there, the team spent another 5 weeks on analytics, integration tests, design alignment, and production hardening, making it 7 weeks total from first commit to production.
For context, the manual estimate was 6 months. That’s over 3.5× faster.

Knowledge graph
Rovo Dev, similar to other AI, operates with patterns, a so-called “knowledge graph”.
At a high level, AI functions as an advanced autocomplete system – it automatically completes content, even entire files, by selecting the most contextually appropriate options based on its training data, prompt, and current context.
To get the best results, instruct the AI agent to “copy” or “create something similar” by providing specific examples that enhance its knowledge graph and ensure that autocomplete has relevant references to consult.
Project knowledge wiki
Every time you ask a question about your repository, the AI agent does research, going through all of the code. This not only wastes tokens and time, but also produces inconsistent results.
Instead, pick the most expensive model you have access to, do deep research on the codebase, and dump it into the local wiki folder, explaining architecture, pitfalls, best practices, code paths, etc. Next time, your task can reference the wiki and perform a limited search through part of the repo instead of going through everything again.
LLM Knowledge system from Andrej Karpathy

Continuous improvement through Memory
We want Rovo Dev to learn from its mistakes and continuously improve. Self-learning → improving cycle is what will make our AI better every time we use it.
We solved this by adding a system prompt to record any challenges Rovo Dev faces in .agents/memories/ folder. Some test is failing that is not supposed to fail? The file is not located where it should be. Record it as a memory.
Then pick up memory files periodically and improve skills/wiki so this error doesn’t happen again.
Jira tickets as tasks, memory, and audit logs
How good are your Jira stories? Sometimes developers only get a few lines of description, and in days, weeks, or months when someone tries to pick it up… it was hard to understand what’s supposed to be done.
If Rovo Dev can code and has context about the codebase, why don’t we use Rovo Dev to create well-defined Jira tickets with all the depth and relevant technical information? Switch to Orchestrator agent mode:
> You are tasked to create a Jira ticket with a new task. > Use create-story skill. The task is to move from tire based cache > to flattaned cache for cookies keys. Ask me questions to clarify one by one.
This ticket not only has great depth and quality, but can also easily serve as a starting prompt for your Executor agent.
Code quality guardrails
It is quite hard for AI to understand if the code is written correctly or not. Is it working and doing what it is supposed to do?
Any type of “tests” is a clue for AI, a way to validate that it produced the expected outcome.
- Types (TypeScript, Python, etc.) assist AI in understanding code by clarifying the expected variable types and potential parameter variations, eliminating the need for guessing. Type checks serve as the first line of defense for code quality – once they pass, we can advance to the next stage
- The next level is unit tests – comprehensive unit test coverage is AI’s “eyes” – it checks the output of the code, validates what it produces. Once they pass, we can advance to the next stage
- Integration tests allow AI to “see” the entire application. They confirm that the app compiles successfully and verify how humans interact with it. AI uses these observations to ensure that interactions are done correctly.
- VR tests – the next bastion is how UI actually looks like – for the AI, it does not matter if a pixel is red or green. But the VR test “teaches” the difference and that it matters.
- The final bastion of quality is Pollinator E2E tests and PDV (post-deployment verification) checks on staging and production before exposure to customers. Historically Pollinators were expensive (in terms of time) to maintain, not anymore with Rovo Dev.
- Bonus point: to ensure our AI has good test coverage, we can introduce Stryker mutation testing. I haven’t done it myself yet; this is my next step!
In summary, tests are a way we give AI “eyes” in the application we built. Give it an understanding of how humans interact with applications and what an application should look like.
Developing tests is cheap – AI is great at writing them, I would argue it does it better than engineers – we have always seen tests like a chore or “tax” we need to pay. It does not matter for AI.

Pro Tip
Prompt to Rovo Dev:
You are tasked to create a prompt for another agent that will generate a new image for me. The image should be an AI writing a lot of tests and look very happy. Ask me questions one by one to create better prompt
….. questions/answers …..
Excellent! I have everything I need. Here’s the final image generation prompt based on your answers:
A cartoon illustration of a friendly android with subtle robotic features (a glowing screen-like face with circuit patterns on its skin) sitting in a minimalist white room. The android has a huge, joyful grin with star-shaped sparkling eyes, radiating pure happiness. It is surrounded by multiple floating screens, each displaying lines of test code being rapidly typed — screens show snippets like describe(), it(), expect(), and green checkmarks. The android has multiple arms, each typing on a different keyboard or screen simultaneously. The overall mood is energetic, joyful, and chaotic in the best possible way. Clean white background, bright and vibrant cartoon style, high detail illustration.
Engineering knowledge is more valuable than ever
My first fear was that my engineering skills and experience would become irrelevant in a world where AI can produce okay-quality code very fast. I quickly realized that I was wrong.
In a new world of AI-assisted engineering, there are 2 types of people:
AI operators
People who have limited technical depth and experience coding and building large, scalable applications. This could be non-engineers such as PMs, EMs, or some junior engineers.
They focus on
- Create a quick PoC or prototype – there is no need or intention to build a large, scalable, secure system; they want to quickly “hack” something that works and looks okay to prove the concept
- Add features and fixes into existing well-set-up for AI usage projects – using the Knowledge graph and existing patterns and skills already setup in the repository
AI engineers
People who have depth of experience building complex applications and are actively learning fast-growing AI tools, practices, and ecosystems.
Their strength is that they can do what AI operators can do, but they can also ship production-ready features using AI assistants, guide AI through complex systems, and ensure the solution AI builds is scalable and aligned with the system’s architecture.
They add skills, knowledge graphs, and code patterns to enable AI to produce high-quality code.
As of right now, we are in more need of good engineers than ever before.
Practical tips
… or what we learned along the way while building Public App Updates with Rovo Dev.
Point to examples
If you want something similar to X, point Rovo Dev to the example, ask to analyze and create a plan for how this will be achieved in the current working folder.
Use create-plan skill. Load this example folder "/../afm/.../org-app-updates", analyze the code and create a plan to create a new table with list of updates similar to it. Ask me questions one by one.
Tell Rovo Dev to ask questions
You think that you told Rovo Dev everything but you didn’t. Ask questions turns on interactive mode (you can use skill for that) that prompts Rovo Dev to find gaps in your design and ask questions to clarify them.
Use skill ask-questions, interview me about this plan_table.md going into each of the decisions and exploring pros/cons and gaps.
Use the right MCP servers
You need to make sure that you are using your organisational design components. You can either point to documentation / use the local knowledge graph or use the right MCP for it.
Context optimization through Skills
Skill is an additional prompt injected into your context to fine‑tune Rovo Dev’s behavior.
The core idea is that a smaller context is better: we must strike a balance by providing the AI agent with sufficient information to complete its task while avoiding irrelevant details. AI agents work better with smaller contexts, injecting Design system rules in the context, while it is tasked to fix the Bitbucket pipelines configuration file, “pollutes” the context, and makes AI worse.
There are many resources explaining what a skill is; I simply want to highlight that they are excellent for refining tasks and preventing the use of overly large contexts, which can degrade AI agent performance.
Create scripts for repeated tasks
If you find Rovo Dev creating scripts on the fly, consider generating a script and reusing it. This will improve AI Agent performance and give you consistent outcomes
.agents/skills/batch-upload/SKILL.md Use this skill when you are batch uploading source maps. Execute .agents/skills/batch-upload/scripts/batch_upload.py to perform this action.
Optimize your cost
It is easy to just use the most expensive model (for example, Opus), but AI usage is not free for any organization – you can hit your limits fast, “hack” the system by switching sites, etc. Instead, organize your models into “Advisors” and “Executors”.
Your Advisor or Orchestrator (Opus) creates a plan and passes the plan to the Executor subagent (Sonnet) to execute.
AI can understand what the app “looks like”
We can give AI “eyes” and access to the app “looks” using a few tricks:
- Add VR tests – they are snapshots of pages and give AI an understanding of how pixels are organized and when something does not look as expected
- Add
playwright-cliskill with storybooks – we can give Rovo Devplaywright-cliskill that will allow it to open the app, inspect components, understand generated CSS, and fix issues
Impact?
Here’s the bottom line – a fully working, production-ready frontend+backend for Public App Updates, in 7 weeks from first commit to production (Feb 19 – Apr 9, 2026). A traditional manual build was estimated at 6 months. The numbers speak for themselves:
- 7 weeks actual build time (first commit → production-ready) vs 6 months estimated manually — that’s over 3.5× faster
- ~14,000 lines of code written in total (source + tests)
- 26 unit test files with ~322 individual test cases, plus 3 integration test files — comprehensive coverage from the start
- 30 React components across a clean, modular package structure
- Core frontend functionality was done in just 10 days (Feb 19 – Mar 4) — the remaining weeks were spent on analytics, integration tests, deployment pipelines, and production hardening
This isn’t just about speed – the quality bar was not lowered. Full TypeScript, full unit test coverage, integration tests, and a scalable package architecture. Rovo Dev didn’t cut corners; it eliminated the slow parts.
Areas to improve in your team
Make sure engineering teams are embracing AI engineering
Not everyone is on the same stage of AI engineering. Help teams who are behind; learn from teams who are ahead. Play, as a Team
Reduce meetings, alignment time
Explore how we can leverage AI, smaller teams, to reduce the number of meetings or time-to-align on direction/user experience/etc.
Reduce Pull Request review time while keeping a high-quality bar
Pull Request reviews still is a critical quality bar. There is a worrying direction of “no comments approved” – the volume of submitted code is huge. It is hard to review everything manually.

