What We’re Building

We’ve built a platform that will be the secure execution engine behind Atlassian’s AI agent infrastructure.

As part of this we needed a way to run Firecracker microVM orchestrator on Kubernetes. So we built a system for it called Fireworks.

You submit an OCI container image and a command, and it boots a hardware-isolated Firecracker VM, runs your workload. Features include 100ms warm starts, live migration between hosts, eBPF network policy enforcement, shared volumes , and snapshot filesystem restore, sidecar sandboxes. To do this we had to build a scheduler, autoscaler, node agents, envoy ingress layers, raft persistence, and much more.

The Setup Is Simple

I’m not doing anything groundbreaking. My workflow looks like this:

  • Three workspaces, each checked out on a different branch, each with an agent working
  • Split terminal: agent on one side, a shell on the other so I can poke at things while it works
  • Always have something running. If your agents are idle, you’re leaving productivity on the table
  • Treat code as a black box. If you can comprehensively validate via inputs and outputs, you often don’t need to read the code and what it’s doing

Treat It Like a Real Engineer

Just like you wouldn’t expect a human to ship working code without access to a real environment, your AI needs end-to-end access too. Give it the tools to verify its own work:

  • Access to deploy and test in a dev environment. This is where autonomous development really clicks. The agent catches its own mistakes and fixes them in a loop
  • Raising PRs, spawning independent agents for self-review, addressing feedback, reading pipeline output, updating tickets
  • It needs to be involved in all parts of the SDLC, not just code generation

Queue prompts. Read what it’s doing and anticipate when it will return and what the next task will be.

Some common ones for me:

  • Write e2e tests and deploy in dev shard then loop issues while working e2e
  • !review-PR (a prompt shortcut to spin up an independent sub agent to review)
  • Create PR, wait for CI branch builds, address issues, then address comments on PR from Rovo Dev PR bot

Context Feedback Loops

Going hand in hand with verification. Encourage iterative development using the e2e environment. Architect your dev environment so you can have real independent shards for developers to use that won’t break anyone else.

Skills

Skills are useful for specific domains or common actions within your repo.

Internally we’ve built lots of skills!

Skills for PRs, using CLI, specific domains like Raft, gRPC.

We’ve built a meta-workflow/orchestration skill for Fireworks development. It doesn’t do one narrow technical thing, instead it gives the agent a set of “golden path” loops for how to work on Fireworks changes end‑to‑end.

Another example is a skill that automates deploying, operating, and tearing down isolated Fireworks dev shards on the shared AWS scms Kubernetes cluster.

Subagents / personas

For review, have an adversarial persona subagent that spins up and reviews what the main agent has written.

I have this one tied to a !review-pr prompt shortcut that spins it up as an independent subagent.

How We Validate

With no hand-written code, validation is everything. Our approach:

• AI writes the e2e tests too. The agent writes tests, deploys to a dev shard, runs them, and loops on failures until they pass. The test suite is the primary proof that things work.
• Dev shard loop: Every feature gets deployed to an isolated dev shard on a real cluster. The agent deploys, tests e2e, fixes issues, redeploys. This catches integration issues that unit tests miss.
• CI pipeline as quality gate: Every PR runs lint, vet, tests, and Helm validation. The agent reads pipeline output and addresses failures before requesting review.
• Progressive rollout: main deploys to dev without PRGB, so we can validate internally fast. Production gets canary deploys across multiple clusters.
• If I need to verify, I test outputs, not read code. Submit a job, check it boots in 100ms, verify migration preserves state, confirm network policy blocks what it should. Black box validation.

Your Team Needs to Be Agentic Too

If you’re blocked on human review, your throughput is gated by the slowest reviewer. Teams need to embrace AI-assisted reviews and shift their attention to the high level: architecture, design intent, risk, rather than nitpicking details. The agents can handle the details.

For bigger, scarier PRs: spin up an independent agent to review before a human even looks at it.

Consider a prod branching model. We need to move fast, and we have made great progress in our team having main deploy to dev without PRGB (Peer Review/Green Build). This lets us ship to internal test cases and ourselves faster. We can’t afford waiting hours for a human PR, especially in a multi-timezone world.

Invest in Your AI Setup

Spend real time on your repo’s AI configuration: skills, agent definitions, memory files. Continuously update them. This isn’t set-and-forget; it’s a living system that gets better the more you feed it.

Your Role Changes

You become more of an architect and builder. Work with the AI to explore architecture options. Let it suggest implementation details. Press it with your domain knowledge. Then let it implement.

When something important is happening, read along to the thinking as the agent works. You don’t need to write the code, but you should understand what’s being built.

Talk to your code through the Agent

Want to know how something works? Ask the agent. It explores the repo and returns a natural language explanation, often with key implementation snippets.

Want to make a small adjustment, suggest an improvement, or propose a new approach? Ask. It’s especially powerful to explain why you want that change. This lets the model use its knowledge to contextualize the goal, often leading to stronger outputs.

An agent-driven interface or tool suits this paradigm, keeping you focused on the outcome while delegating details to your agentic coworker.

Mitigate risk with strong design, not manual review

If you’re not hand-writing code, your safety net shifts:

  • CI/CD pipelines: your automated quality gate
  • Sharding: limit the blast radius of any single change
  • RBAC (Role based access control) / JIT (Just in time) access: control who (and what) can write
  • Progressive rollouts & canary deploys across multiple clusters
  • AI-written e2e tests: this is your primary validation harness. If you’re reading any code, read the tests

What’s working really well with Rovo Dev

The all-in-one access to Atlassian products is genuinely great. Having Bitbucket and Pipelines integration in the agent has been a game changer. The agent can raise PRs, read diffs, and monitor builds without leaving the conversation. It makes it a seriously compelling daily driver.

Outcome

We’re building things we never would have committed to before. Too long, too complex, not enough domain expertise on the team. Even two months ago, I wouldn’t have believed we’d have a Firecracker-based microVM platform with 100ms warm starts and live migration between hosts, built in four weeks, entirely by LLMs.