Creating with Rovo: How We Built a Collaborative AI Canvas

Christopher Cheung

4w ago

Rovo is our hero AI solution at the core of our platform. Through the Teamwork Graph, it can search across internal and third‑party knowledge, surfacing the right information from tools like Confluence, Jira, and connected apps. It can also interact with and create objects across the ecosystem – from Jira issues to Confluence pages.

What it didn’t have was a fully featured creation canvas: a collaborative space where people and AI can co-create, iterate on, and refine rich content together in real time. That gap was the catalyst for Creating content with Rovo.

The core insight behind creating with Rovo was simple: Creation should be a collaboration between the user and the AI, not a handoff from one to the other.

This post goes behind the scenes of content creation with Rovo and covers:

Why we designed it as a collaborative AI canvas, not a one‑shot writer
How we handle different content types reliably (pages, whiteboards, databases)
The technical choices behind ADF, SVG, and CSV‑based schemas
The streaming infrastructure that lets Rovo generate and edit rich content safely in real time

Core principles

We didn’t want to build a standalone AI writing tool siloed inside a single product.

Creating with Rovo has an entry point in Confluence, but it’s natively built on Rovo and accessible across every Rovo surface. Creation should feel like a natural extension of the chat experience: you ask a question, co‑author a page, iterate on a whiteboard, and refine a database – all in the same conversation, with the same AI that understands your context.

That led to a few core design principles:

Creation lives inside chat.
Content starts in a canvas within Rovo chat. Once you’re happy with it, you move it into the right space or product surface.
It’s a platform, not a feature.
From the start, the canvas was designed to support multiple content types (pages, whiteboards, databases) and to extend to other products. Create with Rovo for Jira work items recently launched in EAP and follows the same pattern.
Real‑time streaming.
Users see content being created and edited as it happens. This makes the collaboration feel alive and iterative, instead of a single “generate and paste” moment.

Technical implementation

At a high level, Rovo’s backend is built around a top‑level orchestrator agent with access to a set of specialized skills – cross‑product search (including 3rd party products), Teamwork Graph search, a Jira retrieval agent, and more. The orchestrator invokes these skills based on the user’s intent.

To support creating content with Rovo, we introduced a new Confluence creation and editing skill – a purpose‑built agent responsible for:

Understanding the user’s intent,
Selecting appropriate content types and templates,
Producing structured content that downstream systems can trust.

This agent is not isolated. It inherits Rovo’s full context: conversation history, connected knowledge sources, and relevant work items. A user can ask Rovo to:

“Create a project plan based on our Q3 goals,”

and the agent will pull the right context from the Teamwork Graph before generating the document.

Producing Confluence content with an LLM

Confluence content types have very different underlying representations and constraints:

Pages use a rich node‑based structure with deep nesting.
Whiteboards rely on spatial layout and visual elements.
Databases are primarily tabular, with a schema and views.

To maximize quality, we needed to find the right model + output format for each content type. That meant running evaluations across different LLMs and formats, and ultimately landing on different approaches for each. This testing process yielded unexpected discoveries about current frontier LLM capabilities.

Pages / Live Docs

For pages, we needed to:

Support a wide range of rich‑text elements (panels, statuses, complex tables, macros), and
Avoid formatting loss during edits.

For page creation, we have the LLM produce ADF (nested JSON) directly, then pass it through a proprietary ADF repair library. This library:

Handles common LLM output issues (invalid or non schema-compliant JSON), and
Guarantees conformity to the ADF schema before anything reaches the editor.

For page edits, we defined a small set of editor‑style commands that the LLM can call to manipulate the ADF – for example:

replaceNode
insertNodeAfter
removeNode

Within each tool call, the LLM produces ADF as the value. Editing runs as an agentic loop with reflection: think of it as a coding agent, but for Atlassian documents.

Whiteboards

Whiteboards add another layer of complexity: spatial layout and visual semantics.

We evaluated a number of output formats, including:

HTML + CSS
WDF (Whiteboards Document Format – the internal JSON storage representation used)
DSLs on top of WDF
Direct tool‑calling

We created an extensive dataset to use for content generation evals, and had human’s and LLMs (via output screenshots) judge the results. The judgements were centered around visual, connectors/grouping and layout quality. The ability to parse and stream the data also influenced the decision. Ultimately, SVG was the clear winner.

LLMs are trained on a vast amount of data. We found that SVGs drew the firmest parallel to how people think about infinite‑canvas boards (shapes, text, positions), and to what LLMs can reliably produce and understand.

For whiteboards we:

Generate an SVG document when creating a net‑new board, then
Convert it into native whiteboard elements – almost like an SVG import!

<svg viewBox="0 0 1200 800" xmlns="http://www.w3.org/2000/svg">
  <rect id="background" x="0" y="0" width="1200" height="800" fill="#ffffff"/>
  
  <text id="haiku-title" x="600" y="250" font-size="32" font-weight="bold">
    <tspan x="600" dy="0">Haiku</tspan>
  </text>
  
  <text id="haiku-text" x="600" y="350" font-size="24" font-style="italic">
    <tspan x="600" dy="0">Cherry blossoms fall</tspan>
    <tspan x="600" dy="40">Soft petals dance on the breeze</tspan>
    <tspan x="600" dy="40">Spring whispers goodbye</tspan>
  </text>
  
  <line id="decoration-line" x1="400" y1="520" x2="800" y2="520" stroke="#dfd8fd"/>

  <text id="haiku-form" x="600" y="580" text-anchor="middle" font-size="14">
    <tspan x="600" dy="0">5 - 7 - 5 syllables</tspan>
  </text>
</svg>

Ingesting the LLM chunks in realtime is handled by a streaming SVG parser in tandem with constraint solving algorithms to ensure containment and resolve layout overlaps. This allows users to see their Whiteboards be “assembled” in realtime.

For editing, we:

Provide the LLM an SVG representation of the current board with line numbers, and
Expose tools such as:
- delete_svg
- insert_svg
- replace_svg (line‑based)
- update_elements_svg (ID‑based)

We also introduce a special todo_list tool so the LLM lays out its plan before making changes. This simple pattern significantly improved quality for complex, multi‑step edits.

<todo_list>
1. Make all stickies red
2. Move all stickies to the left
3. Create more red stickies below
</todo_list>

Example output for editing – TODO lists come first, always.

Databases

Databases are built by the LLM as three separate CSVs (schema, views, data) wrapped in XML tags. This keeps schema, presentation, and data clearly separated and parseable as it is streamed in.

For creation, the model always produces three CSV sections wrapped in XML tags:

<fields> – schema definition:
field_name,type,depends_on,configuration
<views> – view config:
view_name,layout,filters,sorts,hidden_fields
<data> – sample rows matching the defined fields

For edits, the model receives the database in this representation plus the user’s selection, then outputs two CSVs:

Metadata changes – schema, views, filters, sorts
Data changes – add/edit/delete rows

Each row in these CSVs represents a single, declarative change that downstream code can parse and apply reliably.

Streaming generated content to the client

Determining what the LLM should produce was only half the problem. The other half was delivering that output to the frontend – incrementally, in real time, and across multiple rendering surfaces.

Rovo Chat / Content iFrame communication in Rovo Canvas

The Canvas uses the shared Rovo Chat platform component that is used everywhere.

Rather than building a simplified preview, we render content with the same components used by the main Confluence experience, embedded in an iframe. That means the canvas has full feature parity with the content objects in Confluence.

To support this, we defined a new streaming API contract for LLM actions:

LLM APIs stream partial results as they’re generated
The frontend ingests those partial chunks
Updates are pushed into the content objects that need to render them
The frontend components display a streaming user experience

This has to work on Rovo‑supported surfaces. We used the same commands for creation/editing within the Rovo Canvas as used for editing Content in Confluence directly with Rovo.

Rovo Chat editing regular content in Confluence

From within the iFrame, the user can select text and ask for edits, which needs to be executed through the Rovo Chat component outside of the iFrame. Therefore, it was clear we needed a bidirectional communication channel between chat and content objects:

On the same origin and across origins
Across JavaScript contexts
Without leaking transport details into application code

The result is the Rovo Bridge API – a library that lets distinct applications communicate with each other using local function calls, while abstracting away the underlying transport.

Under the hood:

A Transport abstraction decides how data actually moves:
- BrowserTransport (built on postMessage)
- DesktopTransport (built on Electron IPC)
- …and more to come.
Commands follow a command pattern with strong typing for safety.
Each command includes sender and receiver metadata for traceability and security.
Data is exchanged via https://www.jsonrpc.org/specification over the chosen transport.

Evaluations and monitoring

Offline evals were run daily using comprehensive datasets across all content types, leveraging a novel screenshot-based evaluation approach with LLM judges that assessed not just task completion, but visual quality, tone, and knowledge accuracy – consistently baselined against human feedback.

Online evals tracked success rate and task completion metrics on live customer and internal traffic, providing a real-world signal decoupled from frozen datasets.

For real-time reliability and quality monitoring, the team set up automated health checks that validated agentic flows, ensuring the right sub‑agents, tools, and content actions were invoked correctly, not just that a single high‑level API call succeeded. This is on top of extensive reliability monitoring dashboards and SLOs, with detectors for both sudden error spikes and degradations

Conclusion

Building for AI requires a shift in mindset – the industry is moving so fast, and so the code, prompts, model will continue to rapidly evolve. But a robust eval suite, extensive metrics for online experimentation, and comprehensive reliability monitoring are what allows rapid iteration with confidence.

Creating content with Rovo is a new foundational experience that will serve as the building block for upcoming Confluence AI features. This is only the beginning.