Introduction

As technical writers working on a wiki, one of the problems we face is that Confluence doesn’t really have any in-built feature for detecting a range of content errors. I refer to things like broken internal links, broken images, broken Jira issues macros (specifically), any macros that have failed to render (in general) and inclusions that cannot find the page to insert (covering {include} and {excerpt-include} macros). Some other problems are that you tend to end up with a lot of pages and edits from many wiki contributors, and migrating your system between versions (something we do very often) can occasionally affect your content.

To address this problem, I’ve been threatening for some time to write a program that scans for content errors in Confluence, and quickly creates a detailed report. During our recent Tech Writer innovation day, I finally coded it up. It’s a Python script, and it connects to a Confluence instance, logs in, gets a list of pages in a given space using the XML-RPC library, then checks the source of each page for content errors using regular expressions. Best of all, it completes all this on the hefty Confluence documentation set in around one hour.

I call it “The Busted Stuff Report“.

Program Details

The Busted Stuff Report scans for a range of content issues as mentioned above. As it scans, it builds an interactive HTML report, as shown:

The Tech Writer who owns the documentation space can then simply click on the links in the report to view the problems and fix them directly. It also prints pictorial ASCII-art checks and crosses into the console, so you can get a subliminal sense of the number of errors as they whiz past. Finally, it also builds a detailed log file on the local drive for troubleshooting:

Viewing the Source

You can check out the (open) source code for the Busted Stuff Report on BitBucket:

https://bitbucket.org/edawson/busted-stuff-report/src/fecc38a2a47d/BustedStuffReport.py

To run the script, you’ll need Python 2.7.2. If you’re going to try it out, be aware that the script is designed for our specific use case at Atlassian, where we have Confluence as our public-facing system for delivering online documentation. As a result, it only scans pages that are open to anonymous users (the general public). This also helps it catch specific content issues related to restricted-content-embedded-in-unrestricted-content (a particularly curly problem). Because of this, ensure that the username and password you initially supply to the script has the same permissions as an anonymous user. This open-content-only approach means that you can run the script from anywhere you have Internet access to your Confluence server.

Summary

The Busted Stuff Report is now part of our Tech Writing toolset for improving maintenance of the Atlassian documentation. When something goes wrong, the Busted Stuff Report can find it, fast.

Thanks for reading.

Fresh ideas, announcements, and inspiration for your team, delivered weekly.

Subscribe now