Get hands-on training for JIRA Software, Confluence, and more at Atlassian Summit Europe. Register now ›

The JIRA data anonymizer is a little tool which uses XSLT to anonymize potentially sensitive text in XML backups. Unfortunately it uses up lots of memory, as XSLT works on a DOM tree and so has to load the whole XML document into memory.
We are solving this properly in 3.7 with a built-in anonymizer, but since a customer needed a solution now, I rewrote the anonymizer in STX, an XSLT-like language that is built on a streaming parser (SAX I imagine) instead of DOM. The STX template looks like this:

<?xml version=”1.0″?>
<!–
JIRA Data anonymizer
Written in STX (http://stx.sourceforge.net/), an XSLT-like language which uses
relatively little memory.
Copyright (c) 2005-2006 Atlassian
$Revision: 1.3 $
–>
<stx:transform xmlns:stx=”http://stx.sourceforge.net/2002/ns” version=”1.0″ pass-through=”none”>
  <stx:template match=”Action/@body | */@description | Issue/@environment | Issue/@summary | NotificationInstance/@email | ChangeItem/@newstring | ChangeItem/@oldstring | FileAttachment/@filename | NotificationScheme/@name | PermissionScheme/@name | Resolution/@name | CustomFieldValue/@textvalue | OSPropertyText/@value | Project/@url”>
    <stx:attribute name=”{local-name(.)}”><stx:value-of select=”translate(., ‘0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ’, ‘xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx’)”/></stx:attribute>
  </stx:template>
  <stx:template match=”Action/body/text() | */description/text() | Issue/environment/text() | Issue/summary/text() | NotificationInstance/email/text() | ChangeItem/newstring/text() | ChangeItem/oldstring/text() | FileAttachment/filename/text() | NotificationScheme/*/text() | PermissionScheme/*/text() | Resolution/*/text() | CustomFieldValue/textvalue/text() | OSPropertyText/value/text()”>
    <stx:value-of select=”translate(., ‘0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ’, ‘xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx’)”/>
  </stx:template>
  <!– Default rule – copy everything across –>
  <stx:template match=”node()|@*” priority=”-1″>
    <stx:copy>
      <stx:process-attributes />
      <stx:process-children />
    </stx:copy>
  </stx:template>
</stx:transform>

This parses arbitrary-sized XML file in constant memory (as a test I transformed 280Mb with -Xmx32mb in ~2 mins). If you ever need to do a simple text transformation on a large XML file, STX is a very useful little tool.

Fresh ideas, announcements, and inspiration for your team, delivered weekly.

Subscribe now

Fresh ideas, announcements, and inspiration for your team, delivered weekly.

Subscribe now