3 (More) Lessons from Building Software Development Tools for Google MarketPlace

(Part I)

In the previous article, I described how we came up with the idea for the Jira Studio Activity Bar as part of our Google Market Place and Jira Studio integration. Things were going pretty well. We were able to get data from GApps using the Google Apps data APIs using OAuth for authentication. We were able to connect to the GTalk servers and retrieve a users buddy list and get buddy updates. We could send and receive messages to buddies. More importantly, we were able to do it in a scalable way using Comet techniques, such as long-polling.

You didn’t think it was going to be that easy, did you?

Things were going well. Which is to say, the ground hadn’t fallen out from under us… yet. Then someone on the Jira Studio team realized that even though we going to be deploying to Tomcat 6 and using Atmosphere to take advantage of asynchronous IO, we weren’t really going to be saving that many resources. He explained that the standard Studio setup is to run all our application servers behind Apache. Ok, that’s typical enough. What’s the big deal? Well, apparently Apache can be a bit of a resource hog.

Your asynchronous IO is nice all, but…

The Studio Apache configuration uses prefork mode. Each of those Apache processes takes up about 20MB. And since Apache uses blocking IO, all the effort to scale on the application server side might be for nothing if we tied up a bunch of Apache processes that would eat up the limited amounts of memory available. We discussed switching to using worker threads configuration, but decided it didn’t really buy us that much in savings. We suggested deploying all the applications behind a web server implemented using asynchronous IO, like Nginx, but that would have been too big a configuration change to make without adequate time to test.

Finally, we hit upon a solution. We can just have browsers connect directly to the application server that the ActivityBar webapp was running on! All the regular Studio apps could continue running at http://yourcompany.jira.com/ and the ActivityBar webapp would run at http://yourcompany.jira.com:8000/ or something similar. Users wouldn’t care because the Activity Bar webapp is only ever accessed in the background. Ok, we can put that issue to bed…

You can’t do that!

Wait, what’s that, you say? Oh no, you’re right! Now we’re violating the same origin policy that we had been able to avoid before! Hold on a second, though. If my memory serves me correctly, I remember seeing a solution to this very problem when we were working on the new dashboard system in Jira 4. There is an RPC system that allows the gadget – which is in an iframe and in some circumstances is loaded from a different origin than the container – to send messages to the container. It has about 5 different implementations for all the different variations of browsers that are out there – it uses window.postMessage on all the latest browsers that support it, some funky VBScript for IE 6 & 7, a few other techniques for older Gecko and Webkit, and the a fallback method using nested iframes. Could we use that? It’s been developed and battle hardened by the Google developers, and it would fit our needs perfectly. But how easy would it be to just drop it in?


As luck would have it, the Shindig RPC JavaScript code can be pretty easily adapted to run outside Shindig. All we needed to do was create an iframe that is loaded from the ActivityBar webapp, use that to do all of our Ajax requests and long polling with the ActivityBar server, and use the adapted Shindig RPC system to send messages between the application page and the iframe. Phew, disaster averted.

But wait, what about…

The HTTP spec, section 8.1.4 specifies that a “single-user client SHOULD NOT maintain more than 2 connections with any server or proxy.” Most browsers these days tend to stretch that number to a maximum of 8 connections. IE6 and IE7 still respect that 13 year old suggestion, so if a user opens more than two windows or tabs to their instance of Studio, each with the ActivityBar running, in IE6 and IE7 that will eat up all their connections and additional windows or tabs will appear to hang while they wait for a connection to be available. But wait, it gets better! If your Studio usage habits are anything like mine then you’ve typically got 6-10 tabs open with issues, issue searches, reviews, wiki pages, etc. So even on browsers where that limit is stretched, you’re likely to run into the same problem.

A bit of research turned up a number of workarounds, but the one we chose was to use multiple subdomains. Since we had already done the work to solve the same origin policy we didn’t need to worry about that, it would “just work”. The big question to answer was whether or not we’d be able to make this change in our deployment environment on such short notice. The guys over at Contegix came through for us on this one and got the deployment environment for Jira Studio modified to setup the DNS aliasing that we needed. Now, if you look in Firebug or other web browser debugging tool, you’ll see connections being made to http://chat1234.yourcompany.jira.com in one tab and http://chat4321.yourcompany.jira.com in another.

Lessons Learned

  1. Coordinate with your administrators. We, the development team, had it all planned out how the chat portion of the Activity Bar could scale. But we didn’t fully understand the deployment scenario, so all that planning was almost worthless. We were lucky to realize the situation we were in before we got to deployment and all the issues got worked out. But it very easily could have happened differently and customers would have suffered for it. Always talk to the administrators that will be deploying your software so everyone fully understands how things need to work.
  2. The same origin policy can be a real pain! This is a bit of a repeat from the last post, but it popped up to bite us again when we were accessing our own services! Fortunately, with a little research, we were able to leverage an existing, known solution to solve our problems. With the open source community having the breadth and depth of knowledge that it does, it always pays to look around and see how others have solved the problems you run into and, if you’re lucky, you can leverage their solutions.
  3. HTTP isn’t really meant for chat. That 2 connection limit is a strong indication of that. It is an old recommendation, but even modern browsers have an 8 connection limit for good reason. But HTTP is meant for doing request/response interactions, not 2-way communication. It will be nice, in 20 years, when we all the browsers have support for HTML5 and WebSockets so we can do real 2-way communication. That being said, I’m happy with what we’ve managed to achieve here.

Now that it’s all over…

Phew! We can finally step back and enjoy the fruits of our labor! Those were some major technical hurdles that we had to learn about and overcome in a short amount of time. It’s hard to believe all that took place in the short span of a few months. Everyone at Atlassian and Contegix really pulled together nicely to make it all happen. And, again, special thanks to Jean-Francois Arcand from the Atmosphere project for rapidly fixing bugs that I reported!

But you know all that I just said about stepping back and enjoying the fruits of our labor? Ya, you can pretty much forget about the whole stepping back thing. We installed this on our Studio instance so we could dogfood it and already have plenty of ideas for improvements. Among those are

  • More quick add links – We’ve got a quick link so you can insert, into your chat, a link to the current page you’re on with a single click. On the “Upcoming Events” Google Calendar tab there is a “Quick add” button. But we can also add quick add links for issues, wiki pages, and other convenience functions.
  • Auto IM Translation – This is a really cool feature implemented by one of our developers during our last ShipIt that will automatically translate chats that you receive into the language you choose.
  • Streaming application updates – Right now the application feeds are fetched from the browser when they are needed and they are only fetched once to avoid sending 7-8 spurious HTTP requests. It would be much cooler if the ActivityBar webapp made these requests on behalf of the browser and, using the communications link already established, sent updates as they were found. Then you could sit on your Jira dashboard and have continuous feedback on everything going on in your project.
  • Installable on your own servers? – One of our goals in all of our integration work with Studio has been to make it possible for as much of what we do to benefit our behind the firewall customers as well. There is a little bit of work in the ActivityBar webapp itself that still needs to be done to make this possible – we’re currently hard wired to connect to the GTalk servers, for instance. Most of the effort to make this work in behind the firewall settings will need to be done in the apps themselves. We had to hack customize each application to get the ActivityBar to show up on every single page across every single app, something that isn’t possible with the current plugin system in our applications. Something that comes close is Web Resource Contexts in Confluence, but there are a few additional bits of information that are required that can’t be provided purely as web-resources. But stay tuned, cause if we do that you just know it will be pluggable like all Atlassian’s other apps and you’ll be able to add your own application tabs!

This is part two of a two part blog. Part I

3 (More) Lessons from Building Software Developmen...