Skip to content

(The pain of) managing content in Confluence

Creating content in Confluence is easy and fun. It’s not perfect – in fact even content creation in Confluence still has a long way to go, but compared to the competition the tools are excellent and the important thing is: it just works. People do create content, in great quantity at great speed. The area where Confluence needs to develop is in the mature phase of wikis – content management.

This article series is an attempt to provide Atlassian some feedback about Confluence’s (lack of) content management tools. It it also meant to serve as a discussion platform for Confluence users around the world, swapping ideas and evolving the features discussed in the articles. Hopefully it will spur discussion within Atlassian as well and help them focus on what’s important for more mature instances. How do you make the most of your content as the wiki matures?

Each article will focus on one or two topics, this first one introducing the subject and discussing content refactoring.

Don’t get me wrong, I love Confluence

Confluence logo

Atlassian Confluence, the Enterprise Wiki

Atlassian Confluence is an awesome product. First of all, it’s a wiki. And wikis are the best invention since the Internet. Seriously. They enable collaboration independent of space and time, and even work pretty well for almost-real-time collaboration. Confluence is one of the most popular enterprise wikis, i.e. wikis that are specifically designed to fit into corporate use. It integrates well into other business applications and systems such as Microsoft Active Directory and Microsoft Office. In addition it has flexible rights management, enables hierarchical organization of pages and has a rich text editor that makes creating content fast and easy. That last bit is probably Confluence’s top selling argument, but it also works against Confluence in the long run.

You see, everyone loves creating content with Confluence. It’s just that no one loves managing that content. There’s a lot of praise about Confluence already, so I will only briefly reiterate some of the features that makes Confluence wonderful:

  • Content creation (Atlassian’s blog is full of praise by guest posters)
  • A flexible plugin framework with hundreds of great plugins, both free and commercial (some of which are featured in this article)
  • Ease of maintenance (Confluence is well documented and the architecture is rather simple)
  • Atlassian has, as they say, legendary support so solving problems is quick and (mostly) painless… as long as the solution is not a feature request (in which case solving it may not be possible in the near future or at all).

Some context

I’ll start by giving a bit of background so you know where these opinions are coming from. I’m a Confluence administrator in an IT service company of 160 employees, situated in Helsinki, Finland. About 100 of those are Confluence users. We use Confluence to document our clients’ infrastructure, processes, key personnel as well as our own systems and processes. We have client, team, process and project spaces; technology-specific KB spaces; and some miscellaneous spaces, about a hundred global spaces in total.

We’ve been using Confluence for two years now and after working for a year in on-site support I took responsibility for maintaining and developing the wiki. You can read more in an earlier post that describes my role in the company in more detail. Having solved tickets and worked with clients first-hand, I knew exactly why we needed a documentation system – we had none prior to Confluence. All our documentation was in Office documents scattered across a network drive. There was no search engine that indexed the contents of those documents, we had no tagging that could be utilized… Documentation was a mess.

Adoption was not easy but I had faith in the idea of the wiki and in Confluence, so I persevered and a year after the project began, we had thousands of pages of documentation and dozens of daily users already. After two years the wiki is still growing at steady pace, but even though we started working on content management tools and strategies almost a year ago, we are now starting to run into problems with the wealth of content in the wiki.

Why I am writing this

Everyone loves creating content with Confluence. It’s just that no one loves managing that content.

Here’s the problem: we have a lot of content. And a lot of new content is generated on a daily basis. Adding new content is almost too easy, thanks in part to the Doc Import function.

Here’s the skinny: 6000 pages are edited at a rate of 2500 edits per month by our 120 users. More detailed figures can be found in the table below.

Users Content Growth
We have 120 total users, out of which around 90 are daily users. The daily user breakdown is as follows:

  • 50 light users (1-4 pages edited each month)
  • 20 moderate users (5-9 pages edited each month)
  • 20 heavy users (10+ pages edited each month)
Our instance houses 12 000 distinct content objects and counting. Content breakdown by type:

  • 6500 pages (current versions)
  • 2100 data attachments (so excluding image attachments)
  • 1100 blog posts
Every month, on average, 700 new content objects are created. Content breakdown by type:

  • +300 new pages
  • +100 new comments
  • +100 new blog posts
  • +200 data attachments

Managing all this content in Confluence is by no means easy, and I administrate our wiki full-time. The problem will grow worse with time unless something is done about basic management tools.

I tried to find constructive criticism about these issues with Confluence on the web, but my search efforts were mostly in vain. There is discussion, but it’s not very structured. There are first impressions and tons of advice on getting started and generating participation, but I am more interested in more mature instances where content overabundance has started to become a problem. There is, of course, a plethora of issues logged at the Atlassian issue tracker but they, too, are unorganized and drawing any conclusions from the abundance of (currently over 6000 unresolved) issues would require countless hours of research and analysis.

So on to the current issues. We have been able to mitigate or work around some problems by implementing some sort of tool or process ourselves. Others have been addressed using a plugin, but the solution still leaves room for improvement. In this first article of the series, I’ll focus on content refactoring. Each topic will be introduced in a problem-solution-benefits-implementation format.

Problem 1: Content refactoring

This is certainly the area where Confluence has the most room for improvement. There are certain limitations that make content refactoring at times very burdensome – virtually impossible for an admin untrained in SQL (as some operations entail making direct changes to the database).

An organization changes as it grows. Products and processes change, organizational structure changes, projects are handed over to new people. In many of these situations, content needs to be refactored so that it remains useful in the new context. There are two refactoring-specific actions that are impossible as of Confluence 3.5 and earlier: moving blog posts and renaming space keys. These limitations have the following broad-ranging implications:

  1. Two spaces cannot be merged, keeping the blog posts in both spaces.
  2. A space cannot be split in two, moving part of the blog posts into the new space.
  3. Renaming a space along with the spacekey is not supported, possibly causing URLs to become inconsistent with the space name and role.
  4. A blog post cannot be moved to the correct space after accidentally posting it in the wrong space.
  5. A wiki-wide refactoring involving renaming spaces according to new naming conventions is not possible on spacekey/URL level, again causing inconsistencies between space names and URLs.

There is a workaround for the spacekey problem, but it is by no means simple or error-resistant. It involves exporting the space into XML, performing a few search & replace functions and importing the XML back into Confluence. After that the original space needs to be deleted. All links pointing to the old space have to be fixed by hand (or using the unsupported global search and replace macro, which seems to be incompatible with Confluence 3.2 or newer). Of course this presents a new problem… Since usage statistics are based on spacekeys and not space ID’s, usage statistics will effectively reset for that space.

In addition to these severe caveats there are a couple of minor issues that are more an annoyance since they can be worked around, but a more robust and user-friendly solution would be preferable:

  1. Space templates can be created manually using the Copy Space Plugin, but cannot be deployed flexibly (for instance using only certain parts of the page structure and leaving others out).
  2. Similarly, parts of a space template cannot be copied without using a scripting plugin (such as one branch of a space’s pagetree) thereby preventing fast and flexible user-generated content from templates.

Possible solutions

  1. Enable moving blog posts. Can this really be so difficult? Does this entail anything else than changing the space ID that the post is linked to in the database? (Discussion)
  2. Enable merging spaces. This would simply mean moving all content from one space to another, updating all links in the process.
  3. Enable renaming spacekeys live. No export + replace + import hassle. Change space structure so that spacekeys behave similarly to page titles – renaming a spacekey automatically updates links to said space. Which aspects of the architecture specifically prevent implementing this today? (Discussion)
  4. Enable copying pages with descendents. This is really needed often for certain page structure templates. (Discussion)
  5. Enable copying a space in part by choosing pagetree branches using checkboxes (similar to Space Export).


The benefits of this are rather case-specific, but on a general level you would gain the following benefits:

  1. Spaces can be centrally managed, including merging spaces and using one or more space templates for creating new spaces.
  2. Space naming conventions can be managed. Spacekey naming conventions can be implemented after the test / pilot / adoption period and revised from time to time.
  3. Blog posts become as flexible to manage as pages. They can be moved along with attachments and comments. Extremely useful when merging spaces.
  4. Space structure / pagetree templates can be managed. It is possible to create certain page structures for documentation, e.g. project templates, process templates, product templates, system templates… These templates can be deployed to any space instantly. Using the partial space copy several templates could be managed in one space, minimizing the overhead for space template management (one space for each template).

How it would work

The implementation for most of these is pretty straightforward, at least UI-wise.

  1. Moving blog posts should work like a simplified version of moving a page: choose a new space, and that’s it.
  2. Merging spaces naturally needs to address the page title problem. Two pages cannot have the same title in the same space, therefore some kind of conflict resolution for conflicting page titles needs to exist when merging two spaces. But I don’t see this as a major problem – make the user choose from two options:
    1. Manually resolve each conflict. This would ask the user about each conflict and enable renaming one of the pages to something else. Page titles could be compared before the question is asked so that the user can be told how many conflicts have to be resolved before making the choice.
    2. Automatically resolve conflicts by renaming conflicting page titles from the space that is being merged (ie. imported). A report could and should be shown after the operation, showing the conflicting titles that were renamed during the operation.
  3. I assume the problem with renaming spacekeys live today stems from updating all links pointing to that space in real-time. Is this perhaps resource intensive? Does it in fact take so long for larger spaces that some pages may be edited during the operation, preventing some links from being updated?
    1. Perhaps locked pages could be queued somehow, so that if the operation finds a link to be updated and the page is locked for editing, it would simply schedule the update of that link for later, when the page isn’t locked anymore.
    2. If the operation is very resource intensive and might cause Confluence to be sluggish or unusable during the operation, perhaps it could be scheduled for off-production hours (or, in a 24/7 usage model, a low-usage period). In the worst case, if Confluence is indeed rendered useless during such an intensive operation, perhaps a maintenance mode screen could be presented during the operation.
  4. Copying pages with descendents is already possible using the Confluence Command Line Interface, why not just implement the exact same functionality via the GUI? The operation itself is simple (equivalents to CCLI commands follow):
    1. ./confluence -a copyPage –space OLDSPACE –title PAGETITLE –newSpace NEWSPACE –parent NEWPARENTPAGE –descendents –copyAttachments –copyComments –copyLabels (basic ”copy with descendents” functionality, user simply chooses the new space and parent page)
    2. Additional GUI elements to customize the operation would be: –children (only direct page children, not all descendents), –newtitle (choose new title upon copying), –copyAttachments, –copyComments, –copyLabels (show checkboxes that user can check or uncheck to include or exclude page attachments, comments and labels).
    3. As a bonus feature not covered by the CCLI, a parameter –prefix, –suffix that inserts a desired prefix/suffix before/after the title of each copied page. This would make generating those project/product etc. templates so much easier.
  5. Finally, copying a space in part would work similarly to the space export in that the user would choose the parts of the space pagetree that need to be copied into the new space. Additionally, the user could specify to include blog posts (would require implementation of #1 first).

There is another problem closely related to content refactoring, namely the lack of proper taxonomy in Confluence. There are labels, which are an excellent tool for organizing and using content dynamically in different contexts. But the problem is managing or organizing labels.

Taxonomy and label management, however, is a large enough subject to cover in a blog post of its own – I will tackle that in the next article.