Skip to content

Managing content in Confluence: Taxonomy

This article is part of a series that focuses on the difficulty of managing content in Confluence and offers improvement suggestions through constructive criticism. In this installment I will address one of the biggest problems in Confluence: utilizing labels efficiently to organize information.

When you manage content, you need some sort of plan. Wikis are not inherently optimized (or even meant for) structured, hierarchical information, but why should they not support the creation of a hierarchy? Why not even several different hierarchies, if they are needed?

Confluence does have the ability to arrange pages in a hierarchical fashion. With plugins, it is even possible to arrange spaces in logical hierarchies. This facilitates the grouping of information together, but pages can only have one parent and one space. They cannot belong to several parents or spaces, so how do we group pages dynamically?

Labels: Yes :-)

Enter labels. Confluence features labels as a way to tag pages and spaces with keywords which are then used to dynamically group data. What is the problem then? It is that labels themselves cannot be organized. Not even the simplest label refactoring tools come bundled with Confluence. Here’s what’s possible. You can:

Label cloud

Label cloud in Confluence. What’s the use of these things anyway?

  1. Label pages and blogs individually
  2. List pages by label
  3. Show label clouds for a specific space or globally
  4. List all the labels in use in a specific space or globally
  5. Show the related labels for the current page

Out of all these features, only features 1 and 2 are useful. There are few plugins to assist in this endeavor, some of which are maintained and others not. These are:

  • The Label Management Plugin, only developed for a short while and then forgotten. We have contributed to the plugin community by keeping this one up to date and working with the latest versions of Confluence. But it is highly limited in functionality and does not integrate well at all with the rest of Confluence. It is difficult to use and cannot be customized except by editing the source directly – it has no control panels or settings of any sort.
  • The Adaptavist Label Tools Plugin, which merely features macros to add specific labels to pages simply by selecting them from a list (which cannot be centrally managed) or automatically adding them (through some sort of workflow for instance).
  • The Adaptavist Synonym Plugin, which enables the creation of synonyms, although the functionality here is again rather limited. It merely enables searching of pages by a label and all its synonyms, such as printer and printers and pritners if a word is often misspelled. But what about related labels such as inkjet-printers and laser-printers (child labels) or peripherals (parent label)?

Honestly none of these brings us very far in actual label management. What we need is a comprehensive way to view and reorganize labels both globally and on a per-space basis.

Management: No :-(

Here’s a list of things that Confluence – even with all its plugin arsenal – cannot do:

Label Manager from the Label Management Plugin

The Label Management Plugin. Ugly as hell, hasn’t been updated since it was created in 2006. Can’t be themed either, since all styles are hard-coded into the plugin source…

  1. Rename a label.
  2. Combine or merge two or more labels into one.
  3. Define relationships between labels. This renders the following features impossible:
    1. Synonyms (with the option to maintain consistent labeling automatically, meaning incorrect labels are substituted by correct ones)
    2. Label categories, families or groups (the label printers contains child labels laser-printers, inkjet-printers, matrix-printers and is a child of the label peripherals)
    3. Searching or filtering content by label category
  4. Describe a label in detail
  5. Maintain label inheritance.
  6. Incidentally, labeling attachments is also impossible, which makes attachments a whole lot harder to find sometimes. Update: Atlassian has implemented attachment labels in Confluence 4.2.

Let me break these points down further to explain why these features are necessary.

Renaming labels is pretty basic. It means that if a definition, project name, team name – or any other aspect of content that can be assigned to a page using a label – changes, the label cannot simply be replaced with a new name. In order to achieve this, each page that has the label must be relabeled manually, one page at a time. Remove the old label, add the new one. Can it be that hard to change one field in one table in the database? For the value it would bring to keeping labels current and accurate, it is almost criminal that it hasn’t been implemented yet.

Merging labels is also pretty basic, although it does involve a more complicated operation under the hood. But it’s still by no means nuclear physics – merely a few more queries in the database to replace many labels with one. Let’s say I notice that half of my users are labeling knowledge base articles related to Windows 7 with win7 and the other half with windows7. Obviously these two are the same label, so why have both? It’s unnecessary clutter in my label taxonomy and makes finding and grouping content more difficult. So I want to merge the two – just replace the longer windows7 with the already existing win7.

Once again I have to find every page with the longer label and remove and add labels one by one until I have replaced every instance of windows7. This can be scripted using the Confluence Command Line Interface, but that usually is not an option for a content manager – most content managers are simply not that technical in nature and do not possess the required skills. It should be doable through the GUI. Type or find the label to replace, then type or find the label to replace it with. Then again perhaps I want to ensure that win7 stays as the definitive Windows 7 label. That brings me to the next point – Synonyms.

Synonyms are needed when several different terms with the same meaning need to be defined so a user can use the correct term – or, better yet, have the system automatically convert the synonym to the defined root term. It is also possible to typo a label name, but thanks to Confluence’s label autocompletion, that doesn’t happen too often. Still, it is good if those situations, too, can be anticipated and dealt with in advance.

Label groups are necessary to make semantic categories for labels. It is paramount in designing a space to define certain terms that the space topic revolves around. But how are those terms related to each other? A page hierarchy provides only a singular way to organize information, but oftentimes multiple relations need to be drawn between content.

Confluence Page Tree

The Confluence Page Tree makes navigating content hierarchically possible. Useful, eh?

Both of the above features could be utilized in searching and filtering content by label. How can I look for content in a group of labels? What about a label group and all its subgroups? Could I browse content by label, by label group, perhaps navigate a label tree reminiscent of the page tree?

Label descriptions would simply give more context to each label. For which purpose was the label created? What did the label creator have in mind when the label was created? Can it be used in multiple contexts? This metadata is important so the label always has a purpose for existing.

Label inheritance is useful because labels are used to build RSS feeds and recently-updated feeds of content in a group of pages. It reduces space clutter because every topic doesn’t have to have its own space so it can be tracked easily. Sure, for this purpose there is the Descendant Notifications plugin, but this would still be a nice addition to the label management tool suite. The Label Management Plugin does make inheriting labels to child pages possible but it does not maintain them: it only slaps the labels on upon page creation. But if a page already exists and is moved, the labels are not updated accordingly. This unfortunately greatly diminishes the value of the plugin because you can not rely on the inheritance rules to work in every situation.

Finally, attachment labeling is needed simply so that attachments are easier to find. Office attachments, for instance, have metadata which could be used to automatically label attachments upon uploading them. But that information is not used. Why? Update: Atlassian has implemented attachment labels in Confluence 4.2.

How to get labels under control

The answer to my questions above is rather straightforward: Expand the current labeling system and create some tools to manage labels. Enable parent-child relations, synonyms and inheritance. Create tools for viewing, browsing, replacing and merging labels by space and globally. Sounds simple enough – but I know it is never simple to make architectural changes to a mature system that is already in use by almost 11 000 organizations worldwide. Still – the implications of having or not having taxonomy management are major. The bigger the database is, the more important these tools become.

I will discuss one solution to a feature I mentioned above in more detail, namely the label grouping function.

The way I see the groups used apart from the ways I already described – viewing, arranging, searching and filtering content – is labeling new content. Often the user does not fully understand the context into which she is creating the new data. I have already suggested one way for Confluence to suggest related content upon creating new content, but this is another opportunity to take advantage of space taxonomies.

When creating a new page or even editing an existing one, the user has the option to add labels using the label editor. Confluence suggests existing labels that you have used recently and the most popular labels in the space.

Confluence label editor

The current Confluence label editor suggests recent and popular labels, but does not enable browsing them in semantic groups or hierarchies

But what if the label editor let the user:

Label editor UI mockup

A primitive UI mockup to illustrate how labels could be grouped semantically. As you can see, drawing is not my strongest suit.

  • Search existing labels to choose from (this is currently approximated with the label autocompletion feature)
  • Browse the space taxonomy or view a hierarchical label map to choose labels from (simply click on the labels to add them)
  • Optionally browse the global taxonomy or view a hierarchical global label map to choose labels from

These could be shown in a drop-down dialog (that pushes content on the screen down) or pop-up dialog (similar to the Restrictions, Linking, Image browser and other pop-up dialogs) that is closed after labels are entered. This way the user would have a better idea which context(s) she is working in and which labels she can choose from. If no suitable labels exist, new ones can always be added to the pool (and perhaps merged or reorganized later).

What’s the use?

The ability to define parent and child labels enables creating a space taxonomy and even a global taxonomy. By looking at the space taxonomy, you will know what kind of content exists in the space, and can often derive new relations between the content in the space. When a refactoring is necessary, say when a team is split up or two teams are joined together, they can combine or split their taxonomies to better suit the new working paradigm.

  • Users have another way of locating relevant content using global and space taxonomies. They are also better informed about the information structure of existing content when creating and modifying content.
  • Space Admins can employ taxonomy to gain deeper insight into the content in their spaces. They are then better equipped to refactor content when necessary.
  • The information architect, content strategist or system administrator can draw new parallels between content when more complex relations can be defined and managed. This aids content strategy as a whole, as content owners stay better aware of the content in their domain.

In short, organizing content dynamically in times of transition becomes much easier with proper tools to analyze, regroup, rename, merge, refactor and then re-analyze your content. I propose that no single improvement would facilitate content management more than this suite of tools collectively called taxonomy management.