20 July 2011

What would the killer documentation tool do?

I was inspired to write this in response to a passing comment in the current issue of Southern Communicator where Emily Cotlier mentioned that some delegates as the Writers UA Conference bemoaned the absence of open source  Help Authoring Tools (HATs).

Why aren’t there any solutions that support the end-to-end content life cycle content without the need to roll out proprietary software on every desktop? This issue has puzzled me for over a decade too. But going further, I’d like to see an open source solution for the real-world needs of business and technical documentation, not just the contrived "tech-writer world" of HATs. I’ve written in the ASTC LinkedIn group what I think of HATs as a solution to real-world information management: lame.

Here's what I think we the world really needs: systems that enable technical and business folks to easily create content in templates designed by professionals, using a familiar interface, and within a framework that supports re-use without manual slicing and dicing. And it has to track unlimited versions (with review comments) at the paragraph level.

Commercial products like Author-it, RoboHelp and Flare can go part way towards this objective, assuming that a company is willing to splash out for the requisite proprietary software on every desktop, then re-train every employee to use it. Meanwhile, back in the real world, folks go about their business creating content with word processors, just as they have always done and will do for the foreseeable future. This actually creates a discontinuity when such content has to be exchanged with single-source products. Sure, you can import word processor files, but what happens when the various authors and reviewers make changes to it? You have to import it again, and to the best of my knowledge there is no single-sourcing product that can merge the new version with the existing version, nor retain review comments.

Yet this is how people have always processed corporate and technical documents. I can think of two reasons for this mis-fit of problem and solution: (a) it is not in their commercial interest to support it, because they want you to buy their tool for every desktop, including non-professional writers (b) they don’t have sufficient process understanding to recognise the need for it. It does seem that mostly HATs and other forms of content management get designed by folks who have at best an immature understanding of information lifecycles.

I made such a solution in Lotus Notes back in the 90s. It kept all versions of all topics, along with review comments. From a practical point of view, the Notes word processor stinks (think Notepad). And it has not improved since that time, either. As soon as you decide to use a real work processor, the ability to retain multiple versions and review comments is lost. But within that Notes-only project, it was a viable solution. You’d think a more generalised version would be feasible now, using an open source platform.

If only there were open source HATs, perhaps people who truly understand the processes could get involved and make them fit the real world. Actually they would not be HATs because “help” would be an optional form of content in them, along with web site content, learning material (including tests), presentations, artwork, social media feeds, and so on.

Let’s face it, current HATs are online help authoring tools that have had a few other “easy” formats bolted on. Even the new products have just brushup up the stale metaphore.

Many organisations have gone down the path of wikis. In reality they are a weak solution to the problem and I think the main drivers are (a) there’s not much else on offer (b) they are open source (c) they are chosen by developers, who think documentation is pretty much the same thing as source code.

This last point is probably a major influence on the design of wikis, too. Mature markup languages already existing before wikis were ever invented. OK, the designers didn’t like the idea of embedding complex tags, but unfortunately they also lost the subtlety and versatility provided by HMTL and XML and didn't replace it with anythying else. In the process they have locked wikis out of the types of content and functionality that I’ve mentioned previously.

Question: why does the tagging have to be embededed in the text at all? The text can easily be stored in a database, thus removing the need to contain any tags.

Evangelist sometimes tell me that XML is the answer because XML can do anything. My impression is that XML can’t do anything by itself and the environments in which it is managed seem to be absurdly complicated for making the same output as a word processor. I once had to make some simple changes to a DocBook template – things like headers and footers, title page, table of contents, styles and so on. It became a forensic investigation, trying to figure out where each of these things was actually defined. Mostly I ended up modifying XSLT files, which is not my idea of a productive design environment.

What I found lacking from the XML world was pretty much everything that I expect to be available in a document design and management system. XML was chosen by the developers because it was a good fit with their programming source control.

The path of least resistance is document management. Systems that suck all manner of content, including word processor documents, into a repository. Usually they provide workflow, collaboration and distribution, all of which are important. SharePoint is popular for smaller organisations, probably because of ease of installation and maintenance (disclosure: I have SharePoint 2003). SharePoint actually has a nice feature that seems to be lacking in open source solutions: it has limited awareness of document content. Unfortunately this is restricted to Word’s document properties (in my version, anyway).

Imagine the possibilities if the document management system was aware of all of the document’s content. I don’t just mean a dopey full text search: I’m thinking about a system that is aware of the structure and tagging of all the content. Suddenly you have the possibility of content re-use without destroying the source document. Even better, the source document is something that anybody in the organisation can create and edit, using the software they are familiar with.

The key to this is the ability to “round trip” content. That is, import the collection of paragraphs representing a document, do what you want with them in the content management (including re-use to different media and document types), then reproduce the original document and let the reviewers do their thing again. When second and subsequent versions get imported, each paragraph needs to be tagged as a version of its predecessor. This is where products like RoboHelp and Author-it lose the plot. You just can’t do that, because content management does not communicate with third-party content creation, so you don’t know if a paragraph is entirely new, or just a new version. Flare gets around this by providing a "lite" version of it's editor that you can install on every desktop, but this is just a variation of the HAT vendor trying to colonise every desktop.

Tagging of paragraphs is the key. Each one needs a globally unique ID that stays with it for life, and is traceable. This means the word processor has to allow “foreign” content in its content. Perhaps surprisingly, Word supports this but OpenOffice does not. You can easily test this by writing some custom content directly into the respective XML content files, then opening and saving. Word keeps the custom content, but OpenOffice does not.

I keep asking vendors if their tool respects custom content and get the same answer: "it validates against a DTD". In other words "no".

Will we ever see a “perfect” content management system that does it all? Programmers have given it their best shot and fallen a long way short of the mark. I fear it won’t happen until some savvy big-picture technical communicators get involved early in the design process. Which is to say, probably never.