Documenting It: 2012

December 30, 2012

Ode to Olinks

The documentation set I work on uses olinks for just about all of the cross references. Their flexibility and the fact that they are not resolved until publication time make them powerful and ideal for modular documents.
Like xrefs they can use the DocBook gentext system to build the text for a link and like link they allow you to provide your own link text. Unlike either of them, they do not use ids that must be resolvable at validation time. In fact an unresolvable olink is not a validation error because they can be used to link to documents outside the scope of the current document such as another book in a product library or an article. The resolution of the targets to actual endpoints is left up to the publication system.
This flexibility, naturally, is both friend and foe. On the friend side, it makes writing modular documents easier by eliminating the validation a errors that crop up when trying to create a link between two XML documents that will ultimately end up as part of the same publication. This is one of the main reasons we started using olinks instead of xrefs. All of the writers found the constant validation errors distracting. Naturally this is only a problem if you are writing DocBook XML documents with a real XML editor. Teams that use text editors with some syntax highlighting and auto tag closing features will not be distracted by the validation errors. Of course, they also won't know a document is invalid until it blows up in publication.
The other strength of using olinks is that a team can architect a library that is more that just a loose collection of related books. The books in a library can have working links between each other. One book can reference a topic that is discussed in another book and provide the user with a functional link directly to the required information. This is possible without olinks, and I have worked on libraries that attempted it. However, without olinks, or some similar mechanism, deep links between books in a library is a bear to maintain and most resource constrained teams will not succeed. The argument can also be made that deep links between books in a library is not valuable. Given the difficulty of maintaining them using "standard" methods, the argument is correct. However, using olinks lowers the cost to the point that not using them is letting your readers down.
On the foe side of the equation, using olinks does add some overhead to your publication system and adds some extra steps in setting up a library. If you are using a publication system based on the DocBook XSLT style sheets, the added overhead is fairly minimal. Most of the work is already included in the base style sheets. You simply need to turn it on and add a preprocessing run to the system. The preprocessing run will calculate the target databases needed to resolve the targets to real endpoints. We currently have our system set up so that an author can skip this step when doing builds of a single book for test purposes. However, the preprocessing is done by default when building an individual book. When building the entire library, all of the books in the library are preprocessed before any output generation happens and that step cannot be skipped.
The added steps in building a library are not all exactly steps. Some of them are things that all of the writers working on the library must keep in mind and rules to which they must adhere. The first added step is determining the layout of the generated HTML site. This allows the system to calculate the relative paths between all of the books in the library. Part of the site mapping process includes creating book IDs for each book in the library. These IDs are used to identify the link targets when they are outside the scope of the current book.
Most of the remaining overhead involves sticking to a few simple rules. You shouldn't change any IDs once they are in use without making sure you change them throughout the entire set. You should rebuild the target databases whenever you make changes to a book. If you do make changes to IDs, rebuild all of the books in the library to make sure the link targets resolve. The most important thing is to watch for warning during book generation to ensure that all of the links resolve. Mostly, these are all just basic best practices is any writing job.
So there are lots of benefits for the minor costs. A responsible, well disciplined team of writers should be capable of using olinks without a problem if they have the need for linking between books or are doing modular documentation.

November 28, 2012

Reuse

When you have to maintain a sprawling documentation set, the ability to reuse content can be a lifesaver. It can also be a disaster. The wall staving off disaster is strategy, consistency, and discipline. Without a good strategy, reuse becomes a rat's nest. If the writers are not consistent in applying the strategy, reuse creates snarls. If the writers are not disciplined, reuse exacerbates the problems.
The first thing that needs be done in forming a good reuse strategy is defining what reuse means. One definition of reuse is that content modules are used in multiple published documents. For example, a standard warning message is placed in a module and then that module is imported into any document that requires it. Another common definition of reuse that content is cloned (copied) to other modules where it may be useful. For example, the standard warning is simply pasted into all of the modules that require it.
In my mind the first definition of reuse is the one that knowledge set maintainers should aspire. It truly leads to reduction in workload and possibility for error. Writers only need to maintain a single copy of a module and when that module is updated, all importing modules take advantage of the update. The second definition, in contrast, saves writers some amount of work up front since they do not have to originate any content, but increases the maintenance load on the backend. A change in one of the cloned sections requires the writer to not only update the original, but to hunt down all of the copies, determine if the change is appropriate, and then make the update. It is more work and far more error prone.
The idea of cloning content is not without merit and does have a place in a solid reuse strategy, but by itself is not a solid reuse strategy. Cloning is useful when the content in a module is a close, but not exact, fit. For example, two products may use a common log-in module and have a similar log-in procedure. However, the username/password requirements may be very different or one of the products may require an additional step. In this case, it may make sense to maintain two copies of the log-in procedure.
Cloning is also routinely used to perform versioning. Each product library I work on has at least three versions. Each of these versions is a clone of the other versions. The entire collection of modules is cloned into a single version such that module A and module B will share the same version in an instance of a library instance. Trying to make an update to multiple version of a library will highlight the issues with cloning as a primary reuse strategy.
So, if cloning is not a useful primary reuse strategy what is? Reuse is complex and any strategy will require many tactics:
* constraining writing to make all of the content stylistically uniform
* wise use of variables
* wise use of conditional text
* sensible chunking rules
* open communication channels
* sensible cloning rules
* scoping rules
* clear versioning policies for shared content collections
Using variables and conditional text make it easier to share modules between products or in places where there are minor variations required in the content. They are useful for places where a product name changes or when two products use different version schemes. Conditional text can allow for slight variances in procedures. Variables and conditional text can have pitfalls as well. They can hinder translation and can get convoluted and hard to manage. When a module becomes too heavy with conditionals and variables, it might be a good idea to consider cloning.
One of the most important parts of a reuse strategy is the size of the modules allowed. They must be fine grained enough to maximize reuse. For example, a book is not a great a great module size since a books reusability is limited to one per library. The modules need to be large grained enough to maintain. For example, a phrase, or even a sentence, does not make a great content module because there would simply be too many of them to manage. I generally think that the DocBook section element is a good module delimiter. Sections are fine grained enough to be shared in multiple places in a library or set of libraries and rough grained enough to hold a useful amount of information. In case by case instances, tables, lists, admonitions, and examples also make good modules.
In situations where you are only dealing with one product library a strict versioning policy may not be critical. All of the modules will ostensibly share the same version for an entire library. However, if you are working in an environment where products share large components, it makes sense to have a strict and well understood policy in place about how component and product versions work. We currently have two products that share a number of common components and the products can at any one time be using different versions of the components. To handle this we version the documentation for each component independently of the products in which they are used. Each product imports the required version of the component sets and when a release is built tags are added to the component libraries to mark the product revision. This allows us to make on going updates to all of the content with reasonable assurance that we won't accidentally cross contaminate a product library. It does, however, add administrative overhead and require some extra caution on the part of the writers.
There is not a one-size-fits-all answer for how to implement these things. Every team has slightly different requirements, slightly different content, and slightly different capabilities. If your requirements put ease of translation over ease of reuse you will choose a different set or parameters. If your team is made up of newbies, you will choose a less strict set of parameters. Your tools will also alter the parameters you choose. The trick is to choose wisely and honestly.
Once you have chosen stick to the plan and look for ways to improve the process. Just know that once you have chosen, changing your mind becomes harder and harder. Reuse creates tangles and dependencies that are not easily unwound.

November 2, 2012

Attribution and Provenance

I was recently involved in a discussion about reuse and one of the recurring issues was that writers didn't want other writers modifying their modules without being consulted.
In a distributed, multi-writer team there is always the chance that two writers will make a change to the same module. When reuse is added into the mix, there is the problem that one writers changes are incompatible with at least one use of the module and the only real solution to the problem is to branch the module. It is naive to think that everyone will always do the right thing and so there is a requirement to be able to track changes and have writers names attached to changes. This requirement makes it possible to easily rollback mistakes and to hold writers accountable such that mistakes are less likely to happen. Attaching a writer's name to a change also makes it easier to coordinate future changes because the next writer to come along can see who has been working on a module and coordinate with them to decide if updates require a branch or not.
Attaching a writers name to a module's change log was not an issue for this group, partly because they are working in a system that really doesn't support branching or any robust change tracking mechanism, but mostly because they were more hung up on the fact that another writer can change their modules. It was an issue of ownership that is exacerbated by a system that lists all of the writers that ever contributed to a book as an author. Much of the discussions about how to manage the issue of modifying reused topics focused around how manage the ownership issue and devolved into a discussion about how to keep track of the authors of a module.
This is unproductive. In order for reuse to work, in fact for modular writing to be effective at all, the concept of ownership needs to be extended to all of the modules that make up the content base. No one writer can own a content module. Technical writers in a group project, regardless of if the group is a corporate writing team or an open source project, cannot, if they want to create good content in an efficient manner, retain the ownership of any one piece of the whole. Ownership of pieces can be destructive because it makes writers reluctant to make changes to modules they don't own, creates situations where writers are upset when a change is made to a module they own, and fosters an environment where writers focus on making their modules great instead of making the whole project great. In the end technical writers working on a team are not authors; they are contributors. Authors are entities that publish complete works that are intended for standalone consumption.
I know writers generally don't like to hear that they are not authors. I know that I don't. I like to get credit for my work and see my byline. I worked as a reporter for several years and I write several blogs. In both cases, I am an author and own the content. In both cases, I produce complete works that are intended for standalone publication and consumption. As a reporter, I did work on articles with other reporters and how the byline, and hence ownership of the work, was determined depended largely on how much each reporter contributed. If it was a two person effort and both split the work equally, the byline was shared. In teams bigger than two, typically, at least one of the reporters was relegated to contributor.
However, I also work as a technical writer and contributor to a number of open source projects. In both cases, I write content that is published and in which I take pride. The difference is that they are large group efforts of which my contributions are only a part (sometimes a majority part, sometimes a tiny part). Publicly, I cannot claim authorship for the content produced by the efforts. There is little way to distinguish my contributions from the others and attempting to do so does not benefit the reader. Do I get credit for this work? Within the projects I do because all of the change tracking systems associate my changes with my name. I do not make contributions to a project that requires personal attribution for my contributions, nor do I make contributions that prohibit derivative works. Both feel detrimental to the purpose of the projects. How can one make updates if no derivatives are allowed on a content module? Most of the efforts do use licenses that restrict redistribution and derivative works, but these are for the entire body of work.
There is the issue of provenance in environments that accept outside contributions or produce works that are an amalgam of several projects. This is largely a CYA legal issue, but it is a big issue. Fortunately, it is a problem with several working solutions. The open source communities have all developed ways of managing provenance as have any company that ships functionality implemented by a third party. One of the most effective ways of managing the issue of provenance is to limit the types of licenses under which your project is allowed to accept.
Personally, I would restrict direct contributions to a single license that doesn't require direct attribution of the contributor and allows derivative works. Ideally, contributions should be only be accepted if the contributor agrees to hand over rights to the project which eliminates all of the issues.
For indirect contributions, the issue is a little more thorny. You want to maximize the resources available to your project while minimizing your exposure to legal troubles and unwanted viral license terms. For example, Apache doesn't allow projects to use GPL licensed code because it is too viral. However, they do allow the use of LGPL binaries since they don't infect the rest of the project. This also means knowing what is and isn't allowed by the licenses which which you want to work. For example, if your I project wants to use a work that requires attribution and doesn't allow derivative works, you need to have a policy in place about how you redistribute the work, like only distribute the PDFs generated by the author.
Tracking provenance need not be hard. For direct contributions, you just need to ensure that all contributors accept the terms required for contribution and that is that. For indirect contributions, they should be handled like third party dependencies and have the license terms associated directly with the third-party in a separate database. They only need to be consulted when the project is being prepped for a release to ensure that legal obligations are being met.
The take away:
* the concept of ownership is destructive and counter productive to large group projects
* provenance is an issue, but not a problem if properly scoped

October 19, 2012

Topics and the Death of Technical Writing

Like the link bait title? Well stay and enjoy the post....

I find topic-based writing evangelists to be an annoying scourge to the technical writing profession. That is not to say that I am a narrative apologist. I just find the narrative crowd to be more reasonable than the topic crowd. Maybe it is that the narrative crowd has accepted their defeat and the the topic crowd are still fighting the war. I think is that the narrative crowd know that on the fundamentals they are right and are willing to adopt new thinking to improve the profession while the topi crowd knows that on the fundamentals they are weak and cannot tolerate challenges to the dogma.
I fall somewhere in the middle of the two crowds. I believe that the narrative crowd knows what it takes to make technical documentation that is useful for all ranges of users. Knowledge transmission depends on context and order. Many things need to be done in a set order. Knowledge builds on prior knowledge. The larger organizational structures-books, sets, help systems, libraries-we use to present that information need to reflect these truths. In a lot of cases the building blocks must also bend to these truths and provide hints to the larger structure. It also true that there will be some amount of bridging, or glue, content that will need to created to provide context and flow through the material.
I also think that the idea of breaking technical documentation into atomic chunks that can be reused and distributed to a number of writers is a good practice. There is no reason that a section in a book cannot be written as a standalone piece of content that is eligible for reuse where appropriate. In the case of online help, each of the pages in the help should be as independent as possible. Even in books that are presented in HTML should have pages that can intelligibly standalone. In these cases, there is a high probability that the pages will be viewed out of order and without the supporting context.
I think the strengths of narrative and topic can be, and have been, merged into successful systems. I like to call it modular writing and it is not an original idea. It doesn't require any fancy tools or even a special writing tag set like DITA or DocBook. All it takes is a little discipline and agreed to conventions. To make it work, you simply agree to what constitutes a content module. For example, the team I work with uses the section element. You also agree to some basic ground rules for the boundaries of a content module. We have a loose definition that a module is the smallest set of content that can usefully standalone. This does not mean that it is just a concept or a task. In a lot of cases, what the topic crowd deems a concept or a task does not constitute a set of information that can usefully standalone because a free floating task may require some context to make it useful instead of just simply a set of steps a monkey can perform.
Unlike topic-based system which disdains bridge content, leaves space for bridge content which can, if it makes sense, be included in a module, or it can be put in a separate module. Our system allows it to be placed in the map outside the scope of any module. We use DocBook, so bridge material can be placed in a chapter before any sections or in a section before any sub-sections.
So where topic evangelists proclaim that bridge content is wasteful and that context is a quaint idea clung to by dinosaurs, I proclaim that they have a very useful place in writing. I also proclaim that they are not always required or even desirous. These things are important tools to help guide novices and provide background for experienced users. They aren't useful to a ninja user who just needs a refresher on navigating a UI or remembering the syntax for an API.
Ditching them impoverishes the overall content and makes the content little more than a collection of tiny factlets that are strung together by a gossamer strand. It means that you can hire a bunch of low skill writers to gather the factlets and beat them into the required form while some "information architect" assembles the pieces into what at first glance looks like a sufficient documentation set. It is the assembly line model. Unfortunately assembly line documentation often results in things like your typical auto manual or coffee maker instructions.

July 11, 2012

SVG in DocBook

One of the most difficult things about multi-format publishing is images. Images that are scaled to look good in HTML usually look pixelated when scaled for PDF. Images that are scaled for PDF or print are usually too small for the Web.
DocBook will allow you to specify different image files for each output format, but that presents its own set of problems. While disks pace is cheap, it is not free. There is also the real danger of the images getting out of sync.
One solution is scalable vector graphics. They are infinitely scalable and are tiny. The problem is getting them to render. Fortunately most modern browsers support SVGs. Unless you need to support users that are stuck on IE 6, HTML shouldn't be a problem. PDF requires a processor that supports SVG rendering. We use XEP by Render X and it does an excellent job supporting SVG.
I had always wanted to play with SVGs and see if our publication tools could handle it. Somehow, the pressures of getting content written and updating our templates trumped doing it. Thankfully, Fintan Bolton stepped up and tried it out for some complex graphics he wanted to use in documenting the advanced security features in CXF.
Our DocBook XSLT tool chain handled SVGs with only minor modifications and they look fabulous. Fintan did discover a few gotchas along the way though:

XEP is extremely slow at processing SVG files, because it has to download the SVG DTD for every image it processes. The solution was to add the DTD to our local catalog. However, it turns out that configuring XEP to use the XML Catalog is tricky. Fintan found this post, which explains how to do it: http://tech.groups.yahoo.com/group/dita-users/message/16362

When creating your image, be conservative in your choice of fonts. If the XEP processor needs to do font substitution, it can result in very badly spaced type. So far, I have had no problems using common fonts like Arial, Courier New, and Helvetica New (actually, XEP substitutes Helvetica New --> Helvetica, but that substitution works well).

Do not set format="SVG" in the DocBook imagedata element. This results in HTML mapping to the HTML

Documenting It

December 30, 2012

Ode to Olinks

November 28, 2012

Reuse

November 2, 2012

Attribution and Provenance

October 19, 2012

Topics and the Death of Technical Writing

July 11, 2012

SVG in DocBook

May 17, 2012

New Content and a New Format

May 10, 2012

We Only Notice When It Is Bad

April 11, 2012

Fuse Enterprise and EPubs

March 2, 2012

Keeping Track of Everything

February 21, 2012

Content Management

February 1, 2012

Difficult Things

January 12, 2012

FuseSource has Documentation

January 5, 2012

The Value of Documentation

Popular Posts