Documenting It

March 15, 2013

Map Existing Structures Instead of Using the Three Topic Types

It is not that I don't like the kernel that germinated topics. I do like the idea of breaking big ideas into smaller, more manageable, and reusable chunks. It is one of the cornerstones of good writing.
What I don't like is the reduction of all things into three containers that are both too restrictive and not specific enough. Tasks, for example, cannot contain any conceptual information despite the fact that for most complex actions a reader will need some conceptual information to ground the task and explain its purpose in the larger scheme. Also, given the context free nature of topics, a task cannot depend on any other tasks despite the fact that many tasks are meta-tasks where each step in the task is itself another task.
One way to solve the need for adding context to a task is to redefine task to include an overview block that allows for conceptual information. Another way is to define a concept type that, by definition, precedes a task to provide the required context. Both cases create a more specific, and more useful, architecture for writing.
Similarly, to solve the meta-task issue one could define a new task type that allows dependencies on other tasks. This type, called a procedure, doesn't need to have hard dependencies; it could allow for output generation without inclusion of the sub-tasks. However, it would make it harder to ignore the need for the sub-tasks.
It is not that information architects are not free to make new content types; it is that most don't. They have their three types and try to force everything into them. They ignore the fact that an existing information set will have organically developed topic types that make sense for it. In most instances the argument is that the starting point set was narrative and therefore flawed. It needs to be tamed into the three canonical types for its own good.
The mistake here is that by assuming the new model is better, they lose the native intelligence in the existing structure. They assume it has none and impose it. Unfortunately, this approach typically results in more work and no net increase in the value of the information.
A far better approach is to analyze the structures used through out the existing set and attempt to build types, based on the canonical types, into which the old structure map. This requires some upfront work, but makes the move into topics, or modules, smoother. It also retains the knowledge encoded into the existing architecture. It has grown up as a reflection of the needs of the information, the needs of the consumers, and the needs of the authors. Hopefully, the standardization of the existing structures will result in a net increase in value because it will smooth out the bumps in the existing set instead of chopping it up. It will also give the authors more investment in the task of migration and more able to spot places where it can be improved.
The other benefit of remembering that the structure of existing sets has value, is that it sparks an iterative process. The architecture can be modified as needed. New types can be introduced; old types can be refined or removed.

January 21, 2013

Just a Bunch of Books

One of the things that have been occupying my brain lately is the differences between thinking of a product library as a bunch of books versus thinking of it as a bunch of knowledge modules. In both models a library will have something called books that are used to organize the content because book is such a well understood concept for organizing written text. Readers expect to see a list of books that will contain smaller divisions called chapters. They understand how to navigate inside that abstraction.
The differences between a bunch of books and a bunch of knowledge modules is mostly a production concern. It will have an impact on the reader's experience since the resulting library can be very different, but it is not something a reader will need to have knowledge of to work with the published content. While I am an advocate of the module centric approach because I think it provides more flexibility for the production side and the potential for a richer experience for the reader, I do not believe that a library constructed using a book centric approach cannot have the same richness as a module centric one.
The major difference between the two approaches is how content is chunked into buckets. In a book centric model, the book is the primary chunk-level. Every smaller block used to flesh out a book is done so with a view to building a single entity. This may or may not lead to what is currently derided as the narrative style of writing where there is a flow from one small block to the next and each block is contextually dependent on the other small blocks in the book. It does mean that the possible small blocks are predetermined by the predetermined set of books. So library design goes something like:

* What users will our product have?

* What set of high level knowledge will they need to work with the product?

* What set of books should we create to cover the knowledge requirements of the users?

* Create a set of books

* For each book, determine what specific knowledge will the users need/expect?

* For each book, create the content to satisfy the user.

In a module centric model, the basic chunk level is much smaller. It would typically be at a level where each module contains a digestible block of knowledge that a user will find useful. It is an intentionally vague definition since I believe that for any given project, the module is best determined by the writers working on the project. This model can be used to create things that feel narrative since a module can contain content that bridges between modules or provides context that glues modules together. It doesn't, however, predetermine the set of possible knowledge modules around a set of big chunks. The books can be decided on late in the game as the content builds up and the way to organize them because clearer. Library design goes like this:

* Who will use the product?

* What will the users want to do with the product?

* What specific tasks will the users need to do to accomplish their goals?

* What knowledge will the users need to accomplish these tasks?

* What knowledge modules does this map to?

* Create the knowledge modules.

* What modules need to grouped to illuminate a task?

* Create collections that map into book-like structures.

* What glue is needed to hold the collections together?

* Create the glue.

The books are created after most of the content is written. This gives you some added agility in creating the library because you can modify the organization as new information arrives. I also gives you flexibility in terms of reusing information. There may be modules that go in more than one book and you can simply include the module without cloning or you can clone it if it makes more sense.
A collection-centric model does change the way writers work and does require some extra-discipline. Instead of working on a book and not needing to worry about what other writers are doing, writers in this model work on a set of modules and must consider how their work fits into the whole. For example, instead of writing a security guide, a writer might write all of the security related modules. Those modules may all be built up into a security guide, but a few may also be used in other books. Writers need to communicate with each other more to coordinate updates to shared modules.

January 3, 2013

Translation or Internationalization

A while back someone passed around a quote from the FireFox team that said something like "We should strive to ensure that every user, no matter what language they speak, can have a consistent experience with our product." The reason for sending it around was to prod the writing team to strive for the same thing.
I generally agree with the sentiment of the statement. Language shouldn't be a barrier to accessing knowledge or a software product. Documentation and user interfaces should be usable regardless of a person's native language. I won't attempt to argue what subset of languages are useful because that is purely a business/resourcing issue.
The statement got me thinking about the differences between making a UI available in multiple languages and making documentation available in multiple languages. I often hear translation used to describe both efforts, but I think that papers over a lot differences.
UI are not so much translated as they are internationalized. In general, making a UI available in a second language involves translating all of the labels and warning messages into a second, or third, language. So there is some translation being done, but it is fairly simple stuff. Most of the labels and warning messages are single words or short direct statements. It takes skill to be sure, but it is a pretty straight forward task.
Documentation really does need to be translated. In general, documentation requires more than a simple parsing of labels and direct statements into a second language. Yes, there are plenty of instances where documentation is little more than steps and reference tables which are just labels and short direct statements, but that is pretty low hanging fruit. I would also argue that simply because steps are short and direct that because they are part of a larger whole, they really should be treated as more than strings that can be changed without consideration of the context. With documentation, because it is a dense collection of language, you really need to consider the whole body of the work and translate it into a new language. This may mean rewriting parts of the content to be more understandable to speakers of the second language. For example, cultural references always sneak into content because they can help explain complex ideas. There are also structures like glossaries that don't always have direct mapping into the second language.
I have seen strategies for translation that attempt to stream line the process by treating the text like a collection of strings. It seems to the that while it may grease the wheels a little, it cannot produce truly good quality content. The systems all place a number of restrictions on the content originator to make sure the strings can be easily translated. Often it seems you end up with something that is mediocre in multiple languages, but is done quickly efficiently.
Wouldn't it be better to create great content in one language and then, if required, have full translations done. It may not be as efficient, but it will probably make for happier readers.

December 30, 2012

Ode to Olinks

The documentation set I work on uses olinks for just about all of the cross references. Their flexibility and the fact that they are not resolved until publication time make them powerful and ideal for modular documents.
Like xrefs they can use the DocBook gentext system to build the text for a link and like link they allow you to provide your own link text. Unlike either of them, they do not use ids that must be resolvable at validation time. In fact an unresolvable olink is not a validation error because they can be used to link to documents outside the scope of the current document such as another book in a product library or an article. The resolution of the targets to actual endpoints is left up to the publication system.
This flexibility, naturally, is both friend and foe. On the friend side, it makes writing modular documents easier by eliminating the validation a errors that crop up when trying to create a link between two XML documents that will ultimately end up as part of the same publication. This is one of the main reasons we started using olinks instead of xrefs. All of the writers found the constant validation errors distracting. Naturally this is only a problem if you are writing DocBook XML documents with a real XML editor. Teams that use text editors with some syntax highlighting and auto tag closing features will not be distracted by the validation errors. Of course, they also won't know a document is invalid until it blows up in publication.
The other strength of using olinks is that a team can architect a library that is more that just a loose collection of related books. The books in a library can have working links between each other. One book can reference a topic that is discussed in another book and provide the user with a functional link directly to the required information. This is possible without olinks, and I have worked on libraries that attempted it. However, without olinks, or some similar mechanism, deep links between books in a library is a bear to maintain and most resource constrained teams will not succeed. The argument can also be made that deep links between books in a library is not valuable. Given the difficulty of maintaining them using "standard" methods, the argument is correct. However, using olinks lowers the cost to the point that not using them is letting your readers down.
On the foe side of the equation, using olinks does add some overhead to your publication system and adds some extra steps in setting up a library. If you are using a publication system based on the DocBook XSLT style sheets, the added overhead is fairly minimal. Most of the work is already included in the base style sheets. You simply need to turn it on and add a preprocessing run to the system. The preprocessing run will calculate the target databases needed to resolve the targets to real endpoints. We currently have our system set up so that an author can skip this step when doing builds of a single book for test purposes. However, the preprocessing is done by default when building an individual book. When building the entire library, all of the books in the library are preprocessed before any output generation happens and that step cannot be skipped.
The added steps in building a library are not all exactly steps. Some of them are things that all of the writers working on the library must keep in mind and rules to which they must adhere. The first added step is determining the layout of the generated HTML site. This allows the system to calculate the relative paths between all of the books in the library. Part of the site mapping process includes creating book IDs for each book in the library. These IDs are used to identify the link targets when they are outside the scope of the current book.
Most of the remaining overhead involves sticking to a few simple rules. You shouldn't change any IDs once they are in use without making sure you change them throughout the entire set. You should rebuild the target databases whenever you make changes to a book. If you do make changes to IDs, rebuild all of the books in the library to make sure the link targets resolve. The most important thing is to watch for warning during book generation to ensure that all of the links resolve. Mostly, these are all just basic best practices is any writing job.
So there are lots of benefits for the minor costs. A responsible, well disciplined team of writers should be capable of using olinks without a problem if they have the need for linking between books or are doing modular documentation.

November 28, 2012

Reuse

When you have to maintain a sprawling documentation set, the ability to reuse content can be a lifesaver. It can also be a disaster. The wall staving off disaster is strategy, consistency, and discipline. Without a good strategy, reuse becomes a rat's nest. If the writers are not consistent in applying the strategy, reuse creates snarls. If the writers are not disciplined, reuse exacerbates the problems.
The first thing that needs be done in forming a good reuse strategy is defining what reuse means. One definition of reuse is that content modules are used in multiple published documents. For example, a standard warning message is placed in a module and then that module is imported into any document that requires it. Another common definition of reuse that content is cloned (copied) to other modules where it may be useful. For example, the standard warning is simply pasted into all of the modules that require it.
In my mind the first definition of reuse is the one that knowledge set maintainers should aspire. It truly leads to reduction in workload and possibility for error. Writers only need to maintain a single copy of a module and when that module is updated, all importing modules take advantage of the update. The second definition, in contrast, saves writers some amount of work up front since they do not have to originate any content, but increases the maintenance load on the backend. A change in one of the cloned sections requires the writer to not only update the original, but to hunt down all of the copies, determine if the change is appropriate, and then make the update. It is more work and far more error prone.
The idea of cloning content is not without merit and does have a place in a solid reuse strategy, but by itself is not a solid reuse strategy. Cloning is useful when the content in a module is a close, but not exact, fit. For example, two products may use a common log-in module and have a similar log-in procedure. However, the username/password requirements may be very different or one of the products may require an additional step. In this case, it may make sense to maintain two copies of the log-in procedure.
Cloning is also routinely used to perform versioning. Each product library I work on has at least three versions. Each of these versions is a clone of the other versions. The entire collection of modules is cloned into a single version such that module A and module B will share the same version in an instance of a library instance. Trying to make an update to multiple version of a library will highlight the issues with cloning as a primary reuse strategy.
So, if cloning is not a useful primary reuse strategy what is? Reuse is complex and any strategy will require many tactics:
* constraining writing to make all of the content stylistically uniform
* wise use of variables
* wise use of conditional text
* sensible chunking rules
* open communication channels
* sensible cloning rules
* scoping rules
* clear versioning policies for shared content collections
Using variables and conditional text make it easier to share modules between products or in places where there are minor variations required in the content. They are useful for places where a product name changes or when two products use different version schemes. Conditional text can allow for slight variances in procedures. Variables and conditional text can have pitfalls as well. They can hinder translation and can get convoluted and hard to manage. When a module becomes too heavy with conditionals and variables, it might be a good idea to consider cloning.
One of the most important parts of a reuse strategy is the size of the modules allowed. They must be fine grained enough to maximize reuse. For example, a book is not a great a great module size since a books reusability is limited to one per library. The modules need to be large grained enough to maintain. For example, a phrase, or even a sentence, does not make a great content module because there would simply be too many of them to manage. I generally think that the DocBook section element is a good module delimiter. Sections are fine grained enough to be shared in multiple places in a library or set of libraries and rough grained enough to hold a useful amount of information. In case by case instances, tables, lists, admonitions, and examples also make good modules.
In situations where you are only dealing with one product library a strict versioning policy may not be critical. All of the modules will ostensibly share the same version for an entire library. However, if you are working in an environment where products share large components, it makes sense to have a strict and well understood policy in place about how component and product versions work. We currently have two products that share a number of common components and the products can at any one time be using different versions of the components. To handle this we version the documentation for each component independently of the products in which they are used. Each product imports the required version of the component sets and when a release is built tags are added to the component libraries to mark the product revision. This allows us to make on going updates to all of the content with reasonable assurance that we won't accidentally cross contaminate a product library. It does, however, add administrative overhead and require some extra caution on the part of the writers.
There is not a one-size-fits-all answer for how to implement these things. Every team has slightly different requirements, slightly different content, and slightly different capabilities. If your requirements put ease of translation over ease of reuse you will choose a different set or parameters. If your team is made up of newbies, you will choose a less strict set of parameters. Your tools will also alter the parameters you choose. The trick is to choose wisely and honestly.
Once you have chosen stick to the plan and look for ways to improve the process. Just know that once you have chosen, changing your mind becomes harder and harder. Reuse creates tangles and dependencies that are not easily unwound.

November 2, 2012

Attribution and Provenance

I was recently involved in a discussion about reuse and one of the recurring issues was that writers didn't want other writers modifying their modules without being consulted.
In a distributed, multi-writer team there is always the chance that two writers will make a change to the same module. When reuse is added into the mix, there is the problem that one writers changes are incompatible with at least one use of the module and the only real solution to the problem is to branch the module. It is naive to think that everyone will always do the right thing and so there is a requirement to be able to track changes and have writers names attached to changes. This requirement makes it possible to easily rollback mistakes and to hold writers accountable such that mistakes are less likely to happen. Attaching a writer's name to a change also makes it easier to coordinate future changes because the next writer to come along can see who has been working on a module and coordinate with them to decide if updates require a branch or not.
Attaching a writers name to a module's change log was not an issue for this group, partly because they are working in a system that really doesn't support branching or any robust change tracking mechanism, but mostly because they were more hung up on the fact that another writer can change their modules. It was an issue of ownership that is exacerbated by a system that lists all of the writers that ever contributed to a book as an author. Much of the discussions about how to manage the issue of modifying reused topics focused around how manage the ownership issue and devolved into a discussion about how to keep track of the authors of a module.
This is unproductive. In order for reuse to work, in fact for modular writing to be effective at all, the concept of ownership needs to be extended to all of the modules that make up the content base. No one writer can own a content module. Technical writers in a group project, regardless of if the group is a corporate writing team or an open source project, cannot, if they want to create good content in an efficient manner, retain the ownership of any one piece of the whole. Ownership of pieces can be destructive because it makes writers reluctant to make changes to modules they don't own, creates situations where writers are upset when a change is made to a module they own, and fosters an environment where writers focus on making their modules great instead of making the whole project great. In the end technical writers working on a team are not authors; they are contributors. Authors are entities that publish complete works that are intended for standalone consumption.
I know writers generally don't like to hear that they are not authors. I know that I don't. I like to get credit for my work and see my byline. I worked as a reporter for several years and I write several blogs. In both cases, I am an author and own the content. In both cases, I produce complete works that are intended for standalone publication and consumption. As a reporter, I did work on articles with other reporters and how the byline, and hence ownership of the work, was determined depended largely on how much each reporter contributed. If it was a two person effort and both split the work equally, the byline was shared. In teams bigger than two, typically, at least one of the reporters was relegated to contributor.
However, I also work as a technical writer and contributor to a number of open source projects. In both cases, I write content that is published and in which I take pride. The difference is that they are large group efforts of which my contributions are only a part (sometimes a majority part, sometimes a tiny part). Publicly, I cannot claim authorship for the content produced by the efforts. There is little way to distinguish my contributions from the others and attempting to do so does not benefit the reader. Do I get credit for this work? Within the projects I do because all of the change tracking systems associate my changes with my name. I do not make contributions to a project that requires personal attribution for my contributions, nor do I make contributions that prohibit derivative works. Both feel detrimental to the purpose of the projects. How can one make updates if no derivatives are allowed on a content module? Most of the efforts do use licenses that restrict redistribution and derivative works, but these are for the entire body of work.
There is the issue of provenance in environments that accept outside contributions or produce works that are an amalgam of several projects. This is largely a CYA legal issue, but it is a big issue. Fortunately, it is a problem with several working solutions. The open source communities have all developed ways of managing provenance as have any company that ships functionality implemented by a third party. One of the most effective ways of managing the issue of provenance is to limit the types of licenses under which your project is allowed to accept.
Personally, I would restrict direct contributions to a single license that doesn't require direct attribution of the contributor and allows derivative works. Ideally, contributions should be only be accepted if the contributor agrees to hand over rights to the project which eliminates all of the issues.
For indirect contributions, the issue is a little more thorny. You want to maximize the resources available to your project while minimizing your exposure to legal troubles and unwanted viral license terms. For example, Apache doesn't allow projects to use GPL licensed code because it is too viral. However, they do allow the use of LGPL binaries since they don't infect the rest of the project. This also means knowing what is and isn't allowed by the licenses which which you want to work. For example, if your I project wants to use a work that requires attribution and doesn't allow derivative works, you need to have a policy in place about how you redistribute the work, like only distribute the PDFs generated by the author.
Tracking provenance need not be hard. For direct contributions, you just need to ensure that all contributors accept the terms required for contribution and that is that. For indirect contributions, they should be handled like third party dependencies and have the license terms associated directly with the third-party in a separate database. They only need to be consulted when the project is being prepped for a release to ensure that legal obligations are being met.
The take away:
* the concept of ownership is destructive and counter productive to large group projects
* provenance is an issue, but not a problem if properly scoped

October 19, 2012

Topics and the Death of Technical Writing

Like the link bait title? Well stay and enjoy the post....

I find topic-based writing evangelists to be an annoying scourge to the technical writing profession. That is not to say that I am a narrative apologist. I just find the narrative crowd to be more reasonable than the topic crowd. Maybe it is that the narrative crowd has accepted their defeat and the the topic crowd are still fighting the war. I think is that the narrative crowd know that on the fundamentals they are right and are willing to adopt new thinking to improve the profession while the topi crowd knows that on the fundamentals they are weak and cannot tolerate challenges to the dogma.
I fall somewhere in the middle of the two crowds. I believe that the narrative crowd knows what it takes to make technical documentation that is useful for all ranges of users. Knowledge transmission depends on context and order. Many things need to be done in a set order. Knowledge builds on prior knowledge. The larger organizational structures-books, sets, help systems, libraries-we use to present that information need to reflect these truths. In a lot of cases the building blocks must also bend to these truths and provide hints to the larger structure. It also true that there will be some amount of bridging, or glue, content that will need to created to provide context and flow through the material.
I also think that the idea of breaking technical documentation into atomic chunks that can be reused and distributed to a number of writers is a good practice. There is no reason that a section in a book cannot be written as a standalone piece of content that is eligible for reuse where appropriate. In the case of online help, each of the pages in the help should be as independent as possible. Even in books that are presented in HTML should have pages that can intelligibly standalone. In these cases, there is a high probability that the pages will be viewed out of order and without the supporting context.
I think the strengths of narrative and topic can be, and have been, merged into successful systems. I like to call it modular writing and it is not an original idea. It doesn't require any fancy tools or even a special writing tag set like DITA or DocBook. All it takes is a little discipline and agreed to conventions. To make it work, you simply agree to what constitutes a content module. For example, the team I work with uses the section element. You also agree to some basic ground rules for the boundaries of a content module. We have a loose definition that a module is the smallest set of content that can usefully standalone. This does not mean that it is just a concept or a task. In a lot of cases, what the topic crowd deems a concept or a task does not constitute a set of information that can usefully standalone because a free floating task may require some context to make it useful instead of just simply a set of steps a monkey can perform.
Unlike topic-based system which disdains bridge content, leaves space for bridge content which can, if it makes sense, be included in a module, or it can be put in a separate module. Our system allows it to be placed in the map outside the scope of any module. We use DocBook, so bridge material can be placed in a chapter before any sections or in a section before any sub-sections.
So where topic evangelists proclaim that bridge content is wasteful and that context is a quaint idea clung to by dinosaurs, I proclaim that they have a very useful place in writing. I also proclaim that they are not always required or even desirous. These things are important tools to help guide novices and provide background for experienced users. They aren't useful to a ninja user who just needs a refresher on navigating a UI or remembering the syntax for an API.
Ditching them impoverishes the overall content and makes the content little more than a collection of tiny factlets that are strung together by a gossamer strand. It means that you can hire a bunch of low skill writers to gather the factlets and beat them into the required form while some "information architect" assembles the pieces into what at first glance looks like a sufficient documentation set. It is the assembly line model. Unfortunately assembly line documentation often results in things like your typical auto manual or coffee maker instructions.

July 11, 2012

SVG in DocBook

One of the most difficult things about multi-format publishing is images. Images that are scaled to look good in HTML usually look pixelated when scaled for PDF. Images that are scaled for PDF or print are usually too small for the Web.
DocBook will allow you to specify different image files for each output format, but that presents its own set of problems. While disks pace is cheap, it is not free. There is also the real danger of the images getting out of sync.
One solution is scalable vector graphics. They are infinitely scalable and are tiny. The problem is getting them to render. Fortunately most modern browsers support SVGs. Unless you need to support users that are stuck on IE 6, HTML shouldn't be a problem. PDF requires a processor that supports SVG rendering. We use XEP by Render X and it does an excellent job supporting SVG.
I had always wanted to play with SVGs and see if our publication tools could handle it. Somehow, the pressures of getting content written and updating our templates trumped doing it. Thankfully, Fintan Bolton stepped up and tried it out for some complex graphics he wanted to use in documenting the advanced security features in CXF.
Our DocBook XSLT tool chain handled SVGs with only minor modifications and they look fabulous. Fintan did discover a few gotchas along the way though:

XEP is extremely slow at processing SVG files, because it has to download the SVG DTD for every image it processes. The solution was to add the DTD to our local catalog. However, it turns out that configuring XEP to use the XML Catalog is tricky. Fintan found this post, which explains how to do it: http://tech.groups.yahoo.com/group/dita-users/message/16362

When creating your image, be conservative in your choice of fonts. If the XEP processor needs to do font substitution, it can result in very badly spaced type. So far, I have had no problems using common fonts like Arial, Courier New, and Helvetica New (actually, XEP substitutes Helvetica New --> Helvetica, but that substitution works well).

Do not set format="SVG" in the DocBook imagedata element. This results in HTML mapping to the HTML

Documenting It

March 15, 2013

Map Existing Structures Instead of Using the Three Topic Types

January 21, 2013

Just a Bunch of Books

January 3, 2013

Translation or Internationalization

December 30, 2012

Ode to Olinks

November 28, 2012

Reuse

November 2, 2012

Attribution and Provenance

October 19, 2012

Topics and the Death of Technical Writing

July 11, 2012

SVG in DocBook

May 17, 2012

New Content and a New Format

May 10, 2012

We Only Notice When It Is Bad

April 11, 2012

Fuse Enterprise and EPubs

March 2, 2012

Keeping Track of Everything

February 21, 2012

Content Management

Popular Posts