March 15, 2013

Map Existing Structures Instead of Using the Three Topic Types

It is not that I don't like the kernel that germinated topics. I do like the idea of breaking big ideas into smaller, more manageable, and reusable chunks. It is one of the cornerstones of good writing.
What I don't like is the reduction of all things into three containers that are both too restrictive and not specific enough. Tasks, for example, cannot contain any conceptual information despite the fact that for most complex actions a reader will need some conceptual information to ground the task and explain its purpose in the larger scheme. Also, given the context free nature of topics, a task cannot depend on any other tasks despite the fact that many tasks are meta-tasks where each step in the task is itself another task.
One way to solve the need for adding context to a task is to redefine task to include an overview block that allows for conceptual information. Another way is to define a concept type that, by definition, precedes a task to provide the required context. Both cases create a more specific, and more useful, architecture for writing.
Similarly, to solve the meta-task issue one could define a new task type that allows dependencies on other tasks. This type, called a procedure, doesn't need to have hard dependencies; it could allow for output generation without inclusion of the sub-tasks. However, it would make it harder to ignore the need for the sub-tasks.
It is not that information architects are not free to make new content types; it is that most don't. They have their three types and try to force everything into them. They ignore the fact that an existing information set will have organically developed topic types that make sense for it. In most instances the argument is that the starting point set was narrative and therefore flawed. It needs to be tamed into the three canonical types for its own good.
The mistake here is that by assuming the new model is better, they lose the native intelligence in the existing structure. They assume it has none and impose it. Unfortunately, this approach typically results in more work and no net increase in the value of the information.
A far better approach is to analyze the structures used through out the existing set and attempt to build types, based on the canonical types, into which the old structure map. This requires some upfront work, but makes the move into topics, or modules, smoother. It also retains the knowledge encoded into the existing architecture. It has grown up as a reflection of the needs of the information, the needs of the consumers, and the needs of the authors. Hopefully, the standardization of the existing structures will result in a net increase in value because it will smooth out the bumps in the existing set instead of chopping it up. It will also give the authors more investment in the task of migration and more able to spot places where it can be improved.
The other benefit of remembering that the structure of existing sets has value, is that it sparks an iterative process. The architecture can be modified as needed. New types can be introduced; old types can be refined or removed.

January 21, 2013

Just a Bunch of Books

One of the things that have been occupying my brain lately is the differences between thinking of a product library as a bunch of books versus thinking of it as a bunch of knowledge modules. In both models a library will have something called books that are used to organize the content because book is such a well understood concept for organizing written text. Readers expect to see a list of books that will contain smaller divisions called chapters. They understand how to navigate inside that abstraction.
The differences between a bunch of books and a bunch of knowledge modules is mostly a production concern. It will have an impact on the reader's experience since the resulting library can be very different, but it is not something a reader will need to have knowledge of to work with the published content. While I am an advocate of the module centric approach because I think it provides more flexibility for the production side and the potential for a richer experience for the reader, I do not believe that a library constructed using a book centric approach cannot have the same richness as a module centric one.
The major difference between the two approaches is how content is chunked into buckets. In a book centric model, the book is the primary chunk-level. Every smaller block used to flesh out a book is done so with a view to building a single entity. This may or may not lead to what is currently derided as the narrative style of writing where there is a flow from one small block to the next and each block is contextually dependent on the other small blocks in the book. It does mean that the possible small blocks are predetermined by the predetermined set of books. So library design goes something like:


* What users will our product have?


* What set of high level knowledge will they need to work with the product?


* What set of books should we create to cover the knowledge requirements of the users?


* Create a set of books


* For each book, determine what specific knowledge will the users need/expect?


* For each book, create the content to satisfy the user.


In a module centric model, the basic chunk level is much smaller. It would typically be at a level where each module contains a digestible block of knowledge that a user will find useful. It is an intentionally vague definition since I believe that for any given project, the module is best determined by the writers working on the project. This model can be used to create things that feel narrative since a module can contain content that bridges between modules or provides context that glues modules together. It doesn't, however, predetermine the set of possible knowledge modules around a set of big chunks. The books can be decided on late in the game as the content builds up and the way to organize them because clearer. Library design goes like this:


* Who will use the product?


* What will the users want to do with the product?


* What specific tasks will the users need to do to accomplish their goals?


* What knowledge will the users need to accomplish these tasks?


* What knowledge modules does this map to?


* Create the knowledge modules.


* What modules need to grouped to illuminate a task?


* Create collections that map into book-like structures.


* What glue is needed to hold the collections together?


* Create the glue.


The books are created after most of the content is written. This gives you some added agility in creating the library because you can modify the organization as new information arrives. I also gives you flexibility in terms of reusing information. There may be modules that go in more than one book and you can simply include the module without cloning or you can clone it if it makes more sense.
A collection-centric model does change the way writers work and does require some extra-discipline. Instead of working on a book and not needing to worry about what other writers are doing, writers in this model work on a set of modules and must consider how their work fits into the whole. For example, instead of writing a security guide, a writer might write all of the security related modules. Those modules may all be built up into a security guide, but a few may also be used in other books. Writers need to communicate with each other more to coordinate updates to shared modules.

January 3, 2013

Translation or Internationalization

A while back someone passed around a quote from the FireFox team that said something like "We should strive to ensure that every user, no matter what language they speak, can have a consistent experience with our product." The reason for sending it around was to prod the writing team to strive for the same thing.
I generally agree with the sentiment of the statement. Language shouldn't be a barrier to accessing knowledge or a software product. Documentation and user interfaces should be usable regardless of a person's native language. I won't attempt to argue what subset of languages are useful because that is purely a business/resourcing issue.
The statement got me thinking about the differences between making a UI available in multiple languages and making documentation available in multiple languages. I often hear translation used to describe both efforts, but I think that papers over a lot differences.
UI are not so much translated as they are internationalized. In general, making a UI available in a second language involves translating all of the labels and warning messages into a second, or third, language. So there is some translation being done, but it is fairly simple stuff. Most of the labels and warning messages are single words or short direct statements. It takes skill to be sure, but it is a pretty straight forward task.
Documentation really does need to be translated. In general, documentation requires more than a simple parsing of labels and direct statements into a second language. Yes, there are plenty of instances where documentation is little more than steps and reference tables which are just labels and short direct statements, but that is pretty low hanging fruit. I would also argue that simply because steps are short and direct that because they are part of a larger whole, they really should be treated as more than strings that can be changed without consideration of the context. With documentation, because it is a dense collection of language, you really need to consider the whole body of the work and translate it into a new language. This may mean rewriting parts of the content to be more understandable to speakers of the second language. For example, cultural references always sneak into content because they can help explain complex ideas. There are also structures like glossaries that don't always have direct mapping into the second language.
I have seen strategies for translation that attempt to stream line the process by treating the text like a collection of strings. It seems to the that while it may grease the wheels a little, it cannot produce truly good quality content. The systems all place a number of restrictions on the content originator to make sure the strings can be easily translated. Often it seems you end up with something that is mediocre in multiple languages, but is done quickly efficiently.
Wouldn't it be better to create great content in one language and then, if required, have full translations done. It may not be as efficient, but it will probably make for happier readers.

December 30, 2012

Ode to Olinks

The documentation set I work on uses olinks for just about all of the cross references. Their flexibility and the fact that they are not resolved until publication time make them powerful and ideal for modular documents.
Like xrefs they can use the DocBook gentext system to build the text for a link and like link they allow you to provide your own link text. Unlike either of them, they do not use ids that must be resolvable at validation time. In fact an unresolvable olink is not a validation error because they can be used to link to documents outside the scope of the current document such as another book in a product library or an article. The resolution of the targets to actual endpoints is left up to the publication system.
This flexibility, naturally, is both friend and foe. On the friend side, it makes writing modular documents easier by eliminating the validation a errors that crop up when trying to create a link between two XML documents that will ultimately end up as part of the same publication. This is one of the main reasons we started using olinks instead of xrefs. All of the writers found the constant validation errors distracting. Naturally this is only a problem if you are writing DocBook XML documents with a real XML editor. Teams that use text editors with some syntax highlighting and auto tag closing features will not be distracted by the validation errors. Of course, they also won't know a document is invalid until it blows up in publication.
The other strength of using olinks is that a team can architect a library that is more that just a loose collection of related books. The books in a library can have working links between each other. One book can reference a topic that is discussed in another book and provide the user with a functional link directly to the required information. This is possible without olinks, and I have worked on libraries that attempted it. However, without olinks, or some similar mechanism, deep links between books in a library is a bear to maintain and most resource constrained teams will not succeed. The argument can also be made that deep links between books in a library is not valuable. Given the difficulty of maintaining them using "standard" methods, the argument is correct. However, using olinks lowers the cost to the point that not using them is letting your readers down.
On the foe side of the equation, using olinks does add some overhead to your publication system and adds some extra steps in setting up a library. If you are using a publication system based on the DocBook XSLT style sheets, the added overhead is fairly minimal. Most of the work is already included in the base style sheets. You simply need to turn it on and add a preprocessing run to the system. The preprocessing run will calculate the target databases needed to resolve the targets to real endpoints. We currently have our system set up so that an author can skip this step when doing builds of a single book for test purposes. However, the preprocessing is done by default when building an individual book. When building the entire library, all of the books in the library are preprocessed before any output generation happens and that step cannot be skipped.
The added steps in building a library are not all exactly steps. Some of them are things that all of the writers working on the library must keep in mind and rules to which they must adhere. The first added step is determining the layout of the generated HTML site. This allows the system to calculate the relative paths between all of the books in the library. Part of the site mapping process includes creating book IDs for each book in the library. These IDs are used to identify the link targets when they are outside the scope of the current book.
Most of the remaining overhead involves sticking to a few simple rules. You shouldn't change any IDs once they are in use without making sure you change them throughout the entire set. You should rebuild the target databases whenever you make changes to a book. If you do make changes to IDs, rebuild all of the books in the library to make sure the link targets resolve. The most important thing is to watch for warning during book generation to ensure that all of the links resolve. Mostly, these are all just basic best practices is any writing job.
So there are lots of benefits for the minor costs. A responsible, well disciplined team of writers should be capable of using olinks without a problem if they have the need for linking between books or are doing modular documentation.

November 28, 2012

Reuse

When you have to maintain a sprawling documentation set, the ability to reuse content can be a lifesaver. It can also be a disaster. The wall staving off disaster is strategy, consistency, and discipline. Without a good strategy, reuse becomes a rat's nest. If the writers are not consistent in applying the strategy, reuse creates snarls. If the writers are not disciplined, reuse exacerbates the problems.
The first thing that needs be done in forming a good reuse strategy is defining what reuse means. One definition of reuse is that content modules are used in multiple published documents. For example, a standard warning message is placed in a module and then that module is imported into any document that requires it. Another common definition of reuse that content is cloned (copied) to other modules where it may be useful. For example, the standard warning is simply pasted into all of the modules that require it.
In my mind the first definition of reuse is the one that knowledge set maintainers should aspire. It truly leads to reduction in workload and possibility for error. Writers only need to maintain a single copy of a module and when that module is updated, all importing modules take advantage of the update. The second definition, in contrast, saves writers some amount of work up front since they do not have to originate any content, but increases the maintenance load on the backend. A change in one of the cloned sections requires the writer to not only update the original, but to hunt down all of the copies, determine if the change is appropriate, and then make the update. It is more work and far more error prone.
The idea of cloning content is not without merit and does have a place in a solid reuse strategy, but by itself is not a solid reuse strategy. Cloning is useful when the content in a module is a close, but not exact, fit. For example, two products may use a common log-in module and have a similar log-in procedure. However, the username/password requirements may be very different or one of the products may require an additional step. In this case, it may make sense to maintain two copies of the log-in procedure.
Cloning is also routinely used to perform versioning. Each product library I work on has at least three versions. Each of these versions is a clone of the other versions. The entire collection of modules is cloned into a single version such that module A and module B will share the same version in an instance of a library instance. Trying to make an update to multiple version of a library will highlight the issues with cloning as a primary reuse strategy.
So, if cloning is not a useful primary reuse strategy what is? Reuse is complex and any strategy will require many tactics:
* constraining writing to make all of the content stylistically uniform
* wise use of variables
* wise use of conditional text
* sensible chunking rules
* open communication channels
* sensible cloning rules
* scoping rules
* clear versioning policies for shared content collections
Using variables and conditional text make it easier to share modules between products or in places where there are minor variations required in the content. They are useful for places where a product name changes or when two products use different version schemes. Conditional text can allow for slight variances in procedures. Variables and conditional text can have pitfalls as well. They can hinder translation and can get convoluted and hard to manage. When a module becomes too heavy with conditionals and variables, it might be a good idea to consider cloning.
One of the most important parts of a reuse strategy is the size of the modules allowed. They must be fine grained enough to maximize reuse. For example, a book is not a great a great module size since a books reusability is limited to one per library. The modules need to be large grained enough to maintain. For example, a phrase, or even a sentence, does not make a great content module because there would simply be too many of them to manage. I generally think that the DocBook section element is a good module delimiter. Sections are fine grained enough to be shared in multiple places in a library or set of libraries and rough grained enough to hold a useful amount of information. In case by case instances, tables, lists, admonitions, and examples also make good modules.
In situations where you are only dealing with one product library a strict versioning policy may not be critical. All of the modules will ostensibly share the same version for an entire library. However, if you are working in an environment where products share large components, it makes sense to have a strict and well understood policy in place about how component and product versions work. We currently have two products that share a number of common components and the products can at any one time be using different versions of the components. To handle this we version the documentation for each component independently of the products in which they are used. Each product imports the required version of the component sets and when a release is built tags are added to the component libraries to mark the product revision. This allows us to make on going updates to all of the content with reasonable assurance that we won't accidentally cross contaminate a product library. It does, however, add administrative overhead and require some extra caution on the part of the writers.
There is not a one-size-fits-all answer for how to implement these things. Every team has slightly different requirements, slightly different content, and slightly different capabilities. If your requirements put ease of translation over ease of reuse you will choose a different set or parameters. If your team is made up of newbies, you will choose a less strict set of parameters. Your tools will also alter the parameters you choose. The trick is to choose wisely and honestly.
Once you have chosen stick to the plan and look for ways to improve the process. Just know that once you have chosen, changing your mind becomes harder and harder. Reuse creates tangles and dependencies that are not easily unwound.

November 2, 2012

Attribution and Provenance

I was recently involved in a discussion about reuse and one of the recurring issues was that writers didn't want other writers modifying their modules without being consulted.
In a distributed, multi-writer team there is always the chance that two writers will make a change to the same module. When reuse is added into the mix, there is the problem that one writers changes are incompatible with at least one use of the module and the only real solution to the problem is to branch the module. It is naive to think that everyone will always do the right thing and so there is a requirement to be able to track changes and have writers names attached to changes. This requirement makes it possible to easily rollback mistakes and to hold writers accountable such that mistakes are less likely to happen. Attaching a writer's name to a change also makes it easier to coordinate future changes because the next writer to come along can see who has been working on a module and coordinate with them to decide if updates require a branch or not.
Attaching a writers name to a module's change log was not an issue for this group, partly because they are working in a system that really doesn't support branching or any robust change tracking mechanism, but mostly because they were more hung up on the fact that another writer can change their modules. It was an issue of ownership that is exacerbated by a system that lists all of the writers that ever contributed to a book as an author. Much of the discussions about how to manage the issue of modifying reused topics focused around how manage the ownership issue and devolved into a discussion about how to keep track of the authors of a module.
This is unproductive. In order for reuse to work, in fact for modular writing to be effective at all, the concept of ownership needs to be extended to all of the modules that make up the content base. No one writer can own a content module. Technical writers in a group project, regardless of if the group is a corporate writing team or an open source project, cannot, if they want to create good content in an efficient manner, retain the ownership of any one piece of the whole. Ownership of pieces can be destructive because it makes writers reluctant to make changes to modules they don't own, creates situations where writers are upset when a change is made to a module they own, and fosters an environment where writers focus on making their modules great instead of making the whole project great. In the end technical writers working on a team are not authors; they are contributors. Authors are entities that publish complete works that are intended for standalone consumption.
I know writers generally don't like to hear that they are not authors. I know that I don't. I like to get credit for my work and see my byline. I worked as a reporter for several years and I write several blogs. In both cases, I am an author and own the content. In both cases, I produce complete works that are intended for standalone publication and consumption. As a reporter, I did work on articles with other reporters and how the byline, and hence ownership of the work, was determined depended largely on how much each reporter contributed. If it was a two person effort and both split the work equally, the byline was shared. In teams bigger than two, typically, at least one of the reporters was relegated to contributor.
However, I also work as a technical writer and contributor to a number of open source projects. In both cases, I write content that is published and in which I take pride. The difference is that they are large group efforts of which my contributions are only a part (sometimes a majority part, sometimes a tiny part). Publicly, I cannot claim authorship for the content produced by the efforts. There is little way to distinguish my contributions from the others and attempting to do so does not benefit the reader. Do I get credit for this work? Within the projects I do because all of the change tracking systems associate my changes with my name. I do not make contributions to a project that requires personal attribution for my contributions, nor do I make contributions that prohibit derivative works. Both feel detrimental to the purpose of the projects. How can one make updates if no derivatives are allowed on a content module? Most of the efforts do use licenses that restrict redistribution and derivative works, but these are for the entire body of work.
There is the issue of provenance in environments that accept outside contributions or produce works that are an amalgam of several projects. This is largely a CYA legal issue, but it is a big issue. Fortunately, it is a problem with several working solutions. The open source communities have all developed ways of managing provenance as have any company that ships functionality implemented by a third party. One of the most effective ways of managing the issue of provenance is to limit the types of licenses under which your project is allowed to accept.
Personally, I would restrict direct contributions to a single license that doesn't require direct attribution of the contributor and allows derivative works. Ideally, contributions should be only be accepted if the contributor agrees to hand over rights to the project which eliminates all of the issues.
For indirect contributions, the issue is a little more thorny. You want to maximize the resources available to your project while minimizing your exposure to legal troubles and unwanted viral license terms. For example, Apache doesn't allow projects to use GPL licensed code because it is too viral. However, they do allow the use of LGPL binaries since they don't infect the rest of the project. This also means knowing what is and isn't allowed by the licenses which which you want to work. For example, if your I project wants to use a work that requires attribution and doesn't allow derivative works, you need to have a policy in place about how you redistribute the work, like only distribute the PDFs generated by the author.
Tracking provenance need not be hard. For direct contributions, you just need to ensure that all contributors accept the terms required for contribution and that is that. For indirect contributions, they should be handled like third party dependencies and have the license terms associated directly with the third-party in a separate database. They only need to be consulted when the project is being prepped for a release to ensure that legal obligations are being met.
The take away:
* the concept of ownership is destructive and counter productive to large group projects
* provenance is an issue, but not a problem if properly scoped

October 19, 2012

Topics and the Death of Technical Writing

Like the link bait title? Well stay and enjoy the post....

I find topic-based writing evangelists to be an annoying scourge to the technical writing profession. That is not to say that I am a narrative apologist. I just find the narrative crowd to be more reasonable than the topic crowd. Maybe it is that the narrative crowd has accepted their defeat and the the topic crowd are still fighting the war. I think is that the narrative crowd know that on the fundamentals they are right and are willing to adopt new thinking to improve the profession while the topi crowd knows that on the fundamentals they are weak and cannot tolerate challenges to the dogma.
I fall somewhere in the middle of the two crowds. I believe that the narrative crowd knows what it takes to make technical documentation that is useful for all ranges of users. Knowledge transmission depends on context and order. Many things need to be done in a set order. Knowledge builds on prior knowledge. The larger organizational structures-books, sets, help systems, libraries-we use to present that information need to reflect these truths. In a lot of cases the building blocks must also bend to these truths and provide hints to the larger structure. It also true that there will be some amount of bridging, or glue, content that will need to created to provide context and flow through the material.
I also think that the idea of breaking technical documentation into atomic chunks that can be reused and distributed to a number of writers is a good practice. There is no reason that a section in a book cannot be written as a standalone piece of content that is eligible for reuse where appropriate. In the case of online help, each of the pages in the help should be as independent as possible. Even in books that are presented in HTML should have pages that can intelligibly standalone. In these cases, there is a high probability that the pages will be viewed out of order and without the supporting context.
I think the strengths of narrative and topic can be, and have been, merged into successful systems. I like to call it modular writing and it is not an original idea. It doesn't require any fancy tools or even a special writing tag set like DITA or DocBook. All it takes is a little discipline and agreed to conventions. To make it work, you simply agree to what constitutes a content module. For example, the team I work with uses the section element. You also agree to some basic ground rules for the boundaries of a content module. We have a loose definition that a module is the smallest set of content that can usefully standalone. This does not mean that it is just a concept or a task. In a lot of cases, what the topic crowd deems a concept or a task does not constitute a set of information that can usefully standalone because a free floating task may require some context to make it useful instead of just simply a set of steps a monkey can perform.
Unlike topic-based system which disdains bridge content, leaves space for bridge content which can, if it makes sense, be included in a module, or it can be put in a separate module. Our system allows it to be placed in the map outside the scope of any module. We use DocBook, so bridge material can be placed in a chapter before any sections or in a section before any sub-sections.
So where topic evangelists proclaim that bridge content is wasteful and that context is a quaint idea clung to by dinosaurs, I proclaim that they have a very useful place in writing. I also proclaim that they are not always required or even desirous. These things are important tools to help guide novices and provide background for experienced users. They aren't useful to a ninja user who just needs a refresher on navigating a UI or remembering the syntax for an API.
Ditching them impoverishes the overall content and makes the content little more than a collection of tiny factlets that are strung together by a gossamer strand. It means that you can hire a bunch of low skill writers to gather the factlets and beat them into the required form while some "information architect" assembles the pieces into what at first glance looks like a sufficient documentation set. It is the assembly line model. Unfortunately assembly line documentation often results in things like your typical auto manual or coffee maker instructions.

July 11, 2012

SVG in DocBook

One of the most difficult things about multi-format publishing is images. Images that are scaled to look good in HTML usually look pixelated when scaled for PDF. Images that are scaled for PDF or print are usually too small for the Web.
DocBook will allow you to specify different image files for each output format, but that presents its own set of problems. While disks pace is cheap, it is not free. There is also the real danger of the images getting out of sync.
One solution is scalable vector graphics. They are infinitely scalable and are tiny. The problem is getting them to render. Fortunately most modern browsers support SVGs. Unless you need to support users that are stuck on IE 6, HTML shouldn't be a problem. PDF requires a processor that supports SVG rendering. We use XEP by Render X and it does an excellent job supporting SVG.
I had always wanted to play with SVGs and see if our publication tools could handle it. Somehow, the pressures of getting content written and updating our templates trumped doing it. Thankfully, Fintan Bolton stepped up and tried it out for some complex graphics he wanted to use in documenting the advanced security features in CXF.
Our DocBook XSLT tool chain handled SVGs with only minor modifications and they look fabulous. Fintan did discover a few gotchas along the way though:

  • XEP is extremely slow at processing SVG files, because it has to download the SVG DTD for every image it processes. The solution was to add the DTD to our local catalog. However, it turns out that configuring XEP to use the XML Catalog is tricky. Fintan found this post, which explains how to do it: http://tech.groups.yahoo.com/group/dita-users/message/16362

  • When creating your image, be conservative in your choice of fonts. If the XEP processor needs to do font substitution, it can result in very badly spaced type. So far, I have had no problems using common fonts like Arial, Courier New, and Helvetica New (actually, XEP substitutes Helvetica New --> Helvetica, but that substitution works well).

  • Do not set format="SVG" in the DocBook imagedata element. This results in HTML mapping to the HTML tag. This results in the image being enclosed in a cute scrollbox, but the scrollbox is too small.

    The results from Fintan's experimenting look too good to be true. I cannot wait to roll out some documents with the new graphics.

    May 17, 2012

    New Content and a New Format

    FuseSource just released enterprise versions of their message broker and ESB. In addition we released a new management console and a new version of Fuse IDE. Together the products aim to make it easy for companies to consume open source integration technology.
    As part of the effort to make the technology easier to consume, we beefed up the documentation. We continued the upgrades to the messaging documentation. We added some more introductory information and overview documentation. We did more work on making the content more procedural as well.
    The most exciting thing, at for me, was introducing ePub versions of all of books. Now users can access the content offline on their mobile devices including iPad, Android, Nooks, and any other e-reader.
    Getting the DocBook to ePub transform to work in our build environment has been a long term goal of mine. The DocBook XSLT project comes with support for ePub, but it requires running support programs. I needed it to all work using just XSLT and Ant which proved harder than I had anticipated. The support programs are just used to do the packaging, so I figured that it would be easy to use Ant scripts to do the packaging. I was wrong. Getting everything in the right place was tricky, but the real catch was getting the encoding right. There were also some issues with namespaces getting in the way. The community, as always, was helpful in sorting through the issues.
    Check out the end result in the Fuse MQ Enterprise and Fuse ESB Enterprise at the FuseSource Documentation page.

    May 10, 2012

    We Only Notice When It Is Bad

    One of the frustrating things about working in technical documentation is that people only notice the documentation when their is something wrong with it. Customers rarely mention that the documentation is excellent because they just expect it to be complete and thorough. The comments only ever come in when the documentation does not meet a customer's needs. Sometimes the complaints about the documentation are really about a lack in the actual product, but the user only notices it when it they are looking at the documentation.
    This fact has several impacts on documentation teams:
    The most obvious is that writers can easily get demoralized. It is imperative for internal customers-the tech support team, the sales team, the development managers, the product managers-provide some positive feedback. The documentation team will get plenty of negative feedback, so there is no danger that their heads will swell.
    Because customers rarely think about the documentation, they rarely provide requirements for the documentation. This means that the product management team rarely provides detailed guidance for the documentation team. Most often, the guidance is that we need to document the new features. Occasionally, if a customer has complained about something it is listed as a requirement, but without any clear information about what the underlying issue the customer is having. The requirement will be something like "we need more documentation about Java development."
    The largest impact that customer's lack of thinking about documentation is that it results in a lack of investment. The lack of investment is monetary, personnel, and time. Because customer's generally don't have requirements for documentation, the impression is that it isn't important. Because the only comments that are received about documentation, the impression is that the documentation team is less than good. The result is that development teams are always looking for ways to hire less writers and squeeze more efficiency out of the documentation process. The result is that the documentation usually gets poorer and the complaints go up.
    Curating knowledge for complex technical products is an inefficient process. Dragging information out of SMEs takes time, organizing the information takes time, trying out the processes takes time. There is no real way to improves the speed of these steps. The push for modular, context free documentation is an outcrop of this push for efficiency, and it also, generally, creates poorer documentation. Context is an important part of written content.
    What can we do to address these issues? Educate internal customers about the importance of documentation. Make sure they understand that the fact that customers complain shows that they care. The fact that they don't mention the documentation probably means that the documentation team is doing a good job. As writers we need to remember that silence from the customers is the biggest compliment.

    April 11, 2012

    Fuse Enterprise and EPubs

    FuseSource just released a public beta of our Enterprise open source integration products. The idea behind the Enterprise versions of the products is to make the underlying Apache software more accessible for typical enterprise customers. A big part of that push was around improving the documentation and examples. We took a long look at our content and tried to reorganize it to be more user task oriented as opposed to feature oriented. We also tried to add more "intro" level content. It was quite a bit of work and it is still on going.
    Another thing the doc team added to make the content more accessible is add a new publication format: EPUb. All of the documentation for the enterprise products are available in EPub and look great on iPads. Most of the work for making this happen was done by the excellent people who work on the DocBook XSLT project. Using the work they did on transforming DocBook into EPub along with help from them on the mailing list, I managed to integrate the EPub transforms into our publication system.
    Check them out at Fuse MQ Enterprise Documentation and Fuse ESB Enterprise Documentation

    March 2, 2012

    Keeping Track of Everything

    It is no secret that I suffer from gadgetophilia. What is a little surprising to me is my love of data. I always thought that tracking a paddle on GPS was useful while on the water, and it was fun to see your speed at the end of the trip. I never really thought I'd be interested in that data later.
    A few years ago I got a Garmin Forerunner as a cycling computer. It track cadence, speed, heart rate, and your course. It also let's you down load the data to a computer for tracking your workouts. I figured what the heck, it would be cool to see where I've been riding. Now, I have three years of ride data and I constantly compare new rides with past rides to track my progress. It is a little bit of an obsession.
    I've also been keeping track of my weight because my doctor told me it was the best way to diet. Seeing the trend line would keep me motivated. It never really worked, but I did it anyway. The graph was sort of neat. When our scale died a few weeks ago, I wanted one that would automatically track my weight. I ended up with the Withings scale. It records weight, BMI, and body composition data and automatically uploads it to the Web. I find this super cool and love looking at the graph.
    This obsession with data extends to photos as well. I love the way iPhoto can show where a picture was taken and I love the fact that my iPhone automatically ads that information. It save me from compulsively adding the data manually. If I have to I do while I'm adding face data, because that is super cool too.
    Initially I worried that maybe keeping track of all this stuff was unhealthy; it was just another time killing obsession. As I thought about it more I realized that it was just another form of journalling in a sense and that some of the data was actually helpful. In fact, human beings have been obsessed with keeping track of things forever. Technology just makes it easier.
    I have always been a journal keeper. Writing things down started out as a crazy teenage dream about having source material for an autobiography for when I was famous. Then it became a creative outlet and a way to work out the stresses of life. The journal is also a good way to keep things in perspective. It provides a window to the past, both good and bad, that can help refocus what is happening in the present. It can also provide clues as to what is happening in the present-sort of like medical records.
    The face and places data with the photos serves a similar role. It provides context for the pictures. It adds to the memory. It also makes the photos easier to find.
    The workout data and the weight data doesn't serve a real memory purpose, but they do help in keeping track of your health. I can easily see that last summer I was in better shape than I am now. That is no surprise since the stationary bike is easier than a real bike. I can also easily see that I am in better shape at this year than I was at the same time last year. So, when I drag the real bike out of the garage, I will be able to gage what is a good starting point for training. When my health anxiety gets going good I can see proof that I'm in good physical shape.
    I think that the data craze is here to stay and not just for me. Anyone can keep and track reams of data about themselves cheaply and easily. For a hundred dollars you can buy a wrist band that monitors your activity throughout the day and monitor the quality of your sleep. With a smartphone you do even more.
    Applications like Facebook, Pintrest, and Intagram are more ways we keep records of our lives. They are taking the place of journals, folders, and photo albums. Just easier to update, store, and share.
    Of course the downside of all this is that companies now have access to all of this information too. When it was written on paper in your drawer or in your bookcase, you controlled access to the information. Now Facebook, Google, Apple, Garmin, Fitbit, and other companies can use the data for their own ends. You just have to trust them to be good shepherds and not sell your data to the wolves.
    That is probably easier with companies that view you as their customer instead of their product.... So it pays to know the business model of the companies who have your data.

    February 21, 2012

    Content Management

    In addition to the complexity of gathering, organizing, and presenting technical information, technical writers have to deal with managing multiple versions of the same content. Sometimes the versions correspond to different versions of a product and sometimes they are tied to different products. For example we maintain documentation for at least two versions of four different products. Two of the products share content with a third product. Keeping track of what version of a document goes where and where to make changes so that they apply to the proper products and versions is complicated.
    For the last ten years, I have worked at companies that use a source control system for documentation content management. In general, source control systems are a good match for a documentation content management system. They use a centralized repository for storing content objects. They provide change tracking and versioning. They provide mechanisms for managing and viewing different versions of the same set of content objects.
    I personally have used ClearCase, Subversion, and Git. They all have done the job asked of them. In ClearCase we didn't really tax the system much. We were working with FrameMaker BLOBs and only working on a single version of the content at a time. We also relied on the build team to manage the view specs, the branching, the tagging, etc. We used Subversion in the same way we used ClearCase, but slowly started working with XML content objects that were shared across product lines. Things got trickier once we started sharing content and we lost the team that managed the views and the branching. By the time we got to Git, we were completely on our own managing a spaghetti of branches and tags.
    The drawback to all of them is that they are tailored to being used by developers. The interfaces are hard to use and non-intuitive to writers. They also lack a lot features that writers need like searching content and flow control. For example, to get the proper view of a content library in SVN we had to write and maintain checkout scripts to get the proper versions of all the proper repositories. If you worked on multiple versions of a product, you needed to have multiple working copies. It works well when it is all set up properly, but it is a major pain point. It is also fragile. It is easy to make a mistake that breaks a lot of things and is hard to undo.
    There has to be a better way. I had always heard that there were products that are tailored for managing documentation content, but never had budget.
    Recently, I have started investigating CMS systems. The first trouble was figuring out what type of content management system. There are Web content management systems that are used to build Web sites. There are enterprise content management systems like Share Point that are designed to store all sorts of content objects.
    I am interested in component content management systems. This type of system stores content modules that can be assembled into final documents. They typically use XML as the content model and provide features for dynamic ID generation and linking. They also typically offer hooks into the major XML editors. They also offer searching and process control features.
    I don't expect that any system will completely hide the ugliness of managing multiple versions of a diverse set of content. It is an inherently tricky problem. There will always be the problem of maintaining a coherent methodology for identifying variants and making sure changes get applied to the proper variants, etc. My hope is that a CMS system places a more friendly veneer over the process so that it is easier for a writer to work with.
    I'd rather have writers focused on content than on managing the versions of the content.