November 2, 2012

Attribution and Provenance

I was recently involved in a discussion about reuse and one of the recurring issues was that writers didn't want other writers modifying their modules without being consulted.
In a distributed, multi-writer team there is always the chance that two writers will make a change to the same module. When reuse is added into the mix, there is the problem that one writers changes are incompatible with at least one use of the module and the only real solution to the problem is to branch the module. It is naive to think that everyone will always do the right thing and so there is a requirement to be able to track changes and have writers names attached to changes. This requirement makes it possible to easily rollback mistakes and to hold writers accountable such that mistakes are less likely to happen. Attaching a writer's name to a change also makes it easier to coordinate future changes because the next writer to come along can see who has been working on a module and coordinate with them to decide if updates require a branch or not.
Attaching a writers name to a module's change log was not an issue for this group, partly because they are working in a system that really doesn't support branching or any robust change tracking mechanism, but mostly because they were more hung up on the fact that another writer can change their modules. It was an issue of ownership that is exacerbated by a system that lists all of the writers that ever contributed to a book as an author. Much of the discussions about how to manage the issue of modifying reused topics focused around how manage the ownership issue and devolved into a discussion about how to keep track of the authors of a module.
This is unproductive. In order for reuse to work, in fact for modular writing to be effective at all, the concept of ownership needs to be extended to all of the modules that make up the content base. No one writer can own a content module. Technical writers in a group project, regardless of if the group is a corporate writing team or an open source project, cannot, if they want to create good content in an efficient manner, retain the ownership of any one piece of the whole. Ownership of pieces can be destructive because it makes writers reluctant to make changes to modules they don't own, creates situations where writers are upset when a change is made to a module they own, and fosters an environment where writers focus on making their modules great instead of making the whole project great. In the end technical writers working on a team are not authors; they are contributors. Authors are entities that publish complete works that are intended for standalone consumption.
I know writers generally don't like to hear that they are not authors. I know that I don't. I like to get credit for my work and see my byline. I worked as a reporter for several years and I write several blogs. In both cases, I am an author and own the content. In both cases, I produce complete works that are intended for standalone publication and consumption. As a reporter, I did work on articles with other reporters and how the byline, and hence ownership of the work, was determined depended largely on how much each reporter contributed. If it was a two person effort and both split the work equally, the byline was shared. In teams bigger than two, typically, at least one of the reporters was relegated to contributor.
However, I also work as a technical writer and contributor to a number of open source projects. In both cases, I write content that is published and in which I take pride. The difference is that they are large group efforts of which my contributions are only a part (sometimes a majority part, sometimes a tiny part). Publicly, I cannot claim authorship for the content produced by the efforts. There is little way to distinguish my contributions from the others and attempting to do so does not benefit the reader. Do I get credit for this work? Within the projects I do because all of the change tracking systems associate my changes with my name. I do not make contributions to a project that requires personal attribution for my contributions, nor do I make contributions that prohibit derivative works. Both feel detrimental to the purpose of the projects. How can one make updates if no derivatives are allowed on a content module? Most of the efforts do use licenses that restrict redistribution and derivative works, but these are for the entire body of work.
There is the issue of provenance in environments that accept outside contributions or produce works that are an amalgam of several projects. This is largely a CYA legal issue, but it is a big issue. Fortunately, it is a problem with several working solutions. The open source communities have all developed ways of managing provenance as have any company that ships functionality implemented by a third party. One of the most effective ways of managing the issue of provenance is to limit the types of licenses under which your project is allowed to accept.
Personally, I would restrict direct contributions to a single license that doesn't require direct attribution of the contributor and allows derivative works. Ideally, contributions should be only be accepted if the contributor agrees to hand over rights to the project which eliminates all of the issues.
For indirect contributions, the issue is a little more thorny. You want to maximize the resources available to your project while minimizing your exposure to legal troubles and unwanted viral license terms. For example, Apache doesn't allow projects to use GPL licensed code because it is too viral. However, they do allow the use of LGPL binaries since they don't infect the rest of the project. This also means knowing what is and isn't allowed by the licenses which which you want to work. For example, if your I project wants to use a work that requires attribution and doesn't allow derivative works, you need to have a policy in place about how you redistribute the work, like only distribute the PDFs generated by the author.
Tracking provenance need not be hard. For direct contributions, you just need to ensure that all contributors accept the terms required for contribution and that is that. For indirect contributions, they should be handled like third party dependencies and have the license terms associated directly with the third-party in a separate database. They only need to be consulted when the project is being prepped for a release to ensure that legal obligations are being met.
The take away:
* the concept of ownership is destructive and counter productive to large group projects
* provenance is an issue, but not a problem if properly scoped

No comments:

Post a Comment