Friday, June 6, 2008

Taming Off-line Content

In my posts on the long tail of enterprise content it appears I neglected to address the issue of content stored on the desktop, flash drives and other storage devices (off-line content) of employees. Although it is difficult to assess the value of these content, I believe that, for the most part, these content simply extend the long tail to the right of the curve. The challenge with this situation is still the same as for any content in the long tail: how does the organization ensure that content that meets its criteria for risk (or other value attribute) is placed under control of the designated systems. However, content stored on an employee's chosen device, outside of the control of the enterprise content management system, presents an additional challenge for discovery, retention and destruction. Locating this content when the time comes to take action, such as destruction, can be a problem. Think about a document that an employee as copied to their flash drive for "safe-keeping", if the organization is required to retain this document for 3 years, how does the organization ensure that all copies of the document including the one taken off-line is destroyed at the end of the retention period?. The organization can certainly define policies for document handling and also block port access to prevent external storage devices from been attached to company issued computers etc. It may also consider blocking e-mail with attachments and locking down computer hard disk so nothing can be stored on it (good luck with that one!).

Gaining Control: Self-Containing/Describing Content

So far organizations have managed to get a good handle on content sent as e-mail attachments (at least somewhat), however there does not appear to be an easy answer for off-line content without doing something so drastic that any productivity gain an organization had anticipated is largely eroded. The more I think about this issue the more I'm convinced that the silver bullet (if there was ever such a thing) is to have content that can tell you everything about itself. Content that can carry around with it more just its metadata (e.g. author, department, type, age etc) but also information such as: (1) its access control list; (2) the application that was used to create it (not just the mime type or document extension); (3) what business processes it participates in; (4) what organization owns it and (4) what organizations are allows to read it etc. These metadata must be persisted with the content and readable by any operating system and/or application software.

The implementation of this concept will require cooperation by leading vendors in both application and operating system software and standards organizations. Existing technologies such as XML, DRM contain aspects of what is required to implement this concept but portability of DRM solutions today remains an issue for various reasons. WinFS, a technology from Microsoft (MSFT) has some of the critical operating system support needed for this, it has however had its own share of problems. XML is really not self-describing especially because to consume a piece of XML you must first understand its structure and the relationship between the elements to get any meaning out of it.

In any case, while we wait around for the industry to provide the necessary infrastructure support for true self-describing content, organizations can either reward employees for not taking content off-line or punish employees who do so. I'm not advocating either approach to solving the problem, just a suggestion.

No comments: