Thoughts on Content Management and Open Source.

Wednesday, December 29, 2004

Workflow in CMS and Beyond

Depending on who you talk to, Workflow is a core feature of a Content Management System. Many people looking for a CMS have Workflow in their selection criteria and many products calling themselves a CMS have Workflow on their feature lists. However, after experiencing workflow as a both a requirement and as a solution, I have learned that people's definition varies widely. For some products, having the most basic of approval mechanisms (two states: unpublished and published and a designated group of users that can change that state) is enough to claim workflow. At the other end of the spectrum, other products fully implement workflow standards.

The Workflow Management Coalition (WfMC) publishes a document called their Reference Model that describes workflow as "The computerised facilitation or automation of a business process, in whole or in part." This is pretty general (although I am sure there are some who would say that workflow is just about business processes and not about computers.). Workflow is commonly discussed as a set of states and transitions. A state, or "wait state," is a step in the process that requires action from external actor. A transition is an action that changes from one state to another. A workflow definition, describes a set of states and transitions and their relationships (the transitions are available from each state, the participants can initiate these transitions, and the required conditions for those transitions to happen) and also inputs and data that get captured along the way. A workflow instance is the execution of the process.

In his Server Side article The State of Workflow, Tom Baeyens describes a Workflow Management System as something "that takes as input a formal description of business processes and maintains the state of processes executions, thereby delegating activities amongst people and applications." Many CMS do not fully implement the concept of a workflow definition. What they call workflow is more of a task management system where any user can assign a task to anyone and no formal process is followed. CMS Matrix is somewhat reliable in evaluating this feature.

If you want to learn more about workflow theory, there are some excellent resources available. One of my favorites is Workflow Patterns which describes different patterns in workflow definitions along with very cool Flash movies that illustrate the behavior of the pattern. I think that patterns are useful here because they provide a language of commonly found workflow parts. So when I say something like "Synchronizing Merge", you know exactly what I am talking about. Some people will call this jargon but I think it is a good thing because it gives a shorthand way to express familiar concepts. The Workflow Patterns site also has an excellent reading list

There are many products (Open and Closed Source) which specialize solely in Workflow. These workflow engines integrate with other applications to provide workflow functionality. Most of them have a syntax (usually XML based) for defining workflows and an API that is able to receive events and trigger external processes. If you are looking at implementing a CMS only for the workflow, and your workflow is sufficiently complex and subject to change, you should consider this type of solution. Look here for a list of products. Another useful feature on the Workflow Patterns site is a page that indicates what patterns the major commercial Workflow engines support. When I get around to it, I would love to evaluate some Open Source workflow engines along these lines. Speaking of Open Source Workflow engines, here is a list of engines implemented in Java.

The industry is rich with standards especially in the area of Web Services which has the potential to coordinate workflows across application boundaries. One of the most interesting standards is XPDL which is published by WfMC. This is an XML schema that is used by several workflow engines to define workflows. The presence of a standard here means that you could theoretically port your workflow from one engine to another simply be moving a file. I emphasize theoretically here. Several workflow engines (Open Source and proprietary) use XPDL but, to my knowledge, no CMS does. It may be that XPDL is overkill for your typical CMS workflow definition.

Now that I have gotten you all excited about workflow, I would like to leave you with some very practical and sobering advice: less is more. Usually during the analysis stage of a project, workflows become very complicated. There are many factors that drive complexity: the fear of excluding stakeholders from the project, the fear of losing control, overblown promises when the project is sold, and just simply bad business process. More often then not, after painstakingly implementing complex workflows, I am asked to rip them out in favor of very simple solutions before the system is deployed because it is impossible to get anything published. Think about simplifying/streamlining/optimizing business processes before they are encoded into a workflow definition. Don't design for the most complex, infrequently used cases. Instead suggest managing these exceptional cases off line.

Tuesday, December 28, 2004

Wachovia "Content Access Services" Case Study

This case study by Gilbane's Content Technology Works (CTW), other than being an advertisement for IBM, shows a successful execution of a content integration strategy. The client is Wachovia, a very large financial services company (4th in the US), that grew out of acquisitions and, consequently, had several document repositories based on different technologies: IBM ImagePlus, FileNet, and Mobius. Libraries of digitized forms and documents are a huge deal in financial services companies because they need to keep every interaction with their customers at hand for customer service and regulatory reasons. As an aside, I once had a client that knew they had to do something major with imaging when the weight of their paper archives started to bow the floors and break the windows of their building.

In order to integrate the business operations of the merged companies, Wachovia employees needed to work with information about their customers from multiple sources. Projects requests started surfacing about building desktop applications that integrated customer information from multiple departments. Rather than trying to migrate all the content over into one system, Wachovia started building a common content access layer with adaptors for different document management systems. The core technology they used was IBM's DB2 Information Integrator Content Edition which already came with a FileNet adaptor. Wachovia also built new adaptors for ImagePlus and Mobius and then added their own XML-based API to insulate Wachovia's client applications from the Information Integrator API.

From reading the case study a couple of really good decisions stand out.

  • Rather than adopt a strategy of migrating all content into a single system such as FileNet, Wachovia chose an integration approach where heterogeneous systems could coexist. This makes sense for several reasons:
    • It is cheaper. Migrating terrabytes of content from one format to another has a huge up front cost.
    • It leverages existing investments. Not just licensing but also customization, integration and training.
    • It is lower risk. Wachovia does not have to bet the business on one technology. There is so much volatility in the marketplace that no one knows what is going to be a market leader and what will fall behind.
    • It allows best of breed selection. Different groups may have different needs that are best satisfied with different technologies.

  • Rather than have their client applications access the Information Aggregator API directly, Wachovia added an abstraction layer. This will enable Wachovia to more easily swap out Information Aggregator if a better technology comes along.

In all, I think this is an excellent approach to the vision of "Enterprise Content Management." It is practical, cost effective and reduces dependency on a single provider.

Two New Plone Books

Over the past couple of months, two new books have come out on Plone:

  • The Definitive Guide to Plone by Andy McKay (a lead developer on the Plone team) available online for free and onpaper from APress.
  • Plone Content Management Essentials by Julie Meloni available from SAMS.

The review on Slashdot is very thorough and interesting.

Plone is becoming a real market leader in content management software. Consider that:

  • "The UK's Ministry of Defence's Defence Academy says it chose Plone and Zope, an open source content management system and supporting application server, for the software's functionality, not its negligible cost." (Infoconomy Article)
  • Two major newspaper sites (Boston Globe and the San Diego Union-Tribute) use Plone

The chief objection that I hear about Plone is Python and lack of in house skills. I have a feeling that with books like this, this trend may change. I wonder if Python and Zope could be the de-facto technologies for content management. Given that Zope has been designed from the ground up as a content management focused application server and Python is an extremely effective and high performing programming language, I would not count the possibility out.

Labels:

Tuesday, December 21, 2004

CMS Exit Strategies

A while back I posted that I was hearing rumblings about Microsoft losing interest in their CMS product (MCMS, formerly NCompass). This "Ask Tony" article from CMS Watch is another datapoint that supports this theory and also gives some strategies for migrating off the platform.

Just two years ago, there seemed to be so much momentum around MCMS and, with a company as formidable and resolute as Microsoft behind it, MCMS seemed like a really "safe" buy. If it does turn out that Microsoft is shifting their strategy away from MCMS, it looks like the conservative buyer is running out of options. It is not enough to look at market share and company financials anymore. With all the turmoil in the marketplace, buyers need to think about substitutability in their content management technology strategy. And it's not just bankruptcy or sunsetting that you have to worry about. Other concerns are changes in product direction, the escalating support and maintenance costs, and just general product neglect.

Below is a list of questions that should be asked during a CMS purchase:

  • Do the licensing terms offer protection against the vendor discontinuing the product? For example, some smaller vendors put their code in escrow to protect their customers in the event that they go out of business.
  • Does the technology store the data in proprietary formats? If so, are there ways to get your content out?
  • If you decide to move off the platform, how much of the custom integration work will be lost?
  • Is the architecture monolithic or divisible? A component based architecture with open API's increases the likelihood that you will be able to gradually move off the platform with less pain by substituting components with better options. Products that run on top of standard open platforms (for example: J2EE or LAMP) have an advantage.
  • What kind of programming skills does it take to maintain their software? Beware of proprietary or rare scripting languages. Those skills are harder to find and also history suggests that products move toward standardization. Just look at how Vignette dragged their customers from Tcl to ASP then JSP. When given a choice of integration technologies, use the more standards based one.
  • Talk to long standing customers. Ask them about migrations to newer versions. Were they required by the vendor? Did they solve problems or add functionality? Was the functionality useful? Was the migration expensive and/or disruptive?

These questions often get overlooked during the excitement of a deal and the focus on feature functionality. It is hard enough to choose the best solution for your organization right now. Thinking about the future and events that you cannot predict and have no control over makes it even harder. But, whether your content is your product or it supports your product, it is too important not to have an exit strategy if things get bad.



Wednesday, December 15, 2004

Open Source FAQ for my Mother and other Non-Digerati

Since we announced Optaros a lot of friends and family have been asking for an explanation of Open Source. In the spirit of re-use, I thought I would publish some of my answers to typical questions. If you are reading this Blog, you may know all this stuff, but you also might be asked similar questions so feel free to send your friends this way.

  • What is Open Source?

    Open Source is a licensing model for software where the software and the source code are distributed without requiring licensing fees. There are a number of Open Source software licenses that have different stipulations. The Open Source Definition, written by Bruce Perens, contains 10 tests to determine whether a software license can qualify as Open Source. Open Source used to be called Free Software (and it still is by some people). The name Open Source was adoped because "Free" had a connotation of being worthless, rather than its intended meaning of Freedom as in Liberty. The concept of Free Software has been around for a very long time.


  • You mean like Linux?

    Linux has been getting a lot of media attention but there are thousands of software applications distributed as Open Source. Today, Open Source Software is mostly used in infrastructure (operating systems such as Linux, web servers that serve web pages, databases) and sub-components of software, but a growing number of Open Source Software is software that non-programmers can use both at work (such as accounting systems, spreadsheets, word processing applications and publishing systems) and home (such as web browsers, Instant Message clients, and games).


  • Who writes Open Source Software?

    While the stereotype of an introverted high-shool student programming in his pajamas is a colorful image, it is inaccurate. Most Open Source programmers are paid to code by either software companies such as IBM and Oracle that have Open Source contributors on staff, or consultants and internal technical staff who use Open Source Software and contribute back improvements they make when working for their employer.

  • Why do people give away things for free?

    A lot of Open Source software started out as commercial software and was opened as a competitive business strategy to increase support and services revenues by increased market share. Even proprietary software is given away for free for these same reasons but those deals are kept secret. Proprietary software companies will underwrite Open Source projects to lower the overall price by making infrastructure that their software runs on free (for example, SAP open sourced a database to make SAP cheaper for their clients to run. This also took revenue away from their competitor: Oracle who sells databases). As for the individual contributors, different people do it for different reasons. Some contribute because any contribution back to the community will be further improved upon and those improvements will be available to the original author. Others contribute because Open Source is the best way to expose their talent.

  • Is Open Source Software any good?

    Open Source is just a licensing model. The quality of the software is up to the developers. Consequently, there is some very good Open Source Software and some very bad Open Source Software. Nevertheless, the Open Source model enables processes and attracts individuals that are capable of rapidly creating very high quality software. The key here is that everything is open. People that are qualified and inspired can contribute, and the results of their efforts are visible for everyone to see. On the second point, people generally step up when their code is open for criticism (and oh yeah, the Open Source community can be very critical) in the same way that restaurants keep their kitchens cleaner if their patrons can see in. Closed software, on the other hand, is a little like sausages - you may not want to know what's inside.

  • What stops some unqualified or nefarious person from writing code that will hurt my computer?

    All projects have their own governance systems. Within each project there is a trusted circle of contributors, called committers, that take responsibility for evaluating every contribution for accuracy and appropriateness before entering it in the code base. Many of the better run projects have special test programs (called unit tests) that automatically check inputs and outputs for accuracy. I know many proprietary software companies that do not apply this level of rigor in their QA.

  • Why are companies using Open Source Software?

    The cost savings is just one reason why companies are interested in using Open Source. The companies that are really benefitting from Open Source Software use it because of the flexibility that it provides. Companies that use Open Source Software do not have to worry about a vendors ceasing to support a product. They can also be more self sufficient in diagnosing and solving problems with the software. There is nothing worse than knowing of a bug and having a software vendor deny it.

  • Who can I call when it breaks?

    It depends on the software. Some software is supported just like proprietary software. In these cases there are companies that sell maintenance or support contracts that give you access to a help desk. In other cases, there are mailing lists and forums where you can post questions and people are generally very helpful in giving answers. These lists are also searchable so usually you can find your question already answered.

  • How do I try Open Source?

    A really good place to start is to dump Microsoft Internet Explorer and start using the Open Source browser Firefox I think you will find it is a better product and you will not have to worry about all that annoying Ad-Ware.


Labels:

Tuesday, December 14, 2004

Drupal v. Plone

There is a good discussion on the Drupal site comparing Plone and Drupal. Many of the contributors have experience with both systems and the general consensus is that Drupal is good to quickly stand up small sites and Plone is better for larger more complicated content management initiatives.

Here is a summary of the comments:

  • Plone is an enterprise-grade CMS that can support complicated workflow and permissioning and high traffic volumes. Drupal is targeted for smaller sites.

  • Drupal's LAMP based architecture is more open than Plone's Zope (all in one application server and database).

  • Python is higher performing and more object oriented, but more difficult to learn than PHP.

  • Plone has fewer but more polished releases than Drupal

  • Drupal's UI is very intuitive and requires little training


Labels:

Announcing Optaros

Today we are officially announcing our company: Optaros. We are a consulting and systems integration firm that helps large enterprises solve IT business problems by providing services and solutions that maximize the benefits of Open Source Software.


If your company is considering Open Source or is looking for an alternative to custom or packaged software for solving business problems, let us know.




Friday, December 10, 2004

Content as Cathedral and Bazaar

I have been thinking about the idea that ECM (one central system for all content) is a myth and I feel like there is a lot of credence to it. As an implementer of CMS, I know how hard content management projects can be if several departments are involved. It is very difficult to get all the stakeholders (creators and consumers of content) at the table and even those who are represented are not happy with the compromises that they have to make. As an architect, designing a universal content model that is both all encompassing and easy to use is very difficult (it is easier in the case of unstructured content such as a document management system where everything is either a document or a folder).

If the project is successful, once the system is in place and people start really using it, they start to learn its potential and come up with innovative new applications and extensions. The list of enhancement requests grows and grows but, the more constituencies that use the system, the harder it is to scope future releases. Users start to feel like they are not being heard and lose a sense of ownership over the system. Interest in the system declines as users direct their creativity to circumvent the system with work-arounds (EMAIL!) that undermine the spirit of the initiative of building a high quality centralized content repository.

In lots of way's I am reminded of Eric Reymond's famous metaphor The Cathedral and the Bazaar. For those who are not familiar with the metaphor, the cathedral is a centrally managed and architected structure and the bazaar is a decentralized organism that is defined by individual motivations adding up to market forces. The reason why I like this metaphor for content is that, in many cases, content is a creative process where people have an idea or knowledge and want to communicate it. A marketplace is created when creators and consumers of content connect and collaborate. As Reymond put it: "a great babbling bazaar of differing agendas and approaches". A cathedral is controlled by the vision of a central authority. It is formal. Nothing gets in without the architects approval.

Like a society, an organization needs both: a bazaar for employees to navigate, collaborate, and innovate; and a cathedral to show the world its achievement and its aspirations. The bazaar is the internal team workspaces, departmental wikis, and blogs. The cathedral is the publishing process and product. To drive this metaphor into the ground, I picture the cathedral's architect working his way through the bazaar to identify the images and artifacts that he will incorporate into the cathedral.

When an ECM vendor tries to shoe-horn a company's bazaar into their cathedral, they threaten to either make the cathedral too chaotic or to stifle the creative energy that drives the company. In a real world example, if I wrote up some instructions for solving a common problem, and I decided that my peers could benefit, in that moment, I want to publish something. If I have to jump through hoops to get it done, I may just forget about it and let it die on my hard drive, or at most, email it around to the people that I know could use the information. However, if I could publish it to a more permanent place, the information has a better chance of surviving.

So what can be done to make the bazaar less chaotic and intimidating? I think that the answer lies in enterprise search and other technologies that can deal with navigating a repository that was not primarily meant to be navigated. Search is what keeps the Web useful. The litmus test for a good repository is that it improves, rather than degrades, over time and I think the Web, and search, passes that test. I saw a presentation on some very cool search technologies that make me feel confident that things are going to get even better. Another strategy to sort out all of this confusion is the directory model like DMOZ. A while ago, I did some work for the Department of Defense, and learned about DefenseLINK where military sites can register to be indexed and included in a directory. In order to be included in DefenseLINK, you have to meet certain standards (like accessibility) and have someone responsible for the content. I think they also require some metadata although I am not sure how it is used.

These strategies shift the burden of organizing the universe of content away from the content creators at creation time to a centralized organization or technology trying to make sense of everything after the fact. I don't underestimate the difficulty of this responsibility but I think it is more realistic than to expect all content communities (small groups of creators and consumers) to adhere to the same set of rules and expend additional effort when it is not related to the immediate task at hand.


If you have some ideas in this area, please shoot me an email.

Labels:

Monday, December 06, 2004

Content Management Overview

On the last day of the Gilbane conference, Erik Hartman demonstrated his new site Content Management Overview. The initiative is based on CMSML and has information on 92 content management systems. The site will allow vendors to submit information about their product (moderated by Erik's team) and is designed to be used to narrow down the field of products considered in a CMS selection. I think the tool has the potential to be very useful, especially if the vendors and OS projects participate and there is reason to believe they will. Erik said that he frequently is asked by vendors why their product is not included. He showed an email from a Vignette executive as example.

During his presentation, preceding Erik's, Bob Doyle handed over the management of the CMSML repository over to Erik. CMS Review will continue to support the CMS comparison tool using data from Hartman's site.

I would encourage people to use the Content Management Overview site. If you disagree with any product claims, you can add comments directly on the site.





Open Source version of Microsoft Content Management Server

I just heard that Artemis Software is distributing an out-of-the-box integration of Microsoft Content Management Server called MCMS.RAPID that is to be installed on top of MCMS 2002. The interesting thing is that, MCMS.RAPID is an open source project. Read the license here. I was not able to find an announcement on Microsoft's website.

The email announcement that I received made it seem like this was an Open Source CMS. But in reality it is just a set of customizations that provide a higher starting point for an MCMS development project. You still need to purchase the full MCMS product from Microsoft. I have heard criticism of MCMS for being very expensive to license. This set of extensions might reduce the total cost of ownership and make MCMS a more attractive option. Still, as I noted in another post, Microsoft does not seem to be paying much attention to this product so I would carefully consider purchasing it.





Wednesday, December 01, 2004

Keynote Themes from Gilbane

The first day of the Gilbane Conference had a very interesting Keynote panel of industry analysts who focus on Content Management technologies. The panelists were:


  • Steven Ashley, Senior VP, Research from Baird & Co.

  • Joshua Duhl, Research Director of Content Management from IDC

  • Hadley Reynolds, Vice President & Research Director from Delphi

  • Kyle McNabb, Senior analyst from Forrester Research

  • Alan Pelz-Sharpe, Vice President, North America from Ovum

The panelists were asked to speak for 5 minutes on something that they think the audience should know about trends and direction of ECM. Here were the highlights....

Joshua Duhl kicked it off with the bold statement that ECM is a myth. There has been a big consolidation of companies (and their capabilities) and there is more the come. Vendors will say that they have it all working together but they don't. This idea was echoed by Alan Pelz-Sharpe who said that companies have so many forms of content and processes that it is unrealistic and impractical to manage it all within one system. On the other side, Kyle McNabb said that it is very real and focused on the idea of "one throat to choke." In the Q&A there was a similar topic about best of breed vs. ECM. Kyle McNabb's recommended big ECM vendors who have complete solutions with best of breed components. It think the general consensus was that ECM is a vision (not a product) worth pursuing for some companies if the cost/benefit is right.

Steven Ashley, who focuses on financial analysis of CM companies, made the point that all the money has been, and continues to be, in Document Management. He pointed to three huge deals that FileNet made ($10MM, $9MM, and $8MM). He said that Documentum is doing equally well. Oracle and Micorsoft are preparing to compete heavily in this area. The interesting thing is that an informal audience poll said that most of the audience was focused on Web Content Management.


Most companies under budget their CM initiatives. They should expect to spend 2 to 3 times the license cost on services. The established ECM players will have an advantage in integration costs over the platform players (Oracle and Microsoft)

Other interesting points


  • Pretty much everyone agreed on Alan Pelz-Sharpe's point that no one wants to do compliance. Customers just see it as a cost with no upside.

  • The analysts on the panel had no interest in Open Source CMS. The topic did not come up until one of the audience (not me) asked the question. Alan Pelz-Sharpe said that he did a little research into the market and found that some of the Web Content Management products were pretty good and may present a good deal for small to mid size companies. Generally the analysts didn't follow them and avoided recommending them.

  • I am not sure if I heard him right but I think that Joshua Duhl made the point that Microsoft did not offer a content management system. I was surprised because it was such a huge topic last year (remember, Microsoft bought NCompass and started selling it as Microsoft Content Management Server). I talked to a couple friends in the vendor community and learned that MS has basically done nothing with the product since purchasing it and all the buzz is around SharePoint and their document management solutions. Is Microsoft stepping away from their Content Management Server product?


Labels: