Thoughts on Content Management and Open Source.

Wednesday, December 28, 2005

ZORacle Part III: Connecting Zope to an Oracle database

Database Adaptors and SQL Methods

If you have ever evaluated Zope you have probably heard that, in addition to using the included object database (the ZODB), Zope can access most SQL compliant relational databases. The typical framework to do this is Data Adaptors and ZSQLMethods. Data Adaptors, commonly called DAs, are plug in products that give you a connection to a particular database. For example, there is a popular DA for MySQL called ZMySQLDA. Once you have a DA, you can use the Zope Management Interface (ZMI) to create a ZSQL Connection to connect to the database and ZSQL Methods to execute queries against that database. The whole process is pretty well documented here. What you get when you go this route is a published Zope object that gives you a result set of data based on a query, or a handle on a query that can update the database (using SQL update, delete, or insert commands). ZSQL Methods are the most useful if you want to create a page in Zope (written in either ZPT or DTML) and you want to display information stored in a relational database. Unfortunately, there are not a lot of DA options for Oracle. The main open source one is DCOracle2 which is no longer active although there are still many people using it. My experience with compiling DCOracle2 for Oracle 10g looked like this. Even after that, it was still flaky. The best instructions I found are here but before you start with DCOracle2, look at the SQL Relay's DA. If you have money to spend, you can also look at eGenix mxODBC which talks to Oracle over ODBC and is not too bad at $120 per server.... Unless, however you are running on *NIX. In that case you need to buy an Oracle ODBC driver which may cost $1,599.00 per server (Windows Oracle ODBC drivers are free). The mxODBC DA is the most configurable DA that I have seen.

However, if you want to do more heavy lifting with a relational database, this framework is a little weak because you probably don't want to do manage all your SQL logic in the ZMI, especially if you want to access the data from Python based classes sitting on a file system (deployment gets difficult here). So the next level in working with relational data is to use a DA and a regular SQL method in your python code. That might look a little like this (note: all code samples are meant to be illustrative, I have left out important bits that are required for the code to run):


def __init__(self,context, map):
self._context = context
#.....


def selectProperties(self, pid):
setattr(self, '_selectProperties',
SQL('_selectProperties', '', CONNECTION_NAME, 'propertyId',
'SELECT %s FROM %s WHERE SCHEMA_ID = ' %
self._fieldList(map),RDBDAO.TABLE_NAME[self._context.id])) )

method = self._selectProperties.__of__(self._context)
return method(propertyId=pid)


def setLocals(self):
try:
results = self.selectProperties(self._context.propertyId)
columns = results.names()

if len(results) < 1:
raise Exception, "Can't find my database row."

for record in results:
for column in columns:
if column.lower() in self._propertyTypes:
if record[column] is not None:
self._rdata[column.lower()] = self._conversion.toZope(record[column],self._propertyTypes[column.lower()])
else:
zLOG.LOG('RDBDAO', DEBUG, "no value for ", column.lower())

In the selectProperties method we create a new method called _selectProperties which is a SQL Method. Don't get too caught up in the syntax. The only thing I would call out is the use of "<dtml-sqlvar>" which does things like apply proper quoting and escape bad characters when the type is string. the CONNECTION_NAME variable is actually just a string which matches the name of the ZSQL Connection that you set up to point to the database that you want to talk to. If you put the ZQL Connection right at the root folder, your object will have a reference to it through Acquisition. However, this doesn't happen if the object does not exist within the acquisition hierarchy or is so brand new that the acquisition context has not yet been set. So we wrap the _selectProperties method in another method called selectProperties which just calls the private _selectProperties method using the context of the calling class which was passed in the __init__ method - hence the syntax __of__(self._context). Then the setLocals method runs the query and puts the results in a local dictionary. the _conversion object contains methods to do data conversion like handling dates.

Notice how there is no syntax for opening and closing a connection. That is all handled in the background by the ZSQL Connection. This example does not do a database update. If it did, you might see a SQL method that issued a query of "commit." Depending on the implementation of the DA, connection pooling and other configuration tends to be extremely simple. There are very few parameters to adjust and that gives you little control over how your application manages connections. Still, this method of data access goes pretty far - as long as the DA behaves reliably. When this framework starts to fall down is when you start working with really long strings such as CLOBs (Character Large Object). The problem arises because SQL methods only accept simple SQL statements. The update code would look like this:


def updateProperties(self, propertyId, args):

setattr(self, '_updateProperties',
SQL('_updateProperties', '', CONNECTION_NAME, 'propertyId ' + ' '.join([k for k in args] ),
self.sqlUpdateString(args)) )

method = self._updateProperties.__of__(self._context)
return method(schemaId=propertyId, **args)

def sqlUpdateString(self, map):
"""This method creates an update statement based on the property map.
There may be more properties than we will be setting when we execute this statement
but that is taken care of by the optional argument on the dtml-sqlvar tag.
"""
sqlString = "UPDATE " + RDBDAO.TABLE_NAME[self._context.id]
sqlString += " SET " + ", ".join(["%s = %s" % (k.upper(), self._conversion.setValue(k,DBTYPE,self._propertyTypes[k])) for k in map])
sqlString += ", MODIFIED_DATE_TIME=%s" % (DBTYPE=='MySQL' and 'CURRENT_TIMESTAMP()' or 'SYSTIMESTAMP',)
sqlString += ", MODIFY_INSTANCE='%s:%s'" % (os.getenv('HOSTNAME'), INSTANCE_HOME)
sqlString += " WHERE PROPERTY_ID = "
return sqlString

Here the _conversion class puts in the appropriate <dtml-sqlvar> syntax as well as do some additional data conversions. In Oracle, a SQL statement can only be up to a certain number of characters - I can't remember how many, but if you try to issue a query like "UPDATE tableA SET field A = '[some multiple thousand character string such as the body of an article]'", you will get an error. MySQL tends to be a little more forgiving but there are limits. In order to set large string values, you need to use "bound variables" so that the update query gets assembled on the database side. DA's and SQL Methods don't do this very well (actually they don't do it at all).

SQL Relay

To get around this limitation, we used SQL Relay which allowed our Python DAO to talk directly to the database without going through the Zope DA pathway. Although SQL Relay also has a DA, we didn't use it because it does not support bound variables. But the Python client libraries, which we used in our code, do support binding. SQL Relay consists of several components:

  • A set of connection daemons which hold the connection to the database open.
  • A listener which is a deamon that runs and listens on a specified port and forward requests to a connection
  • A client that can talk to the listener
  • A cache manager daemon that maintains the query cache and removes stale result sets


You can put the listener on the server with the database or the server with the application server. Our design had everything but the clients sitting on the relational database server. The system is also extremely configurable through editing various XML files.

So now, with SQL Relay, our update code looks a little like this:


def __init__(self,context,map,connection):
"""The connection object is passed in from DBTransactionManager. The syntax looks like this:
self._connection=PySQLRClient.sqlrconnection(_CON_INFO['host'],_CON_INFO['port'],'',_CON_INFO['user'],_CON_INFO['password'],0,1)

The user name and password used here are not the database username and password. They are username and passwords that are set up in SQL Relay for clients to use. See SQL Relay configuration documentation for more
information.
"""

self._connection=connection
#.....

def sqlUpdateString(self,datamap):
"""This method creates an update statement based on the property map.
There may be more properties than we will be setting when we execute this statement
but that is taken care of by the optional argument on the dtml-sqlvar tag.
self,dbtype,value,type=''
"""
sqlString = "UPDATE " + RDBDAO.TABLE_NAME[self._context.id] + " SET "
params = []
for k in datamap.keys():
params.append(" %s=%s" % (
k.upper(),
self._conversion.toQueryTemplate(
k,
self._propertyTypes[k]
)
)
)
sqlString += ",".join(params)
sqlString += " WHERE PROPERTY_ID = %s" % self._context.propertyId
return sqlString

def persist(self):
query = self.sqlUpdateString(datamap)
cur=PySQLRClient.sqlrcursor(self._connection)
#.....
try:
cur.prepareQuery(query)
for k in datamap:
if k.upper() == 'BODY':
body = self._conversion.setValue(
DBTYPE,self._rdata[k],self._propertyTypes[k]
)
cur.inputBindClob( k.upper(), body, len(body))
else:
cur.inputBind(
k.upper(),
self._conversion.setValue(
DBTYPE,self._rdata[k],self._propertyTypes[k]
)
)

cur.executeQuery()
if cur.affectedRows() > 0:
zLOG.LOG('SQLRDAO', DEBUG,
"Update %d rows " % cur.affectedRows())
else:
zLOG.LOG('SQLRDAO', ERROR,
"Database returned error: %s " % cur.errorMessage())
raise DBError, cur.errorMessage()
#.......

With this setup, we were able to have a robust DAO that talks to an Oracle database and can handle all sorts of data types including CLOBs and BLOBs. Also, we have a configurable database connection framework that can be used to interface wiht several different databases. If I have an opportunity to work on another Zope project that needed relational database connectivity, SQL Relay (either using the DA or the client libraries) will be the first option that I try.

"ZOracle" Part II: The Solution

In my last post, ZOracle Part I, I described the requirements and some background on a recent project to rewire an existing Zope CMF-based CMS to use an Oracle based relational repository.

As I hinted earlier, the key to the solution was an aspect of the existing architecture that placed all of a content asset’s attributes in a structure based on a Zope's PropertySheet. Each object had multiple property sheets that represented groups of attributes in much the same way Alfresco uses Aspects to allow simple objects to be extended with additional data or capabilities. PropertySheets are normally backed by a Python dictionary, and inherit from Persistent so they get stored in the ZODB when associated with other ZODB persisted objects. What we did was create a new kind of property sheet, called RDBProperties, that was backed by a Data Access Object rather than a ZODB persisted dictionary. The DAO encapsulated all the code to read and write from the database. This enabled us to experiment with various database connectivity strategies and do comparative testing (See Part III: Connecting to an Oracle database).

We decided to keep things simple by creating a new database table for every different PropertySheet definition and having a corresponding column for each property. This normalized design was desirable because it made things easier for external applications hoping to make sense of the data. So, if there are 5 asset classes, and 3 possible property sheets to use, there would be three backing tables. We decided to have Oracle manage the primary keys for property sheets with sequences. We stored the unique property sheet ID on the Zope side for retrieval and also captured the Zope-derived object ID in the database so we could reconstruct objects outside of Zope. Other than making sure that we got our data types right and finding a reliable way to talk to the database (see next article), this part of the prototype was pretty straightforward.

The only problem that remained to solve was when write back to the database. Zope is somewhat elusive in this regard. By design, the programmer doesn’t really know when persistent objects are being written to the database. It just kind of happens in the background. We needed to make sure that we kept the relational database up to date, but we also didn’t want to write to the database too often and create a new performance problem. To do this, we extended Zope’s transaction manager, TM (Shared.DC.ZRDB.TM.TM), with a new derived class called RDBTransactionManager. This gave us hooks to execute logic at the beginning and end of the transaction (in Zope, a transaction is defined as what happens from the beginning of the HTTP request to when the response is sent. This is different from a database transaction), and also when a transaction is aborted. A new DBTransactionManager is created the first time a DAO is requested and then re-used in each subsequent DAO instantiation within the Zope transaction. Our DBTransactionManager also had a collection (dictionary) of Property DAO’s that were used in the current Zope transaction, so within a transaction, we could cache values and then wait until the end of the transaction to write back to the database (or not, in the case of an abort). At the end of the transaction, the DBTransactionManager iterated through its list of DAOs and called their persist method, and then at the end of it all, called a global commit method which committed the database transaction.

In this design, Zope (actually ZEO) still manages concurrency because, as far as it knows, it still owns the objects. In our tests, Zope still complained when two transactions were trying to update the same object at the same time. The design also worked with multiple ZEO clients talking to the same ZEO database and Oracle database. Search and general object maintenance was also still managed within Zope. Whenever a content asset is stored in the repository, it is indexed with portal_catalog. Also, all of the other functionality of the CMS operated normally - essentially unaware that anything unconventional was happening underneath. Only some rigorous testing can answer the question of whether this improve the performance of the application. It will certainly reduce the size of the ZODB and also reduce the frequency that the ZODB thinks it needs to write data (Zope considers the data contained in the DAO “volatile" and therefore unworthy of persistence). But we did meet the requiremed of data being accessible to any technology capable of accessing an Oracle database.

Next: Connecting Zope to an Oracle database

Tuesday, December 27, 2005

"ZOracle" Part I: The Problem

Optaros recently finished a project to build a prototype that adapted an elaborate Zope CMF-based custom CMS to persist content to an Oracle database rather than the ZODB. The reason for doing this was that the ZODB was not performing adequately under the heavy load that the CMS was subject to and was not open to non-Zope technologies that our client wanted to share data with at the database layer. The next set of blog posts will talk about the problem, various solutions, and what we did. These posts are slightly more technical than other posts on this blog and I won’t be insulted if some of the more management types just skim through them ;)

The problem

The system that we were working with has a very large repository (45 GB of text - images and other binary files are stored outside of the ZODB) that is continually being written to (tens of thousands of new objects a day). They use FileStorage, rather than DirectoryStorage, because there are so many objects in the ZODB that the operating system would run out of inodes. Because the database is so big and gets bombarded by so many write requests (the ZODB is effectively single threaded and is optimized for reading rather than writing), the system’s performance is just barely acceptable. There is also a risk of data corruption which would lead to extensive down time which would be disastrous for this mission critical application.

In a Zope CMF based application, everything is stored in the ZODB (except, in this case, binary files which are stored directly on the file system). This includes objects themselves, version information, history, the search indexes (called portal_catalog), and, to some extent, code. While the maintaining the search index represents a significant amount of overhead in this application, the primary target for removing from the ZODB was the actual content objects themselves because there was a desire to expose the content within the repository (read only) to non Python applications. Oracle as a repository was particularly desirable because the client owns a site-license for Oracle and wants to leverage Oracle’s capabilities of administration, tuning, and back-up and recovery.

The system already uses ZEO but technologies that would relieve pressure on the storage tier, such as Zope Replication Services, were tried and failed because of the write-intensity of the application. The right solution would improve performance and store content in fielded data (as in relational tables) rather than the ZODB. Also critical, the solution needed to go in smoothly with as little disruption as possible to the sophisticated and complex application sitting on top. None of the existing solutions seemed to have much promise.

  • OracleStorage was ruled out because, in addition to being somewhat stagnant over the passed few years, it fails on the requirement of being open to non-Zope technologies. OracleStorage stores Zope objects in Python Pickles which are serialized Python objects (equivalent to Serializable in Java. Non-Python applications would have a difficult time reading pickles.
  • The newer project APE, an Object-Relational Mapping layer for Zope (like Hibernate in the Java world), looked like a viable option but earlier prototypes using APE suffered from performance. There was also concern about how the underlying caching mechanism would behave under load. The ultimate breaker was that the documentation on configuring APE was pretty thin.
  • Another solution, which may still be used as a fall-back, was to have a nightly script that iterates through the ZODB and writes to an Oracle schema. This would solve the problem of having the content available in a relational database, but it would not solve any performance issues and would not a safeguard against corruption. This option could be selected in conjunction with Oracle storage if OracleStorage was more actively maintained.

The solution that we wound up going with took advantage of a particular design characteristic of the system: that all of a content asset's attributes were actually stored outside of the asset in a class derived from Zope’s PropertySheet.

Next: an overview of the solution we went with.

Labels:

Monday, December 19, 2005

New Optaros White Paper: The Growth of Open Source Software in Organizations

Optaros just released a new white paper that looks at how companies are using open source software. The study is based on a survey of 512 U.S. companies and government organizations. The report observes that companies are starting to use open source for more than infrastructure and browsers. 42% of respondents said they were using open source portals and/or content management systems somewhere within their organization. 16% were using open source in customer relationship management. Read the press release here or the whole report here.

Plone "Distros"

Plone4Artists and New Zealand's "Government Web Guidelines compliant Content Management System" are two examples of a group with specific needs putting together a "stack" of Plone technologies, with some customizations and making it available to a wider community. This is similar to the concept of "Distros" in Linux, which have been very effective in spreading the use of Linux. If this turns into a powerful trend, success will depend on the degree to which the Plone distros keep in sync with the core Plone community. Failure to do so may lead to fragmentation and compatibility issues.

Labels:

iECM: Interoperable Enterprise Content Management

iECM is a new standard being developed through AIIM - the organization that developed the first definition of ECM. The goal is to define a standard or set of standards to allow different Enterprise Content Management software to be able to work together in a heterogeneous environment. As you would expect, if you have been reading this blog, I am very hopeful for this standard. Such a standard would move ECM from a "one CMS to rule all" vision to a more practical distributed environment where different system are connected.

Based on this, you can understand the concern that my experience with the registration process raised. After looking at the iECM Blog and seeing nothing, I figured I should register to the mailing list to see what is going on. And that is where the trouble began.

It turns out that, in order to register, you need to fill out a PDF form. I run Fedora Core 4 and the default PDF viewer is Evince. In Evince, it looked like you were supposed to print out the form, fill it out in ink, then send it in. Not seeing a mail address, or a place to sign (why else would they require a mail form?), I knew something was up. I figured it was an active PDF form after working with them on a project for a big insurance company that needed this technology for some sort of compliance.

So I downloaded and installed the Adobe Acrobat Viewer. Using the Adobe Viewer, I saw a submit button! But my hassle was not over. After I filled out the form and hit submit, I got a pop-up asking me to select my mail client. Thunderbird was not on the list. I had to save a file locally and then attach it to an email to the address on the pop-up (copying and pasting the address and subject was not enabled).

This is not my idea of interoperability. Interoperability would be to use a standard HTML web form, not a proprietary file format and a proprietary viewer. If it had to be an Adobe technology, they could have used Macromedia's ColdFusion or JRun and Java. This also hints that this group is really oriented to document management and not web content management. I am still hopeful that this standard can bear some fruit and become meaningful in the content management industry, but this experience was discouraging

Labels: ,

Thursday, December 15, 2005

Alfresco and Plone

[Note (2/27/2008): Alfresco has improved its web content management capabilities quite a bit since I wrote this post. For a more up to date assessment, check out my more recent review of Alfresco that covers version 2.2]

Alfresco Software frequently describes Alfresco as the first open source Enterprise Content Management (ECM) System. To the extent that they are the first well funded open source project to aggressively invade the territory of industry incumbents like Documentum and FileNet, that is true. However, they are not the first open source project with document management capabilities. While there are a couple of other open source CMS that are designed to do document management (Contineo and Xinco), that part of the open source CMS landscape has been dominated by Plone. What follows is a short analysis of the relative positioning of the two projects.

I recently went to training for Alfresco and really like the software. I was amazed by what the team has done in such a short period of time. The Alfresco team had benefit of being able to rapidly "assemble" their application using best of breed open source components. Equally important, the lessons learned at Documentum (the development team is largely composed from Documentum alum. In fact, A lead Alfresco developer was employee of the year at his last year at Documentum) seem to have been applied to the design of Alfresco.

A particularly compelling aspect of Alfresco is the openness of the architecture. Alfresco supports Microsoft's CIFS (Common Internet File System) protocol that allows you to mount the repository, or a sub-folder of the repository, as a Microsoft Windows Network File Share. Doing so makes it possible for users to unconsciously interact with the CMS by working in their natural ways. Content rules, that are triggered when files are moved in and out of folders, can execute functions like add metadata, start workflow, or send emails behind the scenes. Making the CMS "invisible" like this is a very good way to help ensure adoption. Alfresco also supports WebDAV, is JSR 170 level 1 compliant (level 1 is the watered down, read-only version of the spec which is still useful in integration with other applications), and has a Web Services interface. Support of these standards makes Alfresco very attractive in distributed, heterogeneous architectures which is where I think content management is going. It is a nice departure from the mainstream ECM vision of centralization.

Inside, the application is highly configurable. One interesting feature is the application of the concept of "aspects." If you are a Java programmer, you probably have heard the buzz of Aspect Oriented Programming (AOP). The general idea is that an "Aspect" is a general set of attributes or capabilities that can be assigned to an object without relying on inheritance through the class hierarchy. For some reason, it was easier for me to get my head around applying aspects to content assets than it was for me to figure out AOP. In Alfresco, there are "aspects" like "versionable" or "categorized." This concepts allow content types to be very simple and, if they desire, users can add attributes to a single instance of a content asset. Defining content types and aspects, along with almost all configuration is done by editing XML configuration files. I think it was smart not to build a sophisticated configuration user interface at this stage of the project. The people that you want to make these customizations should be comfortable editing XML files.

Based on their experience of most clients wanting very basic workflow, Alfresco's workflow model is very simplistic. Workflows are designed using folders to represent states and then using rules to add simple approve/reject choices that can trigger other events. This system would be a little awkward for implementing complex workflows that involve splits and merges and syncronization with other approvals. Alfresco plans to add BPEL support will address this in later releases.

As mentioned earlier, versioning has been implemented as an aspect so that any content type may be versioned. One small quirk with versioning is that, when you use the CIFS interface, and you editing the file directly, a new version is created, which creates a full copy of the file on the file system, with every save. This would pose a problem of consuming a lot of disk space if you were editing a large video file and you want to save frequently to prevent data loss.

Alfresco handles all types of file types but support for Microsoft Office and PDF formats is the strongest. Using OpenOffice components, Alfresco is able to extract text for the full text search index (powered by Lucene) and transform into PDF format. The system is architected to be extended with new "transformers" that can handle other conversions. I have already talked to clients that would want to extend Alfresco in this way.

So, is Alfresco the perfect open source ECM? Not quite. At least not yet. First of all, Alfresco is not all open source. Features like group based access control and clustering are actually "Shared Source" and require monthly subsription fees to use. Without these features, it would be difficult to roll Alfresco out to a large group of users. So you could say that the "E" part of the "ECM" is not open source. The second issue is that, at this point, Alfresco does not handle web content, another critical part of the classic ECM definition. This was intentional. The Alfresco team wanted to start with a solid foundation and then grow into other aspects of content management. The demand for affordable document management solutions, and the scarcity of open source projects that do it, make this a wise choice. I actually don't mind the absence of WCM functionality. CMS that try to do too much are often difficult to use and it would be better to get it right than throw it in sloppily. Alfresco says it is going to add WCM later and that team that they have assembled to do it understands WCM. I hope, from their experience at Documentum, they learned the lesson that WCM is not just a matter of managing another type of files.

So how does Alfresco stack up to Plone? There is a large degree of functional overlap between Plone and Alfresco. They both have the functionality necessary for groups of users to manage and share documents: access control, search, metadata, etc. Plone also supports WebDAV and has a mechanism where files automatically updated on the server when edited with a client application such as Microsoft Word. But it does not support CIFS. Alfresco has the advantage of a content rules framework which Plone is missing because of its lack of an event model. Alfresco has a better content versioning system. The many companies who have standardized on Java will feel more comfortable working with a Java solution (although their standard application servers may not run Java 1.5, which is required by Alfresco - WebSphere does not). Also, Alfresco, with its open architecture, has more options for integration than Zope based applications.

There are several areas where Plone has a significant edge. The most notable of which is handling web content. Plone is an effective and elegant hybrid of a document management system and a web content management system. Plone's workflow model is more robust than Alfresco's. The other significant advantage that Plone has is its maturity which has lead to a broad install base, excellent documentation (including several professionally published books), and an extensive library of add-on extensions which provide capabilities ranging from a blog to eCommerce.

Based on all this, I think both applications have their uses. I would use Alfresco for a targeted document management solution that would fit into a larger enterprise content management architecture - perhaps as a node, or collection of nodes, in a deployment like the one described in this presentation which Travis Wissinks gave at the KMWorld & Intranets conference last month. I would use Plone to build an all-in-one intranet or extranet where I wanted to mix article, page, and file content and opportunistically deploy new features to improve collaboration and retention. I would also use Plone as a department-level knowledge management system because of features like threaded discussions around content assets, event calendar, and native RSS support.

Labels: ,

Full text now available

I have changed my settings so that the full text is now available through RSS. Enjoy!

Labels:

Geek Social

Last night I attended the Genius Workshop at Christophers Restaurant in Cambridge. In the words of founder and organizer Shimon Rura, the title is meant to be more aspirational than descriptive. The structure is very informal: just a bunch of smart and creative people talking about what they are working on and new ideas. This is the third one of these events that I have attended and every one has been a fun and interesting experience.

The theme that was discussed on our side of the table was Web 2.0 initiatives. Many people in the group were working on projects (jobs, hobbies or both) that explored new and innovative ways to store and use information. On my end of the table there were:


Many interesting topics were discussed but two that stand out are: that VC's have a hard time getting involved with these Web 2.0 projects which require so little capital (the VCs don't have the bandwidth to manage tens of sub-one million dollar investments); and how can these new services make money when people expect everything for free and the incumbent has very little advantage over the new market entrant willing to give the service away to get market share.

Labels:

Monday, December 12, 2005

When Open Source?

A couple of weeks ago, I presented in a session called Open Source vs. Hosted CMS Strategies at the Gilbane Conference on Content Management Technologies. Jim Howard from CrownPeak, and Gregor Rothfuss from Apache Lenya also presented. Lynda Moulton moderated. My presentation, "When Open source," is posted on the Optaros website

I like the concept of the session: alternatives to the traditional software licensing model. I think that the market has moved beyond the over-simplistic "build vs. buy." The primary point in my talk was that the three acquisition models have different strengths and complement each other. Open Source CMS presents the greatest opportunity for small to medium sized, occasionally updated, websites and as frameworks for building unique, high end content based applications (see slide 4 in my presentation).

Jim and Gregor both agreed with my model so those hoping for a debate may have been disappointed. Instead, we all talked about encroaching on the domain of the commercially licensed software market. I guess if we were joined by someone from a traditional CMS vendor, there may have been some sparks. Maybe next year.

Friday, December 09, 2005

We are hiring!

Optaros is looking for experienced content management professionals to help build the Content Management and Collaboration Practice. For more information, please check out our job announcement.

Wednesday, December 07, 2005

Boston PHP User Group Meeting

Last night Optaros hosted the Boston PHP Users group. Zend provided the pizza. Addison Wesley pitched in door prizes. IBM provided David Boloker, their CTO of Emerging Technology, who gave an excellent overview of IBM's interest and involvement in PHP. He also brought along two colleagues from his engineering team (Adam and Brian - I didn't catch their last names) who showed demos of some PHP related projects.

In his talk, David discussed the concept of "Situational Applications" where a
not-so-technical person could quickly put together an application to respond to an event or capitalize on an opportunity. The technical skill was compared to a good spreadsheet programmer back in the old days (as in Lotus 1-2-3). This analogy resonated with me as someone who used to build database reports in preparation for corporate management meetings. The concept is the same, pull together all sorts of information and crunch it in different ways for decision support. But now, the information can come from so many more sources including live data feeds.

IBM sees PHP as a key to achieve this agility in application development. PHP can serve as a malleable front end to back end systems through Web Services or RSS. The demo used to make this point was of a project called QED Wiki, which is based on WakkaWiki. In addition to being a regular Wiki, where you can add content, QED is an application development environment where you can script applications based on wiki content and external data. The demo used a scenario of a major hardware store chain and how they can create situational applications based on weather conditions (sales depend on the weather conditions. Hurricanes = tarps and plywood. Snow = shovels and salt). With a little bit of script, similar to writing spreadsheet functions, Adam was able to create a nice little application based on data from NOAA and Google Maps showing the weather conditions of different stores within the chain. I think it was a stretch when David said that a typical store manager could put something like this together, but I see the point.

The next demo was of a project to build a PHP plugin for Eclipse. As someone who has to program in a different language every other project, I have grown to depend on an IDE (Integrated Development Environment) for syntax hints. The PHP plugin, while very early (according to Holoker, our group was among the first to see it), looks very useful. It as all that you would expect: code completion, debugging, navigation.... It was unclear how this fits in with Zend Studio but Zend is collaborating on the project and has contributed its debugging technology. If I had to guess, I would say that Zend Studio will eventually be based on Eclipse and Zend will sell value-add features such as remote debugging and code deployment. The relationship between IBM and Zend was described as happy and productive with success stories like Zend Core for IBM and this Eclipse initiative.

If any of you are in the Boston area on January 3rd, 2006, you should come over to 60 Canal Street to see Mitch Pirtle talk about Joomla.

Labels:

Monday, December 05, 2005

Another use for RSS

A little while back, Charlie Wood wrote about using RSS as a mechanism for Lightweight Enterprise Application Integration (EAI). The general idea is that if you need to synchronize data between two applications, you might be able to do it by having each application listen to the other's RSS feed. I like the simplicity of this idea and the fact that it is able to leverage something that many systems are already building in. However, since this is using a "pull" technology it will not support real-time synchronization unless there is some way for the source system to notify the target system to re-check the RSS feed. Otherwise the target system would have to check the source system at fairly frequent intervals. Still, for occasionally updated data (such content in a CMS), it seems very workable.

Something to think about before investing in a heavy messaging architecture.

Labels:

Navigating the Open Source CMS Landscape

In November, I presented at the KM World and Intranets conference in San Jose. It was a fun conference. There was a sizable group of CM Professionals and it was great to hang out and enjoy each other's company:

  • Tony Byrne of CMS Watch presentation, "Making Sense of the CMS Vendor Landscape," based on The CMS Report was excellent. Tony brings so much experience and rationality into the CMS selection process. You should definitely read or hear him before making any decisions.
  • James Robertson of Step Two Designs talked about selecting a CMS with narratives. This is such a healthy departure from the old, disfunctional feature matrix based selection process. I talked about the same topic in my presentation but James said it so much more eloquently. After reading his blog and exchanging email, it was great to finally meet him in person.
  • Lisa Welchman from Welchman Consulting gave some valuable homiletic horror stories in her presentation "Lessons Learned from CM Implementations."
  • Although I didn't get to see her presentation ("Making a Business Case for CMS"), I enjoyed meeting Jane McConnell from NetStrategy on a rare visit from France.
  • Speaking of European invasions, Janus Boye of Boye IT from Denmark made the trip and we finally were able to meet face to face. He and Travis Wissink of Anexinet teamed up to talk about content integration through Portals, Content Bridges and Enterprise Services Buses.
  • Jeff Potts from Navigator gave an excellent case study on an intranet initative at Southwest Airlines. It was a nice mix of technical description and methodology and organizational factors that helped the project succeed.

Events like these make the value of a CM Professionals membership obvious.

Another conference highlight was Tony Byrne's production of CMS Idol where several vendors were given 7 minutes to demo their product then be scathingly critiqued by a panel of judges (James Robertson as Simon, Lisa Welchman as Paula, and Janus Boye as Randy) American Idol style. The winner was determined by the audience. Tony hosted another CMS Idol at the Boston Gilbane Conference. It is an excellent concept. In seven minutes you are able to see what the CMS vendor really thinks is important and see the personality of the company. Some companies try to show breadth and range, others highlight simplicity and fitness to task. I also think these events are useful to the vendors because they can see how different demo styles resonate with a sample audience - just like a focus group.

Thursday, December 01, 2005

eZ components vs. Zend PHP Framework

Maarten Manders writes a nice analysis of two application frameworks being developed for PHP: eZ components and Zend PHP Framework. While there is still not enough visibility for a thorough technical comparison, the article does a good job of summarizing the need for a good framework, guessing at the direction of these frameworks, describing the buzz in the PHP community.

One interesting point is that the eZ components framework has the same web templating framework used in eZ publish which is somewhat like the popular Smarty templating engine. It sure would be great not to have to use a different templating syntax with every CMS.

Labels:

Zope Development Tools

I just got a chance to view the automatically generated Plone API documentation site: http://api.plone.org/. As someone with a Java background, I was really happy to see such a familiar Javadoc style format. Of course, since Python is a dynamically typed language, the documentation cannot tell the data type of a method's arguments (one of the primary thing I use Javadoc for). In order for that to happen, the documentation generation tool would have to run the program and record what variable types are successfully used as method arguments. The main advantage that the API documentation has over looking at the actual code is that it shows a full inheritance tree and lists methods that are inherited from super-classes.

Another tool that might be useful for Java programmers looking for a friendly, familiar face is PyDev, a Python plugin for Eclipse. PyDev does source tree navigation, text completion, method definition lookup, auto-indent, parentheses matching, code search, and automatic, real-time error checking just like what you get when you use Eclipse for Java. I find the error checking especially useful because it shows you immediately if your indents are wrong.

For debugging a Zope application, I was recently introduced to the Python debugger PDB. It is not a visual tool integrated into the your IDE, but if you run Zope in the forground (zopectl fg), and you add the line import pdb;pdb.set_trace() into your code, you enter an interactive session where you can step through the code, write code statements, and check variable values. Here is some good documentation to help you out: http://plone.org/documentation/how-to/using_pdb

Labels: