THE BULLETIN: SEYBOLD NEWS & VIEWS ON ELECTRONIC PUBLISHING

Volume 7, No. 35
May 30, 2002
 
IN THIS ISSUE

INSIDER PERSPECTIVE

by Luke Cavanagh
 
 
Taxonomy Management Becoming a Key Issue in CM

"Water, water everywhere, nor any drop to drink." --Samuel Taylor Coleridge

Barak Pridor, CEO of ClearForest Corporation, estimates that 80 percent of the information in any given enterprise's digital innards is comprised of unstructured data that, by nature, is very difficult to manually categorize. Pridor's company is one of a growing army of vendors developing software that provides automated indexing and categorization to the storehouses of digital content that organizations of all types are creating, ingesting, processing and storing every day. There is content, content everywhere; but how do you find the bit you really need at any given time?

For the owner of a growing collection of content, a taxonomy, or formal classification scheme, can help determine where and under what headings content is filed, just as a library stamps a number on the spine of a book and files it in the stacks accordingly. To end users, taxonomies offer a way to browse items in categories of interest-just as you might browse a section of books in a library. Full-text searches are wonderful at pinpointing specific documents that match very specific queries, but they won't necessarily return related documents that don't contain the specific search term. In this way, a taxonomy not only brings order to digital collections for those maintaining the content, but also helps keep the collection accessible as it grows.

Automation at the heart
The most difficult part of the problem lies in automating the categorization process, so that content is filed in the right place without bogging down business processes and without requiring new staff. A handful of vendors are focused on this problem, typically offering some level of semantically aware language processing and a taxonomy creation and maintenance toolkit. The approaches of the vendors vary too greatly to compare in any depth here (although we'll be exploring this issue in greater depth in future editions of The Seybold Report), but all businesses need to realize that these tool are coming to the forefront in electronic publishing. Among the names vying for traction in the categorization software field are Autonomy, Sageware, Smartlogic, Semio, Inxight, Yellowbrix, Quiver, Applied Semantics, ClearForest, Engenium, Stratify, Triplehop Technologies and Data Harmony (just to name a "select" few).

Partners are key
One commonality among most vendors in this budding space is pricing in the low six figures for a typical installation (most also have between 10 and 50 customers). Another similarity across nearly all of the vendors with which we've spoken is a general understanding of the need to integrate with commercial content-management systems. It seems there's a similar sense of recognition happening in the CM market as well, as agreements with categorization vendors are beginning to happen regularly---Documentum has licensed Semio's taxonomy structures, Fatwire and EidosMedia have agreements in place with Autonomy, and Stellent reports standard integration offerings with a number of vendors out of the box, including SmartLogic and Autonomy. As we've seen with so many technological challenges in the CM space---reliance on standard databases, the move to J2EE application servers, the deployment of Web services---new functionality has a tendency to quickly become standard in the community of CM vendors upon the adopion by a few. It appears that we're at a point where customer needs and the maturity of CM systems and categorization tools are beginning to intersect, which means the time is right for increased partnering and possible acquisition activity between these two markets.

Tips for buyers
Categorization software offerings are far from having uniform functionality, so thorough evaluation on the part of buyers is of utmost importance. Before making a decision on which offering works best, it helps to try and gauge the accuracy of the indexing results the software will provide. Many vendors say that their semantic processing can automatically map a piece of content to the correct category in a taxonomy as much as 80 percent of the time; off the record, some have told us that 50 to 60 percent is far more realistic. Also, consider the system's learning curve. Many of the available offerings are self-learning systems that grow and adapt over time, learning to recognize content types and language patterns. Try to discern how long it will take before the system can shed its training wheels and recognize content on its own. Also, find out what taxonomy schemes a given system supports out of the box and what will need to be customized. Semio and Applied Semantics, for example, offer support for vertical, industry specific taxonomies, but many companies may want to develop and use an in-house taxonomy specific to their own organization. Another important factor is the software's facilities for manually altering, expanding and maintaining taxonomies. Administration interfaces and user-friendliness (read: idiot-proofing) are as important here as they are in CM systems. Finally, the ease of integration, both from a technical and human perspective, is a basic consideration that needs to be made. This space is not quite like CM yet, where J2EE and SOAP compliance are requisites for survival; offerings are varied, young, and often proprietary. An eye toward long-term compatibility can save big bucks down the road.

But make no mistake about it: The categorization software business is developing as we speak, and the software being created may well be the next big must-buy item in your organization.

*Luke Cavanagh is editor of The Bulletin


 
 
 

--------------------------------------------------------
The Bulletin
P.O. Box 644, Media, PA 19063
phone: (610) 565-2480; fax: (610) 565-4659
Editor: Luke Cavanagh
Production: Ed Rozecki, Giap Edwards
Send comments to: lcavanagh@seyboldreport.com
--------------------------------------------------------
For subscription information:
voice: (800) 325-3830 or (610) 565-6864
fax: (610) 565-1858
Email: pubsvcs@seyboldreport.com
via Email: $195/year (52 issues)
--------------------------------------------------------

Copyright © 2002 by Seybold Publications. Reproduction or distribution in whole or in part without written permission is prohibited.