| Normalization of Web Content|
I define normalization of content as a way to break a story into
its constituent parts for the purposes of better management.
Of course, I am influenced by other normalizations in the software
industry -- like the normalizations of relational databases and normalization
of software requirements, which serve similar purpose.
I first felt the need to normalize content in 1999 while importing an existing website into a Content Management System (CMS).
This phase becomes necessary to interoperate, and even to populate a CMS due
to the different ways the different CMS handle annotation, hyperlinks, and templates
By providing normalization filters the CMS vendors can give freedom to to the authors to use whatever tools they
need to write the story. In our case, we use Microsoft FrontPage to write the content, and the normalization
process gets rid of all the custom tags and junk Microsoft inserts into the document. Any look and feel elements (page margins,
non-standard formatting) are also eliminated this way.
What to do with the links
a problem in content normalization. Like everything else, the Kamat Content CrowBot uses rules
to handle the links. If the link points to an existing, valid, internal resource, it is maintained, otherwise it is
tagged to indicated a link that might become available in the future. The separation
of links from the story has
turned out to be a great feature in the production of Kamat's Potpourri. Our customers (who may or
may not have
licensed a linked story) with a lot of content (hence linkable stories) get a richly annotated story while print
customers will not get any hyperlinks.
|(Comments Disabled for Now. Sorry!)||First Written: Wednesday, September 05, 2001|
Last Modified: 1/29/2003