Fun with XSLT - my draft thematic taxonomy

Over the past few days I’ve been doing some work on a new XSLT-based topic navigation for my weblog. I started it over xmas, but had parked it since the new year because of a couple of bugs. My goal was to swap my Radio Userland-hosted OMPL-to-HTML transform (see Weblog Archive – by Topic in my menu) with a custom XML-to-HTML transform hosted on my own server. The reason I want to use XML over OPML is that it’s more flexible – I can potentially do lots of clever things with the XML data in the future, using XPath and the like, whereas OPML would be limiting in that respect. Also I want to host it on my own server to enhance download speed. So I picked up the XML topic nav work again this week and I pretty quickly solved the issues that were bugging me at xmas. It’s funny how parking troublesome code for a couple of weeks can clear the mind and make the fog disappear!

My ideal is to do the XSL transformation on the server-side, rather than the client (browser) side. The reason for this is that due to the proliferation of different browsers on the Web, it’ll be a nightmare to second-guess how all of them will process the XSL transformation. Whereas with a server-side transformation, I know how my server will handle the task. Basically it comes down to this: it’s one less thing for the user’s browser to do when reading my site. Why get the client to process the XSL transform if I can do it on my own turf (my server)?

But having just said all that, I currently don’t have the correct server configuration to do the XSLT processing. I’m used to working with IIS at work, so I was able to come up with a nifty ASP solution to transform my XML to HTML. But my weblog runs on Apache, so ASP can’t be used (there may be a plug-in somewhere to get around this, but I wouldn’t bet on it being an easy implementation). Far better that I do it in a language Apache understands, and the obvious one is PHP. So I’ve investigated using PHP to do the transformation and this will probably be my long-term solution. However it requires me to install two things on my server, which I’ve yet to do – Sablotron and Expat. These things will enable me to do XSLT transformations on my Apache server using PHP. There are other options too: Java/Cocoon and Perl/AxKit, to name a couple I found while searching. However I know very little about those options. If there are any XSLT experts out there who can advise me on the best method to transform XSLT server-side, I’d appreciate it.

My short-term solution is to do the XSL transformations on the client-side, using Javascript. And yes I know I just talked myself out of doing this a couple of paragraphs up. But I really want to see how my XML transforms look now and a Javascript is the quickest way. Besides it doesn’t hurt to experiment with both client-side and server-side, to see for myself the differences.

Here is a test page I’ve done: it’s an HTML page with Javascript (c/o W3Schools) that uses an XSL file to transform a selected section of my XML file into HTML. A caveat: it currently only works with Internet Explorer. I haven’t been able to track down a a cross-platform version that will work in Mozilla and Firebird etc. Note that there is also a way to transform straight XML-to-XSL-to-HTML in modern browsers (eg IE6) without using Javascript. However to do that I’d need multiple xml files (or else do some tricky things to bundle multiple XSL files into 1 XSL file). As the purpose of my topic nav is to have a single XML file to update, and bearing in mind this is a short-term solution, I decided to use Javascript to do the job.

Hey, what are you trying to achieve with all this XSLT processing?

That’s a good question; allow me to explain. I recently converted my taxonomy to a flatter hierarchy, with a maximum of 3 levels. In line with this, I also decided I only want to categorise each weblog post into one category. This may seem to go against the grain of the latest in weblog taxonomy trends (see Jon Udell’s Dynamic Categories post), but there is a method to my madness. I hope.

I was browsing through an introductory book on Wittgenstein, as you do, and I read that his major philosophical work called the Tractatus is ordered using a decimal numbering system. He lays out his arguments like so:

1 -> 2 -> 3 (first level)
1.1 -> 1.2 -> 1.3 (subordinate to first level)
1.11 -> 1.12 -> 1.13 (subordinate to second level)

My understanding, based on my limited reading of Wittgenstein, is that he structured the Tractatus using seven main theses. For each theses, he drilled down and analysed it using the above numbering system. Not to sound pompous or anything, but this is similar to what I do with my weblog. I have a dozen or so topics that I regularly write on and it’s tempting to think of these as theses. They’re probably more like themes than theses, but hey it’s just one letter difference 🙂 My recurring themes are things like Universal Canvas and Microcontent.

To make a long story short, I discovered Wittgenstein’s numbering system wasn’t suitable for my weblog taxonomy. However I did end up with a manifesto of 12 themes that generally revolve around the subject of the Two-Way Web (it’s not restrictive though). I’ve categorised each of my posts into one of those 12 themes. There is one further level below that, so that I can bundle things together if need be – e.g. my collection of posts about Nanowrimo 2003 is categorised as Top > Writing > Nanowrimo 2003.

To bring this post full circle, currently I’m using Radio Userland’s OPML-to-HTML service to produce the above taxonomy. As I mentioned, I’ve got a draft XML-to-HTML version going that uses client-side XSLT. Here is my list of 12 main categories (a plain html file for now) and here is a list of all my weblog posts categorised using this taxonomy (this one uses XSL).

In future I will add extra bits of data to the XML file (e.g. dates, maybe even the content of the posts). This is another advantage of using XML over OPML. I’ll also eventually introduce some dynamic categorisation, a la Udell. All of this XML exploration may be leading me inexorably towards a tool like Syncato, which stores all its content in XML.

In summary I think my thematic taxonomy will help me keep my weblog writing on topic. And from a readers perspective, you will be able to explore any one of ’12 paths to Two-Way Web enlightenment’ 😉