Building an XML Workbench

@Normal=

Like the workbench in my garage, my Web development workbench is often covered with partially assembled widgets, various loose nuts and bolts, and lots of toolsat least while I'm in learning mode. But once I've evaluated the project and decided which materials and tools I'll need, I need a clean space to work with. So, with a swipe of an arm, I clear off the workbench and neatly place the bare necessities on the bench top. Then, and only then, am I ready to begin.

This month, I'll show you how to set up a workbench of tools to process XML on your Web server. It's a low-end solution. That is, I'm not assuming you have high-speed network access, or that you have administrative privileges on your server. All you need is a 56-kbps modem and support for Java servlets on your server. The tools I describe are portable, meaning you can set up a similar workbench under Mac, UNIX, or Windows. When you're finished installing the tools I describe in this column, you'll have everything you need to take advantage of XML on your Web site. And best of all, most of these tools are free.

Server-Side XML

After many months of examining XML tools, I've come to two conclusions. First, client-side processing of XML will always be doomed to inconsistent support. Let's face it, Web developers still jump through hoops to get many of HTML 4's features to behave consistently in the major browsers. In fact, we can use XML and XSL on the server to solve that very problem. Imagine a system that stores its documents in XML format, then uses XSL to dynamically transform these documents into basic HTML. You can even define an XSL style sheet for each brand of browser you intend to support, andwith a little browser detectionserve HTML optimized for that specific browser. In fact, you'll be able to do so with the tools described in this column. Second, I've decided that Java is generally a better choice for processing XML on the server than CGI scripting is. Not only is Java faster, more portable, and better suited to general network programming, there are far more XML tools available in Java than in Perl.

To understand how the tools on our workbench interoperate, it might help to look at the big picture. The Java Development Kit and Java Servlet Development Kit from Sun and the JRun Pro Servlet Engine from Live Software form the foundation. Other key components are the XML Parser for Java, LotusXSL, and XML Enabler, which are freely available from IBM. See " Online" for the URLs to get all of these tools.

I also recommend that you grab a copy of Java Servlet Programming by Jason Hunter and William Crawford (O'Reilly). [Editor's Note: For a brief profile of the book, see the News & Notes section in the February 1999 issue.]

Java Development Kit

The Java Development Kit (JDK) is the basis for all Java development, so you'll need to install the JDK before anything else. If you've already installed the JDK, make sure you have the proper version. The IBM tools are mixed in their support of the JDK: The XML Parser for Java supports JDK 1.2, but the XSL processor works only under JDK 1.1, so that will be our base platform.

Also, keep in mind that you may have already installed the JDK as part of a commercial Java development environment. For example, I'm using Symantec's Visual Cafˇ for developing code, compiling classes, and so on. As part of the installation, Visual Cafˇ automatically installs the JDK 1.1. The bottom line is check to see if you have a JDK installed and if so, ensure that it is version 1.1.x. You can run both versions of the JDK, so you don't have to give up the latest update to build this workbench. If you plan to install both versions, check the JDK documentation for configuration details.

If you've downloaded the JDK from Sun's Web site (see " Online"), the installation process varies depending on the platform you're running. The Windows version comes as an executable archive. Double-clicking on the archive file invokes the JDK installer, which creates the directory structure and unpacks the tools and documentation bundles. On Solaris, you'll have to do the unpacking manually. Once you've unpacked the archive you can delete it to recover the nearly 9MB of disk space that archive file takes up.

Do not indiscriminately unpack every .zip file you see in the directory tree. Java automatically locates class files that are stored in archives, including JAR (Java ARchive) and .zip files. In particular, you'll find a file named classes.zip in the lib directory, which contains the entire core Java classes. Do not unzip this file.

Depending on whether you've obtained the JDK directly from Sun or through third-party software, you may have to set environment variables. If you've followed the installation procedures for the "raw" JDK you don't need to set CLASSPATH. (See the text box titled "All About CLASSPATH" for general information on setting and using this environment variable.)

To Servlet and Protect

Servlets have become increasingly popular for many reasons. First, they can generate documents on the fly, thus replacing CGI scripts. They also afford the benefits of Java, including built-in support for network sockets, database connectivity, and string manipulation. More importantly, servlets are easily portable to any Java-enabled Web server. In fact, I'm able to develop my servlets on my Windows 95 machine and copy them directly over to my Web server, which is running Apache under Linux.

To use Java servlets, you must first ensure that your server supports them. All of the major Web servers support servlets, but that support may not be enabled. Check with your system administrator or Web-hosting service to determine whether your server supports servlets. If it doesn't, then you'll need to get one of the many servlet engines available. JRun Pro, which I'll describe in a moment, should serve those needs quite nicely.

The servlet API is a standard Java extension and comes as part of the JDK 1.2. However, we're using JDK 1.1, so you'll need to download the Java Servlet Development Kit 2.0 from Sun's Web site; see "Online." Once you have the JSDK, unzip the archive to a directory on your hard disk. Assuming you've installed the JSDK in a directory called "jsdk," you'll need to include the path to the jsdk/lib/jsdk.jar file in your CLASSPATH.

The JSDK also provides a simple Java server, ServletRunner, for testing servlets locally before deploying. However, I'm using JRun Pro version 2.2.1 from Live Software for this purpose. JRun is a server extension that implements the Java Servlet API, and it includes a collection of Java classes that acts as an interface layer between your servlets and your Web server. JRun isn't required, but it has proved to be an extremely useful addition for my servlet development. JRun supports many advanced features including servlet chaining and filtering, dynamic reloading of modified servlets, <SERVLET> tag support, user-session tracking, an integrated JRun Web server, and much more. JRun also improves performance through native code that interfaces directly with your Web server. I should also mention that the basic version of JRun (freely available; see " Online") was selected by the editors of Web Techniques and Web Review as the best Java Tool at the 1998 Web Tools awards. I'll talk a lot about JRun Pro (and other servlet engines) in future projects.

To obtain JRun, you'll need to fill out Live Software's online registration form. Live Software's Web site generates an automatic email message telling you where to download the software. The download file is another executable archive that invokes an installation program. The installer program also launches the JRun Connector Wizard, which guides you through the process of installing connectors between your Web server and the JRun Servlet Engine.

Whichever approach you take, you should test your configuration by running some sample servlets. I would do this on both your local development machine and on the server.

The XML Processor

The next part of your development platform is the XML processor. Either Microsoft's XML parser, or IBM's XML Parser for Java (XML4J) can run with our other tools. I wanted to look at XML4J because it was recently updated to support both the XML 1.0 Recommendation that was released in February and the Document Object Model (DOM) Level 1 specification. The XML4J processor also supports XML namespaces and Simple API for XML (SAX) 1.0. It also includes an XPointer package, which parses XPointer expressions, can generate an XPointer based on a node in the document tree, and lets your application search for nodes referenced by XPointers. (If you're interested in XPointers, I'll discuss them in a future column.) The processor supports nearly 40 encodings as specified in the <?xml encoding=...> declaration, including several variants of UTF, ISO, and EBCDIC encodings. XML4J also supports a feature, "validating generation," that lets an application query a DTD and generate a document with the corresponding structure. All very cool stuff, and necessary for serious XML development.

Installation is a matter of downloading the appropriate version of xml4j.zip from IBM's AlphaWorks Web site and unpacking it into a new directory. Our next tool, LotusXSL, will determine the appropriate version. I'm running version 1.1.1.4. To test the installation from Windows, open up a DOS window and issue the command:

type data\personal.xml

If you've installed XML4J properly, the personal.xml file will appear onscreen. Next, run the Java Runtime Environment (JRE) tool (part of the JDK) with the command line in Example 1.

Note that the JAR files must be specified in the command line. That's because JRE ignores the CLASSPATH environment variable. If all goes well, this command line invokes XJParse, which parses personal.xml and checks the syntax. This command line also regenerates personal.xml. If no error messages are shown, the test has passed and you should be able to display the personal.xml file onscreen again using the DOS type command.

Once you have the processor working, you can start experimenting with some of the tools, including a Channel Definition Format (CDF) editor, a CDF viewer, and SiteOutliner, which scans a Web site and reports its profile in CDF format. If you've installed the Java Swing library, you can also run the Tree Viewer, which displays a tree structure of an XML document.

Adding XSL

The XSL processor I've chosen for our workbench is LotusXSL. The processor uses XML4J to parse an XML document and output a source tree. LotusXSL takes this source tree and creates a result tree, which is used to output a document. IBM is careful to note that LotusXSL is an experimental tool. That's because XSL was still a draft specification and the official W3C XSL recommendation was still pending at the time of this writing. Most significantly, LotusXSL does not support flow objects, a big part of XSL. These flow objects conceptually parallel the formatting objects in Cascading Style Sheets. However, there are still many unresolved issues related to implementing flow objects. So, for the time being, you can transform XML documents to be output as HTML, which can include CSS style rules.

To install LotusXSL, download the latest release from the AlphaWorks Web site (see " Online"). I'm using version 1.1.1.6 (second release) for this project. If you're running Windows, you'll get a .zip file that you should unpack into a new directory. The unpacking process extracts the documentation, the source files, and another .zip file containing the binaries. You'll need to unpack this file from the root LotusXSL directory to complete the installation. You can test the installation by opening a DOS window and going to the /testsuite directory, then entering the command line:

test test1

This invokes a batch file that temporarily resets your CLASSPATH and runs the processor on a test case. Assuming you plan to use the processor on your workbench, be sure to add this path string to your permanent CLASSPATH.

During this part of the installation, I ran into a number of problems that may potentially bite you. First, I had a problem running out of environment space. The way test.bat works is that it saves your old CLASSPATH in a separate environment variable. Next, it redefines your CLASSPATH by adding two new JAR files to the path, and appending the old CLASSPATH. All of this can eat up environment space, causing you to receive an "out of environment space" error. I solved this by removing the saved CLASSPATH setting and hard coding the entire CLASSPATH string; see "All About CLASSPATH" for additional details.

Once I'd solved the environment-space problem, I tried running test.bat again. This time I began receiving errors that some of the classes couldn't be found. I carefully checked the CLASSPATH, and verified that all class and JAR files were correctly installed. Eventually, I concluded that the current version of the XSL processor didn't support the most recent XML4J release. The LotusXSL documentation stated that you needed XML4J version 1.1.1.9. However, a newer version, XML4J 1.1.1.4, had just been posted. Despite the apparently lower version number, 1.1.1.4 was a newer release; the unorthodox version numbering scheme promoted more confusion. When I returned to the AlphaWorks Web site to download the older version of XML4J two days later, an update to the XSL processor, which supported XML4J 1.1.1.4, had been posted. Once I had the proper versions installed, things worked seamlessly. I suppose there's something to be said for shrink-wrapped software and bundled solutions.

XML Enabler

The final tool, XML Enabler, is a servlet that takes a HTTP request from a browser, and uses information in the HTTP header to determine which type of browser made the request. The servlet then selects an XSL style sheet from a collection of style sheets, transforms the data into HTML, and sends it back in a response. By customizing different style sheets for various browsers, you can optimize the HTML output for that specific browser. So now, you'll be able to render XML data in virtually any browser. Once you've installed the tools described here, all you have to do is define a mapping between the browser types you want to support and their corresponding style sheets. Of course, you'll have to define the style sheets themselves. I'll tackle that next month. First, let's install the final component in our XML workbench.

Assuming you've downloaded the XML Enabler archive (see " Online") and unpacked the file to a new directory, the next step is to add support for your servlet to your Web server. There are many ways to do this depending on your server, so you'll have to rely on your server's documentation for the precise steps. However, most servers support a servlet.properties file in which you can associate your servlet with its class. If you're using the ServletRunner from the JSDK, or possibly the Java Web Server, you'd add the line shown in Example 2(a) to your servlet.properties file.

You'll also want to place the XML Enabler package in your servlets directory. Then, assuming your servlet engine is running, you should be able to access XML Enabler through your browser. You must pass the name of the XML document to be parsed as a parameter in the URL; see Example 2(b).

Conclusion

After examining different approaches to XML, it's clear to me that XML will be used primarily under the covers of your Web server. Client-side processing of XML makes sense only in an intranet where you're certain that designated browsers support XML. I believe that in the future there will be significant advantages to client-side XML in, say, an all-Microsoft environment. For the heterogeneous world, however, server-side XML just makes more sense.

There's a lot to chew on here. But if you're serious about using XML on your Web site, the effort will be worth it. Next month, I'll show you what new magic you can perform with your new tools. Until then, <WaveGoodBye/>.


Michael publishes BeyondHTML.com, speaks on XML in the Ken North Expert Seminars, and serves as Web Techniques' editor at large. He can be reached at mfloyd@lifestylesSantaCruz.com.




Copyright © 2003 CMP Media LLC