Microsoft XML: Faults and Fixes

By Michael Floyd

Golfers have been saying that there's more than one way to make par since long before Tiger Woods came along. XML is a little bit like golf that way—it gives you several paths toward your goal. You can use XML features, the XSL transformation language, or even the Document Object Model (DOM) to solve the same problem. And because XML integrates easily with other technologies, you can often enlist its services to create other solutions.

As a case in point, in developing the Rocket XML Framework (see February 2000's "Beyond HTML"), I've been struggling with including XML and XSL from external documents. In developing a solution, I've explored many of Microsoft XSL's dark corners. I've also discovered many shortcomings in Microsoft's implementation of the current standards. Ultimately, these flaws make it difficult to do common tasks. This month, I'd like to explore one of those shortcomings and show different methods you can use to get around it.

Style Sheet Includes

When developing style sheets, you'll probably find yourself writing the same code for several versions. For example, Rocket uses different style sheets to display the same data in separate browsers. Each style sheet uses the same basic logic on a class of documents. They traverse the document tree and transform each document instance into HTML. Because various browsers handle HTML differently, Rocket includes several versions of a style sheet, each tuned for a specific version of one of the major browsers (that is, Navigator and Internet Explorer). In addition, a "catchall" generic style sheet outputs a set of minimal HTML for any browser it does not specifically recognize. (Now all of your browser detection code has a purpose!)

When style sheets tune for the nuances between browser types (and sometimes between versions of the same browser), their formatting details typically vary only slightly. Thus, it would be beneficial to write the "static" code once and have a mechanism that lets you suck in different pieces for each browser type. Then each style sheet would simply include the code that handles browser variances. One obvious benefit to this is that you could make general changes to the user interface from a single file, rather than having to modify every style sheet. (In Rocket's case, this could be 30 or 40 style sheets.)

The current XSLT standard (indeed, the earlier XSL draft specification) describes the <xsl:include> element for just such a purpose. That is, you can use the <xsl:include> element in a style sheet to include another style sheet. Thus, you could have something that looks like Listing 1.

This code includes somestylesheet.xsl in the root template at the point at which it's referenced.

Now comes the part where specification and reality part company. Because Rocket currently uses Active Server Pages to generate XML dynamically, it relies on the parser included in Internet Explorer 5. Unfortunately, the XSL parser that comes with Internet Explorer 5 does not support the <xsl:include> element.

Simulating Include with Entities

So what about a workaround? Microsoft suggests that you can simulate an include using a general entity. Recall that general entities in XML let you perform replacements. To reference an entity you prefix it with an ampersand and end it with a semicolon. A simple example of replacement is the predefined entities included in XML that let you handle special characters like angle brackets. For example, when the XML parser sees &lt; it replaces this string with <, which would otherwise be interpreted as markup.

A general entity is more often one that you define. (Note: General refers to the fact that the entity can be used anywhere in an XML document.) For example, I could define an entity called Product. In the DTD, I define a replacement string for Product, which might be "The Rocket XML Framework". In the DTD, I write:

<!ENTITY Product "The Rocket XML Framework">

With this defined in the DTD, I can use it in a document that says, "Today, BeyondHTML released &Product;, a framework that lets you add XML capability to your Web site in five minutes."

When the parser sees the ampersand in the preceding text, it knows that everything up to the semicolon is an entity name. At this point, the parser performs the replacement, inserting the string "The Rocket XML Framework" into the text. Furthermore, the processor parses this string before the replacement is made. This lets you include markup in the replacement string.

When you define this entity directly in the DTD, you are creating an internal general entity. However, to simulate an include, you must be able to reference an external file. As you might guess, this type of reference is an external general entity. How might that look in the DTD?

<!ENTITY myStylesheet SYSTEM "http://www.somedomain.com/somestylesheet.xsl">

XML also supports external general entities. Unfortunately, the processor usually doesn't parse external general entities. (External general entities typically refer to CGI scripts, images, JavaBeans, and other objects that the parser shouldn't touch.) Ultimately, that means that although the processor includes a style sheet, it will never process it. What we need to solve our problem, then, is an external parsed general entity. Unfortunately, these entities are deemed optional by the XSLT specification, and not supported by the Microsoft XSL processor. First tee shot out of bounds.

Combining CSS and XSL

In the case of importing formatting styles, I came up with another solution. However, this solution assumes the browser is HTML 4.0 compliant and supports Cascading Style Sheets (CSS). Of course, Rocket makes use of CSS to format the HTML transformations for Navigator 4 and Internet Explorer 4, so this part is easy.

Here's how it works: You develop an XSL style sheet to traverse your XML document and generate your HTML transformations. Next, you create a CSS style sheet to format that HTML. Then, you link the CSS style sheet to your XSL document by placing a <LINK> at the start of your transformation. <LINK>, which was introduced in HTML 4, lets you link to external objects like CSS style sheets. By including a link to an external CSS style sheet, I can maintain all of the styles used to format my documents. So, to link to somestyle.css, I write my XSL code to look like that in Listing 2.

Now it's time for the games you play when working with both HTML and XML. You see, <LINK> does not contain a closing tag, which means it is not proper XML, and the XSL processor chokes when it sees the tag with no matching end tag. You could indicate an empty element by including a trailing slash, but this has unpredictable results in most browsers. Your other choice is to include a closing </LINK> to make the element well formed. While this is not legal HTML syntax, the browser simply ignores the closing tag. What makes this so distasteful is that as new Internet devices proliferate, they'll likely be less forgiving of illegal syntax than today's bloated browsers.

For those using Microsoft's parsers, there's another solution on the horizon. Microsoft recently released a "Technology Preview" of its XML and XSL processors. The XSL processor seems to support most of the features described in the current XSLT and XPATH specifications. This includes <xsl:include>. Unfortunately, "Technology Preview" does not support <xsl:import>, a more general mechanism that imports the tree structure of other documents rather than simply including the document at the point of replacement.

Simulating Import

I said earlier that there's more than one way to make par, and in XML there are usually several. In this case, I want to define entries for a navigation bar in a single XML document. Then from a style sheet, I want to read this document, combine it with the main document, and transform the results into a neatly formatted Web page. This approach lets me set the navigation entries in a single file and propagate them throughout the entire site with minimum fuss.

This problem differs slightly from the style-sheet include problem. Rather than simply including a document, I need to read in a document tree from one document (the navigation document) and combine it with an existing tree structure (the Web page). Once again, the XSLT standard describes an XSL element for this purpose. Specifically, the standard provides the <xsl:import> element. But as I've already mentioned, Microsoft's XSL processor (even the "Technology Preview" edition) does not support <xsl:import>. This is one feature for which I'd like to lobby.

Another solution (to tide you over in the meantime) makes use of the DOM. Although you may want to groan at the thought of programming, this is easy. The idea is to create two DOM objects, one to represent the XML document, and another to represent the navigation document. Actually, you'll create a third object to represent your style sheet. This method is described in a previous column (see November 1999's "Serving XML with Active Server Pages"). Then you can simply append the navigation document to the root element of the main document, attach a style sheet, and return the results back to the client. The ASP fragment in Example 1 shows you how.

Once the three document objects have been created, this script gets the root elements for the navDocument and the xmlDocument, respectively. Then, the script uses the DOM's appendChild method to append the root element of the navDocument to the root of the xmlDocument. Because we are appending the root element of the navDocument, all of its child elements are also included. And as append suggests, the navDocument elements are appended after the last element in the XML document. The resulting document is shown in Example 2.

From the style sheet, it's a simple matter of traversing to article/SiteNav/item1 to display Topic 1.

Returning to the ASP script, the Response.Write() method includes an xmlDocument.transformNode (xslDocument), which attaches the style sheet to the newly formed XML document. Again, refer to my November 1999 column for details.

Conclusion

We've looked at XSL, CSS, general entities, and DOM as different ways of solving similar problems. While this approach may seem to cover disparate subjects, it reflects the changeable nature of XML development. To solve a particular problem, you must be prepared to attack it from many different angles. More importantly, never forget the basic axiom: There's more than one way to make par.

(Get the source code for this article here.)


Michael is the author of Building Web Sites with XML from Prentice Hall, and architect of the Rocket XML framework. He is also the publisher of BeyondHTML.com. He can be reached at mfloyd@lifestylesSantaCruz.com.




Copyright © 2003 CMP Media LLC