XML
Home Up Search Trademarks how to use

For best results: this site requires that cookies be enabled for proper operation - see Legal Page for more info

 

Select Any of These

XML

LAST UPDATED: 08 March 2009 17:02:07 -0600

horizontal rule

XSQL SERVLET

Oracle has a nifty servlet for converting the results of SQL queries into XML. It's part of the Oracle XDK (XML development kit). It combines SQL, XML, XSLT, and HTTP into a streamlined method for delivering SQL result sets as XML over the Internet.

Check it out:

Oracle Technology Network

horizontal rule

XSL EDITOR

Let's face it--XSL is not the cleanest of scripting languages. Editing XSL documents can get hairy, and quick! Fortunately, one of our readers sent me a link to a neat XSL tool at the VBXML site. It's called XSL Tester and it's pretty nice. Check it out:

XSL Tester

horizontal rule

XMLINT

There are never too many ways to validate an XML document. Microsoft has a nice command-line tool--called XMLINT--for parsing and validating XML documents. It is available at the Microsoft Web site (look under XML Downloads, XML Validation Tool). XMLINT is an XML validation tool that you can execute from a DOS command prompt by passing the name of the XML file as a command-line parameter. You can also specify -w if you want to check for well-formedness and not perform DTD validation. If XMLINT finds any problems, it generates an informative message and tells you the offending line number. Otherwise, it just echoes the name of the file.

Microsoft XML

horizontal rule

SIMPLIFY XML WITH VBSCRIPT CLASS

Level: Advanced
Categories:
* VBScript
* XML
* ASP
Browsers targeted:
* Internet Explorer 5
* Internet Explorer 4

VBScript has long been considered a fairly rudimentary language, lacking the robustness that the full-featured Visual Basic product
demonstrates and lacking any real tools for handling even basic object-oriented capabilities (something that JavaScript does handle). However, with the release of VBScript 5.0 (part of the general Internet Explorer 5 upgrade, although the scripting engine can be loaded separately for IE4), at least some of these basic objections are being answered.

One of the most common problems with ASP scripts (which is where most VBScript code is written) is that the combination ASP and HTML tends to make for difficult-to-manage, highly coupled code. Changing such code is difficult, which is why ASP files tend to be fairly difficult to write and manage. Yet, VBScript 5.0 released the ability to build classes. Specifically, the Class and End Class keywords let you define encapsulated entities that can expose properties and methods (and events in a limited fashion). You can use such classes to simplify your Web page generation radically.

For example, the following VBScript class CWebPage exposes two methods: GetPageData, which retrieves the name of an XML file, an XSL transform file, and a mime type (all optional); and ShowOutput, which performs the XML transform and outputs it to response stream.

<%@LANGUAGE="VBScript" %>
<%
Class CwebPage
Dim xmlSource
Dim xslTransform
Dim mimeType
Dim resultText

Public Sub GetPageData()
dim sourceFileName
dim transformFileName
sourceFileName=request("source")
transformFileName=request("transform")
mimeType=request("mimetype")
if mimeType="" then
mimeType="text/html"
end if
if sourceFileName<>"" then
set xmlSource=createObject("Microsoft.XMLDOM")
xmlSource.load server.mapPath(sourceFileName+".xml")
end if
if transformFileName<>"" then
set xslTransform=createObject("Microsoft.XMLDOM")
xslTransform.load server.mapPath(transformFileName+".xsl")
resultText=xmlSource.transformNode(xslTransform)
else
resultText=xmlSource.xml
mimeType="text/xml"
end if
response.ContentType=mimeType
End Sub

Public Sub ShowOutput()
response.write resultText
End Sub
End Class
%>
<%
Function Main()
set WebPage=new CWebPage
WebPage.GetPageData
WebPage.ShowOutput
End Function

Main
%>

The advantage to using classes in this page can be seen in the Main() function. Classes can be loaded in through the use of directives so that the entire body of the code consists of the includes and the Main function call (which should be in the calling page simply for testing purposes).

You can then make calls to your server to download XML formatted with the given XSL stylesheet, and the output is sent to your browser as formatted HTML. This makes it ideal for using with browsers that don't support XML natively (everything but IE5) or those that support it but can't handle printing XML documents properly (IE5).

horizontal rule

XML: SITES TO SEE

I've had many requests for references to XML Web sites and tutorials. Here are a few that I always start with when I'm looking for XML information on the Web:

W3C: Official XML Specification http://www.w3c.org/xml

The annotated (and much less confusing) specs http://www.xml.com/axml/axml.html

tools, news, articles, etc. http://www.xml.com, http://www.xml-zone.com/

tutorial

http://www.projectcool.com/developer/xmlz/

Finally, I have to highly recommend you do NOT visit these sites--you may no longer have a reason to read my tips!

horizontal rule

RETRIEVING RESPONSES AS XML

Level: Advanced
Categories:
* VBScript
* ASP
* XML
Browsers targeted:
* Internet Explorer 5
* Internet Explorer 4
* Internet Explorer 3
* Netscape Navigator 3
* Netscape Navigator 4

The ASP Request object can be confusing to beginning Web programmers. One common problem is to get all of the terms sent up in a form or query string without knowing precisely what to expect. The problem is that the Request object appears to be a collection of name/value pairs, but in fact it's much more complex. It's actually an interface that exposes four distinct subcollections:

* QueryString--This retrieves all of the name/value pairs that were sent on the command string or were sent as part of a form with its
method sent to GET.
* Form--This retrieves all of the name/value pairs that were sent through a form via a POST method.
* Cookies--This retrieves all of the cookies that have been defined for this page.
* ServerVariables--This retrieves the server variables that were sent as part of the HTTP header or are maintained by the server.

One problem that I frequently encounter is a need to load a Response object into an XML file that I can then pass into an XSL filter.
Knowing that the Response object supports all four collections, I wrote a small routine that queries each collection in turn and turns
the resulting query into an XML document:

<%@LANGUAGE="VBScript"%>
<%
'GetRequestKeys.asp
function getRequestXML()
dim xmlDoc
set xmlDoc=server.createObject("Microsoft.XMLDOM")
xmlDoc.loadXML "<keys/>"
setKeysCollection xmlDoc,"querystring"
setKeysCollection xmlDoc,"form"
setKeysCollection xmlDoc,"cookies"
setKeysCollection xmlDoc,"servervariables"
set getRequestXML=xmlDoc
end function

function setKeysCollection(xmlDoc,collectionName)
set collectionNode=xmlDoc.createElement("collection")
collectionNode.setAttribute "id",collectionName
for each key in eval("request."+collectionName)
set keyNode=xmlDoc.createElement("key")
keyNode.setAttribute "id",key
keyNode.setAttribute "value",request(key)
collectionNode.appendChild keyNode
next
xmlDoc.documentElement.appendChild collectionNode
set setKeysCollection=collectionNode
end function

function main()
set requestDoc=getRequestXML
response.ContentType="text/xml"
response.write requestDoc.xml
end function

main
%>

horizontal rule

XML ZONE

The XML Zone is an excellent Web site for XML resources. It displays news that is updated daily, contains links to some of the best XML stuff on the Internet, and posts articles from XML Magazine. It also has a feature called Ask The XML Pro that lets you post a question and browse answers to previously asked questions. (Topics are even categorized for easy browsing.)

XML Zone http://www.xml-zone.com/

--------------------------------------------------------------------------------

XML VOCABULARY

You may have heard the term XML Vocabulary and wondered what it means. An XML Vocabulary is a PUBLIC XML DTD that is used within an industry. One of the fundamental design goals of XML has always been to facilitate the creation of a common, non-proprietary medium for the exchange of data. In the XML world, this means the creation of industry-standard DTDs, also known as Vocabularies. Some examples are the Chemical Mark-up Language (CML) and Electronic Business XML (ebXML).

--------------------------------------------------------------------------------

XML VIA HTTP

HTTP is a great protocol for exchanging XML files over the Internet. You typically associate HTTP with transferring HTML files from a Web server to a browser. However, the HTTP protocol is not limited to HTML. HTTP works just as well for any text-based data, including XML.

Using HTTP to transfer XML data allows you to make use of existing Web servers and network infrastructure, avoid firewall issues (HTTP typically uses the well-known port 80), and reuse server-side technologies like CGI and Java Servlets with which you may already be familiar. Exchanging your XML data can be as simple as an HTTP PUT or GET!

--------------------------------------------------------------------------------

XML TERMINOLOGY--WELL FORMED VS. VALID

As with any technology, XML has its share of terminology and buzzwords with which to deal. Two common terms you'll often hear in relation to XML documents are well-formed and valid. A well-formed document is one that adheres to all the language structure specified by the XML spec. Among other things, this means that the element names are all valid and have matching start and end tags. In addition to being well formed, a valid XML document must adhere to the semantic constraints defined by a DTD.

More on DTDs later.

--------------------------------------------------------------------------------

XML TAG NAMES

It's always a good idea to develop standards for naming entities in a document structure. XML is no exception.

Large, complex documents full of XML mark-up can get ugly in a hurry. Although it's not specified by the XML spec, a de facto standard for naming XML tags is to lowercase the first letter and uppercase the start of additional words. For example:

<thisIsATag></thisIsATag>

Make your XML files easier to maintain by picking a tag-naming standard and sticking to it!

--------------------------------------------------------------------------------

XML TAG NAME GOTCHAS

The XML spec states that the string 'xml', or any combination of upper- and lowercase letters x,m,l at the beginning of a tag name, is "reserved for standardization." This means that tag names like <xmlFileName> and <XMLtag> should not be used. Although most XML parsers will allow tag names that start with 'xml', you should avoid it.

It's also legal to use a colon in a tag name, but it's not recommended. The XML spec reserves the use of the colon for namespaces. For instance, a tag using a namespace indicator would be of the form

<namespace:tagName>

Typically, an XML parser will not throw up a red flag if you use colons in your tag names, but using them is nevertheless a bad habit and should be avoided.

If you aren't sure what a namespace is, we'll have more on that later. For now, avoid using colons and the word 'xml' in your tag names!

--------------------------------------------------------------------------------

XML SCHEMAS

When I hear the word schema, I typically think of relational databases, table columns, and field types. In the XML world, schema refers to using XML instead of DTDs to define the structure of XML documents. XML Schema is a soon-to-be-released standard from the World Wide Web Consortium. It has its roots in four other standards:

XML-Data Data definition markup language (DDML) Document content definitions (DCDs), based on Resource Description Framework (RDF) Schema for object-oriented XML (SOX) All of these standards are similar in that they define a way to replace DTDs with XML. You can read more about the standards and XML Schema at the W3's Web site:

http://www.w3.org/

--------------------------------------------------------------------------------

XML SCHEMA SAMPLE

Today's tip gives you a taste of what an XML schema looks like:

For the DTD:

<!DOCTYPE Name [ <!ELEMENT Name (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> ]>

the XML schema would be:

<schema xmlns="http://www.w3.org/1999/XMLSchema"> <element name="Name" type="NameType"/> <complexType name="NameType"> <element name="First" type="string"/> <element name="Last" type="string"/> </complexType> </schema>

This is a pretty simple example--we'll get into more detail later!

--------------------------------------------------------------------------------

XML POCKET REFERENCE

For those of us with major-memory-impairment (MMI), "The XML Pocket Reference" is a great book to keep handy. It contains a tutorial for learning the basics, and it's a great reference for the XML and XSL specifications. Best of all, it costs only about eight bucks!

XML Pocket Reference by Robert Eckstein O'Reilly & Associates, October 1999 (107 pages) http://www.amazon.com/exec/obidos/ASIN/1565927095/tipworld

--------------------------------------------------------------------------------

XML NOTEPAD

After you've been using XML for a while, Notepad starts to lose its luster as an editor. Fortunately, there are several nice XML editors available to make our life a little easier. Among them, Microsoft XML Notepad has a key feature that always gets my attention: It's free! XML Notepad provides a clean, simple-to-use interface for creating XML documents. It has a pane on the left side that displays the XML in a tree structure and a pane on the right side that lets you edit content.

Microsoft XML Notepad http://msdn.microsoft.com/xml/notepad/intro.asp

--------------------------------------------------------------------------------

XML MAGAZINE ONLINE

You've probably seen a copy or two of XML Magazine--it's one of the leading periodicals for XML info. But have you been to its Web site recently? The company publishes most of its articles there, creating a great source of XML information that's available even after your coworker walks off with your copy of the magazine.

XML Magazine http://www.xmlmag.com/

--------------------------------------------------------------------------------

XML MACROS

Internal XML entities are basically macros for XML. You can define an internal entity in a DTD, like this:

<!ENTITY byline "by John Doe, (c) 1996">

and then reference it in your XML document:

<Author> &byline; </Author>

The XML parser is required to substitute all cases of the entity reference (&byline) with the text defined by the ENTITY declaration in the DTD. Therefore, the parser would return the markup like so:

<Author> by John Doe, (c) 1996 </Author>

This is useful if you have a piece of text that will be repeated throughout your XML document.

--------------------------------------------------------------------------------

XML IS CASE SENSITIVE

You probably know by now that XML (unlike HTML) is case sensitive. What this means is that <starttag> and </STARTTAG> do not match as a pair of start-end tags. One of the consequences of XML's case sensitivity is that keywords have to be capitalized. Whenever you use keywords like DOCTYPE, ELEMENT, and ATTLIST, they must be in upper case.

--------------------------------------------------------------------------------

XML IN YOUR PALM

If I ever lose my head, I will know where to start looking for it... in my Palm Pilot. I wouldn't know my own phone number if I didn't have a trusty little PDA keeping track of it. Needless to say, when I ran across an article about using XML with a Palm Pilot, I was a happy puppy. Norman Welsh, a staff engineer with Sun Microsystems, has written a great article about synching your Palm database with other desktop applications using XML. Check it out:

XML from Your Palm http://www.sun.com/software/xml/developers/palm/

--------------------------------------------------------------------------------

XML IN THE INDUSTRY

A clear indicator of XML's presence in the market is the widespread support from industry leaders. The "big boys" in the field have provided some great resources and tools for XML developers. Here are just a few worth checking out:

Microsoft http://msdn.microsoft.com/xml/default.asp

Oracle http://www.oracle.com/xml

IBM http://www.ibm.com/developer/xml

Sun http://www.javasoft.com/xml

--------------------------------------------------------------------------------

XML ESCAPE

The less-than (<) and ampersand (&) symbols can appear in a document only as markup delimiters. If you need to use one of these symbols as content in an XML document, you have to "escape" it by using < for < and & for &. (In the next tip, I'll show you an exception to this rule.)

--------------------------------------------------------------------------------

XML ENCODING

Recently, you received tips on using the encoding attribute to tell an XML processor the character format your XML document uses. For example, the following XML prologue indicates the document uses 8-bit character encoding:

<?xml version="1.0" encoding="UTF-8">

The funny thing is, the XML processor has to know what the encoding standard is before it can read any of the document, including the first line, which specifies the encoding being used. Sounds like the proverbial chicken-and-egg problem. XML processors get around this by reading the first few characters (which should always be <?xml) and matching them against their UTF-8 and UTF-16 values. Using this little trick, XML processors can determine whether the characters are 8-bit or 16-bit and read the rest of the <?xml declaration. Of course, I wouldn't rely on the processor auto-detecting the character encoding. It's always a good idea to remove any ambiguity and include the encoding attribute.

--------------------------------------------------------------------------------

XML DECLARATIONS

XML documents should always begin with an XML declaration. But is it required? Nope.

The following is a well-formed XML document:

<?xml version="1.0"?> <name>john</name> <age>10</age>

and so is this:

<name>john</name> <age>10</age>

--------------------------------------------------------------------------------

XML COMMENTS

XML borrows its commenting style from HTML. The comment starts with <!-- and ends with -->.

Here's an example:

<!-- this is a comment -->

You can put comments in DTDs and XML documents. An XML parser is required to ignore everything between the start and end comment delimiters. There are, however, a couple gotchas with comments: You cannot put a double-dash (--) within a comment block, and you cannot intersperse comments with markup.

--------------------------------------------------------------------------------

A BIZTALK BOOK WITH AN EARLY LOOK AT BTS

Consider reading "Understanding BizTalk," by John Matranga, Stephen Tranchida, and Bart Preecs. Extensively researched and reviewed for accuracy (at the moment, anyway) by several Microsoft insiders, the book offers the best public look to date at Microsoft's ideas for how XML fits into its selection of server-side solutions for business-to-business and business-to-consumer sales. There's also an advance look at BizTalk Server (BTS), and though it will probably change before final release, the early peek is tantalizing.

"Understanding BizTalk" by John Matranga, Stephen Tranchida, and Bart Preecs Sams Press ISBN 0672317877 http://www.amazon.com/exec/obidos/ASIN/0672317877/tipworld

--------------------------------------------------------------------------------

XML BOOKS

One of our readers dropped me a line to recommend the following two books. I've not checked them out yet, but he tells me these are great as introductory material for the newbie and as a reference material for the more experienced XML developer.

XML: A Primer (2nd Edition) By Simon St.Laurent IDG Books Worldwide, 9/1999 ISBN: 076453310X http://www.amazon.com/exec/obidos/ASIN/076453310X/tipworld

XML Unleashed By Michael Morrison, David Brownell, and Frank Boumphrey Sams, 12/1999 ISBN: 0672315149 http://www.amazon.com/exec/obidos/ASIN/0672315149/tipworld

--------------------------------------------------------------------------------

XML AND JAVA SERVLETS

Servlets are the Java answer to CGI. You can specify a servlet as the target of an HTTP request and dynamically generate HTML that is returned to the Web browser. The great thing about servlets is that they are not limited to generating HTML; you can use them to return XML documents as well. The following is a snippet of code that shows how to return XML from a servlet:

public class XmlServlet extends HttpServlet { public void service(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException { response.setContentType("text/xml");

PrintWriter out = response.getWriter();

out.println("<?xml version=\"1.0\"?>"); out.println("<name>Tarzan</name>"); } }

A key aspect of this code is the call to response.setContentType(), which sets the content type to "text/xml".

--------------------------------------------------------------------------------

XML AND HTML--DISTANT COUSINS

For those of you with a background in HTML just now starting to look at XML, there are a few gotchas that always pop up: First, XML tag names are case-sensitive; <myTag> and </mytag> will not match. In HTML, you could liberally use (or abuse) any combination of lower and upper case in tag names. With XML, however:

<aTag> some data </aTag> --good <aTag> some data </atag> --bad

Also, you must always be sure to include the starting and ending tag or the XML parser will throw up a red flag. In HTML, you can sometimes leave off ending tags, but in XML a start tag without a matching end tag will cause a parser to return an error.

--------------------------------------------------------------------------------

XML AND BUSINESS APPLICATION SERVICE PROVIDERS

 

Lots of people write to me asking what XML is good for. Almost without

exception, these people are independent operators or members of small

organizations. XML hasn't really become accessible to these subsets of

the computer world yet. Consider for a minute how small businesses

work. They employ very few people, hiring only those who relate

directly to the company's product or service. Most other functions are

contracted out to other companies--legal services, payroll,

distribution, whatever. Those functions that aren't farmed out are

kept in-house mainly because the available contractors aren't set up

for small jobs. Advertising falls into this latter category.

 

Think about the possibility that XML has here. If a standardized

system of tags enabled you to exchange data with other companies

easily, the cost of contracting out even more work would fall. You

could use XML to share data with an accountant (sort of like you might

export QuickBooks files for the accountant's review now). You could

give a marketing agency direct access to some of your sales figures,

enabling them to have immediate feedback on their work. The same goes

for payroll, benefits, and lots more. Easy data exchange enables small

businesses to be more agile. For a look at how this might work, look

at a couple of pages:

 

News about XML-facilitated Application Service Providers:

http://www.aspnews.com/forum.htm

 

A semi-working Web app that hints at how multiple services might

integrate in the future:

http://www.gldialtone.com/webledger.htm

 

A list of industries in which someone's working on the ASP model:

http://www.aspnews.com/dirwhodef.htm

 

 

----------------------------------------------

 

XHTML--PART 1 OF 4

eXtended HTML is the next-generation HTML. I'm sure you're aware that HTML is the presentation language of the Internet. HTML has done an excellent job of providing a platform-neutral, easy-to-understand language for creating Web pages. Unfortunately, HTML comes in many different flavors, and most browsers hide the differences. XHTML 1.0 is the current W3C recommendation for the latest version of HTML. In a nutshell, XHTML takes HTML 4.01 and applies the syntax rules of XML. These basic guidelines will get you started converting HTML to XHTML:

All elements with content must have a start and end tag. Empty elements must have an end tag, or be closed with a / at the end of the start tag. Names, including elements and attributes, should be lowercase. Elements must be properly nested. Script elements should be placed in a CDATA section to avoid improper parsing. Attribute values must be within single or double quotes. We'll look at examples of these in our next tip.

--------------------------------------------------------------------------------

XHTML--PART 2 OF 4

Following up on our previous tip, here are some samples of HTML gotchas in XHTML:

1. All elements with content must have a start and end tag: <p> -- no good <p> </p> -- very good

2. Empty elements must have an end tag, or be closed by adding a / at the end of the start tag: <br> -- no good <br /> -- much better

3. Names should be in lowercase: <FORM ACTION=...> -- oh no! <form action=""> -- whew!

4. Elements must be properly nested: <b><i> .. </b></i> -- out of order <b><i> .. </i></b> -- ooh, I like this

5. Script elements should be placed in a CDATA section to avoid improper parsing: <script> .. </script> should be replaced with... <script> <![CDATA[ ... ]]> </script>

6. Attribute values must be within single or double quotes: <img src=/hi/img.gif> -- this scares me <img src="/hi/img.gif" /> -- I feel better

--------------------------------------------------------------------------------

XHTML--PART 3 OF 4

Keeping with our HTML theme, lets take a look at a typical (simple) HTML page and see what its XHTML counterpart would look like.

Here is a simple HTML page:

<HTML> <HEAD><TITLE>Hello World!</TITLE></HEAD> <BODY> Hello world! <HR> <p> Hello again! </BODY> </HTML>

When converted to XHTML, it looks like this:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head><title>Hello World!</title></head> <body> <p>Hello world!</p> <hr/> <p>Hello again!</p> </body> </html>

Here are a few things to note:

An XML processing instruction has been added. A DOCTYPE declaration has been added. A namespace (xmlns) attribute appears on the <html> tag. All elements and attribute names are lowercase. All tags are properly closed, either with a / or an ending tag.

--------------------------------------------------------------------------------

XHTML--PART 4 OF 4

I hope I've piqued your interest in taking the next step with HTML and you're ready to start converting all of your HTML documents to XHTML. I know what you're thinking: "Is he nuts?" (Nevermind the answer to that.) If you have a stockpile of HTML documents you just can't wait to dig into, or if you're just curious, there is a great tool called HTML Tidy that will help you convert HTML to XHTML.

HTML Tidy http://www.w3.org/People/Raggett/tidy/

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: WHITESPACE

IN HTML, it's possible to use whitespace liberally. In fact, it's good practice to do so, to make your code more readable and more easily editable. XHTML isn't nearly so forgiving, and different parsers will react to large quantities of whitespace in different ways. In particular, you want to avoid whitespace within attribute sequences, so something like this is a bad idea:

<input type="button" value="Press Me" id="button1" />

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: TAG MATCHING

One of the differences between HTML and XHTML is that the XHTML specification is absolutely strict about matching tags. In HTML, something like this is acceptable:

<LI> Cod liver oil

In XHTML, that's not okay, and you'd have to take care to match opening tags with closing tags, like this:

<li> Wart remover </li>

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: SCRIPT REFERENCES

XHTML gets confused if your scripts contain certain (fairly common) character sequences, so it's best to always keep your scripts in remote files and refer to them from within XHTML documents. For example, if you use the decrement operator (--) in a script, an XHTML parser will get confused and misinterpret things. If you refer to a separate file, the problem goes away. You can conceal your script code with HTML/XHTML comments for now, but that approach may not work under future versions of XHTML. The same rule goes for stylesheets.

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: LOWERCASE LETTERS

By and large, XHTML is identical to the HTML you've probably gained some familiarity with during recent years. Most of the tags are the same, and in fact, the main surface difference is that you (as the page creator) are bound to obey stricter rules about how you attach tags and attributes to the elements of your documents.

One of the biggest differences between HTML and XHTML is that XHTML requires all tag and attribute names to be in all lowercase letters. In HTML, you can use <Img>, <IMG>, and <img> interchangeably. Not so in XHTML. The tag that defines image elements is <img>, and there can be no debate about that. The same rule goes for element attributes. All lowercase is the standard.

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: IDENTIFIERS

In HTML, we've traditionally used the NAME attribute to assign a unique identifier to document elements, mainly so we can refer to those elements easily in scripts and other instruction sets. XHTML prefers the ID attribute in place of the NAME attribute. Therefore, where HTML would use something like this...

<INPUT TYPE="BUTTON" VALUE="Press Me" NAME="button1">

...XHTML would use something like this (note the new structure for empty tags):

<input type="BUTTON" value="Press Me" id="button1"/>

--------------------------------------------------------------------------------

XHTML REQUIREMENTS: EMPTY TAGS

Last time, we learned that XHTML is picky about the way you tag elements in your documents. XHTML won't let you get away with implied closing tags (as in <P> tags without </P> tags). But what, you may ask, do you do with HTML tags that are empty--that is, that don't have closing tags? To embed an image, we've traditionally used this tag:

<IMG SRC="filename">

No dice in XHTML. In that environment, you have to use XML-style empty elements. Therefore, the tag above would be restructured like this:

<img src="filename"/>

The forward-slash character must appear before the closing angle bracket.

--------------------------------------------------------------------------------

XHTML FLAVORS

The XHTML specification defines three DTDs to be used in XHTML documents. All XHTML documents must include a DOCTYPE declaration that points to one of these variations:

XHTML Transitional

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/transitional.dtd">

This is the low-impact XHTML DTD and simply requires that you clean up your HTML to follow the XML language constraints. It also allows the use of tags like <font> to control visual aspects of the page. I recommend using Transitional when you first start converting HTML over XHTML:

XHTML Strict

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/strict.dtd">

As the name implies, this DTD has some additional constraints for hardcore XML compatibility. It also assumes the layout of the page is specified using a technology like Cascading Style Sheets (CSS), so tags like <font> and <color> are not allowed.

XHTML Frameset

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN" "http://www.w3.org/TR/xhtml1/DTD/frameset.dtd">

This is to be used if your HTML uses framesets.

As usual, these DTDs are freely available for your perusal at the W3 Web site:

http://www.w3.org/TR/2000/REC-xhtml1-20000126/#dtds

--------------------------------------------------------------------------------

WORKING WITH EMPTY ELEMENTS

In HTML, the <IMG> element is known as an empty element. There is no </IMG> tag. The <IMG> tag is used to insert a piece of information--specifically, a reference to an image file and some information about how it is to be displayed--into a document. Nonempty elements, such as those defined by <H1> and </H1>, are passages of text.

You can have empty elements in XML documents, too. Let's say we want an XML document that lists the drawings associated with a particular architectural project. We won't write any sort of parser that displays actual image files, but we will explore the XML rules that allow us to create and use empty elements.

For the purposes of this project, we'll say we want to be able to list three kinds of drawings--floor plans, elevations, and cross-sections. So, we need to define three elements in our DTD. That's for next time.

--------------------------------------------------------------------------------

WML VARIABLES

WML variables provide a facility for storing data while the user navigates cards in a deck. WML variables have deck scope, which means they're visible from all cards in the deck. There are two main ways to create and set a WML variable: via <setvar> and through any input element. Here is a sample using the <setvar> tag, which assigns the value "admin" to the variable named "login":

<setvar name="login" value="admin" />

Here is a sample of an input field that assigns to the "login" variable the string entered by the user:

<input name="login" type="password" />

--------------------------------------------------------------------------------

WML HYPERLINKS

Hyperlinks in WML are created using the <go> tag. Building on the example from a few tips back, you can see how to use the <go> tag to navigate between cards in a deck.

<wml>

<card id="card1"> <p> page 1 </p>

<do type="accept" label="card 2"> <go href="#card2"/> </do>

</card>

<card id="card2"> <p> page 2 </p>

<do type="accept" label="card 1"> <go href="#card1"/>

</card>

</wml>

This sample introduces a few new concepts. The <do> tag with a type="accept" simply displays a submit-like option to the user. When it is selected, the <go> tag within the <do> tag will be executed. In addition, the <href> attribute of the code tag uses the familiar pound-notation (#name) for linking to a card within the same deck. Lastly, the <go> tag uses the shorthand notation defined by XML for specifying an empty tag: <tag/>.

--------------------------------------------------------------------------------

WML DECK

WML files are commonly called decks. A deck is the smallest unit of WML that is transferred to a WAP device. It can be thought of as a single page of interaction, much like a single HTML page. Since WAP devices are typically scarce on screen real estate, a deck is broken up into cards. A WAP browser displays only one card at a time. A deck is like bundling several closely related Web pages into a single Web page and transmitting it as a single unit--the deck is the WML file, and each card would be a page. Here is a simple example of a WML application with two cards:

<wml>

<card id="card1"> <p> page 1 </p> </card>

<card id="card2"> <p> page 2 </p> </card>

</wml>

--------------------------------------------------------------------------------

WIRELESS APPLICATION PROTOCOL

The Wireless Application Protocol (WAP) is a standard proposed by the WAP Forum for Internet-enabling wireless devices. The WAP Forum--an organization formed by Nokia, Phone.com, Ericsson, and Motorola--has a membership that reads like a who's who of heavy-hitters in the wireless and Internet technology industry. Its goal is to bring Internet content and services to digital cell phones and other wireless devices.

The WAP specification defines an application development environment and transport protocols for creating and delivering content to WAP-enabled devices. You might be wondering, "What does this have to do with XML?" Well, the client portion of a WAP application is written using an XML mark-up language called Wireless Markup Language (WML). In the next few tips, we'll talk about WAP and how to use WML.

WAP forum http://www.wapforum.org/

--------------------------------------------------------------------------------

WINDOWS DNA EXPLAINED

Microsoft excels at attaching marketing words to whole collections of products and technologies with individual identities. Such is the case with Windows DNA and Windows DNA 2000. Windows DNA comprises Windows NT Server 4, Visual Studio, Site Server 3, SNA Server 4, SQL Server 7, Transaction Server (MTS), Message Queue (MSMQ), IIS 4, and the Component Object Model (COM) architecture. Visual Studio is itself an assemblage of Visual Basic, Visual C++, and half a dozen other development environments. Basically, then, Windows DNA is an umbrella term for Microsoft's products that facilitate the storage, retrieval, and sharing of data. The problem is, Windows DNA isn't particularly open. You can write adapters that ease interorganizational data sharing, but that's a lot of trouble. An XML-based solution is needed, and Windows DNA 2000 serves that need.

--------------------------------------------------------------------------------

WINDOWS DNA 2000 EXPLAINED

Windows DNA 2000 incorporates generally the same list of software (though in updated form--Windows 2000 Advanced Server and SQL Server 2000 fall under the Windows DNA 2000 umbrella). Interestingly, Windows DNA 2000 also incorporates a new product called Microsoft BizTalk Server (BTS). That piece of software is responsible for facilitating the exchange of information among applications and businesses. No one has really seen it to date--a beta is due out in the middle of 2000--but some details have come out. We'll get into them for a few days.

--------------------------------------------------------------------------------

WEB SITE: XML DOM

By now, you are probably familiar with the acronym DOM. It stands for Document Object Model, which is the standard for accessing the contents of an XML document using a nested tree structure. That sounds easy, but when you start digging into DOM you will find a multitude of interfaces, properties, and methods. The DevX Web site listed below will help you wade through the DOM interface:

XML Document Object Model http://www.devx.com/upload/free/features/xml/objectmodel/xmldom1.asp

--------------------------------------------------------------------------------

WAP TOOLS

I'm sure all the recent talk about WAP has got you itching to start developing applications in WML. Fortunately, there are several development kits and tools available that enable you to create and test WAP applications using your desktop PC. Here are a few:

Phone.com offers an SDK for developing HDML and WML applications: http://www.phone.com/

Nokia has a popular WAP development SDK: http://www.nokia.com/wap

Here is a cool WAP browser that is great for viewing WML content: http://www.slobtrot.com/winwap/

--------------------------------------------------------------------------------

VB-XML BOOK

If Visual Basic is a tried and true friend, and you're wondering how to use it with XML, look no further. I recommend you get a copy of "Professional Visual Basic 6 XML." It covers XML history and background, provides detail on how to parse and validate XML, and includes coverage of a myriad of XML-related technologies--all this as it relates to Visual Basic!

Professional Visual Basic 6 XML by James Britt, Teun Duynstee Wrox Press, April 2000 (500 pages) http://www.amazon.com/exec/obidos/ASIN/1861003323/tipworld

--------------------------------------------------------------------------------

VB AND XML--PART 1 OF 7

I'm going to spend the next few tips showing you the ABCs of parsing an XML document using Visual Basic. As usual, if you want to process XML documents from application software, you will need an XML parser. Microsoft distributes a free XML parser that works with Visual Basic. It's commonly referred to as MSXML and is available here:

http://msdn.microsoft.com/xml/general/msxmlprev.asp

MSXML supports both the SAX and DOM methods of processing a document. Recall that SAX is an event-based method and that DOM reads the XML document into an in-memory tree structure that you can access via an API. We will use the DOM method, and in particular the following classes and interfaces:

DOMDocument: Top-level class for a DOM document; loads and validates the document IXMLDOMNode: Single node in the DOM tree IXMLNodeList: List of nodes from a DOM tree The following DevX site is a great reference tool for learning more about the DOM interface:

DevX: XML Document Object Model http://www.devx.com/upload/free/features/xml/objectmodel/xmldom1.asp

--------------------------------------------------------------------------------

VB AND XML--PART 2 OF 7

Today's tip gets us started on parsing XML documents from Visual Basic. The first step is to make sure you've set up the Visual Basic environment to use the XML parser. If you're using MSXML, this is done by selecting Project, References. Make sure the Microsoft XML component is selected. The following code shows how to use the DOMDocument class to load an XML document (test.xml) located in the same directory as the application:

Set xml = New DOMDocument xml.Load (App.Path & "\test.xml")

Pretty simple, huh? In our next tip, we'll expand this to include handling errors and validating the document.

--------------------------------------------------------------------------------

VB AND XML--PART 3 OF 7

In our previous tip, we saw the bare-bones, simplest way to load an XML document into a DOMDocument object. Today's tip will create a more generic VB function, which will validate and load an XML document using the following code:

Private Function LoadXMLDocument (f As String) As DOMDocument

On Error GoTo LoadXMLError

Dim myErr As IXMLDOMParseError Dim doc As DOMDocument

Set doc = New DOMDocument doc.validateOnParse = True doc.Load (f)

Set myErr = doc.parseError If (myErr.errorCode <> 0) Then GoTo LoadXMLError End If

Set LoadXMLDocument = doc

GoTo LoadXMLOk

LoadXMLError: Debug.Print ("xml parse error " & myErr.reason) Set LoadXMLDocument = Nothing

LoadXMLOk:

End Function

Here is how you would use this code:

dim xml as DOMDocument Set xml = LoadXMLDocument(App.Path & "\test.xml") If xml Is Nothing Then Debug.Print "XML is not valid" Else Debug.Print "XML is valid" End If

Note that the function uses the validateOnParse attribute of the DOMDocument class to instruct the parser to validate the document. It then uses the parseError attribute to determine whether an error occurred.

--------------------------------------------------------------------------------

VB AND XML--PART 4 OF 7

Now that you've validated your XML and loaded it into a DOMDocument, you're ready to start processing elements. One class with which you'll deal when grabbing elements out of a DOM tree is IXMLDOMNode. Here are few of the attributes and methods commonly used with IXMLDOMNode:

dataType: Access the node's data type childNodes: Access child nodes attributes: Access the attributes for the node xml: Retrieve the xml for the node and all its children text: Contains the text content of a node and all its children This sample code walks through the child nodes of the document root element and prints their xml content:

Dim root As IXMLDOMNode Set root = xml.documentElement

Dim node As IXMLDOMNode For Each node In root.childNodes Debug.Print node.xml Next

--------------------------------------------------------------------------------

VB AND XML--PART 5 OF 7

When processing XML documents, you'll often want to pull out the content of specific nodes. For instance, suppose we have the following XML:

<MyBusinessData> <Address> ... </Address>

.. other tags here

<Customers> <Customer> <Name> Joe Green </Name> <Phone> 888 888 8888 </Phone> </Customer> <Customer> <Name> Becky Smith </Name> <Phone> 999 999 9999 </Phone> </Customer> </Customers> </MyBusinessData>

Now, let's suppose you want to print a list of customer names. Fortunately, the IXMLDOMNode class has a method that allows you to extract a list of nodes based on the element name. Here is the sample code to retrieve all <Customer> tags from the root document and then print the contents of the <Name> tag:

Dim root As IXMLDOMNode Set root = xml.documentElement

Dim listNodes As IXMLDOMNodeList Set listNodes = root.selectNodes("Customer") For Each node In listNodes Debug.Print node.selectSingleNode("Name").Text Next

This example introduces one new class, IXMLDOMNodeList. This class provides a simple container for a collection of IXMLDOMNodes. The example also introduces the use of the selectNodes method of IXMLDOMNode, which takes a single parameter to indicate which nodes you want to select and returns a list of those nodes.

--------------------------------------------------------------------------------

VB AND XML--PART 6 OF 7

So far, all the tips in this series have demonstrated ways to load and read an XML document. However, you'll often want to update your document as well. Here is a code sample that updates the text of a node and saves the new XML document to a file:

Set node = root.selectSingleNode ("Name") If Not node Is Nothing Then node.Text = "Bubba" End If

xml.save App.Path & "customer.xml"

This code works because the Text attribute of the IXMLDOMNode class can be used to change the value of an element. In addition, the DOMDocument class has a save method that can be used to write an XML document to a file.

--------------------------------------------------------------------------------

VB AND XML--PART 7 OF 7

Today's tip wraps up the VB and XML series. We've already seen how to load, read, and write an XML document using Visual Basic and the Microsoft XML parser. Obviously, I have only scratched the surface--and those of you who would like to dig into the details of XML and Visual Basic should check out the VBXML site. It's a great site for VB/XML-related resources and sample code.

One last thing about VB and XML: Note that some of the attributes and methods supported by MSXML are extensions to the W3C DOM specification.

VBXML http://www.vbxml.com/

--------------------------------------------------------------------------------

VALIDATE XHTML

So you've spent hours converting all of your HTML documents to XHTML and you're wondering, "Now what?" If you're like me, you would like a little instant gratification for a job well done. In lieu of taking a vacation, you might want to try W3's XHTML validator. You simply enter a URL and hit the Submit button, and it will download and validate your document. If your document is valid, it prints a nice summary page congratulating you. Otherwise, it displays a message showing what went wrong.

W3's XHTML validator http://validator.w3.org/

--------------------------------------------------------------------------------

UTF-UH-OH-8

It seems I was way off the mark in some previous tips concerning the use of UTF-8 character set encoding. Recall that the xml processing instruction has an encoding attribute that can be used to specify what character set is being used:

<?xml version="1.0" encoding="UTF-8"?>

In a previous tip, I had stated that UTF-8 uses one byte to represent characters, making it unusable for large character sets. This is incorrect--UTF-8 uses one byte to represent ASCII characters. However, it uses more than one byte (up to 3) to represent character sets (such as Asian characters) that require more than one byte.

Thanks to our readers for keeping me honest!

--------------------------------------------------------------------------------

USING INTERNAL AND EXTERNAL SUBSETS

It is quite common to use both internal and external subsets at the same time. This can be useful if you want to include application-wide elements in all your XML documents. Here is an example:

In the file common.dtd, put the following:

<!ELEMENT application_common (name,version)> <!ELEMENT name (#PCDATA)> <!ELEMENT version (#PCDATA)>

In your XML file, put

<?xml version="1.0"?> <!DOCTYPE Customer SYSTEM "common.dtd" [ <!ELEMENT Customer (First,Last,application_common? )> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> ]>

<Customer> <First>Mean</First> <Last>Greene</Last> <application_common> <name>My Application</name> <version>1.0</version> </application_common> </Customer>

Note the use of a SYSTEM specifier in the DOCTYPE declaration and the addition of an internal subset as well. The internal subset includes the elements defined in the external DTD by adding an optional child element (application_common) to the document root element (Customer). This is a good way to include common elements across several different DTDs.

--------------------------------------------------------------------------------

USING BIZTALK EDITOR

Microsoft BizTalk Server (BTS) includes what appears, from early reports, to be a strong XML document type definition (DTD) editor. It's based on a tree analogy, which means you can expand and collapse elements to see and edit their subsidiary elements. It's easy to adjust elements' attributes, too, so you'll have no problem making elements required, optional, or dependent on other fields. There's also a facility for importing existing DTDs and editing them to fit your specific needs.

--------------------------------------------------------------------------------

UNDERSTANDING THE ROLE OF BIZTALK SERVER

The idea behind Microsoft BizTalk Server (BTS), which remains deep in development, is that it's an engine for establishing and managing the rules that govern the exchange of data between two business processes--two businesses, particularly. Say, for example, that Telstra (an Australian telecommunications company) buys switches for its networks from Nortel (a maker of such equipment). For the purchase to go ahead, several things need to happen:

Telstra needs to announce its specifications, including product, quantity, delivery date, warranty requirements, and so on. Nortel needs to state its ability to meet the specifications, and at what price it can do so. Telstra needs to accept Nortel's quote. The order needs to be filled and delivery confirmed. Easy enough, but there's potentially rather a lot of paperwork involved. A lot of human effort could be expended on reading values out of one company's forms and entering them in the databases of the other. BTS establishes relationships between fields in the companies' databases, allowing direct connectivity between Telstra's "orders-placed" database and Nortel's "work-in-progress" database. BTS might also handle translation issues, such as the conversion of Canadian dollars into Australian dollars in this case. A big part of BTS's job is to promote secure data exchange--allowing Nortel to see what it needs to see in Telstra's database, and no more.

--------------------------------------------------------------------------------

UNDERSTANDING BTS PIPELINES

Central to the operation of Microsoft BizTalk Server (BTS) is what's called a pipeline. A pipeline is a pathway by which information may travel, complete with rules about how the information is formatted and used. A pipeline would specify a data source (a company or person) and a data destination (another company or person) for an established business relationship. Alternately, there could be no specified source (an option that comes in handy, for example, when a buyer is shopping for a vendor) or no specified destination (useful when a company sells its products to all comers). The pipeline also would specify how data transiting it should be formatted, and how fields on either end of the pipeline map to the transmission format. For example, a vendor's invoice fields could be made to map directly into a buyer's purchase order fields.

--------------------------------------------------------------------------------

TIP OF THE YEAR

Yep, this is it. Put this one on your refrigerator, make copies, and send it to the folks. I once spent hours trying to figure out why the XML parser I was using could not read a SYSTEM DTD that I was specifying like this:

<!DOCTYPE Customer SYSTEM "common.dtd">

where common.dtd was in the same directory as the XML file I was processing. It turns out that some XML parsers are smart enough to convert the filename to a valid URI; some are not. To get around the problem, I had to change the DOCTYPE declaration to this:

<!DOCTYPE Customer SYSTEM "file:/D:/JOHN/xmltips/test/common.dtd">

--------------------------------------------------------------------------------

THE XHTML NAMESPACE

In writing an XHTML document, you must include an announcement of the XHTML namespace immediately after the XHTML DTD declaration. The namespace definition basically imports a complete set of HTML-like tags for you to use in applying formatting and other design rules to your documents. In XHTML, the namespace declaration appears as an attribute of the <html> tag that opens the document, like this:

<html xmlns="http://www.w3.org/TR/xhtml1">

That brings in the XHTML variations of the familiar HTML elements. However, one of the top attractions of XHTML is that it allows you to import and use your own (or at least, other) namespaces. That's the topic for next time.

--------------------------------------------------------------------------------

THE XHTML DTD

When writing an XHTML document, you're required to declare which Document Type Definition (DTD) the document follows, the same as with any XML document. For XHTML, the DTD announcement looks like this:

<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

That must be the first set of lines in any XHTML document.

--------------------------------------------------------------------------------

THE REAL WORLD

Have ever wanted to talk to your PDA and have it take notes for you? Or maybe tell your Web browser to return to the previous page by verbal command? VoiceXML is an initiative to provide just those capabilities. It is an XML-based markup language targeted at providing voice access and interactive voice response to Web-based content and applications. VoiceXML is designed for supporting two-way, interactive dialogs using various forms of audio input, such as synthesized speech, digitized audio, and speech recognition.

VoiceXML Forum http://www.voicexml.org/

alphaWorks http://www.alphaworks.ibm.com/tech/voicexml/

--------------------------------------------------------------------------------

THE REAL WORLD

Every once in a while I like to throw in an example of how XML is being used in the industry. Microsoft's Channel Definition Format (CDF) is one example. CDF is an XML-based format that describes the content for an active channel. An active channel is basically a group of Web pages. The main distinction is that while a user is viewing your channel, the page is updated automatically.

Here is a pseudo-sample to give you a flavor of what CDF looks like:

<?XML Version="1.0"?> <CHANNEL HREF="http://blah.blah/x.html"> <ABSTRACT>This is a sample channel</ABSTRACT> <TITLE>Sample Channel</TITLE> <LOGO HREF="http://blah.blah/icon.ico" STYLE="icon" /> <LOGO HREF="http://blah.blah/image.gif" STYLE="image" /> <LOGO HREF="http://blah.blah/wide.gif" STYLE="image-wide" /> <ITEM HREF="http://blah.blah/item1/"> <ABSTRACT>Item 1</ABSTRACT> <TITLE>Item 1</TITLE></ITEM> <ITEM HREF="http://blah.blah/item2/"> <ABSTRACT>Item 2</ABSTRACT> <TITLE>Item 2</TITLE></ITEM> </CHANNEL>

Learn more here:

http://www.pcworld.com/r/tw/1%2C2061%2Ctw-xm5-43%2C00.html

http://www.pcworld.com/r/tw/1%2C2061%2Ctw-xm5-42%2C00.html

--------------------------------------------------------------------------------

THE NAME GAME

A valid XML tag name starts with a letter or one of a couple of punctuation characters, followed by a combination of letters, digits, and punctuation marks. Specifically, a tag name must start with a letter (a..z, A..Z), underscore, or colon.

The rest of the name can be any combination of letters, digits, colons, hyphens, periods, and underscores. Some examples of valid tag names are

<tagName></tagName> <TAG-NAME></TAG-NAME> <_tagName1></_tagName1>

Some invalid tag names are

<,name></,name> <3name></3name> <a tag></a tag>

--------------------------------------------------------------------------------

TESTING THE UNIQUENESS REQUIREMENT

Let's modify our XML document to look like this:

<?xml version="1.0"?>

<!DOCTYPE drawingList SYSTEM "drawings4.dtd">

<drawingList>

<floorPlan name="First Floor" URL="http://www.yahoo.com" />

<floorPlan name="First Floor" URL="http://www.yahoo.com" /><elevation name="East Elevation" URL="http://www.yahoo.com" />

<elevation name="South Elevation" URL="http://www.yahoo.com" /> <crossSection name="CS A-A" URL="http://www.yahoo.com" />

<crossSection name="CS B-B" URL="http://www.yahoo.com" /></drawingList>

Save that as drawingList4x.xml and load it with Microsoft Internet Explorer.

We expect an error, because the two floorPlan elements have the same name--something that's forbidden by the ID in the attribute definition. But IE doesn't return an error. This goes to prove that you can't always trust a browser as a tester of XML validity.

--------------------------------------------------------------------------------

SYSTEM EXTERNAL SUBSETS

You have two options when specifying the location of a DTD in a DOCTYPE declaration: SYSTEM or PUBLIC. If SYSTEM is used, the DTD specifier should contain a URI (Uniform Resource Identifier) that points to a DTD file. The DTD file, plus any internally defined elements, make up all declarations needed to validate the document. Here is an example:

<!DOCTYPE AddressBook SYSTEM "http://www.myserver.com/xml/address.dtd">

--------------------------------------------------------------------------------

STYLE SHEETS

There are two main technologies used for displaying XML documents: eXtended Style Language (XSL) and Cascading Style Sheets (CSS). Today's tip will show you how to use CSS with XML documents. You can associate a style sheet with an XML document using the <?xml-stylesheet> processing instruction, as follows:

<?xml version="1.0" ?> <?xml -stylesheet type="text/css2" href="foo.css" ?> <FOO> Hello XML! </FOO>

The CSS file could contain

FOO {display: block; font-size: 24pt; font-weight: bold;}

The attributes for the processing instruction are

href: The style sheet location type: The style sheet language (possibilities include text/css, text/css2, text/xsl) media: The target media (screen, print, etc.) title: Title for the style sheet (not of much use) alternate: yes or no; tells the style sheet engine if there are alternative style sheets You can use the style sheet processing instruction for XSL style sheets, too--more on that later!

--------------------------------------------------------------------------------

STANDALONE DOCUMENTS

By now, everyone has seen the standard XML processing instruction at the beginning of an XML document:

<?xml version="1.0" ?>

One of the attributes of the XML processing instruction is standalone. Possible values are yes and no. For example:

<?xml version="1.0" standalone="yes"?>

If standalone is set to yes, the XML document is declaring that it does not use any external entities (like an external DTD). Some XML processors have optimized algorithms for handling standalone documents, so if your document doesn't use any external stuff, it's a good idea to specify it.

--------------------------------------------------------------------------------

SCHEMA

In reading books and articles about XML--which at this stage of its life remains mired in large quantities of hype--you encounter the word "schema" a lot. You can be led to believe that a "schema" is something you create and/or use in the process of building XML documents and parsing them.

Strictly speaking, no. A schema, in the XML sense, is a set of rules for marking up documents with XML tags. A DTD of the sort we've been creating in the previous series of tips is one kind of schema. There are others, most of them still ideas and proposals that haven't yet been standardized and may never be. Schemas other than DTDs can solve such problems as DTD's poor extensibility and lack of capacity for inheritance.

Grammar hint: The word "schema" is singular, not plural. The valid plural forms of the word are "schemas" and "schemata," with the former seeming more popular.

--------------------------------------------------------------------------------

REUSING DTDS

It's always a good idea to reuse work you've done--particularly work that has been tried and tested. Fortunately, XML provides a simple way to help you reuse DTDs. By using entities, you can include external DTDs as follows:

<!ENTITY mycompanydtd SYSTEM "dtds/company.ent">

.. other stuff here ..

%mycompanydtd; <!-- include the DTD here -->

This can be extremely useful, especially if you need to share common DTDs across multiple projects or organizations and you want to avoid the nightmare of maintaining multiple versions in different files.

--------------------------------------------------------------------------------

REQUIRING UNIQUENESS WITH ID

One of the requirements we originally laid down in the specification for the list of drawings was that each instance of a given element would be required to have a unique name attribute. We can accomplish this with an XML keyword-ID. ID goes into the DTD in the same place we put CDATA before.

Here's a modified DTD:

<!ELEMENT countryList (floorPlan?, elevation?, crossSection?)>

<!ELEMENT floorPlan EMPTY>

<!ATTLIST floorPlan name ID #REQUIRED URL CDATA #REQUIRED >

<!ELEMENT elevation EMPTY>

<!ATTLIST elevation name ID #REQUIRED URL CDATA #REQUIRED >

<!ELEMENT crossSection EMPTY>

<!ATTLIST crossSection name ID #REQUIRED URL CDATA #REQUIRED >

Save that as drawings4.dtd. You'll find that if you save projectDrawings3.xml as projectDrawings4.xml and change the DTD reference to drawings4.dtd, projectDrawings4.xml is just as valid as the unmodified projectDrawings3.xml.

Next time, we'll test the uniqueness requirement.

--------------------------------------------------------------------------------

REQUIEM FOR SOME ATTRIBUTES

We've defined empty attributes, but they're truly empty. They're valid XML, but they contain no information. What we need is a way for each instance of each of our blind tags to carry some useful information. Since we want to create lists of drawings, each element should carry a couple of things:

The name of the drawing A URL where the drawing may be found For the purposes of this exercise, the URL won't refer to anything real; it will just be a placeholder.

Put into XML terms, we want to require each instance of each of our elements to have exactly one name--which should be unique--and exactly one URL, which also should be unique.

Next time: Modifying the DTD to include attributes.

--------------------------------------------------------------------------------

REAL-WORLD XML: BML--PART 1 OF 5

If you've ever written code to create a graphical user interface (GUI), you know things can get hairy--and fast. For example, every time you want to change a label or add a button, you have to re-code, re-compile, and re-distribute the whole ball of wax. Bean Markup Language (BML) is a free toolkit from IBM for creating, configuring, and connecting Java classes using XML. It comes with two applications: a compiler and an interpreter. The compiler generates Java code based on what you've defined in a BML file. The interpreter reads the BML and creates the GUI at runtime.

The next few tips will discuss BML, so if you want to follow along, download it here:

http://www.alphaworks.ibm.com/tech/bml

--------------------------------------------------------------------------------

REAL-WORLD XML: BML--PART 2 OF 5

There is no better way to get started with any new programming paradigm than with the standard "Hello World" application. So let's jump right in with a sample "Hello World" in BML:

<?xml version="1.0"?> <bean class="java.awt.Panel"> <add> <bean class="java.awt.Label"> <property name="text" value="XML Tips Rule!!"/> </bean> </add> </bean>

If you have installed the BML toolkit, you can run the application by opening a command window and typing

java demos.drivers.PlayerDriver helloxml.bml

where helloxml.bml is the file containing your BML markup. Also, you will have to add the BML root directory (bml-root), {bml-root}/lib/xml4j_2_0_11.jar, and {bml-root}/lib/bmlall.jar to the java classpath. This sample demonstrates a few basic BML concepts: adding a bean to the application (java.awt.Panel, java.awt.Label) and setting bean properties (text).

--------------------------------------------------------------------------------

REAL-WORLD XML: BML--PART 3 OF 5

I hope I've peaked your interest with our look at BML. Today's tip describes each of the ten tags defined in the BML Document Type Definition (DTD).

These tags are used for creating and connecting beans:

<bean> Create a new bean or look one up. <args> Pass arguments to the constructor of a bean. <add> Add a bean to another bean (like adding a label inside a panel). <string> Create an instance of the java.lang.String class. These tags allow you to set properties on a bean:

<property> Set or get a bean property. <field> Set or get a bean field. Here are the rest:

<event-binding> Create an event connection from one bean to another. <cast> Convert bean reference to another type. <script> Embed a script. <call-method> Call a method on a bean. If you like reading DTDs in your spare time (and really, who doesn't?), take a look at the bml.dtd file in the doc directory where you installed the BML toolkit.

--------------------------------------------------------------------------------

REAL-WORLD XML: BML--PART 4 OF 5

All applications need a way to handle events--like when a button is pushed or text is entered in an edit field. BML handles events using the <event-binding> tag. It supports several kinds of event handling, including the standard java event listener pattern. The <event-binding> tag takes the form

<event-binding name="event-set-name"> <bean .. /> </event-binding>

where the name attribute specifies the event to bind and the nested bean is a class that implements the Listener interface. As usual, this is easier to see with a sample. Below is a BML file that creates two classes: an ActionListener class (myEventHandler) and a button. The button will generate action events when it is pressed. Notice that the first <add> tag creates an instance of the class myEventHandler and binds it to the id myHandler. The <bean> tag for the button contains an <event-binding> tag that references the handler bean by its id.

<?xml version="1.0"?> <bean class="java.awt.Panel"> <add> <bean class="myEventHandler" id="myHandler"/> </add> <add> <bean class="java.awt.Button"> <event-binding name="action"> <bean source="myHandler"/> </event-binding> </bean> </add> </bean>

Here, then, is myEventHandler.java:

public class myEventHandler extends Component implements ActionListener { public void actionPerformed (ActionEvent a) { System.out.println("howdy"); } }

BML also supports a couple other flavors of <event-binding>, which include calling a method of a class or calling a JavaScript function within the BML file.

--------------------------------------------------------------------------------

REAL-WORLD XML: BML--PART 5 OF 5

I'm not convinced BML is ready for primetime, mission-critical deployment of applications. However, I'm not convinced it isn't, either. It certainly presents an alternative, dynamic approach to creating applications. It also hints at some of the possible uses for XML outside the usual data definition and Web-page worlds. BML starts to get more useful as you embed it in your own applications. PlayerDriver, which comes with the toolkit, is a good starting point for learning how to do that. Overall, I think the BML toolkit is easy to use, has great documentation, and is just plain neat.

--------------------------------------------------------------------------------

PUBLIC EXTERNAL SUBSETS

If the PUBLIC specifier is used in a DOCTYPE declaration, things can get a bit nebulous. Using PUBLIC means the URI points to a "well-known" DTD. It gives the XML processor the opportunity to locate the DTD using its own algorithms. It might have a local copy it can use, or possibly retrieve it from a database. The key point is that the identifier does not specify a particular file, but instead a "well-known" name for the DTD. In addition, the means of finding the DTD is left to the XML processor. Here is an example:

<!DOCTYPE Chemical PUBLIC "global/Chemical"> -- this is not real, it is only a sample

--------------------------------------------------------------------------------

PUBLIC AND SYSTEM SUBSETS

You can use both PUBLIC and SYSTEM specifiers in a DOCTYPE declaration. This lets the XML processor try to look up the PUBLIC identifier (if it can), while still having the option of loading it from a file. Here is an example:

<!DOCTYPE AddressBook PUBLIC "mycompany/global/Address" "http://www.myserver.com/xml/address.dtd">

Note that the SYSTEM keyword is left out.

--------------------------------------------------------------------------------

PROCESSING INSTRUCTIONS

The creators of the XML specifications went to great lengths to create an open standard that is not dependent on any particular application, operating system, or platform. (This, obviously, is a good thing!) However, in the real world there are times when a little hint to the application that processes your XML document can pay huge dividends. The XML specification defines processing instructions (PIs) to accommodate this need. PIs take the following form:

<? target ...instruction... ?>

where target is the name of the target application, and ...instruction... is the directive to the targeted application. Naturally, <? and ?> are the PI delimiters.

Here is an example of a PI you've likely been using without realizing it was a PI:

<?xml version="1.0" encoding="UTF-8" standalone="y" ?>

This PI tells any XML processor the version, encoding, and standalone status of the XML document.

--------------------------------------------------------------------------------

PREDEFINED ENTITIES

XML predefines entities for character references that correspond to markup. This is useful if you need to include a character in your content that the XML processor would otherwise look at as markup. The predefined entities are:

&lt; for (<) &apos; for (') &amp; for (&) &quot; for (") &gt; (>)

For example, the following XML is not valid because the processor would try to interpret the less-than (<) character as markup:

<formula>x < y = 8</formula>

Instead, you should use the predefined entity for the less-than character (&lt;):

<formula>x &lt; y = 8</formula>

--------------------------------------------------------------------------------

PCDATA

PCDATA stands for Parsed Character Data, which is used as a content model for defining an element in a DTD. For example, the following defines an element named Description, which can contain PCDATA:

<!ELEMENT Description (#PCDATA)>

An occurrence in an XML document would look like this:

<Description> this is a description of something </Description>

The Parsed in PCDATA means that an XML parser will read the content, looking for mark-up characters like < and &. The counterpart to PCDATA is a CDATA section, which is not parsed for content.

--------------------------------------------------------------------------------

PARSING XML

XML files are great, but by themselves they don't do a whole lot. To put XML files to use, developers typically write software programs using an XML parser. The parser provides an API to process the elements of an XML document.

Here are some pointers to Java XML parsers:

http://xml.apache.org/xerces-j/ http://www.alphaworks.ibm.com/tech/xml4j/ http://java.sun.com/xml/download.html http://www.jclark.com/xml/xp/index.html http://msdn.microsoft.com/downloads/webtechnology/xml/msxml.asp

--------------------------------------------------------------------------------

PARSERS, PARDNER

XML is pretty neat, but you're not going to get very far without a parser. Here are a few of the most popular ones:

XML4J - IBM's Java parser http://www.alphaworks.ibm.com/tech/xml4j

XML4C - IBM's C++ parser http://www.alphaWorks.ibm.com/tech/xml4c

MSXML - Microsoft Parser, works with Visual Basic and Visual C++ http://msdn.microsoft.com/downloads/webtechnology/xml/msxml.asp

JAXP - Sun's Java parser http://java.sun.com/xml/download.html

--------------------------------------------------------------------------------

NAMESPACES

Namespaces are used to ensure uniqueness among element names. While it's not a requirement to use namespaces, it is highly recommended. For instance, if you were creating a DTD for your business data that contained an element named <Customer> and you wanted to exchange information with a partner company that also uses the element name <Customer>, you could potentially run into some serious tag-name clashing. Namespaces were created to solve this problem. We'll dive more into namespaces soon.

--------------------------------------------------------------------------------

MSXML PREVIEW RELEASE

Microsoft has made available a preview release of its XML parser, MSXML. It's an update to the product release of March 2000. This release now has support for the SAX2 API and XSLT/XPath.

Here are the links:

http://msdn.microsoft.com/downloads/webtechnology/xml/msxml.asp http://msdn.microsoft.com/workshop/xml/articles/sax2jumpstart.asp

--------------------------------------------------------------------------------

MS AND XML

You may not know it, but you most likely already have an XML parser on your PC. Microsoft includes a parser with Internet Explorer 5--in the form of a standard COM object implemented in a file named msxml.dll. If you search your system and find that file, you've got yourself a parser! You can use it with any Active-X/COM capable development environment like Visual Basic and ASP.

--------------------------------------------------------------------------------

MORE WML

Wireless Markup Language (WML) is an XML-based markup language used to create applications for use with WAP-enabled devices. The WAP specification borrows heavily from the existing request-and-reply model used by the Internet today. A WAP device makes a request to a WAP gateway for a WML file, much like a Web browser makes a request to a Web server for an HTML file. A browser on the WAP device (called a micro-browser) renders the page according to the contents of the WML file. WML performs the same function as HTML; it contains XML tags that define what the page looks like. Here is Hello in WML:

<wml> <card> <p> Hello </p> </card> </wml>

--------------------------------------------------------------------------------

MORE ON ENCODING

XML by default supports Unicode characters. Unicode is the evolution of ASCII to include support for all spoken languages. ASCII only uses 7 bits--which is fine for English--but doesn't come close to providing enough range for languages like Chinese. For that reason, Unicode uses 16 bits per character. An XML processor is required to support UTF-8 and UTF-16 character encoding. (Obviously, UTF-8 uses 8 bits and UTF-16 uses 16 bits.) If you're using UTF-8 or UTF-16, you can leave off the encoding attribute. If you're using another character encoding, you must specify it using the encoding attribute.

Here are some examples:

<?xml version="1.0">, must use either UTF-8 or UTF-16.

<?xml version="1.0" encoding="ISO-8859-1"?>, uses Latin-1 encoding, the Microsoft Windows default character set.

<?xml version="1.0" encoding="UTF-8">, uses 8 bits per character.

--------------------------------------------------------------------------------

MEET THE FUNCTOIDS

In Microsoft BizTalk Server (BTS), it's possible to take multiple fields from a source document and manipulate them before putting the results into single fields in the destination document. Such manipulations might include arithmetic operations, string manipulations, or anything else you care to write in ECMA-262 script (which is basically a standardized version of JavaScript). The transformation rules are essentially Extensible Stylesheet Language (XSL) transformations, but BTS calls them--gloriously--functoids. Let's hope this term makes it beyond the beta stage and becomes part of the lingo.

--------------------------------------------------------------------------------

LINKING TO EXTERNAL FILES

In XHTML, you're generally bound to refer to scripts and stylesheets as files that are independent of your XHTML documents. The syntax for doing this is exactly the same as that in standard HTML (except in standard HTML, imports usually are a matter of convenience more than anything else). The syntax looks like this:

<script type="text/javascript" src="/libraries/math.js"> </script>

That line allows the file in which it appears to refer to functions in the file math.js, which resides in the libraries folder. The file contains only JavaScript code--no HTML or XHTML at all.

David Wall, based near Washington, D.C., works as a writer, lecturer, and consultant. You'll find example code at http://www.davidwall.com/xml. You can contact David at xml_dave@davidwall.com.

--------------------------------------------------------------------------------

LET XML SCHEMAS DO THE WORK

XML schemas attempt to overcome two basic shortcomings of DTDs: data typing and a complex syntax. DTDs use a unique, often confusing syntax for describing the content constraints of an XML document type. XML schemas are written using XML, so you don't have to learn a new language. In addition, since schemas are written with XML you can parse them using standard XML tools to obtain the metadata for the schema.

XML schemas also provide a way to apply data typing to element content. With a DTD there is no way to specify a datatype for elements. For example, neither of the tags

<salary>1000</salary> <salary>hi</salary>

would be vetoed by a validating parser using a DTD. With an XML schema, you could specify that <salary> can contain only integer values and let the parser do the data validation for you.

--------------------------------------------------------------------------------

JARGON

Sometimes the jargon associated with a technology can be overwhelming. XML is definitely no exception. Here are a few commonly used XML terms dealing with document structure:

Root element: Every XML document must have a root element. It must always be the first element, and all other elements are sub-elements of the root element. Document entity: The term document entity comes into use when you're dealing with XML that is not stored in a file. It is often the case that an XML document is transmitted over the Internet between cooperating software programs. The term document entity is used to describe the entire XML document. We typically associate this with an XML file. When dealing with byte streams, it is simply the chunk that makes up each XML document. Child element: A child element is a sub-set of another element. If the child element does not have any children, it is called a leaf element. Parent element: A parent element is an element that contains other elements (child elements).

--------------------------------------------------------------------------------

IS BIZTALK REALLY OPEN: THE BIG QUESTION

Because it's so closely integrated with specific applications (and not a little bit because BTS is a Microsoft product), it's fair to ask whether the BizTalk concept is really reliant on open XML standards or merely paying lip service to them as Microsoft marches off to create yet another market-dominating, standards-stifling product. At this early stage, it seems as though the company has approached XML quite well, by making its products read and write standards-compliant XML code. The company simply has built strong XML tools into its commercial server software and seems to be hoping to attract people to Windows (and Windows network information services) on the strength of that software. Let's beware of any effort to extend XML, though--that happened with HTML and JavaScript, and the results were some products that, while nifty in their own right, damaged the community. Here's hoping that Microsoft sticks with standards-compliant XML tools.

--------------------------------------------------------------------------------

INTRODUCING XHTML

In an XML universe that's nothing if not loaded with specifications and sequences of letters to describe them, Extensible Hypertext Markup Language (XHTML) sounds promising. This emerging standard (they're all emerging standards, aren't they?) promises to take HTML, the Web markup language that's already well established, and add to it some of the benefits of XML, which isn't nearly as widely understood or supported.

You can read the XHTML 1.0 specification at the World Wide Web Consortium's Web site:

http://www.w3.org/TR/xhtml1/

--------------------------------------------------------------------------------

INTRODUCING DTD--PART 1 OF 3

DTD stands for Document Type Definition. It refers to a formatted ASCII file that defines what tags, attributes, and tag relationships are allowable for a class of XML documents. DTDs are used in conjunction with a validating parser to ensure that XML documents are valid.

Remember that a valid XML document, in addition to being well formed, adheres to the language semantics specified by a DTD. Over the next few tips, we'll explain how to create and use a DTD. Note that DTDs will most likely be supplanted by XML Schemas (we'll get into these later), but for now they are still in widespread use and supported by many tools.

--------------------------------------------------------------------------------

INTRODUCING DTD--PART 2 OF 3

One way to specify a DTD is to include it right in the text of the XML file. The following is a sample XML file that contains a DTD:

<?xml version="1.0"?>

<!DOCTYPE Customer [ <!ELEMENT Customer (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> ]>

<Customer> <First>John</First> <Last>Doe</Last> </Customer>

If you're not up on DTDs, we'll cover more on the syntax later. The point here is that the DTD can be contained in the text along with the XML.

Given the above XML file, a validating parser will throw an exception if the XML does not adhere to the constraints of the DTD. A non-validating parser will ignore the DTD.

--------------------------------------------------------------------------------

INTRODUCING DTD--PART 3 OF 3

The most common way to specify a DTD for an XML document is to include a DOCTYPE element with an external reference. For example, using the following XML file:

<?xml version="1.0"?>

<!DOCTYPE Customer SYSTEM "customer.dtd">

<Customer> <First>John</First> <Last>Doe</Last> </Customer>

the file customer.dtd would contain:

<!ELEMENT Customer (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)>

In XML-speak, the DTD file is called an external-subset of the full DTD for the XML file.

--------------------------------------------------------------------------------

INTERNAL SUBSETS TAKE PRECEDENCE

Internal subsets always take precedence over an external subset. This is particularly handy if you want to make use of an existing DTD but need to tweak it a little to suit your needs. Here is a sample XML file using an internal and external DTD where the internal subset overrides the value of an entity defined in the external subset:

<?xml version="1.0"?> <!DOCTYPE Sample SYSTEM "company.dtd" [ <!ELEMENT Sample (copyNotice)> <!ELEMENT copyNotice (#PCDATA)> <!ENTITY copy "my copyright notice"> ]>

<Sample> <copyNotice> © </copyNotice> </Sample>

where company.dtd is a separate file and could contain the following:

<!ENTITY copy "Company Wide Copyright Notice">

In this example, the internal subset defines an entity copy that replaces the externally defined entity of the same name. So the parsed XML would come back as

<Sample> <copyNotice> my copyright notice </copyNotice> </Sample>

--------------------------------------------------------------------------------

INTERNAL AND EXTERNAL SUBSETS

When talking about document type definitions (DTDs) and the associated Document Type Declaration (<!DOCTYPE>), you will often hear the terms internal and external subset. Internal subset refers to a document type definition that is declared inside the XML file. Here is an example:

<?xml version="1.0"?> <!DOCTYPE Customer [ <!ELEMENT Customer (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> ]>

<Customer> <First>Mean Joe</First> <Last >Greene</Last> </Customer>

External subsets are used when the DTD is contained in an external file, and the DOCTYPE declaration refers to it. Here is the previous example using an external subset:

<?xml version="1.0"?> <!DOCTYPE Customer SYSTEM "customer.dtd">

<Customer> <First>Mean Joe</First> <Last >Greene</Last> </Customer>

where customer.dtd is a separate file and would contain the following:

<!ELEMENT Customer (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)>

--------------------------------------------------------------------------------

GRASPING THE BIZTALK SPECIFICATION

Professional computer programmers spend much of their time making systems interoperate. That is, they write parsers, filters, converters, and adapters that take the output of one program and render it readable by another. The idea is that the business that uses the two pieces of software is made more efficient by having an automatic translator that sits between them.

The problem is, custom programming work is very expensive. Hiring a team of programmers to integrate your applications can do much to offset the financial gain that comes from integration. A more generic solution is needed, but the generic solution can be no less robust or reliable than custom work.

This is the purpose of BizTalk, a data-interchange scheme developed by Microsoft and others. Based on XML standards, BizTalk is meant to facilitate data interchange among applications--including applications running on separate companies' computers. BizTalk, therefore, is a tool for (among other things) buying and selling across networks. It's been in the works for more than a year now, and it's getting to be mature enough to reward experimentation.

--------------------------------------------------------------------------------

GO AGAIN!

The previous tip showed you how to navigate between cards in a deck using the <go> tag. Why stop there? You can also use the <go> tag to navigate to other WML decks. Just as in HTML, you can use the href attribute to specify a WML deck:

<go href="http://someserver.com/wml/deck1.wml" />

You can take this one step further and generate dynamic WML pages by specifying a server-side script (CGI, servlets, ASP) instead of a WML file. Your script is responsible for generating valid WML in response to the request. Here is a sample:

<go href="books.cgi" method="get" > <postfield name="author" value="Jordan" /> </go>

Note the use of the <postfield> tag to pass values to the script.

--------------------------------------------------------------------------------

FIXED ATTRIBUTES

When I first saw FIXED attributes, I was a little confused. If an attribute always has a fixed value, why in the world would you bother to specify it in a DTD? The answer is quite simple: Fixed attributes allow you to have a default attribute value that you don't have to specify all the time. This makes your XML leaner and cleaner. For example:

<!ELEMENT House EMPTY> <!ATTLIST House type CDATA #FIXED "ranch">

When the parser encounters <House>, it's the same as <House type="ranch"/>.

--------------------------------------------------------------------------------

FINDING THE XML IN BIZTALK SERVER

We've been talking about pipelines under Microsoft BizTalk Server (BTS). Pipelines establish paths and rules by which data can go from one company's server to the server of another company, completing a business transaction in the process. So, where's the XML? XML markup comes into play as the data fields are transmitted from the source to the destination. While in transit, the data are in a sort of message format, not unlike an email message. Individual pieces of data (quantity, description, delivery date, and so on) are tagged in XML. XML document type definitions (DTDs) define the markup, and it's possible to translate (via mappings you specify) the source XML format into the destination XML format. It's also possible to draw DTDs and other markup specifications from those stored at biztalk.org and other library sites.

--------------------------------------------------------------------------------

ENTITIES

There are two types of entities: general and parameter. General entities are what you normally think of as an entity in XML, and are typically used as replacement text in the content of a document. Parameter entities are used strictly within a DTD, also as replacement text.

General entities are declared like this:

<!ENTITY copyright "MyCompany.com, Inc., 1999">

and are referenced using an ampersand (&) and semicolon (;) as delimiters:

&copyright;

Parameter entities are declared like this:

<!ENTITY % peopleAttrib "name CDATA #IMPLIED age CDATA #IMPLIED weight CDATA #IMPLIED>

and are referenced using a percent (%) and semicolon (;) as delimiters:

%peopleAttrib;

Next time we'll see a common use for parameter entities.

--------------------------------------------------------------------------------

ENCODINGS

Have you ever seen the following and wondered what the heck encoding is and what that funny looking value is?

<?xml version="1.0" encoding="ISO-8859-1"?>

If you're like me, you probably pay little attention to the encoding attribute... and most of the time that's fine. The encoding attribute is used to tell an XML processor what standard the characters in the XML document are encoded with. An encoding standard is simply a specification of how a character is represented in bits--typically how many bits and what character each possible value represents. For instance, ASCII defines 7-bit characters, where the value 97 is the letter a, the value 98 is the letter b, and so on.

In general, most people ignore the encoding attribute. For those of you who have to deal with alternative character sets, I'll provide a few more tips over the next weeks.

--------------------------------------------------------------------------------

EMPTY ELEMENTS IN XML DOCUMENTS

In our previous tip, we used this DTD to define three empty elements for use in a list of architectural drawings:

<!ELEMENT floorPlan EMPTY>

<!ELEMENT elevation EMPTY>

<!ELEMENT crossSection EMPTY>

Now let's look at the syntax for using these elements in an XML document. Pay close attention--this is one of those situations in which HTML knowledge will cause trouble for you.

In XML, empty elements open with the < character as usual but close with a /> sequence. Therefore, an empty element called isNATOMember, which we might have used in the list of countries we worked with in a previous tip, would appear in an XML document like this:

<isNATOMember/>

That's the general syntax. Here's a document that uses our empty elements:

<?xml version="1.0"?>

<!DOCTYPE drawingList SYSTEM "drawings.dtd">

<floorPlan/>

<elevation/>

<crossSection/>

Save that as projectDrawings.xml. Though you might have predicted that your browser wouldn't display any sort of formatted text (after all, there's no stylesheet defined here), you might not have anticipated the error that appears. Next time, we'll see why we get that error.

--------------------------------------------------------------------------------

EMPTY ELEMENTS IN A DTD

In our previous tip, we decided to create a DTD that includes definitions of three empty elements: one for floor plans, one for elevations, and one for cross-sections. This DTD would be used to generate lists of drawings associated with a building project.

According to XML syntax, the key to defining an empty element, logically enough, is the keyword EMPTY. Use EMPTY in an element definition, and you've defined an element that requires no closing tag.

The DTD looks like this:

<!ELEMENT floorPlan EMPTY>

<!ELEMENT elevation EMPTY>

<!ELEMENT crossSection EMPTY>

Save that as drawings.dtd.

This DTD allows us to use empty elements called floorPlan, elevation, and crossSection in an XML document. In our next tip, you'll see the syntax to use in the XML document itself.

--------------------------------------------------------------------------------

ELEMENTS OR ATTRIBUTES

Here are a few factors to consider when trying to decide whether to use elements or attributes:

Elements can contain nested elements and content; attributes can contain only content. Obviously, if there's any chance you may need to add nested structure to a data container, elements are the way to go. On the other hand, attributes provide more options for constraining the type of data in the container, and they can contain default values. The ability to limit the possible values and provide default data lets the parser do some of the work for you.

--------------------------------------------------------------------------------

ELEMENT CONTENT

Today's tip describes your options for content type within an element. An element's content type is defined in a DTD using the ELEMENT specifier:

<!ELEMENT ElementName ( .. content type .. )>

The options for content type are element-content, mixed-content, character-content, and empty-content. Element-content consists of nested elements only. Mixed-content can contain elements and character data. Character-content can contain character data only. Empty-content is self-explanatory. Here are samples of each:

<!ELEMENT Name (a,b,c)> - element content <!ELEMENT Name (a | #PCDATA)*> - mixed content <!ELEMENT Name (#PCDATA)> - character content <!ELEMENT Name (#EMPTY)> - empty content

--------------------------------------------------------------------------------

ELEMENT ATTRIBUTES

Attributes are used to associate name-value pairs with an element. You declare them in a DTD, where they can appear only in a start element tag or an empty element tag. Here is how they're declared:

<!ATTLIST 'name' 'attribute definitions' >

where 'name' is the name of the element and 'attribute definitions' is a list of attribute definitions for that element. Attribute definitions have a name, type, and default specifier.

Here's an example:

<!ELEMENT Lights (#PCDATA)> <!ATTLIST Lights state (on|off) "on">

The XML would look like

<Lights state="off"> light sample </Lights>

More on attributes later!

--------------------------------------------------------------------------------

ELECTRONIC BUSINESS XML (EBXML)

The Electronic Business XML (ebXML) is an initiative aimed at developing a technical framework that will enable XML to be used for electronic exchange of business data. The United Nations body for Trade Facilitation and Electronic Business (UN/CEFACT) and the Organization for the Advancement of Structured Information Standards (OASIS) have joined forces to form ebXML. A major aspect of ebXML is the participation of industry leaders in working groups with the intent of generating DTDs and schemas for e-commerce using XML.

If you're interested in e-commerce, ebXML is something to keep an eye on.

Electronic Business XML (ebXML) http://www.ebxml.org/

--------------------------------------------------------------------------------

DTD ELEMENTS--STANDARD AND NESTED

You declare an element in a DTD using the following format:

<!ELEMENT 'name' ('content-spec')>

where 'name' is the name of the tag and 'content-spec' defines what the tag can contain.

There are two basic element types that make up most XML DTDs (and which will be discussed in the next few tips). The first looks like this:

<!ELEMENT 'name' (#PCDATA)>

and is the standard XML tag that contains text data. For example, this:

<!ELEMENT Address (#PCDATA)>

would define a tag that looks like this:

<Address>any text here</Address>

The other type of element consists of a series of nested elements and looks like this:

<!ELEMENT 'name' ('name','name',etc.)>

In this case, the DTD might look like this:

<!ELEMENT Name (First,Last)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)>

and the resulting XML would look like this:

<Name> <First>John</First> <Last>Doe</Last> </Name>

You can accomplish a lot with these two simple types of elements. PCDATA allows you to define nodes that contain data, and nested elements enable you to provide structure. Of course, XML offers many more options for declaring elements, and we'll dive into more of those in future tips.

--------------------------------------------------------------------------------

DTD ELEMENTS: THE ROOT ELEMENT

Every DTD must define exactly one root element in which all other elements are contained. When using the DOCTYPE specifier to include a DTD, the DOCTYPE name has to match the root element name.

For example:

<?xml version='1.0'?> <!DOCTYPE Order [ <!ELEMENT Order (Customer,Item)> <!ELEMENT Customer (#PCDATA)> <!ELEMENT Item (#PCDATA)> ]>

<Order> <Customer>John Doe</Customer> <Item>Palm V</Item> </Order>

Note that the name following DOCTYPE and the name of the root element are both Order.

--------------------------------------------------------------------------------

DTD ELEMENTS: MORE MODIFIERS

XML provides two modifiers you can use to dictate ordering of nested elements.

The comma (,) specifies sequential ordering. For example:

<!ELEMENT Name (First,Middle,Last)>

Name must contain a First, Middle, and Last element, in that order.

<Name> <First>John</First> <Middle>C</Middle> <Last>Doe</Doe> </Name>

The pipe (|) is called a choice modifier. It allows you to specify a list of choices. For example:

<!ELEMENT Sport (Football | Baseball | Basketball)>

Sport must contain one (and only one) of the tags in the choice list.

<Sport> <Basketball>Hey</Basketball> </Sport>

--------------------------------------------------------------------------------

DTD ELEMENTS: MODIFIERS

Recall that you can declare an XML element that contains nested elements. The following example creates a Name element that must contain a First and Last element:

<!ELEMENT Name (First,Last)>

XML provides several modifiers for specifying the number of occurrences of nested tags. The plus sign (+) indicates one or more, the asterisk (*) indicates zero or more, and the question mark (?) indicates zero or one.

Here is an example:

<!ELEMENT Name (First+)>

Name must contain one or more First elements, for example:

<Name> <First>Joe</First> <First>Joseph</First> </Name>

If no modifier is added, the nested element must appear exactly one time.

<!ELEMENT Name (First)>

In this case, Name must contain exactly one First element.

--------------------------------------------------------------------------------

DTD ELEMENTS: MIXED-CONTENT ELEMENTS

Mixed-content elements can contain a mixture of PCDATA and one or more other elements. A mixed-content element must start with PCDATA followed by | and a list of other element types. In addition, it must end with the zero or more modifier (*).

Here is an example:

<!ELEMENT item (#PCDATA)> <!ELEMENT items (#PCDATA | item)*>

<items> Here is some text <item>item 1</item> and some more text <item>item 2</item> </items>

--------------------------------------------------------------------------------

DTD ELEMENTS: EMPTY ELEMENTS

As you may know, XML has a shortcut syntax for specifying an empty element:

<Name></Name> can also be expressed as <Name/>

Likewise, you can specify that an element must be empty in the DTD element declaration:

<!ELEMENT myFlag EMPTY> <myFlag>On</myFlag> is invalid!

--------------------------------------------------------------------------------

DTD ELEMENTS: ANY ELEMENTS

The ANY element specifier indicates an element that can contain any other defined element or PCDATA. It's declared by using the ANY keyword as the content specifier:

<!ELEMENT anything ANY>

The following is an example of an XML file using an ANY element:

<?xml version="1.0"?>

<!DOCTYPE Customer [ <!ELEMENT Customer (Name)> <!ELEMENT Name ANY> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> <!ELEMENT Middle (#PCDATA)> ]>

<Customer> <Name> <First>John</First> <Last>Doe</Last> </Name> </Customer>

The Name element can contain any of the other defined elements (in any order), as well as PCDATA.

--------------------------------------------------------------------------------

DTD ELEMENTS: AN EXAMPLE

The past few tips have focused on declaring elements in a DTD. Now, we want to provide some examples using these techniques.

<!ELEMENT Order (Customer, Item*)+>

The Order element must contain a Customer element, followed by one or more Item elements. The plus (+) sign indicates that we could have one or more occurrences of the sequence.

<Order> <Customer> .. </Customer> <Item> .. </Item> <Customer> .. </Customer> <Item> .. </Item> <Item> .. </Item> </Order> <!ELEMENT Food (Egg | Apple | Steak)+>

Food can contain one or more occurrence of Egg, Apple, or Steak, in any order.

<Food> <Steak> .. </Steak> <Egg> .. </Egg> <Egg> .. </Egg> </Food>

<!ELEMENT Document (Title, Author+, (Para | Img)+, Summary?)>

Document must contain a Title, followed by one or more Author tags, followed by one or more (Para or Img) tags, optionally followed by a Summary tag.

<Document> <Author> .. </Author> <Title> .. </Title> <Img> .. </Img> <Para> .. </Para> <Para> .. </Para> </Document>

I'm sure you've guessed by now that the options are endless, but we hope these examples have given you a taste of how you can mix and match element modifiers to define the semantics of your XML documents.

--------------------------------------------------------------------------------

DOM AND SAX

XML parsers process XML documents, making the elements and data available to an application via an API. There are two parsing methodologies used for XML: DOM and SAX. An XML parser supports one or both of these APIs. DOM allows you to read the XML data using API calls to walk a nested tree structure. It's useful if your application is concerned with the structure of the document. SAX is based on callbacks; your application is called as each element is encountered while parsing the document. SAX is great for parsing large documents, since all of the data isn't pulled into memory at one time.

--------------------------------------------------------------------------------

DISPLAYING XML

For a large class of applications, XML is a tool for the electronic exchange of data. In these cases, your XML may never be seen by human eyes. However, XML is also commonly used as an intermediary format for data that may be viewed through many types of interfaces, from Web browser to smart-phone. Typically, the content is stored as XML. When it is requested by a client, it is transformed into the client-specific format (HTML for a browser, for example).

You could, of course, use a homegrown solution to transform the XML--but fortunately, there are two widely accepted technologies to fill the void: Cascading Style Sheets (CSS) and Extensible Style Language (XSL). CSS is a relatively simple technology that was originally created for HTML and has been extended for XML. XSL is more like a scripting language and has become quite popular. It is typically mentioned in conjunction with XSL Transformation Language (XSLT). An XSL file describes the visual layout for an XML file and is typically transformed into the client format using XSLT.

--------------------------------------------------------------------------------

CONDITIONAL SECTIONS--PART 1 OF 2

Most programming languages support the notion of conditionally compiling a section of code into the final executable, based on the presence of a keyword. XML supports a similar concept, known as conditional sections. A conditional section can appear only in the external subset of a DTD.

Here's an example of how you might declare a conditional section:

<![INCLUDE [<!ELEMENT DebugRecord (timestamp,description)> <!ELEMENT timestamp (#PCDATA)> <!ELEMENT description (#PCDATA)> ]]>

The include keyword tells the processor to include this section in the DTD. If IGNORE was used, the section would not be included. In the next tip, I'll show you a good use for conditional sections.

--------------------------------------------------------------------------------

CONDITIONAL SECTIONS--PART 2 OF 2

In our previous tip, we showed you how to optionally include a section in a DTD. Today's tip will give you a concrete example of how this might be useful. It's pretty typical in the real world to have a system in production and still have ongoing application development. The sample code below shows how you can optionally include a DebugRecord in your XML files, which can be very useful during development or for debugging production problems.

Here is the XML file. Note the entity debug and the inclusion of debug.dtd:

<?xml version="1.0?> <!DOCTYPE Customer SYSTEM "debug.dtd" [ <!ELEMENT Customer (First,Last,DebugRecord?)> <!ELEMENT First (#PCDATA)> <!ELEMENT Last (#PCDATA)> ...

<!ENTITY % debug "INCLUDE"> ]>

&t;Customer> <First>john</First> <Last>allen</Last> <DebugRecord> <timestamp>88</timestamp> <description>howdy</description> </DebugRecord> </Customer>

and here is debug.dtd:

<![%debug [<!ELEMENT DebugRecord (timestamp,description)> <!ELEMENT timestamp (#PCDATA)> <!ELEMENT description (#PCDATA)> ]]>

By setting the debug entity to INCLUDE, each XML file can optionally include a DebugRecord. Set this entity to IGNORE to leave it out.

--------------------------------------------------------------------------------

COMMON WML ELEMENTS

Today's tip lists some common WML elements and what they do:

<wml> </wml> Root element for all WML decks

<head> </head> Similar to HTML, you can specify optional information about the deck as a whole.

<card> </card> Used to define a card in the deck. Decks can have multiple cards.

<table> </table> Used to define a table, just like HTML.

<setvar> </setvar> Used to set the value of a deck-wide variable. (More on this later.)

<go> </go> Similar to the HTML <a>; specifies a URL to go to based on a user action.

<prev> </prev> Navigate to the previous card.

<input> </input> Creates an edit field for data entry.

These are just a few frequently used elements to give you a flavor of how WML works. I hope I've piqued your curiosity!

--------------------------------------------------------------------------------

COMING TO A THEATER NEAR YOU

That's right--XML is on video. "Introduction to XML" is an executive-level introduction to XML available on VHS tape. If you're looking for nuts and bolts, you'll not get it here--but if you're looking for an excellent introduction to XML and ways to use it to implement systems, this tape is for you. I have to admit, I haven't purchased the video for myself yet--I'm holding out for the DVD version with Dolby Digital Surround Sound.

Introduction to XML (VHS) Director: Bryan L. Bell http://www.amazon.com/exec/obidos/ASIN/0967848806/tipworld

--------------------------------------------------------------------------------

CHARACTER REFERENCE

What do you do if you want to embed in your XML a character that you cannot type on your keyboard? To handle this, XML supports character references. A character reference allows you to specify a number that, when parsed, will be replaced by the equivalent Unicode character. A character reference starts with &#x followed by the hexadecimal character code, or &# followed by the decimal character code. For example, to display a copyright symbol you would use the following:

<copyright>&#169; 2000 My Company , all rights reserved.</copyright>

--------------------------------------------------------------------------------

CDATA

If you need to put into an XML document a chunk of text that's not interpreted as markup or content, the CDATA section is for you. A CDATA section takes the following form:

<![CDATA[ 'your stuff here' ]]>

CDATA sections are basically a convenience for document authors. A common use would be to embed an example of XML that you don't want to be mistaken for markup. For example, using this:

<![CDATA[ <name>Jane</name> ]]>

the <name> tag would not be interpreted by a parser as markup.

--------------------------------------------------------------------------------

BUT IT WON'T RENDER...

You may wonder why we've been ignoring the fact that the documents we've created don't render. When you load them into your browser, all you see is a listing of XML code. We haven't even tried to create a stylesheet.

The point of XML is not to generate pretty documents for publication on a network. XML documents CAN be made to render prettily, but it's almost as if the facilities for doing that are afterthoughts. XML is designed to serve as a container for information, which is to be extracted and used by a computer program.

In the case of the list of drawings we've been working on, a program might look at the list documents, determine which drawings relate to which projects, and present the user with an attractive interface for accessing his or her image files. The real interface-rendering work would be done by the program that read the XML files and extracted information from them.

--------------------------------------------------------------------------------

BOOKS

A few readers have asked me to suggest good XML books, particularly those related to Java. Here are three I recommend:

Java and XML by Brett McLaughlin O'Reilly & Associates June 2000 ISBN: 0596000162 498 pages $39.95 http://www.oreilly.com/catalog/javaxml/ http://www.amazon.com/exec/obidos/ASIN/0596000162/tipworld

XML by Example by Benoit Marchal Que December 1999 ISBN: 0789722429 425 pages $24.95 http://www.amazon.com/exec/obidos/ASIN/0789722429/tipworld

Professional XML by Wrox Team (Editors) Wrox Press January 2000 ISBN: 1861003110 1,169 pages $49.99 http://www.wrox.com/Consumer/Store/Details.asp?ISBN=1861003110 http://www.amazon.com/exec/obidos/ASIN/1861003110/tipworld

--------------------------------------------------------------------------------

BIZTALK: THE BIG PICTURE

BizTalk Server (BTS) is more than a suite of tools for editing DTDs, setting up XSL transformations, and managing document-flow rules. BizTalk also has found its way into many elements of the Microsoft server-side universe. Most notably, Microsoft Site Server, Commerce Edition, has been redesigned to center on BizTalk. Notably, Commerce Server's popular order pipelines have been redesigned around BizTalk document type definitions, which means it will be easier for site developers to use existing schemes as starting points for their projects. Also, the contents of Microsoft Wallet and Microsoft Passport have been rewritten to comply with BizTalk specs, which means such data will fit more easily into commerce systems.

--------------------------------------------------------------------------------

AVOID XML BLOAT

If you're concerned about bloat in your XML, you might not be too happy about using Unicode. Unicode was designed to handle character sets for all languages. It does this by using 16 bits for each character. This is an excellent idea if you're defining a standard for use on a global scale. But if you're trying to build applications that have to be moved across low-bandwidth Internet connections, it's not so great. Fortunately, there is an easy workaround: In addition to 16-bit Unicode, XML supports UTF-8, which uses only 8 bits for each character.

Here is the XML declaration you would use:

<?xml version="1.0" encoding="UTF-8">

--------------------------------------------------------------------------------

ATTRIBUTES AND ENTITIES

Parameter entities are typically used to declare common attributes for elements. For example, if you want every element in your document to have a database id attribute, you could do the following:

Declare the parameter entity:

<!ENTITY % dataid "dbid CDATA #REQUIRED">

Declare an element:

<!ELEMENT person (#PCDATA)>

Declare an attribute list for the element using an entity reference:

<!ATTLIST person %dataid; name CDATA #REQUIRED address CDATA #REQUIRED >

--------------------------------------------------------------------------------

ATTRIBUTE TYPES

There are two datatypes you'll frequently use with attributes: string and enumeration. String types allow you to enter arbitrary text data as the attribute value. Enumeration dictates a list of allowable values. Here's an example:

<?xml version="1.0"?>

<!DOCTYPE Person [ <!ELEMENT Person EMPTY> <!ATTLIST Person sex (male|female) "male" hair CDATA #IMPLIED > ]>

<Person sex="male" hair="brown"/>

The 'sex' attribute can only be 'male' or 'female'--anything else will cause the validating parser to generate an error. The 'hair' attribute, on the other hand, can contain anything--even 'purple'.

--------------------------------------------------------------------------------

ATTRIBUTE DEFAULTS

When declaring an attribute for an XML element, you must always define a default declaration. The term default declaration is a bit misleading, however, since it tells the parser not only what is the default value, but also if its presence is required. A value of #REQUIRED means the attribute must be present, while #IMPLIED means there is no default value. #FIXED indicates that the attribute has a fixed value. If none of these is specified, the default value must be specified with the attribute definition.

Here are some examples, which may clarify these values:

<!ATTLIST Person sex (male|female) "male" hair CDATA #IMPLIED eyes CDATA #REQUIRED ears CDATA #FIXED "yes" >

The default value for 'sex' is 'male'. The 'hair' attribute has no default value. The attribute 'eyes' must always be specified in a <Person> tag. The 'ears' attribute can have only one value: 'yes'.

--------------------------------------------------------------------------------

ASSOCIATING ATTRIBUTES WITH ELEMENTS IN A DTD

We want to modify the DTD that defines the structure of our list of drawings to require each instance of each element to have two attributes. One of the attributes will contain the drawing's name; one of the attributes will contain a URL that refers to the drawing.

The key to adding attributes to an element's definition in a DTD is the <!ATTLIST> instruction. ATTLIST allows you to define a list of attributes and their characteristics. Here's our DTD, modified to include ATTLIST definitions:

<!ELEMENT countryList (floorPlan?, elevation?, crossSection?)>

<!ELEMENT floorPlan EMPTY>

<!ATTLIST floorPlan name CDATA #REQUIRED URL CDATA #REQUIRED >

<!ELEMENT elevation EMPTY>

<!ATTLIST elevation name CDATA #REQUIRED URL CDATA #REQUIRED >

<!ELEMENT crossSection EMPTY>

<!ATTLIST crossSection name CDATA #REQUIRED URL CDATA #REQUIRED >

Save that as drawings3.dtd.

Here, you see three instances of the <!ATTLIST> instruction. Each fills four lines. In each four-line segment, the middle two lines are the most important. On these lines, the first word ("name" or "URL") is the name of the attribute. The second word (CDATA) can be used to put restrictions on what you can assign to the attribute. CDATA is short for "character data" and means that the attribute can be assigned any value. The third word (#REQUIRED) says that every instance of the element must contain this attribute. There are other words that can take the place of #REQUIRED to specify other rules--we'll get to them soon.

--------------------------------------------------------------------------------

ASSOCIATING ATTRIBUTES WITH ELEMENTS

Now that we've defined our attributes in a DTD, let's use them to add value to our XML document. In this modification, we'll take advantage of the DTD's allowance for multiple instances of each element. We'll also--as we're now required to do--add attributes (and values for them) to each of the empty elements. Here's the code:

<?xml version="1.0"?>

<!DOCTYPE drawingList SYSTEM "drawings3.dtd">

<drawingList>

<floorPlan name="First Floor" URL="http://www.yahoo.com" />

<floorPlan name="Second Floor" URL="http://www.yahoo.com" /><elevation name="East Elevation" URL="http://www.yahoo.com" />

<elevation name="South Elevation" URL="http://www.yahoo.com" /> <crossSection name="CS A-A" URL="http://www.yahoo.com" />

<crossSection name="CS B-B" URL="http://www.yahoo.com" /></drawingList>

Save that as projectDrawings3.xml.

This document illustrates how to use empty elements with attributes in an XML document. It's just like HTML, really--you just use the name of the attribute, follow it with an equal sign (=), and put the assigned value in double quotes ("). No problem.

--------------------------------------------------------------------------------

ANY ORDER

I recently received a question that makes for a good XML tip: How do you specify a DTD so that the content of an element must contain a constrained number of elements, but the order of the nested elements doesn't matter? For instance, a <Name> element must contain a <First> and <Last> tag, but in any order. This is what we came up with:

<?xml version="1.0"?> <!DOCTYPE top [ <!ELEMENT top ( (a,b) | (b,a) )*> <!ELEMENT a (#PCDATA)> <!ELEMENT b (#PCDATA)> ]>

In this case, both the following XML documents are valid:

<top> <a>hi</a> <b>hi</b> </top>

<top> <b>hi</b> <a>hi</a> </top>

Basically, we ended up specifying each of the possible combinations on the content model of the top-level tag. Any suggestions of how to do this another way?

--------------------------------------------------------------------------------

ANNOTATED XML SPECIFICATION

Have you ever tried to read the XML specifications? Sometimes I think it's written in a language that no inhabitant of Earth can understand. Fortunately, a great resource is available that provides some interpretation to the language used here: The Annotated XML Specification. This site is a handy companion to the specification and offers a verbatim copy of the XML specification, with tons of links throughout that lead to simple explanations we all can understand.

Annotated XML Specification http://www.xml.com/axml/axml.html

--------------------------------------------------------------------------------

ADDING YOUR OWN NAMESPACE IN XHTML

XHTML documents can use combinations of tags from the XHTML namespace (which are very much like old HTML tags) and tags from other namespaces. You import the XHTML namespace in the <html> tag, like this:

<html xmlns="http://www.w3.org/TR/xhtml1">

Then, to make other namespaces available, you import them with statements like this one, also in the <html> tag:

<html xmlns="http://www.w3.org/TR/xhtml1" xmlns:mytags="http://www.davidwall.com" >

You can add as many namespaces as you care to define.

--------------------------------------------------------------------------------

ADDING A TOP-LEVEL ELEMENT IN AN XML DOCUMENT

Let's modify our XML document so it uses the top-level document we defined in our DTD in the previous tip. Here's what it should look like now:

<?xml version="1.0"?>

<!DOCTYPE drawingList SYSTEM "drawings2.dtd">

<drawingList>

<floorPlan/>

<elevation/>

<crossSection/>

</drawingList>

Save that as projectDrawings2.xml.

Note the reference to the modified DTD file in the DOCTYPE line, and note that the three empty elements are now defined as part of a surrounding drawingList element. You can load this in Microsoft Internet Explorer and not see an error. You'll just see the code, of course--this document isn't meant for attractive display--but you won't get a parsing error. This means we've created a well-formed, valid XML document that involves empty tags.

But what is it good for? Absolutely nothing. In our next tip, we'll see how to make this document more functional.

--------------------------------------------------------------------------------

ADDING A TOP-LEVEL ELEMENT IN A DTD

In our previous tip, you created an XML document (called projectDrawings.xml) that included three instances of empty elements. You defined the empty elements in a DTD correctly, but when you attempt to load projectDrawings.xml into Microsoft Internet Explorer, it gives you an error, specifically:

Only one top level element is allowed in an XML document.

What's that mean? It means there's an error in your DTD. A valid XML DTD has to have a single top-level element. A top-level element is an element that includes everything else. The opening tag of the top-level element is the first tag in the XML document; the closing tag of the top-level element is the last tag in the XML document. We need to alter our DTD to include a top-level element--something you've seen before. Here's the modified DTD:

<!ELEMENT countryList (floorPlan?, elevation?, crossSection?)>

<!ELEMENT floorPlan EMPTY>

<!ELEMENT elevation EMPTY>

<!ELEMENT crossSection EMPTY>

Save that as drawings2.dtd.

We also need to modify the XML document itself--that's for next time.

--------------------------------------------------------------------------------

ACRONYMS GALORE

Any technology worth its salt has a slew of acronyms associated with it. Here's just a few of XML's:

XML (eXtensible Mark-up Language)--The subject of these tips XSL (eXtendended Style Language)--Typically used to convert XML to HTML XT (XML Transformation)--Used with XSL to convert XML to other formats XHTML (eXtended HTML)--Newer spec for enhancing HTML with XML-like features DTD (Document Type Definition)--Defines structure semantics for XML documents DOM (Document Object Model)--One of the parsing methodologies SAX (Simple API for XML)--The other popular parsing methodology

--------------------------------------------------------------------------------

A QUICK PARSE

You may not know this, but the latest version of Microsoft Internet Explorer can parse and display XML. Explorer also will verify that the document is well formed. If it's not well formed, IE displays an error message and tells you the line number where the infraction occurs. Explorer also puts little plus and minus signs next to tags so you can expand and contract branches of the tree. If you're looking for a quick way to view or validate an XML document, pull it up in Internet Explorer.

--------------------------------------------------------------------------------

A BOOK ABOUT THE POTENT COMBINATION OF JAVA AND XML

 

Because they're both so network-centric, the Java programming language

and XML have real potential to create synergy. Hiroshi Maruyama, Kent

Tamura, and Naohiko Uramoto explore this happy union in their "XML and

Java: Developing Web Applications." The three programmers show how to

use Java to maximum effect in XML applications and go into some detail

on the Simple API for XML (SAX) and its power to simplify parsing

operations. They also get into using Java programs as intermediaries

between databases and XML files, and application in which Java's

ability to interact with many platforms comes in handy. The book has

met a positive reception on Amazon.com. If you know Java and XML (this

book won't teach you either), the book is worth a look.

 

"XML and Java: Developing Web Applications"

http://www.amazon.com/exec/obidos/ASIN/0201485435/tipworld

 

 

----------------------------------------------

XML WHITESPACE

Whitespace is one of the few non-markup characters to which XML pays special attention. In a nutshell, the XML specification dictates that whitespace in content should be preserved and passed unmodified to applications, while whitespace within markup or attributes can be removed. In XML documents, whitespace is typically used to distinguish elements and enhance readability. So what is whitespace? XML whitespace consists of these four characters:

ASCII space: hex 20 Tab: hex 09 Carriage Return: hex 0D Line Feed: hex 0A

--------------------------------------------------------------------------------

XML FAQ

I'm always looking for an easy way to get answers quick! Here is a great introductory site for learning the basics of XML, as well as a good starting place if you have an XML question. The site includes how-to stuff for the novice, as well as information for developers and hardcore SGML fans. While the site doesn't provide very in-depth information, I have nevertheless found it to be quite useful.

XML FAQ http://www.ucc.ie/xml/

--------------------------------------------------------------------------------

VERSIONING DTDS

Out in the real world, we always fight the battle between maintaining backward compatibility and enhancing our applications with new features. In the XML world, this usually means maintaining different versions of what is logically one DTD. One method proposed by Benoit Marchal in his book "XML by Example" is to put versioning information in the DTD URI. For example, the URI in the following DOCTYPE declaration includes the version (1.0):

<!DOCTYPE SYSTEM "http://www.myserver.com/dtd/1.0/business.dtd">

This helps do two things: The application can retrieve the URI to figure out what version it's using, and we can easily manage multiple versions of the same DTD without mucking up the DTD itself.

--------------------------------------------------------------------------------

UNPARSED ENTITIES

Entities are the subject of much conversation in the XML specification. Recall that a general entity is much like a substitution macro in most programming languages (I'm thinking of c - define statements). An entity can be defined in a DTD like so:

<!ENTITY myhello 'hola'>

and then referenced in an XML document like this:

<someTag> This is how you say hello in Spanish : &myhello; </someTag>

When the document is parsed, the value returned for <someTag> will be 'This is how you say hello in Spanish : hola'. That is an example of what is called a parsed entity. There's also an entity type that is unparsed. Unparsed entities by definition are not processed by the XML processor. They are used for non-XML content like images or sound files. Here is an example of an unparsed entity:

<!ENTITY myImage SYSTEM "http://www.myserver.com/images/my.gif" NDATA GIF>

In this case, the NDATA GIF part is an XML notation and indicates a data type.

--------------------------------------------------------------------------------

SHOW ME SOME ID

One of the possible types you can specify for an attribute is ID. For those of you familiar with database terminology, an XML ID attribute is like a primary key for a specific instance of an element. Just like a primary key, no two elements can have the same ID. In addition, no element can have more than one ID attribute, and its value must be #IMPLIED or #REQUIRED. It wouldn't make sense to have a default or #FIXED value for an ID, since this would violate its uniqueness. Here is how you declare and use ID attributes:

<?xml version="1.0"?>

<!DOCTYPE Company [ <!ELEMENT Company (employee*)> <!ELEMENT employee (#PCDATA)> <!ATTLIST employee empid ID #REQUIRED> ]>

<Company> <employee empid="e44">john</employee> <employee empid="e84">albert</employee> <employee empid="e94">wesley</employee> </Company>

Note, the empid field is required--and since its type is ID, values must be unique within the document.

--------------------------------------------------------------------------------

REFERENCES TO UNPARSED ENTITIES

Unparsed entities can be used only in attribute values declared to be of type ENTITY or ENTITIES. Accordingly, you CAN'T do the following:

<!ENTITY myImage SYSTEM "http://www.myserver.com/images/me.gif" NDATA Gif>

... <p>My picture: &myImage;</p>

The reason this doesn't work is that an XML parser wouldn't know what to do with embedded, unparsed data (most likely binary, like an image file). XML parsers deal only with markup or text. This begs the question, "What can you do with an unparsed entity?" As usual, this is best answered with an example:

<?xml version="1.0"?>

<!DOCTYPE images [

<!NOTATION GIF SYSTEM "gifviewer.exe">

<!ELEMENT images (image*)>

<!ELEMENT image EMPTY> <!ATTLIST image src ENTITY #REQUIRED> <!ATTLIST image name CDATA #REQUIRED>

<!ENTITY myLogo1 SYSTEM "http://www.myserver.com/images/logo1.gif" NDATA GIF> <!ENTITY myLogo2 SYSTEM "http://www.myserver.com/images/logo2.gif" NDATA GIF> ]>

<images> <image name="logo-right" src="myLogo1" /> <image name="logo-left" src="myLogo2" /> <image name="logo-bottom" src="myLogo1" /> </images>

Note the declaration of a NOTATION (more on this later), and that the src attribute is of type ENTITY. If you try to specify a value other than the name of a defined entity for the src attribute, the document will be invalid.

--------------------------------------------------------------------------------

MARKUP

The word markup is thrown around a lot in XML circles. It's definitely a term that covers a lot of territory in the XML specification. Fortunately, markup is easy to differentiate from text because it always starts with the character &lt; and ends with the character &gt; or begins with the character & and ends with the character ;.

--------------------------------------------------------------------------------

JABBER XML

When I heard about Jabber XML, I thought the name sounded cool so I looked into it. Turns out more than the name is cool. Jabber XML is an instant messaging technology that relies heavily on XML. At its core, Jabber defines an API and abstraction layer based on XML to provide a medium for applications to exchange data. So what does that mean? Well, you can read more about it at the Jabber site:

Jabber http://www.jabber.com/

--------------------------------------------------------------------------------

IDREFS

So, what can you do with an ID attribute? Well, for starters, you can reference it using another type of attribute called IDREF. When you use an IDREF attribute, it must contain a value previously defined in an ID attribute. Here is an example:

<?xml version="1.0"?>

<!DOCTYPE Company [ <!ELEMENT Test (employee*)> <!ELEMENT employee (#PCDATA)> <!ATTLIST employee empid ID #REQUIRED> <!ATTLIST employee boss IDREF #IMPLIED> ]>

<Company> <employee empid="e44">john</employee> <employee empid="e84">john</employee> <employee empid="e94" boss="e89">john</employee> </Company>

Here, the boss attribute is defined as an IDREF, so it must contain a value from an empid tag. Also, notice that the sample uses "e89" in the boss attribute, which is not defined in any empid attributes. Therefore, my document is not valid!

--------------------------------------------------------------------------------

ID NAMES

When I first tried using ID attributes, I was attempting to dump a database table to XML. I thought an ID attribute would provide a cool way to store my database key with the corresponding XML element. I quickly found that ID names are bound by the same rules as XML elements. For instance, you cannot start them with a number. If you use integer types for database keys... well, you see the problem. An easy way to get around this is to append a character to the front of the id. If the database key for Bubba in the Person table is 22, you could use something like this:

<Person id="i22">Bubba</Person>

--------------------------------------------------------------------------------

HEX OR DECIMAL ENTITY REFERENCES

When using character references, you have the option of specifying the character using a decimal or hexadecimal number. Recall that you can embed a character reference in XML using a & character, followed by a number, and terminated with a ;. If the reference begins with &#x, the digits provide a hexadecimal representation of the character's code. If it begins just with &#, the digits provide a decimal representation of the character's code.

--------------------------------------------------------------------------------

CDATA

I'm often asked what the CDATA tag is and why you'd use it. The CDATA tag is used to mark sections in a document that should not be parsed. That is, if you have a block of data you want the XML parser to ignore, you put in a CDATA section. CDATA sections are primarily used to embed XML samples in XML documents. Here is an excerpt from an XML document that contains samples:

<?xml version="1.0"?>

<!DOCTYPE samples [ <!ELEMENT samples (sample*)> <!ELEMENT sample (#PCDATA)> ]>

<samples> <sample>

<![CDATA[ This is a sample of how to use the buffy tag...

<buffy> slayer </buffy> ]]> </sample> </samples>

--------------------------------------------------------------------------------

NOTATIONS

Notations are XML's way of handling unparsed (typically binary) data. XML parsers are simply not equipped to handle binary data formats in a generic, cross-platform manner. In lieu of this, notations are used to associate a name to an external handler application. For instance, the following notation could be used for GIF images files:

<!NOTATION GIF SYSTEM "gifviewer.exe">

assuming you have an application called gifviewer.exe that knows what to do with a GIF image. There really isn't much you can do with this. It simply provides a way for an XML document to say "Hey, I've got some non-XML data, and here is an application that can handle it."

--------------------------------------------------------------------------------

XML ATTRIBUTES

Most of us (myself included) would probably see nothing wrong with the following XML processing instruction:

<?xml version="1.0" standalone="no" encoding="UTF-8"?>

Well, we'd be wrong! Although I have never seen it explicitly called out, the attributes of the XML processing instruction are order dependent and must appear in the following order:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>

Of course, only the version attribute is required; encoding and standalone are optional.

--------------------------------------------------------------------------------

NAMESPACE SPECIFICATIONS

One of the fundamental concepts behind XML can also be one of its biggest drawbacks. The ability to define your own markup language has proven to be a simple, yet powerful, tool--so much so, that new XML vocabularies are popping up faster than the weeds in my backyard. Now that we have all of these nifty XML vocabularies, there is a pretty good chance that somebody is using the same tag names you are. Commonly known as tag name collision, this is a fundamental problem with the XML spec and DTDs in general. What do you do when you want to combine two XML vocabularies and they both define the same element name with different semantic meanings? To address this problem, the W3C introduced XML namespaces, which basically allow you to define a context for a set of tags. The official definition from the XML namespace spec goes like this:

An XML namespace is a collection of names, identified by a URI reference [RFC2396], which are used in XML documents as element types and attribute names.

That's pretty ambiguous and, to be quite honest, I'm not so sure that namespaces really solve the problem. In the next few tips, I'll give you a brief introduction to what namespaces can (and can't) do. In the meantime, check out the specs below:

Namespaces in XML http://www.w3.org/TR/REC-xml-names/

--------------------------------------------------------------------------------

MS AND XML--UPDATED

Stephen Z., one of your fellow tip readers, emailed me a useful tidbit I would like to pass on to everyone: In a previous tip I had mentioned that Internet Explorer 5.5 includes an XML parser (MSXML) and is quite handy when you want to quickly view or validate an XML document. Well, it turns out that IE 5.5 also includes an XSL parser, although neither is fully compliant with the W3C standards. However, Microsoft has made available a beta release with additional goodies and better compliance with the standards.

Thanks, Stephen!

MSXML Parser http://msdn.microsoft.com/xml/general/msxmlprev.asp

--------------------------------------------------------------------------------

JAXP TUTORIAL

To this day, I find the APIs for most XML parsers to be quite confusing. Typically, you can breeze through the documentation of an API and get a pretty good idea of the different classes or functions. XML parsers are supposed to provide a Document Model (DOM) and/or Simple API for XML (SAX) interface for programmers to use, but too often I look at the APIs and it's hard to figure out where to start.

You've probably guessed where I'm going with this... Sun has a nice tutorial for using its Java XML Parser (JAXP). It takes you step by step through using JAXP for DOM and SAX parsing. Give it a try.

Sun's Java XML Parser Tutorial http://java.sun.com/xml/tutorial_intro.html

--------------------------------------------------------------------------------

IBM XSL EDITOR

IBM has been a forerunner in providing great tools, SDKs, and support for the XML community. Just one of the many XML software tools it provides is an XSL/XML editor. IBM's XSL Editor features include the ability to create, import, and edit XSL and XML documents. It allows you to set "break points" on the style sheet and source document. It also features a nifty collapsible tree view. Check it out.

IBM XSL Editor http://www.alphaworks.ibm.com/tech/xsleditor

--------------------------------------------------------------------------------

USING NAMESPACES

You won't get too far with namespaces without talking about qualified names. Qualified names are element names "qualified" with the namespace name, as in this example:

<?xml version ="1.0" ?> <bicycle xmlns:parts="http://www.huffy.com/parts"> <parts:tire /> <parts:brakes /> </bicycle>

This XML document defines a namespace called parts. You can see the element names tire and brakes are prepended by the namespace name followed by a colon. The combination of [namespace name + : + element name] is defined in the XML namespace specification as a qualified name. Using namespaces is as simple as declaring a namespace using xmlns and using qualified names instead of standard element names.

--------------------------------------------------------------------------------

SCOPE OF NAMESPACES

Previously, I told you it's usually a good idea to declare namespaces in the root element. However, there are times when you want to take advantage of the scoping facilities provided by namespaces. Scoping in namespaces is quite similar to scoping of variables in most programming languages. A namespace declaration applies to the element within which it is declared, as well as any contained elements. Here is an example:

<PurchaseOrder xmlns="http://www.amazon.com/"> <Order> <Book xmlns="http://www.amazon.com/books"> <Cost>12.00</Cost> <Title>The Fly</Title> </Book> <CD xmlns="http://www.amazon.com/music"> <Cost region="LX">19.00</Cost> <Title artist="elvis" name="hound dog" /> </CD> </Order> </PurchaseOrder>

This is very similar to the multiple namespaces discussed in our previous tip, except the namespaces are declared at the point they are used instead of within the root element. This means you do not have to use the qualified names.

--------------------------------------------------------------------------------

NAMESPACES AND ATTRIBUTES

The use of namespaces is not limited to elements--you can apply them to attributes as well. Here is an example:

<orderReport xmlns:order="http://myserver.com/orders" xmlns:catalog="http://myserver.com/catalog">

<item order:orderId="1234" catalog:itemNumber="456" /> <item order:orderId="5678" catalog:itemNumber="789" />

</rootElement>

As with element names, this allows you to pull attributes from multiple sources into a single document.

--------------------------------------------------------------------------------

MULTIPLE NAMESPACES

The motivation behind using namespaces is the ability to use elements from multiple vocabularies in a single document. Now, let's take a look at how you declare and use multiple namespaces. Although not required, it's a good idea to declare your namespaces in the root element, as in this example:

<PurchaseOrder xmlns:orders="http://www.amazon.com/" xmlns:books="http://www.amazon.com/books" xmlns:music="http://www.amazon.com/music"> <orders:Order> <books:Book> <books:Cost>12.00</books:Cost> <books:Title>The Fly</books:Title> </books:Book> <music:CD> <music:Cost region="LX">19.00</music:Cost> <music:Title artist="elvis" name="hound dog" /> </music:CD> </orders:Order> </PurchaseOrder>

You can see that a <PurchaseOrder> declares three namespaces: orders, books, and music. The books and music vocabularies have elements that would collide (Cost and Title) if not for the use of namespaces.

--------------------------------------------------------------------------------

DECLARING NAMESPACES

Declaring a namespace is very straightforward. To do so, the XML namespace specification provides us with the reserved word xmlns. It can be added to any element as an attribute, but it is typically specified within the root element. The attribute value should be a unique URI, as in this example:

<rootElement xmlns="http://www.myserver.com">

You may wonder "Why the URI?" Well, it's quite simple. The major design goal of XML namespace is to guarantee uniqueness of an element or attribute name, which boils down to guaranteeing uniqueness of the namespace. For this, the W3C relies on domain names, which by definition are unique. The use of the URI as the xmlns attribute value serves no other purpose than to provide a unique string that can be associated with the namespace. This is best illustrated by using the alternate form of the xmlns attribute:

<rootElement xmlns:myNamespace="http://www.myserver.com">

In this case, the namespace "myNamespace" is made unique by association with the URI.

--------------------------------------------------------------------------------

XML PARSER TEST SUITE

One simply cannot overstate the importance of software suites conforming to the specifications they support. XML technology can be severely crippled if XML parsers do not fully, and correctly, support the specifications. So how do you know if your XML parser conforms to the specification? Fortunately, OASIS has made available on its Web site a conformance test suite. It contains contributions from some major players in the XML industry, including SUN, Fuji Xerox, OASIS/NIST, and James Clark.

XML Conformance Test Suite http://www.oasis-open.org/committees/xmltest/testsuite.htm

--------------------------------------------------------------------------------

VB XML SAX

We recently ran a series of tips on using the Microsoft XML parser with Visual Basic to parse XML documents. For that discussion, we were using the DOM method of parsing. Recall that there are two popular XML parsing methods, DOM and SAX. DOM uses an in-memory tree-like model with APIs that let you navigate the nodes. SAX uses an event-based method where you process the XML document via callbacks. The latest version of MSXML supports SAX in addition to DOM and is discussed at the URL below:

JumpStart for Creating a SAX2 Application with Visual Basic http://msdn.microsoft.com/xml/articles/Vbsax2jumpstart.asp

--------------------------------------------------------------------------------

NAMESPACES AND VALIDATION

You may have noticed that I haven't mentioned anything about validating documents that use namespaces. This is where namespaces fall short of being useful by themselves. The XML namespace specification does not require support for DTDs or validation. In fact, without some DTD trickery, it's a one-or-the-other situation. Namespaces can be useful, but so is validation. It's up to you to decide which makes sense for your application. Namespaces used in conjunction with XML Schema go a long way to rectifying this problem. More on that later!

--------------------------------------------------------------------------------

NAMESPACE URI IS CASE SENSITIVE

Most of us are accustomed to being able to enter www.yahoo.com or WWW.YAHOO.COM and getting to the same site. That's because URLs are not case-sensitive. On the other hand, a namespace prefix is case-sensitive, so the following two namespaces are not identical:

<rootElement xmlns="http://www.myserver.com">

<rootElement xmlns="http://www.MyServer.com">

For XML namespaces, case matters!

--------------------------------------------------------------------------------

NAMESPACE PREFIXES AND URIS

Being able to declare and use a namespace prefix is neat but doesn't buy you much without the use of URIs. You're free to use any name you like for a prefix, which leads you right back to the problem of name collision if somebody else uses the same prefix. This is where URIs pick up the slack. URIs are by definition unique, which makes your namespace unique regardless of the prefix. Here is an example:

<rootElement> <item xmlns:ref="http://www.myserver.com"> <item xmlns:ref2="http://www.myserver.com" /> <rootElement>

The namespaces ref and ref2 are actually the same namespace since they use an identical URI.

--------------------------------------------------------------------------------

USING ENTITIES AS ATTRIBUTE VALUES

Today we'll take a look at how you use entities as values for attributes. You're probably familiar with how to define an attribute for an element that can contain plain text data, as in this example:

<!ELEMENT Dog EMPTY> <!ATTLIST Dog breed CDATA #REQUIRED>

<Dog breed="any text" />

What you might not know is that you can reference an ENTITY for the attribute value, as shown here:

<?xml version="1.0"?>

<!DOCTYPE Test [ <!ENTITY howdy "hello"> <!ELEMENT Test EMPTY> <!ATTLIST Test greeting CDATA #REQUIRED > ]>

<Test greeting="&howdy;" />

In this case, an XML parser will return hello for the value of the attribute greeting.

--------------------------------------------------------------------------------

PRESERVING WHITESPACE

When you're editing XML documents, it is common to use whitespace characters (space, tab, carriage-return, line-feed) to distinguish markup and enhance readability. In addition, the XML specification dictates that a parser must report the existence of any whitespace, even though most applications will simply ignore it. However, for the occasion when whitespace should be preserved, you can use a special attribute named xml:space. The appearance of this attribute, and its corresponding value, is an indication to applications that the whitespace within the element should be preserved. A good example of when whitespace has meaning would be a source code listing, as in this example:

<sourceCodeListing xml:space="preserve">

// silly algorithm that does nothing

int x, z;

for (int i=0; i<1; i++) { x = x + (i * 3)

for (int j=0; j< x; j++) { z = z + j; } } </sourceCodeListing>

In this case, you would definitely want to preserve whitespace to maintain readability.

--------------------------------------------------------------------------------

NOTATION ATTRIBUTES

The XML specification defines the concept of notations to deal with external (typically binary) data. Simply put, notations are a way to associate a name with an external application. They're kind of like file type associations in Windows, in other words. Here are a few examples:

<!NOTATION bmp SYSTEM "paint.exe"> <!NOTATION gif SYSTEM "gif.exe">

Here, then, is an example using a notation as an attribute value:

<!ATTLIST img type NOTATION (bmp|gif) "bmp" >

This declaration says that the img element has an attribute named type whose value is NOTATION. The type attribute is restricted to bmp or gif, with the default being bmp. When using a validating parser, the possible values must match previously defined notations.

--------------------------------------------------------------------------------

ENTITIES AND ATTRIBUTES

Previously, we've seen how to use standard, parsed entities as substitution text in attribute values. Here, then, is an example of code that shows how to reference external, unparsed entities as attribute values:

<?xml version="1.0"?>

<!DOCTYPE Test [ <!NOTATION gif SYSTEM "gifapp.exe"> <!ENTITY logo SYSTEM "mylogo.gif" NDATA gif> <!ELEMENT Test EMPTY> <!ATTLIST Test img ENTITY #REQUIRED > ]>

<Test img="logo"/>

There are several things that have to happen to make this work:

1. Declare a notation to be used with the entity. 2. Declare an unparsed, external entity. 3. Declare an attribute with type ENTITY. (This is a key difference from the previous example, which used CDATA!) 4. Use the entity name as the attribute value, i.e. img="logo".

This technique may be useful if you have a construct that could potentially change, but you need to reference it in many places. Also, note that the entity reference does not have the usual & (ampersand) and ; (semi-colon) characters around it.

--------------------------------------------------------------------------------

XML SCHEMA

As of October 24, 2000, XML Schema has finally become a W3C Candidate Recommendation. A Candidate Recommendation means it has been released for public consumption and feedback. Recall that XML Schema is a proposed replacement for DTDs. It allows you to specify document vocabularies without some of the drawbacks of DTDs. Most of the current work on XML Schema is based on combining the best features of a group of competing, proprietary schema technologies. It's taken a while to get this far, and we hope to see the final W3C recommendation before long!

XML Schema http://www.w3.org/XML/Schema.html

--------------------------------------------------------------------------------

VOICEXML

VoiceXML is a nifty new technology that uses XML to enable voice interaction between wireless devices and middle-tier data services. Client-side technologies for wireless devices are a hotbed of debate and competition. One of the front-runners is WAP, which uses WML to define client-side interactivity. WML is an XML-based language that is analogous to HTML for WAP devices. VoiceXML is similar to WML in that it uses XML to define the user interface. It differs in that it uses voice instead of point-and-click. For a great introduction to VoiceXML, check out this article:

An Introduction to VoiceXML http://www.wirelessdevnet.com/training/voicexml/voicexmloverview.html

--------------------------------------------------------------------------------

PARENTHESES IN ELEMENT DEFINITIONS

When building DTDs for anything but simple examples, you quickly find that it can be an exercise in futility to try modeling all of the possible combinations and nesting of elements for "real" business data. You can, however, use parentheses in element definitions to make life a little simpler (or at least make your DTDs easier to follow). Consider this example:

<!ELEMENT data (d1, (d2 | d3), (d4 | (d5,d6,d7))) >

This may look a bit confusing at first, but as you start to dissect it, things get a little clearer. The <data> element must contain a <d1> element, followed by <d2> OR <d3>, followed by <d4> OR the combination of <d5>, <d6>, <d7> (in that order).

--------------------------------------------------------------------------------

EASY WAY TO FIND A NODE

Sometimes parsing XML is akin to the needle-in-a-haystack problem, especially with large, complex documents. Fortunately, the DOM specification gives a simple way to narrow a search in an attempt to select a single node. The method is available from the DOMDocument class and is appropriately called selectSingleNode. Here is a sample using VB and MSXML:

Set node = item.selectSingleNode("myNode") If Not node is Nothing Then '// code here Else '// error stuff here End If

--------------------------------------------------------------------------------

A GREAT WEB SITE

The XML Directory of Resources is a great Web site for hunting down XML stuff and getting the latest XML news. It's laid out similar to Yahoo!, with links to tools, reference materials, tutorials, and happenings in the industry. I check it frequently and recommend you do the same.

XML Directory http://www.xmldir.com/

--------------------------------------------------------------------------------

XML SCHEMA: SIMPLETYPE

Previously, we saw how to use XML Schema to define simple text elements without attributes. Today, we'll start to look at what facilities XML Schema provides for defining named types. The simplest way to declare a named type is to use the simpleType element. Here is an example:

<simpleType name="nameType"> <restriction base="string" /> </simpleType>

<simpleType name="ageType"> <restriction base="positiveInteger" /> </simpleType>

Accordingly, the following code defines elements that use our named types:

<element name="first" type="nameType" /> <element name="age" type="ageType" />

As you can see in our sample code, simpleType contains an attribute that defines the name of the type and contains a restriction element to define the data type. In our next tip, we'll look at how you start building more complex types and elements.

--------------------------------------------------------------------------------

XML SCHEMA TO DISPLACE DTD

If you haven't heard about it before, you soon will. XML Schema has been talked about for quite some time and is now poised to displace DTDs as the preferred way to specify an XML vocabulary. XML Schema has been developed to plug a few holes with DTDs. For instance:

The DTD syntax is different from XML and therefore requires mastering another notation. DTDs don't support data typing; for example, there is no way to specify that an element must contain an integer. DTDs do not support inheritance, which would simplify their reuse. DTDs do not support namespaces. The W3C released a candidate recommendation for the XML Schema specification in October 2000, and you can check it out at the site below. We'll also spend some time over the next few days looking at what you can do with XML Schema.

XML Schema http://www.w3.org/XML/Schema.html

--------------------------------------------------------------------------------

XML SCHEMA BY EXAMPLE

I think the best way to learn any new technology is to start with simple examples. With that in mind, let's begin with a look at the most common declaration you will see in a DTD--the creation of elements. Here is a simple example that declares leaf elements using XML Schema:

<element name="first" type="string"/> <element name="age" type="positiveInteger"/>

A DTD equivalent would look like this:

<!ELEMENT first (#PCDATA)> <!ELEMENT age (#PCDATA)>

Both of these examples declare two elements: <first> and <age>. The first thing to note is that XML Schema actually uses XML to specify the elements. Also, note the use of specific data types (string and positiveInteger) for the elements.

--------------------------------------------------------------------------------

XML OVER HTTP

We focus a lot on creating, editing, validating, and displaying XML documents, but one thing you don't hear much about is transporting those documents. Much of the usefulness (and hype) around XML relates to its ability to express business data in a self-describing, platform-independent manner. That is all well and good, but for most enterprise business models you need an efficient way to automatically move your XML to the various applications that process it. Add to that the likelihood of these applications existing on different platforms, on different networks, in different geographic locations, and... well, you might have a problem.

The simplicity of XML lends well to transport by any number of existing protocols, but I recommend you consider HTTP. HTTP has found fame in transporting HTML documents between Web servers and browsers, but at its core, HTTP is a simple protocol that doesn't care what kind of data it's transferring from the server. That information can just as readily be XML as HTML. Combine this with the plethora of server-side technologies (ASP, Servlets, etc.) that process HTTP requests, and you've got a great solution for transporting XML.

--------------------------------------------------------------------------------

XML NAMESPACE PREFIX

You may recall that the declaration of a namespace includes a namespace prefix and a namespace name. Consider the example

<root xmlns:pre='http://myserver.com'>

where 'pre' is the prefix and 'http://myserver.com' is the name.

Keep in mind that there are some constraints to the naming of a prefix: It must begin with an underscore or a letter (a..z, A..Z). In addition, the prefix cannot begin with the sequence of letters x, m, l in any combination of upper or lower case. The prefix 'xml' is by definition associated with the name

http://www.w3.org/XML/1998/namespace

Also, the prefix 'xmlns' is used for binding namespace and is also reserved.

--------------------------------------------------------------------------------

XML DATABASE PRODUCTS

I have to be honest: I've found XML's place in the database world a bit confusing. I guess it's kind of cool to store XML in a database column, but that's really not much more than storing text in a BLOB. In the world of middleware messaging and B2B data exchanges, XML's benefits are easily perceived and instantly tangible. But XML in the database...? I dunno. As usual, I found enlightenment from one of our peers. Robert Bourret has written an article titled "XML Database Products." Check it out here:

XML Database Products http://www.rpbourret.com/xml/XMLDatabaseProds.htm

--------------------------------------------------------------------------------

WEB SITE: XML COVER PAGES

The XML Cover Pages is a great Web site for XML developers. It is sponsored by the OASIS, which has been a major player in the XML world from day one. (The name, by the way, comes from Managing Editor Robin Cover.) The site is a one-stop shopper's dream for articles, news, and pointers to XML resources and related technologies. Lots of articles and extensive coverage of all the technologies that surround XML make this worth a regular stop.

XML Cover Pages http://www.oasis-open.org/cover/

--------------------------------------------------------------------------------

NAMESPACES AND ATTRIBUTES

You're probably aware that an attribute cannot be specified twice for a single element. For example, the following is NOT valid:

<root x="1" x="2">

This gets a little more interesting when dealing with namespaces. For example, the following XML demonstrates some different combinations of namespaces and attributes that are all invalid:

<root xmlns:ns1="http://myserver.com" xmlns:ns2="http://myserver.com" >

<child ns1:x="1" ns1:x="2" /> -- invalid for the same reasons if you were not using a namespace <child ns1:x="1" ns2:x="2" /> -- can't do this because ns1 and ns2 are bound to identical namespace name

</root>

As a point of interest, note that the uniqueness of the namespace name does not apply to the default namespace. For example, here a default namespace is declared identical to ns1:

<root xmlns:ns1="http://myserver.com" xmlns="http://myserver.com" > <child ns1:x="1" x="2" /> -- this is ok </root>

--------------------------------------------------------------------------------

LOOK WHO USES XML

Are you curious about who is using XML, or maybe how they are using it? XML Tree is a great Web site for finding XML content providers, and it even contains a catalog of sites that provide XML content via HTTP requests. One of XML Tree's best features is the ability to test the content provider by executing a request and showing you the XML it returns.

XML Tree http://www.xmltree.com/

--------------------------------------------------------------------------------

EASY WAY TO FIND A NODE

Call me lazy, but I'm always looking for an easier way to get things done. Not too long ago, I did a tip on how you can easily locate a node in a document tree using Visual Basic and Microsoft's XML parser. With thanks to one of our readers, here is another way to do the same thing:

Set nodes = item.getElementsbyTagName("description") If nodes.length > 0 Then 'Do one thing Else 'Do another End If

--------------------------------------------------------------------------------

XML SCHEMA: SPECIFYING TYPES AND CONSTRAINTS

One of XML Schema's greatest benefits over DTDs is the ability to specify the data types and constraints of allowable content for a given tag. With DTDs, pretty much everything is a string and it's left to the application to validate that an integer field really does contain an integer. With XML Schema, however, you can build into the document structure sanity checks that will occur when the document is run through a validating parser. Most of this is done using the <restriction> tag.

The following example defines a named type that must contain an integer with a value between 10 and 20, inclusive. Note that this was impossible to do in a DTD!

<simpleType name="myInteger"> <restriction base="integer"> <minInclusive value="10" /> <maxInclusive value="20" /> <restriction> <simpleType>

This is just the tip of the iceberg. XML Schema has an even more extensive ability to constrain the content of elements, as we'll see in upcoming tips.

--------------------------------------------------------------------------------

XML SCHEMA: PUTTING IT ALL TOGETHER

Today's tip will bring together some sample code we've been building over the past few tips. For simplicity, we've been discussing pieces of XML Schema without showing how they fit into a complete XML Schema document. Now, let's look at the following code, which is a combination of samples we've used in recent tips:

<?xml version="1.0"?> <xsd:schema xmlns:xsd="http://www.w3.org/2000/08/XMLSchema">

<!-- declare elements --> <xsd:element name="person" type="personType" />

<!-- declare attributes --> <xsd:attribute name="country" type="xsd:string" />

<!-- declare complex types --> <xsd:complexType name="nameType"> <xsd:sequence> <xsd:element name="first" type="xsd:string" /> <xsd:element name="middle" type="xsd:string" /> <xsd:element name="last" type="xsd:string" /> </xsd:sequence> </xsd:complexType>

<xsd:complexType name="addressType"> <xsd:sequence> <xsd:element name="city" type="xsd:string" /> <xsd:element name="state" type="xsd:string" /> </xsd:sequence> <xsd:attribute ref="country" /> </xsd:complexType>

<xsd:complexType name="personType"> <xsd:sequence> <xsd:element name="name" type="nameType" /> <xsd:element name="address" type="addressType" /> </xsd:sequence> </xsd:complexType>

</xsd:schema>

Since XML Schema are themselves XML documents, note the inclusion of the xml declaration. Also, it is typical to use a namespace for elements belonging to the XML Schema vocabulary. By convention, the namespace prefix 'xsd' is used, but it is not required.

--------------------------------------------------------------------------------

XML SCHEMA: INDICATING QUANTITY

XML Schema provides more flexibility than DTDs when defining the cardinality of elements. Recall that the DTD syntax allows you to indicate an occurrence of only one, zero or more (*), optional (?), or one or more (+). For example, take a look at this code:

<!ELEMENT order (customer, item+, mail_to?, bill_to)>

Here, we've defined an <order> element that must contain one <customer> tag, one or more <item> tags, optionally a <mail_to> tag, and one <bill_to> tag.

XML Schema provides the same functionality using the attributes 'minOccurs' and 'maxOccurs'. Consider the following example, which accomplishes the same as the above DTD but uses XML Schema:

<complexType name="orderType"> <sequence> <element name="customer" type="string" /> <element name="item" type="string" minOccurs="1" maxOccurs="unbounded"/> <element name="mail_to" type="string" minOccurs="0" /> <element name="bill_to" type="string" /> </sequence> </complexType>

Note that minOccurs="0" makes the element optional. Also, the default value for both is 1, which means if you don't specify otherwise, one occurrence is required.

--------------------------------------------------------------------------------

XML SCHEMA: COMPLEXTYPE

Continuing with our introduction to XML Schema, let's look at how you define elements that contain sub-elements. XML Schema has two basic ways of defining types: the simpleType tag and the complexType tag. We've seen how the simpleType tag is used to declare leaf elements, so let's look at how to use the complexType tag to begin adding structure. Consider the following example:

<complexType name="nameType"> <sequence> <element name="first" type="string" /> <element name="middle" type="string" /> <element name="last" type="string" /> </sequence> </complexType>

<complexType name="addressType"> <sequence> <element name="city" type="string" /> <element name="state" type="string" /> </sequence> </complexType>

<complexType name="personType"> <sequence> <element name="name" type="nameType" /> <element name="address" type="addressType" /> </sequence> </complexType>

<element name="person" type="personType" />

This sample defines three types: nameType, addressType, and personType. It also defines one element--person--that is of type personType. You should note that defining named types does not define an element in the schema. In other words, the above schema doesn't provide for tags like addressType in the XML. Named types are a convenient feature that make the schema more readable and allow for reuse of data types.

Here, then, is some XML that is valid according to our schema:

<person> <name> <first>Harry</first> <middle>T</middle> <last>Potter</last> </name> <address> <city>Los Ageles</city> <state>California</state> </address> </person>

--------------------------------------------------------------------------------

XML SCHEMA: CHOICE AND ALL

Previously, we've seen how to use the sequence tag within the complexType tag to indicate a required sequence of sub-elements. In addition to the sequence tag, you can use the choice and all tags to further refine your options. Here is an example of each:

<choice> <element name="a" type="string" /> <element name="b" type="string"/> </choice>

<all> <element name="a" type="string" /> <element name="b" type="string"/> </all>

The choice tag allows only one of its children to appear in an instance, whereas all defines that all child elements in the group may appear, at most, once in any order.

--------------------------------------------------------------------------------

XML SCHEMA: ADDING ATTRIBUTES

Today's tip demonstrates how to add attributes to elements using XML Schema. Consider the following example:

<complexType name="addressType"> <sequence> <element name="city" type="string" /> <element name="state" type="string" /> </sequence> <attribute name="country" type="string" /> </complexType>

<element name="address" type="addressType" />

Here, we've added a 'country' attribute to 'addressType'. As a result, the XML would look like this:

<address country="US"> <city>Los Angeles</city> <state>California</state> </address>

Attributes can also be declared, and referenced, outside the context of a particular complexType. This is quite useful if you need to reuse an attribute in different elements. Here is an example:

<attribute name="country" type="string" />

<complexType name="addressType"> <sequence> <element name="city" type="string" /> <element name="state" type="string" /> </sequence> <attribute ref="country" /> </complexType>

Here, the 'ref' attribute is used to reference the previously declared attribute.

--------------------------------------------------------------------------------

XML SCHEMA

If you've been reading recent tips, I've likely convinced you that XML Schema offers a much richer set of features for defining XML vocabularies than what could be done with DTDs. Schema are simple and flexible, and they offer critical features such as data typing and inheritance--both of which DTDs lack. I believe one of the biggest improvements over DTDs is the ability to describe an XML vocabulary using XML. This opens the door for using existing tools for validating and processing XML. In the coming weeks, I will continue to include the occasional XML Schema tip, but those of you anxious to get started should check out the Web sites listed below:

W3C XML Schema Specifications http://www.w3.org/XML/Schema.html

XMLschema.com http://apps.xmlschema.com/

IBM alphaWorks http://alphaworks.ibm.com/

--------------------------------------------------------------------------------

XML AND DELPHI

One of the best features of XML is its total independence from any particular platform or development tool. I've seen XML applications written in many languages, including Java, Visual Basic, and C++. I've also had the chance to work with Borland's development tool, Delphi. Although Delphi hasn't garnered the popularity of Java and Visual Basic, I have found that those of us who love it REALLY love it! If you're one such zealot and want to find some resources for parsing XML using Delphi, check out these Web sites:

Open XML http://www.philo.de/xml/

Basic XML Parsing in Delphi http://homepages.borland.com/ccalvert/TechPapers/Delphi/XMLSimple/XMLSimple.html

--------------------------------------------------------------------------------

NEW XML FORUM

Sun has started a new forum for XML users. I frequently pop in on it to get answers to my own XML questions. Call me old-fashioned, but I've always found newsgroups to be a valuable source of information--and a great way to get your finger on the pulse of a technology.

Dot-Com Builder XML Forum http://dcbforum.sun.com/list/discuss.dcb.xml

--------------------------------------------------------------------------------

ORACLE XML TOOLS

The Oracle XML Developer's Kit (XDK) contains a set of tools that cover the complete life cycle of an XML document--including creating, editing, transforming, and parsing. I've been burnt a few times playing with freeware XML tools and have found the Oracle toolkit to be a welcome relief as a full-featured, commercial quality toolset.

Oracle XML Developer's Kit (XDK) http://technet.oracle.com/tech/xml/

--------------------------------------------------------------------------------

XML DECLARATION API

I'm sure everyone knows what an XML declaration is. If you don't, here is an example to refresh your memory:

<?xml version="1.0" ?>

I've frequently referred to this as an XML Processing Instruction. Thanks to one of our readers, however, I've now been set straight. As you may recall, a Processing Instruction (PI) is the XML specifications concession that things are not always neat and tidy in the real world. It gives document authors an opportunity to issue instructions to a particular application that may process the document. The PI takes the form

<? PITarget 'stuff' ?>

where PITarget is a name that identifies the application, and 'stuff' can be pretty much anything and is required to be passed through to the application. While the XML declaration uses the same notation, it is explicitly called out in the specification as reserved for standardization and is not a Processing Instruction.

horizontal rule

Questions?

Just Check out some of our sponsors

Shop at BestPrices.Com!

web server downtime monitoring

HALO Computer Technology

COPYRIGHT 1998 - 2009 All names used are Trademarks of the respective companies

Home ] Up ]

Send mail to CompanyWebmaster  with questions or comments about this web site.
Copyright © 2009 HALO Computer Technology
Last modified: 03/11/09