Welcome to WebHeadStart.org

Web Technologies

Sponsored By

WebHeadStart.org is currently in beta.
Please pardon our appearance as we work to provide you with the most comprehensive reference on today's web technologies.

Interested in advertising on WebHeadStart? Become an advertising partner today!

[WWW-HTML Mailing List Archive Home] [Messages By Thread] [Messages By Date]

Re: Identifying (X)HTML without MIME

From: Asbjørn Ulsberg <asbjorn@tigerstaden.no>
Date: Mon, 08 Nov 2004 22:42:58 +0100
To: trejkaz@xaoza.net, "James Cerra" <jfcst24_public@yahoo.com>
Cc: www-html@w3.org
Message-ID: <opsg509wfxuvpchu@quark>

On Tue, 9 Nov 2004 08:05:01 +1100, Trejkaz Xaoza <trejkaz@xaoza.net> wrote:

> If it doesn't start with "<?xml" but has a DOCTYPE near the top, then  
> it's SGML, and you perform similar rules based on what you see after it.

As far as I see it, an XHTML document can start like this:

   1. <?xml ...>
   2. <!DOCTYPE ...>
   3. <html xmlns="http://...">

Not all are valid prologs of an XHTML document, but some are as XML  
documents. The XML declaration is nonetheless optional, so any valid XHTML  
document may start with just a DOCTYPE. So may HTML documents as well, so  
then you actually have to parse the DOCTYPE to know what type of (X)HTML  
document it is.

What I'd do, is the following:

   1.   Trigger XML parsing mode if:
   1.1. The document starts with <?xml ...>
   1.2. The document element is <html> with an attribute called 'xmlns'
        whos value is 'http://www.w3.org/1999/xhtml '.

   2.   Trigger SGML parsing mode if:
   2.1. The document starts with a DOCTYPE that says it's HTML:

        <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" ...>
        <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" ...>
        <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" ...>

        You may of course cater for more HTML versions than 4.01, but that
        would be just the same; add the DOCTYPE's to your checker.

   2.2. The document element is <html> with no 'xmlns' attribute.

I could have added the point «1.3. The document starts with a DOCTYPE that  
says it's XHTML», but that isn't necessary as all XHTML documents must  
have the <html> elment in the XHTML namespace.

I would also do the check in this order, so that you fall back to SGML if  
any XHTML checks fail. Falling back to XML from SGML would give a much  
higher fail-rate, I think.

-- 
Asbjørn Ulsberg         -=|=-        asbjornu@hotmail.com
«He's a loathsome offensive brute, yet I can't look away»
Received on Monday, 8 November 2004 21:41:45 GMT
Valid XHTML 1.0! Valid CSS! Site Map | Privacy Policy | Terms of Use | WebHeadStart.org © 2005 All Rights Reserved.