Welcome to WebHeadStart.org

Web Technologies

Sponsored By

WebHeadStart.org is currently in beta.
Please pardon our appearance as we work to provide you with the most comprehensive reference on today's web technologies.

Interested in advertising on WebHeadStart? Become an advertising partner today!

[WWW-HTML Mailing List Archive Home] [Messages By Thread] [Messages By Date]

Re: Problem in publishing multilingual HTML document on web in UTF-8 encoding

From: рд?рд╢ре?рд╖ рд╢реБрд?реНрд▓рд╛ \ <wahjava@gmail.com>
Date: Sat, 3 Jun 2006 11:58:44 +0530
Message-ID: <d9a03f10606022328r19ed92edsd9b4bd3ad96d586a@mail.gmail.com>
To: www-html@w3.org
On 6/3/06, Philip TAYLOR <P.Taylor@rhul.ac.uk> wrote:
>
>
> рд?рд╢ре?рд╖ рд╢реБрд?реНрд▓> "Wah Java !!" wrote:
>
>  > If UA (user agent), finds a "Content-Type" in <meta> tag in HTML document,
> > it should use that to identify the document's character encoding,
> > because it is a part of the document. The server's reply should only
> > be considered when document doesn't explicitly states its character
> > encoding.
>
> Much as I think your argument has merit, I cannot see how you
> can resolve the following paradox : suppose, in some as-yet
> unknown encoding (say, ISO-9999-9), the character positions
> which in ISO-8859-1 correspond to the letters "M", "E", "T"
> and "A" correspond instead to the letters "B", "O", "D" and "Y".
> Now the server says that the document is in ISO-8859-1,
> so when the UA sees
>
>         <META http-equiv="content-type" content="text/html; charset=iso-9999-9">
>
> it interprets the META directive as you would wish.  But in so
> doing, it starts to parse the document on the basis of it being
> expressed in ISO-9999-9, whereupon it discovers that there wasn't
> a META directive at all, there was, rather, a(n ill-formed) BODY
> tag. But because it now knows there /was/ no META directive, it
> parses using ISO-8859-1.  But that means there IS a META
> directive.  And so on.  I'm sure you see the problem ...
>
> Philip Taylor
>
-- begin excerpt --
To address server or configuration limitations, HTML documents may
include explicit information about the document's character encoding;
the  META element can be used to provide user agents with this
information.
For example, to specify that the character encoding of the current
document is "EUC-JP", a document should include the following  META
declaration:
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">

The META declaration must only be used when the character encoding is
organized such that ASCII-valued bytes stand for ASCII characters (at
least until the META element is parsed).  META declarations should
appear as early as possible in the  HEAD element
-- end excerpt --

Copied from: http://www.w3.org/TR/html4/charset.html#h-5.2.2 

An excerpt from HTML 4.01 specification. So in other words you've to
organize your content such that your content till <META> tag is ASCII.
I think this is what this excerpt means.

Thanks
Ashish Shukla
-- 
Ashish Shukla "Wah Java !!"
рд?рд╢ре?рд╖ рд╢реБрд?реНрд▓>

  ,= ,-_-. =.
 ((_/)o o(\_))
  `-'(. .)`-'
      \_/

My blah, blah, blah at http://wahjava.blogspot.com/ 
My webpages at http://www.geocities.com/wah_java_dotnet/ 

My GPG Fingerprint: BBA9 AD7D BA71 61EB BE46 8CF5 E44A C663 A03F 4261

My GPG keys at
http://keyserv.nic-se.se:11371/pks/lookup?op=get&search=0xA03F4261 
--
Supercomputers are for people too rich and too stupid to design
efficient algorithms -- Steven Skiena, Department of Computer Science,
SUNY Stony Brook.
Received on Saturday, 3 June 2006 06:28:55 GMT
Valid XHTML 1.0! Valid CSS! Site Map | Privacy Policy | Terms of Use | WebHeadStart.org © 2005 All Rights Reserved.