![]() |
|
![]() |
||
![]() |
![]() |
|
[WWW-HTML Mailing List Archive Home] [Messages By Thread] [Messages By Date] Re: Problem in publishing multilingual HTML document on web in UTF-8 encoding
From: рд?рд╢ре?рд╖ рд╢реБрд?реНрд▓рд╛ \ <wahjava@gmail.com>
Date: Sat, 3 Jun 2006 11:58:44 +0530 Message-ID: <d9a03f10606022328r19ed92edsd9b4bd3ad96d586a@mail.gmail.com> To: www-html@w3.org On 6/3/06, Philip TAYLOR <P.Taylor@rhul.ac.uk> wrote: > > > рд?рд╢ре?рд╖ рд╢реБрд?реНрд▓> "Wah Java !!" wrote: > > > If UA (user agent), finds a "Content-Type" in <meta> tag in HTML document, > > it should use that to identify the document's character encoding, > > because it is a part of the document. The server's reply should only > > be considered when document doesn't explicitly states its character > > encoding. > > Much as I think your argument has merit, I cannot see how you > can resolve the following paradox : suppose, in some as-yet > unknown encoding (say, ISO-9999-9), the character positions > which in ISO-8859-1 correspond to the letters "M", "E", "T" > and "A" correspond instead to the letters "B", "O", "D" and "Y". > Now the server says that the document is in ISO-8859-1, > so when the UA sees > > <META http-equiv="content-type" content="text/html; charset=iso-9999-9"> > > it interprets the META directive as you would wish. But in so > doing, it starts to parse the document on the basis of it being > expressed in ISO-9999-9, whereupon it discovers that there wasn't > a META directive at all, there was, rather, a(n ill-formed) BODY > tag. But because it now knows there /was/ no META directive, it > parses using ISO-8859-1. But that means there IS a META > directive. And so on. I'm sure you see the problem ... > > Philip Taylor > -- begin excerpt -- To address server or configuration limitations, HTML documents may include explicit information about the document's character encoding; the META element can be used to provide user agents with this information. For example, to specify that the character encoding of the current document is "EUC-JP", a document should include the following META declaration: <META http-equiv="Content-Type" content="text/html; charset=EUC-JP"> The META declaration must only be used when the character encoding is organized such that ASCII-valued bytes stand for ASCII characters (at least until the META element is parsed). META declarations should appear as early as possible in the HEAD element -- end excerpt -- Copied from: http://www.w3.org/TR/html4/charset.html#h-5.2.2 An excerpt from HTML 4.01 specification. So in other words you've to organize your content such that your content till <META> tag is ASCII. I think this is what this excerpt means. Thanks Ashish Shukla -- Ashish Shukla "Wah Java !!" рд?рд╢ре?рд╖ рд╢реБрд?реНрд▓> ,= ,-_-. =. ((_/)o o(\_)) `-'(. .)`-' \_/ My blah, blah, blah at http://wahjava.blogspot.com/ My webpages at http://www.geocities.com/wah_java_dotnet/ My GPG Fingerprint: BBA9 AD7D BA71 61EB BE46 8CF5 E44A C663 A03F 4261 My GPG keys at http://keyserv.nic-se.se:11371/pks/lookup?op=get&search=0xA03F4261 -- Supercomputers are for people too rich and too stupid to design efficient algorithms -- Steven Skiena, Department of Computer Science, SUNY Stony Brook.Received on Saturday, 3 June 2006 06:28:55 GMT |
|
||||||||||||||||