Welcome to WebHeadStart.org

Web Technologies

Sponsored By

WebHeadStart.org is currently in beta.
Please pardon our appearance as we work to provide you with the most comprehensive reference on today's web technologies.

Interested in advertising on WebHeadStart? Become an advertising partner today!

[WWW-HTML Mailing List Archive Home] [Messages By Thread] [Messages By Date]

RE: Problem with LANG keyword

From: Reuven Nisser <rnisser@ofek-liyladenu.org.il>
Date: Tue, 23 Sep 2003 23:53:06 +0300
To: "BIGELOW,JIM (HP-Boise,ex1)" <jim.bigelow@hp.com>
Cc: <www-html@w3.org>, "'shaula haitner'" <shaula@shaula.co.il>, "'Yuval Rabinovich'" <yuval@faz.co.il>, "'Gertel Hasson'" <gilagh@netvision.net.il>
Message-ID: <EOEHIKCGOKGNIEEKJHEKIEJDDGAA.rnisser@ofek-liyladenu.org.il>

Hello,
It does not matter if I use Unicode or use encoding your way. See the
following script:

<body lang="en,he,ar" dir="ltr">
<p>The following are two letters in Hebrew,
&05D0; &05D1;
while these are three Arabic letters,
&0644; &0647; &062C;.
The letters forms both evolved from the ancient
Aramaic alphabet.
</p>
</body>

You can still "know" automatically which part is Arabic, which is Hebrew and
which is English. So, marking the whole text as English, Hebrew and Arabic
is enough.

Now, using 8 bit mode:

<body lang="en,he" dir="ltr">
<p>The following are two letters in Hebrew,
&#224; &#225;
</p>
</body>

Or using a text created with Notepad on Hebrew Windows:

<body lang="en,he" dir="ltr">
<p>The following are two letters in Hebrew,
à á
</p>
</body>

Same follows. You know automatically which part is Hebrew and which is
English.

Regards,
Reuven Nisser
Ofek Liyladenu

-----Original Message-----
From: BIGELOW,JIM (HP-Boise,ex1) [mailto:jim.bigelow@hp.com]
Sent: Tuesday, September 23, 2003 8:21 PM
To: Reuven Nisser
Subject: RE: Problem with LANG keyword



 Reuven Nisser wrote
> ...
> This is especially true when using Unicode. There one can mix
> Hebrew, Arabic and English in the same text without any conflict.
> ...

The report <cite>Unicode in XML and other Markup Languages</cite> [1]
discusses the many situations where markup is preferred over Unicode
characters for encoding information about structure and presentation.  See
Section 3.9 Language Tag Characters [2].

Therefore, I think that use of the language attribute in elements that
enclose spans of text from a given language is preferred over discovering
the language based on the Unicode character.  For example:

<body lang="en" dir="ltr">
<p>The following are two letters in Hebrew,
<q lang="he" dir="rtl">&05D0; &05D1;<\q>
while these are three Arabic letters,
<q lang="ar" dir="rtl">&0644; &0647; &062C;</q>.
The letters forms both evolved from the ancient
Aramaic alphabet.
</p>
</body>

Jim Bigelow

[1] http://www.w3.org/TR/unicode-xml/ 
[2] http://www.w3.org/TR/unicode-xml/#Language 
Received on Tuesday, 23 September 2003 16:53:22 GMT
Valid XHTML 1.0! Valid CSS! Site Map | Privacy Policy | Terms of Use | WebHeadStart.org © 2005 All Rights Reserved.