Declaring The XHTML File Encoding
XHTML files, along with any other text files, are saved using a particular character encoding. Since there are many different character encoding in the world, and you have no idea what your visitor's browser default settings are, it's always a good idea to explicitly declare which encoding you used to make your Web page. Here's an example of how to declare the character encoding, in this case, the Unicode encoding is used:
Code Sample
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8"/>
</head>
When a browser sees this <meta> tag, it will know that the page was encoded using UTF-8, and will display properly (provided that you really did encode the page in UTF-8). XHTML requires that you declare the encoding if it is anything other then the default UTF-8 or UTF-16 encodings. You can also use the XML Declaration to specify character encodings. Some character encodings are almost identical.
International Character Set Codes
The World Wide Web Consortium (W3C) highly recommends the use of UTF-8 wherever possible - UTF-8 can be used for all languages and is the recommended charset on the Internet. Support for it is rapidly increasing. That being said, here is a partial listing of languages, countries, and the older charsets typically used for them:
| Language (country) | Charset |
|---|---|
| Afrikaans (AF) | iso-8859-1, windows-1252 |
| Albanian (SQ) | iso-8859-1, windows-1252 |
| Arabic (AR) | iso-8859-6 |
| Basque (EU) | iso-8859-1, windows-1252 |
| Bulgarian (BG) | iso-8859-5 |
| Byelorussian (BE) | iso-8859-5 |
| Catalan (CA) | iso-8859-1, windows-1252 |
| Croatian (HR) | iso-8859-2, windows-1250 |
| Czech (CS) | iso-8859-2 |
| Danish (DA) | iso-8859-1, windows-1252 |
| Dutch (NL) | iso-8859-1, windows-1252 |
| English (EN) | iso-8859-1, windows-1252 |
| Esperanto (EO) | iso-8859-3* |
| Estonian (ET) | iso-8859-15 |
| Faroese (FO) | iso-8859-1, windows-1252 |
| Finnish (FI) | iso-8859-1, windows-1252 |
| French (FR) | iso-8859-1, windows-1252 |
| Galician (GL) | iso-8859-1, windows-1252 |
| German (DE) | iso-8859-1, windows-1252 |
| Greek (EL) | iso-8859-7 |
| Hebrew (IW) | iso-8859-8 |
| Hungarian (HU) | iso-8859-2 |
| Icelandic (IS) | iso-8859-1, windows-1252 |
| Inuit (Eskimo) languages | iso-8859-10* |
| Irish (GA) | iso-8859-1, windows-1252 |
| Italian (IT) | iso-8859-1, windows-1252 |
| Japanese (JA) | shift_jis, iso-2022-jp, euc-jp |
| Korean (KO) | euc-kr |
| Lapp | iso-8859-10* ** |
| Latvian (LV) | iso-8859-13, windows-1257 |
| Lithuanian (LT) | iso-8859-13, windows-1257 |
| Macedonian (MK) | iso-8859-5, windows-1251 |
| Maltese (MT) | iso-8859-3* |
| Norwegian (NO) | iso-8859-1, windows-1252 |
| Polish (PL) | iso-8859-2 |
| Portuguese (PT) | iso-8859-1, windows-1252 |
| Romanian (RO) | iso-8859-2 |
| Russian (RU) | koi8-r, iso-8859-5 |
| Scottish (GD) | iso-8859-1, windows-1252 |
| Serbian (SR) cyrillic | windows-1251, iso-8859-5*** |
| Serbian (SR) latin | iso-8859-2, windows-1250 |
| Slovak (SK) | iso-8859-2 |
| Slovenian (SL) | iso-8859-2, windows-1250 |
| Spanish (ES) | iso-8859-1, windows-1252 |
| Swedish (SV) | iso-8859-1, windows-1252 |
| Turkish (TR) | iso-8859-9, windows-1254 |
| Ukrainian (UK) | iso-8859-5 |
* = scarce support in browsers.
** = Lapp doesn't have a 2-letter code, a three letter code (lap) is proposed
in NISO Z39.53.
*** = Serbian can be written in Latin (most commonly used) and Cyrillic (mostly
windows-1251).