Registration Rules
The Verisign Shared Registration System (SRS) supports IDN (Internationalized Domain Names) containing various Unicode scripts.
Verisign has developed a policy for IDN registrations specifying permissible and prohibited code points. The policy is implemented in the following five validation rules. IDNs which adhere to these five rules are considered valid registrations.
1. IETF Standards
The IDNA2008 specification defines rules and algorithms that permit/prohibit Unicode code points in IDN registrations. Verisign is fully in compliance with all of the RFC documents that comprise the IDNA2008 standard. Please review the IETF Standards.
2. Restrictions on Specific Languages
All IDN registrations require a three letter Language Tag. CHI, for instance, is for the Chinese language. If the Language Tag associated with the registration is in the following table, then Verisign has a List of Included Characters for that language. The requested IDN must be entirely contained within this List of Included Characters. If even one code point from the IDN is not a valid character for this language, then the registration is rejected.
The following table lists the languages that have an associated List of Included Characters.
3. Restrictions on Commingling of Scripts
If the Language Tag specified in the IDN registration is not in the above table, and so does not have a List of Included Characters, then Verisign applies an alternate restriction to prevent commingling of different scripts in a single domain.
The Unicode Standard defines a set of Unicode Scripts by assigning each code point exactly one Unicode Script value. As a rule, Verisign’s registries reject the commingling of code points from different Unicode scripts. That is, if an IDN contains code points from two or more Unicode scripts, then that IDN registration is rejected. For example, a character from the Latin script cannot be used in the same IDN label with any Cyrillic character. All code points within an IDN label must come from the same Unicode script. This is done to prevent confusable code points of different scripts from appearing in the same IDN.
Again, this rule only applies to Languages for which there is not a strictly defined List of Included Characters. For example, the FRE Language Tag, indicating the French language, does not have a strict List of Included Characters, and so the commingling rule applies. All code points in a French domain must come from a single Script. But that script may be any of the valid Unicode defined Scripts.
The following table lists Unicode Scripts, and the associated table of allowed code points.
Unicode Scripts and Associated Code Points
Armenian
Avestan
Balinese
Bamum
Batak
Bengali
Bopomofo
Brahmi
Buginese
Buhid
Canadian Aboriginal
Carian
Cham
Cherokee
Coptic
Cuneiform
Cyrillic
Devanagari
Egyptian Hierogyphs
Ethiopic
Glagolitic
Greek
Gujarati
Gurmukhi
Han
Hangul
Hanunoo
Hebrew
Hiragana
Imperial Aramaic
Inscriptional Pahlavi
Inscriptional Parthian
Javanese
Kaithi
Kannada
Katakana
Kayah Li
Kharoshthi
Khmer
Lao
Lepcha
Limbu
Lisu
Lycian
Lydian
Malayalam
Mandaic
Meetei Mayek
Mongolian
Myanmar
New Tai Lue
Nko
Ogham
Ol Chiki
Old Persian
Old South Arabian
Old Turkic
Oriya
Phags Pa
Phoenician
Runic
Samaritan
Saurashtra
Sinhala
Sundanese
Syloti Nagri
Syriac
Tagalog
Tagbanwa
Tai Le
Tai Tham
Tai Viet
Tamil
Telugu
Thaana
Thai
Tibetan
Tifinagh
Vai
Yi
For a comprehensive list of all Unicode Code Points allowed for IDN registration, click here.
4. ICANN’s Restricted Unicode Code Points
The Verisign SRS also adheres to ICANN’s Guidelines for the Implementation of Internationalized Domain Names Section 5 of the document outlines characters that are allowed by the IETF standard, but should be prohibited for IDN registration. For this reason, the Verisign SRS prohibits those Unicode code points in all registrations. A complete list of ICANN’s restricted Unicode points is here.
5. Special Characters
There are exactly two (2) Unicode characters whose latest definitions are not backward compatible with previous versions of the IDNA Standard. The Latin Sharp S and Greek Final Sigma were previously mapped to alternate characters. Clients and Registries compliant with the older standard would, for instance, map a Latin Sharp S into two lowercase Latin letter S characters. This mapping is irreversible. The latest version of the IDNA standard does not apply this mapping. So, whereas the Latin Sharp S was previously prohibited (mapped into other characters), the latest standard allows Registries to accept this character at their own discretion.
Because these changes are not backward compatible, Verisign has elected to continue to disallow these two (2) characters, until a clear and fair approach to their registration has been reached and communicated.
CHARACTER | UNICODE POINT | GLYPH |
---|---|---|
Latin Small Letter Sharp S | U+00DF | ß |
Greek Small Letter Final Sigma | U+03C2 | ς |