Internationalized Domain Names

Internationalized Domain Name Registration Rules

Verisign has developed a policy for IDN registrations specifying permissible and prohibited code points. The policy is implemented in the following five validation rules. IDNs which adhere to these five rules are considered valid registrations.

1. IETF Standards

The IDNA2008 specification defines rules and algorithms that permit/prohibit Unicode code points in IDN registrations. Verisign is compliant with all of the RFC documents that comprise the IDNA2008 standard. Learn more about the IETF Standards.

2. Restrictions on Specific Languages

All IDN registrations require a three letter language tag. CHI, for instance, is for the Chinese language. If the language tag associated with the registration is in the following table, then Verisign has a list of included characters for that language. The requested IDN must be entirely contained within this list. If even one code point from the IDN is not a valid character for this language, then the registration is rejected.

List of Included Characters

LANGUAGE TAG LANGUAGE
AZE Azerbaijani
BEL Belarusian
BUL Bulgarian
CHI Chinese
GRE Greek
JPN Japanese
KOR Korean
KUR Kurdish
MAC Macedonian
MOL Moldavian
POL Polish
RUS Russian
UKR Ukranian

3. Restrictions on Commingling of Scripts

If the language tag specified in the IDN registration is not in the above table, and so does not have a list of included characters, then Verisign applies an alternate restriction to prevent commingling of different scripts in a single domain.

The Unicode Standard defines a set of Unicode Scripts by assigning each code point exactly one Unicode Script value. As a rule, Verisign’s registries reject the commingling of code points from different Unicode scripts. That is, if an IDN contains code points from two or more Unicode scripts, then that IDN registration is rejected. For example, a character from the Latin script cannot be used in the same IDN label with any Cyrillic character. All code points within an IDN label must come from the same Unicode script. This is done to prevent confusable code points of different scripts from appearing in the same IDN.

Again, this rule only applies to languages for which there is not a strictly defined list of included characters. For example, the FRE language tag, indicating the French language, does not have a strict list of included characters, and so the commingling rule applies. All code points in a French domain name must come from a single script. But that script may be any of the valid Unicode defined scripts.

4. ICANN’s IDN Implementation Guidelines

The Verisign SRS also adheres to ICANN’s Guidelines for the Implementation of Internationalized Domain Names.

5. Special Characters

There are exactly two Unicode characters whose latest definitions are not backward compatible with previous versions of the IDNA Standard. The Latin Sharp S and Greek Final Sigma were previously mapped to alternate characters. Clients and registries compliant with the older standard would, for instance, map a Latin Sharp S into two lowercase Latin letter S characters. This mapping is irreversible. The latest version of the IDNA standard does not apply this mapping. So, whereas the Latin Sharp S was previously prohibited (mapped into other characters), the latest standard allows registries to accept this character at their own discretion.

Because these changes are not backward compatible, Verisign has elected to continue to disallow these two characters, until a clear and fair approach to their registration has been reached and communicated.

CHARACTER UNICODE POINT GLYPH
Latin Small Letter Sharp S U+00DF ß
Greek Small Letter Final Sigma U+03C2 ς