Version 1.0 Title Why we need to make a policy decision to block IDNs Date November 2006 Status Final Author Sam Vilain Why we need to make a policy decision to block IDNs Background, and the problem, in theory International Domain Names (IDNs) are layered atop of DNS. Names that cannot be represented by regular ASCII are converted via a system called “punycode” to a form that is within the specifications of the DNS, as specified in November 1983 with RFC882. In this document I will cover some of the problems known to us so far, which are relevant to domains in major NZ languages other than English (that is, with 10,000 speakers or more of corresponding oral languages in the 2001 Census). There are also some extra domain names listed demonstrating malicious attacks, using unexpectedly inconspicuous Unicode characters. In each of the tables, the first column represents the domain name as it would normally be displayed, if the program displaying them has good Unicode support (but bad IDN support). There is a second column showing the domain names with all nonASCII unicode character components identified using, eg \x{12AB} (which would indicate Unicode character U+12AB), and their punycode equivalents. Accented Latin Characters Unicode Escaped Punycode kāhuimaunga.maori.nz k\x{101}huimaunga.maori.nz xn--khuimaunga-ufb.maori.nz (combining accent version) ka\x{304}huimaunga.maori.nz xn--kahuimaunga-4kg.maori.nz façon.co.nz fa\x{00e7}on.co.nz xn--faon-1oa.co.nz (combining accent version) fac\x{0327}on.co.nz xn--facon-rld.co.nz Unicode has more than one way of representing accented Latin characters. That is, somebody could purchase “kāhuimaunga.maori.nz” represented in the traditional way, using the “Small Latin A with macron” character from unicode (U+101), or with a regular “a” followed by a “Combining Macron” (U+304). Without special knowledge of Unicode, the registry will not know that these two punycode domains are homographs and happily allow different registrants for them. Traditional vs Simplified Chinese Script Unicode 爱.gen.nz 愛.gen.nz Escaped Punycode \x{7231}.gen.nz xn---u0x.gen.nz \x{611b}.gen.nz xn---vgu.gen.nz Copyright © 2006 New Zealand Registry Services. All Rights Reserved. Unicode ⺝.gen.nz 月.gen.nz Escaped Punycode \x{2e9d}.gen.nz xn---kwj.gen.nz \x{6708}.gen.nz xn---7ov.gen.nz Chinese has two scripts in use in New Zealand. The above two domains are the same word. It is understood that each traditional character has a single Simplified equivalent. The Chinese are, in general, well aware of this, so this might not imply blocking either script. There is also a Unicode page for Chinese “radicals” (character components), some of which have identical corresponding words. Indic Languages Unicode महातमा.org.nz ચોપડી.org.nz ચાેેપડી.org.nz ॐ.gen.nz ૐ.gen.nz Escaped Punycode \x{092e}\x{0939}\x{093e}\x{0924} \x{094d}\x{092e}\x{093e}.org.nz xn---h2btb6bzac7h.org.nz \x{0a9a}\x{0acb}\x{0aaa}\x{0aa1} \x{0ac0}.org.nz xn---5dco0a8f5b.org.nz \x{0a9a}\x{0abe}\x{0ac7}\x{0aaa} \x{0aa1}\x{0ac0}.org.nz xn---5dco0a9em3b.org.nz \x{0950}.gen.nz xn---q3b.gen.nz \x{0ad0}.gen.nz xn---pfc.gen.nz The two Indic scripts in use in NZ (Hindi/Devānagarī and Pujarati) contain “signs” which are much like accents, so homographs (or near homographs) exist within the script. There is also a Sanskrit character shared between the scripts, “om”, the difference between these two scripts in this character is stylistic only. The wealth of weird characters in Unicode Unicode √2.net Escaped Punycode \x{221A}2.net xn--2-tbo.net \x{277d}\x{0489}.co.nz xn---s3a102p.co.nz кiwibank.co.nz \x{043A}iwibank.co.nz xn--iwibank-cig.co.nz 最好十.co.nz \x{6700}\x{597d}\x{5341}.co.nz xn---kkrx8lqum.co.nz 最好☩.co.nz \x{6700}\x{597d}\x{2629}.co.nz xn---q4h398pqqi.co.nz .co.nz People might be familiar with the more “chic” unicode characters, like mathematics symbols and how cool it looks when you follow a “DINGBAT NEGATIVE CIRCLED DIGIT EIGHT” with a “COMBINING CYRILLIC MILLIONS SIGN”. But there is certainly potential for attack; shown above is a URL that looks similar to “kiwibank.co.nz”, but uses a “CYRILLIC SMALL LETTER KA” instead of the regular Latin “k”. There is also a “CROSS OF JERUSALEM” that looks very much like the Chinese ideogram for “10”. Copyright © 2006 New Zealand Registry Services. All Rights Reserved. A 'phisher', with knowledge of a piece of software that supports Unicode but has a naïve IDN implementation, could register these domains and use them to fool users of those pieces of software into thinking they are on the right web site. As more and more operating systems support Unicode domain names internally, this problem will increase to more and more pieces of software. At the ICANN conference in Wellington, it was mentioned that it is essentially up to registries to ensure that abusive IDNs are not registered. That being said, there are some attacks that domain registration policy cannot prevent; Unicode Escaped Punycode kiwibank.co.nz∕login.pl kiwibank.co.nz\x{2215}login.pl kiwibank.co.xn--nzlogin-df0f.pl In this example, the Unicode character “Division Slash”, which looks a lot like an ASCII slash, is used to make it look like a part of the domain name ends, when really it carries on and is actually a domain registered in the .pl ccTLD. The URL could have been “kiwibank.co.nz∕login.evil.com/”, out of control of the .com registry under which it lies. Impact – the problem, in practice In the SRS staging environment, we registered “кiwibank.co.nz” and the “kiwibank.co.nz” domain (with the inserted invisible space) and tried to access them using various browsers. The behaviour in a reasonably current (1.5.0.3) release of Firefox on Ubuntu Linux is to display the link in the status bar as punycode; it also rendered the invisible space as a space on the screen, and removed it from the URL. Below is a screenshot, with what seem to be “bugs” in the Unicode support highlighted. ● ● ● Firefox did not try to display the URL in the location bar as the native script Firefox rendered a zero width space with width Firefox does not display link targets in the status bar using the native script This is because the Firefox developers are aware of the issues and have erred on the side of caution with their IDN support by default. Firefox has a “whitelist” of TLDs with “sensible” IDN policies1, and displays the real domain for those TLDs. 1 http://www.mozilla.org/projects/security/tldidnpolicylist.html Copyright © 2006 New Zealand Registry Services. All Rights Reserved. This example screenshot from Wikipedia shows the display with a whitelisted ccTLD: Exhibit A: Koreanenabled location bar Testing with IE 6 on Windows showed no support for IDNs whatsoever; however searching the web for IDN plugins for IE revealed a wealth of software; many of which will be buggy, but nonetheless have some users. An screenshot of IDNenabled IE 6 IDNenabled Outlook IE 7's “solution” to the problem (in the final, released version) is to require the user to click yet another “Ok” or “Cancel” dialogue box to enable display of entire scripts, one by one2. So, if you visit a Russian site and decide to allow display of Russian characters, then from that point on, your copy of IE will now happily display “pаypal.com” (with a Cyrillic small letter “a” a famous demonstration of the IDN homograph attack3) indistinguishably from “paypal.com”. Summary Some of the problems outlined in this document are already addressed by relevant standards, but these are far from complete. The above cases should help demonstrate the need to take a cautious policy with regards to registrations under the .nz namespace, and that blocking all registrations starting with “xn--” until clear policy is made is a sensible first decision. 2 http://blogs.msdn.com/ie/archive/2006/07/31/684337.aspx 3 http://www.shmoo.com/idn/ Copyright © 2006 New Zealand Registry Services. All Rights Reserved.
© Copyright 2025