All Cyrillic .eu domains to be deleted
Eurid has announced that Cyrillic domain names in .eu will be deleted a year from now.
The registry said that it’s doing so to comply with the “no script mixing” recommendations for internationalized domain names, which are designed to limit the risk of homograph phishing attacks.
The deletions will kick in May 31, 2019, and only apply to names that have Cyrillic before the dot and Latin .eu after.
Cyrillic names in Eurid’s Cyrillic ccTLD .ею will not be affected.
The plan has been in place since Eurid adopted the IDNA2008 standard three years ago, but evidently not all registrants have dropped their affected names yet.
Bulgaria is the only EU member state to use Cyrillic in its national language.
Emojis coming to another ccTLD
dotFM is to make emoji domain names available in the .fm ccTLD it manages.
The company said today that it’s currently taking expressions of interest in ‘premium’ emoji inventory, and that such domains will be registerable at an unspecified point in future.
It’s published a list of single-emoji domains it plans to sell.
Emoji domains “will be available based on Unicode Consortium Emoji Version 5.0 standards using single code point; and allowing a mix of letters and emoji characters under the top-level .FM, as well as the dotRadio extensions, .RADIO.fm and .RADIO.am”, dotFM said.
Very few TLDs allow emojis to be registered today.
The most prominent is .ws, which is Western Samoa’s ccTLD, marketed as an abbreviation for “web site”.
.fm is the ccTLD for Micronesia, but dotFM markets it to radio stations.
As ccTLDs, they’re not subject to ICANN rules that essentially ban them contractually in gTLDs.
Emojis use the same encoding as internationalized domain names, but do not feature in the IDN standards because they’re not used in real spoken languages.
Emoji domains are usually considered not entirely practical due to the inconsistent ways they can be rendered by applications.
New gTLDs still a crappy choice for email — study
New gTLDs may not be the best choice of domain for a primary email address, judging by new research.
Over 20% of the most-popular web sites do not fully understand email addresses containing long TLDs, and Arabic email addresses are supported by fewer than one in 10 sites, a study by the Universal Acceptance Steering Group has found.
Twitter, IBM and the Financial Times are among those sites highlighted as having only partial support for today’s wide variety of possible email addresses.
Only 7% of the sites tested were able to support all types of email address.
The study, carried out by Donuts and ICANN staff, looked at 749 websites (in the top 1,000 or so as ranked by Alexa) that have forms for filling in email addresses.
On each site, seven different email addresses were input, to see whether the site would accept them as valid.
The emails used different combinations of ASCII and Unicode before the dot and mixes of internationalized domain name and ASCII at the second and top levels.
These were the results (click to enlarge or download the PDF of the report here):
The problem with these numbers, it seems to me, is the lack of a control. There’s no real baseline to judge the numbers against.
There’s no mention in the paper about testing addresses that use .com or decades-old ccTLDs, which would have highlighted web sites that with broken scripts that reject all emails.
But if we assume, as the paper appears to, that all the tested web sites were 100% compliant for .com domains, the scores for new gTLDs are not great.
There are currently over 800 TLDs over four characters in length, but according to the UASG research 22% of web sites will not recognize them.
There are 150 IDN TLDs, but a maximum of 30% of sites will accept them in email addresses.
When it comes to right-to-left scripts, such as Arabic, the vast majority of sites are totally hopeless.
UASG dug into the code of the tested sites when it could and found that most of them use client-side code — JavaScript processing a regular expression — to verify addresses.
A regular expression is complex bit of code that can look something like this: /^.+@(?:[^.]+\.)+(?:[^.]{2,})$
It’s not every coder’s cup of tea, but it can get the job done with minimal client-side resource overheads. Most coders, the UASG concludes, copy regex they found on a forum and maybe tweak it a bit.
This should not be shocking news to anyone. I’ve known about it since 2009 or earlier when I first started ripping code from StackOverflow.
However, the UASG seems to be have been working on the assumption that more sites are using off-the-shelf software libraries, which would have allowed the problem to be fixed in a more centralized fashion.
It concludes in its paper that much greater “awareness raising” needs to happen before universal acceptance comes closer to reality.
Five million Indian government workers to get IDN email
The Indian government has announced plans to issue fully Hindi-script email addresses to some five million civil servants.
The Ministry of Electronics and Information Technology announced the move, which will see each government employee given an @सरकार.भारत email address, in a statement this week.
सरकार.भारत transliterates as “sarkar.bharat”, or “government.india”.
The first stage of the roll-out will see the five million employees given @gov.in addresses, which apparently most of them do not already have.
Expanding the use of local scripts seems to be a secondary motivator to the government’s desire to bring control of government employee email back within its borders in a centralized fashion.
“The primary trigger behind the policy was Government data which resides on servers outside India and on servers beyond the control of the Government of India,” the MEITY press release states.
India currently has the largest number of internationalized domain names, at the top level, of any country.
NIXI, the local ccTLD manager, is in control of no fewer than 16 different ccTLDs in various scripts, with ample room for possible expansion in future.
The registry has been offering free IDN domains alongside .in registrations for about a year, according to local reports.
There are about two million .in domains registered today, according to the NIXI web site.
India to have SIXTEEN ccTLDs
While most countries are content to operate using a single ccTLD, India is to up its count to an unprecedented 16.
It already has eight, but ICANN’s board of directors at the weekend approved the delegation of an additional eight.
The new ccTLDs, which have yet to hit the root, are .ಭಾರತ, .ഭാരതം, .ভাৰত, .ଭାରତ, .بارت, .भारतम्, .भारोत, and .ڀارت.
If Google Translate and Wikipedia can be trusted, these words all mean “India” in, respectively, Kannada, Malayalam, Bengali, Odia, Arabic, Nepali, Hindi and Sindhi.
They were all approved under ICANN’s IDN ccTLD Fast Track program and will not operate under ICANN contract.
India already has seven internationalized domain name versions of its ccTLD in seven other scripts, along with its vanilla ASCII .in.
National Internet Exchange of India (NIXI) will be ccTLD manager for the whole lot.
India may have as many as 122 languages, according to Wikipedia, with 30 spoken by more than a million people.
Forget emojis, you can buy Egyptian hieroglyph .com domains
Call them the Emojis of the Ancient World.
Egyptian hieroglyphs were once the cutting edge of written communication, and it turns out Verisign lets you register .com domains using them.
Internationalized domain names expert Andre Schapp discovered a couple months ago that the Unicode code points for the ancient script have been approved in 16 Verisign gTLDs, and apparently no others.
This means that domains such as should resolve.
Unfortunately, DI’s database does not support these characters, so I’m having to use images.
But at least one domain investor seems have snapped up a few dozen single-pictograph Egyptian hieroglyph names about a month ago, and his page has clickable links.
Whether you see the hieroglyph or the Punycode, prefixed “xn--“, seems to depend on your browser configuration.
Ancient Egyptian is apparently not the only dead script that Verisign supports.
According to IANA, you can also get .com domains in Sumero-Akkadian cuneiform, which went out of fashion in the second century CE, as well Phoenician, the world’s oldest known script.
Then there’s Imperial Aramaic, Meitei, Kharosthi, ‘Phags-pa, Sylheti Nagari and goodness knows how many other extinct writing systems.
It seems .com has been approved for 237 IDN scripts, in total. Let it not be said that Verisign does not offer domainers ample opportunity to spunk their cash on gibberish.
No Klingon, though.
About that $3,800 emoji domain sale…
The debate over the age of the emoji domain name ☮.com may have been settled. It probably is as old as it was claimed to be.
You may recall that last week I blogged about the €3,400 ($3,816) sale of the domain to an end user. It wasn’t a big sale or a big story, but it’s so rare to see an emoji name sell I thought it was worth a few paragraphs.
It had been claimed, and I reported, that the name was 16 years old, having been registered in April 2001.
Later that day, ICANN principle technologist Paul Hoffman, who was co-author of the IDNA2003 standard that governed how non-ASCII domains were represented in the DNS, questioned whether the name could possibly be that old.
Under IDNA2003, IDNs are encoded with the “xn--” prefix. While applications may render ☮.com as the “peace” symbol, in the DNS it is in fact xn--v4h.com.
Hoffman told me that the prefix had been picked more or less at random in March 2003, so there was no way a speculator could have known in April 2001 how to register a domain that would have no meaning for another two years.
In addition, the Punycode standard that converts non-Latin characters to ASCII was not finalized until 2003 either.
It seemed more likely that the creation date in the Whois record was incorrect, so I updated the original blog post with the new information.
That kicked off a bit of a debate in the comments about scenarios in which the creation date was correct. Some commenters wondered whether the original buyer had registered many domains with different prefixes with the hope of getting lucky.
What none of us considered was that the domain itself changed between 2001 and 2003. Given new information Hoffman supplied over the weekend, that now strikes me as the most plausible scenario.
What most of us had forgotten was that Verisign launched an IDN registration test-bed all the way back in December 2000 (archive.org link).
That roll-out, controversial at the time, encoded the domains with Punycode predecessor RACE and used the bq– prefix.
However, after the IDNA2003 and Punycode standards were published in 2003, Verisign then converted all of the existing IDN .com domains over to the two new standards. Names beginning bq– were changed to xn--, and the encoding of the subsequent characters was changed.
So ☮.com very probably was registered in 2001, but in ASCII it was a completely different domain name back then.
We seem to have a rare(ish) case here of the creation date in the Whois being “right” but the domain name itself being “wrong”.
There may be as many as half a million .com domains with similar issues in their Whois.
I hope this clears up any confusion.
Emoji domains get a 👎 from security panel
The use of emojis in domain names has been discouraged by ICANN’s Security and Stability Advisory Committee.
In a paper late last week, SSAC told ICANN that emojis — aka emoticons or smileys — lack standardization, are barred by the relevant domain name technical standards, and could cause user confusion.
Emoji domains, while technically possible, are not particularly prevalent on the internet right now.
They’re implicitly banned in gTLDs due to the contractual requirement to adhere to the IDNA2008 standard, which restricts internationalized domain names to actual spoken human languages, and the only ccTLD I’m aware of actively marketing the names is Samoa’s .ws.
There was a notable example of Coca Cola registering 😀.ws (xn--h28h.ws) for a billboard marketing campaign in Puerto Rico a couple of years ago, but that name has since expired and been registered by an Australian photographer.
The SSAC said that emoji use should be banned in TLDs and discouraged at the second level for several reasons.
Mainly, the problem is that while emojis are described in the Unicode standards, there’s no standardization across devices and applications as to how they are displayed.
A certain degree of creative flair is permitted, meaning a smiling face in one app may look unlike the technically same emoji in another app. On smaller screens and with smaller fonts, technically different emojis may look alike.
This could lead to confusion, which could lead to security problems, SSAC warns:
It is generally difficult for people to figure out how to specify exactly what happy face they are trying to produce, and different systems represent the same emoji with different code points. The shape and color of emoji can change while a user is viewing them, and the user has no way of knowing whether what they are seeing is what the sender intended. As a result, the user is less likely to reach the intended resource and may instead be tricked by a phishing site or other intentional misrepresentation.
SSAC added that it:
strongly discourages the registration of any domain name that includes emoji in any of its labels. The SSAC also advises registrants of domain names with emoji that such domains may not function consistently or may not be universally accessible as expected
The brief paper can be read here (pdf).
Companies losing $10 BILLION by ignoring new gTLDs — report
The world economy is “conservatively” losing out on almost $10 billion of annual revenue due to a lack of support for new gTLDs and internationalized domain names, according to an ICANN-commissioned research report.
The report, conducted by Analysys Mason for the semi-independent Universal Acceptance Steering Group, calculated that patchy new gTLD support means $3.6 billion of activity is lost, with lack of IDN support costing $6.2 billion.
Despite “new” gTLDs being around for a decade and a half, there are still plenty of web sites and apps that incorrectly assume that all TLDs are either two or three characters. Others don’t support non-Latin scripts.
This leads to internet users abandoning transactions, the report says, when their email addresses are rejected as invalid.
Mason calculated the $3.6 billion number by multiplying the estimated number of email addresses using new gTLD domains (152 million) by the estimated average annual revenue generated per email address ($360), then calculating what portion of these transactions cannot happen due to incomplete TLD support.
Earlier research by .CLUB Domains suggests that 13% of sites do not support new gTLDs, so that’s the number Mason used. The researchers then cut the number in half, to account for the 50% of people it reckons would simply switch to an email address in a legacy TLD name.
That gets you to $3.6 billion of potential revenue lost for want of gTLD support.
Another, more cynical way to spin this would be to say that new gTLDs are causing $3.6 billion of economic damage. After all, if everyone were to use legacy TLDs there would be no problem.
For the IDN number, Mason calculated how many users of five major language groups (Russian, Chinese, Arabic, Vietnamese and Indian languages) are not currently online, then estimated how much revenue would be generated if just 5% of these users (17 million people) were persuaded online by the existences of IDN TLDs.
The report was commissioned in order to raise awareness of the financial benefits of universal acceptance.
The UASG has spent most of its efforts so far focusing on UA as a “bug fix” to be communicated to engineers, so the report is intended to broaden its message to catch the attention of the money people too.
The report, which goes into much more detail about how the numbers were arrived at, can be downloaded here.
Why you can’t register emojis in gTLDs
The popular “emoji” smiley faces are banned as gTLD domain names for technical reasons, according to ICANN.
Emojis are a form of emoticon that originated on Japanese mobile networks but are now used by 12-year-old girls worldwide due to their support on Android and iPhone operating systems.
It emerged last week that Coca-Cola has registered a bunch of smiley-face domain names under .ws, the Samoan ccTLD, for use in an billboard advertising campaign in Puerto Rico.
.ws was selected because it’s one of only a few TLDs that allow emojis to be registered. Coke is spinning its choice of TLD as an abbreviation for “We Smile”.
This got me thinking: would emojis be something new gTLD registries could start to offer in order to differentiate themselves?
Coke’s emoji domains, it turns out, are just a form of internationalized domain name, like Chinese or Arabic or Greek.
Emoji symbols are in the Unicode standard and could therefore be converted to the ASCII-based, DNS-compatible Punycode under the hood in web browsers and other software.
One of Coke’s (smiley-face).ws domain names is represented as xn--h28h.ws in the DNS.
Unfortunately for gTLD registries, ICANN told DI last night that emojis are not permitted in gTLDs.
“Emoticons cannot be used as IDNs as these code points are DISALLOWED under IDNA2008 protocol,” ICANN said in a statement.
IDNA2008 is the latest version of the IETF standard used to define what Unicode characters can and cannot appear in IDNs.
RFC 5892 specifies what can be included in an IDNA2008 domain name, eliminating thousands of letters and symbols that were permissible under the old IDNA2003 standard.
These characters were ostensibly banned due to the possibility of IDN homograph attacks — when bad guys set up spoof web sites on IDNs that look almost indistinguishable from a domain used by, for example, a bank or e-commerce site.
But Unicode, citing Google data, reckons symbols could only ever be responsible for 0.000016% of such attacks. Most homograph attacks are much simpler, relying on for example the visual similarity of I and l.
Regardless, because IDNA2008 only allows Unicode characters that are actually used in spoken human languages, and because gTLD registries are contractually obliged to adhere to the IDNA2008 technical standards, emojis are not permitted in gTLDs.
All new gTLDs have to provide ICANN with a list of the Unicode code points they plan to support as IDNs when they undergo pre-delegation testing. Asking to support characters incompatible with IDNA2008 would result in a failed test, ICANN tells us.
ICANN does not regulate ccTLDs, of course, so the .ws registry is free to offer whatever domains it wants.
However, ICANN said that emoji domains are only currently supported by software that has not implemented the newer IDN protocol:
Emoticon domains only work in software that has not implemented the latest IDNA standard. Only the older, deprecated version of the IDNA standard allowed emoticons, more or less by accident. Over time, these domains will increasingly not work correctly as software vendors update their implementations.
So Coke, while winning brownie points for novelty, may have registered a bunch of damp squibs.
ICANN also told us that, regardless of what the technical standards say, you’d never be able to apply for an emoticon as a gTLD due to the “letters only” principle, which already bans numbers in top-level strings.
Recent Comments