Latest news of the domain name industry

Recent Posts

New gTLDs still a crappy choice for email — study

Kevin Murphy, September 28, 2017, Domain Tech

New gTLDs may not be the best choice of domain for a primary email address, judging by new research.

Over 20% of the most-popular web sites do not fully understand email addresses containing long TLDs, and Arabic email addresses are supported by fewer than one in 10 sites, a study by the Universal Acceptance Steering Group has found.

Twitter, IBM and the Financial Times are among those sites highlighted as having only partial support for today’s wide variety of possible email addresses.

Only 7% of the sites tested were able to support all types of email address.

The study, carried out by Donuts and ICANN staff, looked at 749 websites (in the top 1,000 or so as ranked by Alexa) that have forms for filling in email addresses.

On each site, seven different email addresses were input, to see whether the site would accept them as valid.

The emails used different combinations of ASCII and Unicode before the dot and mixes of internationalized domain name and ASCII at the second and top levels.

These were the results (click to enlarge or download the PDF of the report here):

IDN emails

The problem with these numbers, it seems to me, is the lack of a control. There’s no real baseline to judge the numbers against.

There’s no mention in the paper about testing addresses that use .com or decades-old ccTLDs, which would have highlighted web sites that with broken scripts that reject all emails.

But if we assume, as the paper appears to, that all the tested web sites were 100% compliant for .com domains, the scores for new gTLDs are not great.

There are currently over 800 TLDs over four characters in length, but according to the UASG research 22% of web sites will not recognize them.

There are 150 IDN TLDs, but a maximum of 30% of sites will accept them in email addresses.

When it comes to right-to-left scripts, such as Arabic, the vast majority of sites are totally hopeless.

UASG dug into the code of the tested sites when it could and found that most of them use client-side code — JavaScript processing a regular expression — to verify addresses.

A regular expression is complex bit of code that can look something like this: /^.+@(?:[^.]+\.)+(?:[^.]{2,})$

It’s not every coder’s cup of tea, but it can get the job done with minimal client-side resource overheads. Most coders, the UASG concludes, copy regex they found on a forum and maybe tweak it a bit.

This should not be shocking news to anyone. I’ve known about it since 2009 or earlier when I first started ripping code from StackOverflow.

However, the UASG seems to be have been working on the assumption that more sites are using off-the-shelf software libraries, which would have allowed the problem to be fixed in a more centralized fashion.

It concludes in its paper that much greater “awareness raising” needs to happen before universal acceptance comes closer to reality.

Five million Indian government workers to get IDN email

Kevin Murphy, August 30, 2017, Domain Registries

The Indian government has announced plans to issue fully Hindi-script email addresses to some five million civil servants.

The Ministry of Electronics and Information Technology announced the move, which will see each government employee given an @सरकार.भारत email address, in a statement this week.

सरकार.भारत transliterates as “sarkar.bharat”, or “government.india”.

The first stage of the roll-out will see the five million employees given @gov.in addresses, which apparently most of them do not already have.

Expanding the use of local scripts seems to be a secondary motivator to the government’s desire to bring control of government employee email back within its borders in a centralized fashion.

“The primary trigger behind the policy was Government data which resides on servers outside India and on servers beyond the control of the Government of India,” the MEITY press release states.

India currently has the largest number of internationalized domain names, at the top level, of any country.

NIXI, the local ccTLD manager, is in control of no fewer than 16 different ccTLDs in various scripts, with ample room for possible expansion in future.

The registry has been offering free IDN domains alongside .in registrations for about a year, according to local reports.

There are about two million .in domains registered today, according to the NIXI web site.

India to have SIXTEEN ccTLDs

While most countries are content to operate using a single ccTLD, India is to up its count to an unprecedented 16.

It already has eight, but ICANN’s board of directors at the weekend approved the delegation of an additional eight.

The new ccTLDs, which have yet to hit the root, are .ಭಾರತ, .ഭാരതം, .ভাৰত, .ଭାରତ, .بارت, .भारतम्, .भारोत, and .ڀارت.

If Google Translate and Wikipedia can be trusted, these words all mean “India” in, respectively, Kannada, Malayalam, Bengali, Odia, Arabic, Nepali, Hindi and Sindhi.

They were all approved under ICANN’s IDN ccTLD Fast Track program and will not operate under ICANN contract.

India already has seven internationalized domain name versions of its ccTLD in seven other scripts, along with its vanilla ASCII .in.

National Internet Exchange of India (NIXI) will be ccTLD manager for the whole lot.

India may have as many as 122 languages, according to Wikipedia, with 30 spoken by more than a million people.

Forget emojis, you can buy Egyptian hieroglyph .com domains

Call them the Emojis of the Ancient World.

Egyptian hieroglyphs were once the cutting edge of written communication, and it turns out Verisign lets you register .com domains using them.

Internationalized domain names expert Andre Schapp discovered a couple months ago that the Unicode code points for the ancient script have been approved in 16 Verisign gTLDs, and apparently no others.

This means that domains such as hieroglyph should resolve.

Unfortunately, DI’s database does not support these characters, so I’m having to use images.

But at least one domain investor seems have snapped up a few dozen single-pictograph Egyptian hieroglyph names about a month ago, and his page has clickable links.

Whether you see the hieroglyph or the Punycode, prefixed “xn--“, seems to depend on your browser configuration.

Ancient Egyptian is apparently not the only dead script that Verisign supports.

According to IANA, you can also get .com domains in Sumero-Akkadian cuneiform, which went out of fashion in the second century CE, as well Phoenician, the world’s oldest known script.

Then there’s Imperial Aramaic, Meitei, Kharosthi, ‘Phags-pa, Sylheti Nagari and goodness knows how many other extinct writing systems.

It seems .com has been approved for 237 IDN scripts, in total. Let it not be said that Verisign does not offer domainers ample opportunity to spunk their cash on gibberish.

No Klingon, though.

About that $3,800 emoji domain sale…

Kevin Murphy, June 5, 2017, Domain Tech

The debate over the age of the emoji domain name ☮.com may have been settled. It probably is as old as it was claimed to be.

You may recall that last week I blogged about the €3,400 ($3,816) sale of the domain to an end user. It wasn’t a big sale or a big story, but it’s so rare to see an emoji name sell I thought it was worth a few paragraphs.

It had been claimed, and I reported, that the name was 16 years old, having been registered in April 2001.

Later that day, ICANN principle technologist Paul Hoffman, who was co-author of the IDNA2003 standard that governed how non-ASCII domains were represented in the DNS, questioned whether the name could possibly be that old.

Under IDNA2003, IDNs are encoded with the “xn--” prefix. While applications may render ☮.com as the “peace” symbol, in the DNS it is in fact xn--v4h.com.

Hoffman told me that the prefix had been picked more or less at random in March 2003, so there was no way a speculator could have known in April 2001 how to register a domain that would have no meaning for another two years.

In addition, the Punycode standard that converts non-Latin characters to ASCII was not finalized until 2003 either.

It seemed more likely that the creation date in the Whois record was incorrect, so I updated the original blog post with the new information.

That kicked off a bit of a debate in the comments about scenarios in which the creation date was correct. Some commenters wondered whether the original buyer had registered many domains with different prefixes with the hope of getting lucky.

What none of us considered was that the domain itself changed between 2001 and 2003. Given new information Hoffman supplied over the weekend, that now strikes me as the most plausible scenario.

What most of us had forgotten was that Verisign launched an IDN registration test-bed all the way back in December 2000 (archive.org link).

That roll-out, controversial at the time, encoded the domains with Punycode predecessor RACE and used the bq– prefix.

However, after the IDNA2003 and Punycode standards were published in 2003, Verisign then converted all of the existing IDN .com domains over to the two new standards. Names beginning bq– were changed to xn--, and the encoding of the subsequent characters was changed.

So ☮.com very probably was registered in 2001, but in ASCII it was a completely different domain name back then.

We seem to have a rare(ish) case here of the creation date in the Whois being “right” but the domain name itself being “wrong”.

There may be as many as half a million .com domains with similar issues in their Whois.

I hope this clears up any confusion.