Latest news of the domain name industry

Recent Posts

The most confusing new gTLDs (allegedly)

Kevin Murphy, March 26, 2010, 15:34:51 (UTC), Domain Registries

I don’t know how I missed it until today, but I’ve discovered ICANN has a web-based tool that will be used to determine whether new gTLDs could be confused with existing strings.
The Sword Group algorithm compares applied-for strings with a list of existing TLDs and reserved words such as “icann” and “ripe”.
It looks for “visual similarity”, which means not only common sequences of characters but also the pixel-by-pixel similarities of each character.
Numerical scores are assigned. Any match scoring below 30 is not considered worthy of reporting.
As an experiment, I ran each of the strings on newTLDs.tv’s list of publicly announced TLD hopefuls through the available “pre-production” algorithm.
Here are my findings.
1. The algorithm is pretty much worthless.
See below.
2. The top string shares no letters with its match.
These are the top seven most-confusing strings, ranked by their highest-scoring match:

String Match Score
.lat .tel 77
.team icann 71
.nai .net 70
.africa afrinic 67
.game .name 67
.lac alac 67
.eco .co 62

As you can see, .lat scores highest. It has two characters in common with .tel, but they’re transposed at either end of the string.
According to this algorithm, there’s a significantly higher risk of visual confusion between “team” and “icann” or between “lat” and “tel” than between “lli” and “li”.
3. Three-letter domains often look like ccTLDs.
All of these three-letter new gTLD strings score 61, on the basis that they comprise an existing ccTLD and one additional character:
.ski, .bay, .bzh, .car, .cym, .eus, .fra, .gal, .gay, .ker, .lli, .med, .mma, .ngo, .nrw, .sic and .vin.
4. Berlin and New York better watch out.
According to Sword, .berlin scores 58, on the basis that it looks like “afrinic”.
New York City’s .nyc, when compared to “nic”, scores 59.
Barcelona’s .bcn scores an extra point, 60, due to its similarity to “nro”.
No, really, it does.
5. Not many strings score below 30.
Of the 82 strings I tested, only five scored below 30 and were not reported: .quebec, .rugby, .tokyo, .xxx and .saarland.
Conclusion
Assuming the new gTLD evaluators have the good sense to ignore the output of this algorithm, I don’t think any of the new TLD applicants need to worry too much about Sword.
If good sense is lacking, they may have a problem.

Tagged: , ,

Comments (4)

  1. My colleague Jothan Frakes has aptly re-dubbed this the “S-Word Algorithm.”

  2. Michele says:

    Between “overarching issues” and other general madness this would be hilarious if it wasn’t so depressing

  3. Kevin Murphy says:

    Well, let’s hope the algorithm will get a revamp before it matters.

  4. […] The most confusing new gTLDs (allegedly) […]

Leave a Reply to Antony Van Couvering