Donuts had seven new gTLDs added to the DNS root zone today.
The strings are: .diamonds, .tips, .photography, .directory, .kitchen, .enterprises and .today.
The nic.tld domains in each are already resolving, redirecting users to Donuts’ official site at donuts.co.
There are now 31 live new gTLDs, 26 of which belong to Donuts subsidiaries.
Amazon, Google or Demand Media are going to have to block over 200,000 strings in .wow, which all three have applied for, due to the risk of name collisions.
That’s tens of thousands of names greater than any other applied-for gTLD string.
Here’s the top 20 gTLDs, ranked by the number of collisions:
The average new gTLD string has 7,346 potential collisions, according to our preliminary analysis of the lists ICANN published for 1,318 strings this morning.
As blogged earlier, 9.8 million unique domain names are to be blocked in total.
Seventeen gTLDs seem to have been provided with empty lists, so will not have to block any domains in order to proceed to delegation with ICANN.
ICANN has asked new gTLD registry operators to block a total of 9.8 million domain names, due to the perceived risk of damage from name collisions.
To put it another way, Verisign has managed to take close to 10 million domain names off the market.
ICANN today delivered second-level domain block-lists for 1,327 new gTLDs. Combined, the number of unique blocked domains is just over 9.8 million, according to DI’s preliminary analysis.
Some of the lists relate to gTLDs that will not be approved because they’re in mutually exclusive contention sets with other strings (for example, .unicorn and .unicom).
Twenty-five unfortunate gTLD applicants did not receive lists, because ICANN said they do not qualify for the block-list-based “Alternate Path to Delegation”.
We’re currently crunching the numbers and will have more information later today, with a bit of luck.
With all the talk recently about DNS Namespace Collisions, the heretofore relatively obscure Day In The Life (“DITL”) datasets maintained by the DNS-OARC have been getting a lot of attention.
While these datasets are well known to researchers, I’d like to take the opportunity to provide some background and talk a little about how these datasets are being used to research the DNS Namespace Collision issue.
The Domain Name System Operations Analysis and Research Center (“DNS-OARC”) began working with the root server operators to collect data in 2006. The effort was coined “Day In The Life of the Internet (DITL).”
Root server participation in the DITL collection is voluntary and the number of contributing operators has steadily increased; in 2010, all of the 13 root server letters participated. DITL data collection occurs on an annual basis and covers approximately 50 contiguous hours.
DNS-OARC’s DITL datasets are attractive for researching the DNS Namespace Collision issue because:
- DITL contains data from multiple root operators;
- The robust annual sampling methodology (with samples dating back to 2006) allows trending; and
- It’s available to all DNS-OARC Members.
More information on the DITL collection is available on DNS-OARC’s site at https://www.dns-oarc.net/oarc/data/ditl.
Terabytes and terabytes of data
The data consists of the raw network “packets” destined for each root server. Contained within the network packets are the DNS queries. The raw data consists of many terabytes of compressed network capture files and processing the raw data is very time-consuming and resource-intensive.
While several researchers have looked at DITL datasets over the years, the current collisions-oriented research started with Roy Hooper of Demand Media. Roy created a process to iterate through this data and convert it into intermediate forms that are much more usable for researching the proposed new TLDs.
We started with his process and continued working with it; our code is available on GitHub for others to review.
Finding needles in DITL haystacks
The first problem faced by researchers interested in new TLDs is isolating the relatively few queries of interest among many terabytes of traffic that are not of interest.
Each root operator contributes several hundred – or several thousand – files full of captured packets in time-sequential order. These packets contain every DNS query reaching the root that requests information about DNS names falling within delegated and undelegated TLDs.
The first step is to search these packets for DNS queries involving the TLDs of interest. The result is one file per TLD containing all queries from all roots involving that TLD. If the input packet is considered a “horizontal” slice of root DNS traffic, then this intermediary work product is a “vertical” slice per TLD.
These intermediary files are much more manageable, ranging from just a few records to 3 GB. To support additional investigation and debugging, the intermediary files that JAS produces are fully “traceable” such that a record in the intermediary file can be traced back to the source raw network packet.
The DITL data contain quite a bit of noise, primarily DNS traffic that was not actually destined for the root. Our process filters the data by destination IP address so that the only remaining data is that which was originally destined for the root name servers.
JAS has made these intermediary per-TLD files available to DNS-OARC members for further analysis.
The intermediary files are comparatively small and easy to parse, opening the door to more elaborate research. For example, JAS has written various “second passes” that classify queries, separate queries that use valid syntax at the second level from those that don’t, detect “randomness,” fit regular expressions to the queries, and more.
We have also checked to confirm that second level queries that look like Punycode IDNs (start with ‘xn--‘) are valid Punycode. It is interesting to note the tremendous volume of erroneous, technically invalid, and/or nonsensical DNS queries that make it to the root.
Also of interest is that the datasets are dominated by query strings that appear random and/or machine-generated.
Google’s Chrome browser generates three random 10-character queries upon startup in an effort to detect network properties. Those “Chrome 10” queries together with a relatively small number of other common patterns comprise a significant proportion of the entire dataset.
Research is being done in order to better understand the source of these machine-generated queries.
More technical details and information on running the process is available on the DNS-OARC web site.
This is a guest post written by Kevin White, VP Technology, JAS Global Advisors LLC. JAS is currently authoring a “Name Collision Occurrence Management Framework” for the new gTLD program under contract with ICANN.
Initial Evaluation on the first round of new gTLD applications is almost done, with only two bids now remaining in that stage of the program.
ICANN last night published the delayed IE results for PricewaterhouseCooper’s .pwc and the Better Business Bureau’s .bbb, both of which were passes.
The only two applications remaining in IE are Kosher Marketing Assets’ .kosher and Google’s .search.
The latter is believed to be hung up on technical changes it has made to its bid, to remove the plan to make .search a “dotless” gTLD, which ICANN has banned on stability grounds.
Eight applications are currently in Extended Evaluation, having failed to achieve passing scores during IE.