DNS Namespace Collisions: Detection and Response [Guest Post]

Jeff Schmidt, November 28, 2013, Domain Tech

Those tracking the namespace collision issue in Buenos Aries heard a lot regarding the potential response scenarios and capabilities. Because this is an important, deep, and potentially controversial topic, we wanted to get some ideas out early on potential solutions to start the conversation.
Since risk can almost never be driven to zero, a comprehensive approach to risk management contains some level of a priori risk mitigation combined with investment in detection and response capabilities.
In my city of Chicago, we tend to be particularly sensitive about fires. In Chicago, like in most cities, we have a priori protection in the form of building codes, detection in the form of smoke/fire alarms, and response in the form of 9-1-1, sprinklers, and the very capable Chicago Fire Department.
Let’s think a little about what the detection and response capabilities might look like for DNS namespace collisions.
Detection: How do we know there is a problem?
Rapid detection and diagnosis of problems helps to both reduce damage and reduce the time to recovery. Physical security practitioners invest considerably in detection, typically in the form of guards and sensors.
Most meteorological events are detected (with some advance warning) through the use of radars and predictive modeling. Information security practitioners are notoriously light with respect to systematic detection, but we’re getting better!
If there are problematic DNS namespace collisions, the initial symptoms will almost certainly appear through various IT support mechanisms, namely corporate IT departments and the support channels offered by hardware/software/service vendors and Internet Service Providers.
When presented with a new and non-obvious problem, professional and non-professional IT practitioners alike will turn to Internet search engines for answers. This suggests that a good detection/response investment would be to “seed” support vendors/fora with information/documentation about this issue in advance and in a way that will surface when IT folks begin troubleshooting.
We collectively refer to such documentation as “self-help” information. ICANN has already begun developing documentation designed to assist IT support professionals with namespace-related issues.
In the same way that radar gives us some idea where a meteorological storm might hit, we can make reasonable predictions about where issues related to DNS namespace collisions are most likely to first appear.
While problems could appear anywhere, we believe it is most likely that scenarios involving remote (“road warrior”) use cases, branch offices/locations, and Virtual Private Networks are the best places to focus advance preparation.
This educated guess is based on the observation that DNS configurations in these use cases are often brittle due to complexities associated with dynamic and/or location-dependent parameters. Issues may also appear in Small and Medium-sized Enterprises (SMEs) with limited IT sophistication.
This suggests that proactively reaching out to vendors and support mechanisms with a footprint in those areas would also be a wise investment.
Response: Options, Roles, and Responsibilities
In the vast majority of expected cases, the IT professional “detectors” will also be the “responders” and the issue will be resolved without involving other parties. However, let’s consider the situations where other parties may be expected to have a role in response.
For the sake of this discussion, let’s assume that an Internet user is experiencing a problem related to a DNS namespace collision. I use the term “Internet user” broadly as any “consumer” of the global Internet DNS.
At this point in the thought experiment, let’s disregard the severity of the problem. The affected party (or parties) will likely exercise the full range of typical IT support options available to them – vendors, professional support, IT savvy friends and family, and Internet search.
If any of these support vectors are aware of ICANN, they may choose to contact ICANN at any point. Let’s further assume the affected party is unable and/or unwilling to correct the technical problem themselves and ICANN is contacted – directly or indirectly.
There is a critical fork in the road here: Is the expectation that ICANN provide technical “self-help” information or that ICANN will go further and “do something” to technically remedy the issue for the user? The scope of both paths needs substantial consideration.
For the rest of this blog, I want to focus on the various “do something” options. I see a few options; they aren’t mutually exclusive (one could imagine an escalation through these and potentially other options). The options are enumerated for discussion only and order is not meaningful.

  • Option 1: ICANN provides technical support above and beyond “self-help” information to the impacted parties directly, including the provision of services/experts. Stated differently, ICANN becomes an extension of the impacted party’s IT support structure and provides customized/specific troubleshooting and assistance.
  • Option 2: The Registry provides technical support above and beyond “self-help” information to the impacted parties directly, including the provision of services/experts. Stated differently, the Registry becomes an extension of the impacted party’s IT support structure and provides customized/specific troubleshooting and assistance.
  • Option 3: ICANN forwards the issue to the Registry with a specific request to remedy. In this option, assuming all attempts to provide “self-help” are not successful, ICANN would request that the Registry make changes to their zone to technically remedy the issue. This could include temporary or permanent removal of second level names and/or other technical measures that constitute a “registry-level rollback” to a “last known good” configuration.
  • Option 4: ICANN initiates a “root-level rollback” procedure to revert the state of the root zone to a “last known good” configuration, thus (presumably) de-delegating the impacted TLD. In this case, ICANN would attempt – on an emergency basis – to revert the root zone to a state that is not causing harm to the impacted party/parties. Root-level rollback is an impactful and potentially controversial topic and will be the subject of a follow-up blog.

One could imagine all sorts of variations on these options, but I think these are the basic high-level degrees of freedom. We note that ICANN’s New gTLD Collision Occurrence Management Plan and SAC062 contemplate some of these options in a broad sense.
Some key considerations:

  • In the broader sense, what are the appropriate roles and responsibilities for all parties?
  • What are the likely sources to receive complaints when a collision has a deleterious effect?
  • What might the Service Level Agreements look like in the above options? How are they monitored and enforced?
  • How do we avoid the “cure is worse than the disease” problem – limiting the harm without increasing risk of creating new harms and unintended consequences?
  • How do we craft the triggering criteria for each of the above options?
  • How are the “last known good” configurations determined quickly, deterministically, and with low risk?
  • Do we give equal consideration to actors that are following the technical standards vs. those depending on technical happenstance for proper functionality?
  • Are there other options we’re missing?

On Severity of the Harm
Obviously, the severity of the harm can’t be ignored. Short of situations where there is a clear and present danger of bodily harm, severity will almost certainly be measured economically and from multiple points of view. Any party expected to “do something” will be forced to choose between two or more economically motivated actors: users, Registrants, Registrars, and/or Registries experiencing harm.
We must also consider that just as there may be users negatively impacted by new DNS behavior, there may also be users that are depending on the new DNS behavior. A fair and deterministic way to factor severity into the response equation is needed, and the mechanism must be compatible with emergency invocation and the need for rapid action.
Request for Feedback
There is a lot here, which is why we’ve published this early in the process. We eagerly await your ideas, feedback, pushback, corrections, and augmentations.
This is a guest post written by Jeff Schmidt, CEO of JAS Global Advisors LLC. JAS is currently authoring a “Name Collision Occurrence Management Framework” for the new gTLD program under contract with ICANN.

These are the top 50 name collisions

Kevin Murphy, November 19, 2013, Domain Tech

Having spent the last 36 hours crunching ICANN’s lists of almost 10 million new gTLD name collisions, the DI PRO collisions database is back online, and we can start reporting some interesting facts.
First, while we reported yesterday that 1,318 new gTLD applicants will be asked to block a total of 9.8 million unique domain names, the number of distinct second-level strings involved is somewhat smaller.
It’s 6,806,050, according to our calculations, still a bewilderingly high number.
The most commonly blocked string, as expected, is “www”. It’s on the block-lists for 1,195 gTLDs, over 90% of the total.
Second is “2010”. I currently have no explanation for this, but I’m wondering if it’s an artifact of the years of Day In The Life data upon which ICANN based its lists.
Protocol-related strings such as “wpad” and “isatap” also rank highly, as do strings matching popular TLDs such as “com”, “org”, “uk” and “de”. Single-character strings are also very popular.
The brand with the most blocks (free trademark protection?) is unsurprisingly Google.
The string “google” appears as an exact match on 930 gTLDs’ lists. It appears as a substring of 1,235 additional blocked strings, such as “google-toolbar” and “googlemaps”.
Facebook, Yahoo, Gmail, YouTube and Hotmail also feature in the top 100 blocked brands.
DI PRO subscribers can search for strings that interest them, discovering how many and which gTLDs they’re blocked in, using the database.
Here’s a table of the top 50 blocked strings.
[table id=22 /]

ICANN blocks almost 10 million new gTLD domains

Kevin Murphy, November 18, 2013, Domain Registries

ICANN has asked new gTLD registry operators to block a total of 9.8 million domain names, due to the perceived risk of damage from name collisions.
To put it another way, Verisign has managed to take close to 10 million domain names off the market.
ICANN today delivered second-level domain block-lists for 1,327 new gTLDs. Combined, the number of unique blocked domains is just over 9.8 million, according to DI’s preliminary analysis.
Some of the lists relate to gTLDs that will not be approved because they’re in mutually exclusive contention sets with other strings (for example, .unicorn and .unicom).
Twenty-five unfortunate gTLD applicants did not receive lists, because ICANN said they do not qualify for the block-list-based “Alternate Path to Delegation”.
We’re currently crunching the numbers and will have more information later today, with a bit of luck.

Demystifying DITL Data [Guest Post]

Kevin White, November 16, 2013, Domain Tech

With all the talk recently about DNS Namespace Collisions, the heretofore relatively obscure Day In The Life (“DITL”) datasets maintained by the DNS-OARC have been getting a lot of attention.
While these datasets are well known to researchers, I’d like to take the opportunity to provide some background and talk a little about how these datasets are being used to research the DNS Namespace Collision issue.
The Domain Name System Operations Analysis and Research Center (“DNS-OARC”) began working with the root server operators to collect data in 2006. The effort was coined “Day In The Life of the Internet (DITL).”
Root server participation in the DITL collection is voluntary and the number of contributing operators has steadily increased; in 2010, all of the 13 root server letters participated. DITL data collection occurs on an annual basis and covers approximately 50 contiguous hours.
DNS-OARC’s DITL datasets are attractive for researching the DNS Namespace Collision issue because:

  • DITL contains data from multiple root operators;
  • The robust annual sampling methodology (with samples dating back to 2006) allows trending; and
  • It’s available to all DNS-OARC Members.

More information on the DITL collection is available on DNS-OARC’s site at
Terabytes and terabytes of data
The data consists of the raw network “packets” destined for each root server. Contained within the network packets are the DNS queries. The raw data consists of many terabytes of compressed network capture files and processing the raw data is very time-consuming and resource-intensive.
[table id=20 /]
While several researchers have looked at DITL datasets over the years, the current collisions-oriented research started with Roy Hooper of Demand Media. Roy created a process to iterate through this data and convert it into intermediate forms that are much more usable for researching the proposed new TLDs.
We started with his process and continued working with it; our code is available on GitHub for others to review.
Finding needles in DITL haystacks
The first problem faced by researchers interested in new TLDs is isolating the relatively few queries of interest among many terabytes of traffic that are not of interest.
Each root operator contributes several hundred – or several thousand – files full of captured packets in time-sequential order. These packets contain every DNS query reaching the root that requests information about DNS names falling within delegated and undelegated TLDs.
The first step is to search these packets for DNS queries involving the TLDs of interest. The result is one file per TLD containing all queries from all roots involving that TLD. If the input packet is considered a “horizontal” slice of root DNS traffic, then this intermediary work product is a “vertical” slice per TLD.
These intermediary files are much more manageable, ranging from just a few records to 3 GB. To support additional investigation and debugging, the intermediary files that JAS produces are fully “traceable” such that a record in the intermediary file can be traced back to the source raw network packet.
The DITL data contain quite a bit of noise, primarily DNS traffic that was not actually destined for the root. Our process filters the data by destination IP address so that the only remaining data is that which was originally destined for the root name servers.
JAS has made these intermediary per-TLD files available to DNS-OARC members for further analysis.
Then what?
The intermediary files are comparatively small and easy to parse, opening the door to more elaborate research. For example, JAS has written various “second passes” that classify queries, separate queries that use valid syntax at the second level from those that don’t, detect “randomness,” fit regular expressions to the queries, and more.
We have also checked to confirm that second level queries that look like Punycode IDNs (start with ‘xn--‘) are valid Punycode. It is interesting to note the tremendous volume of erroneous, technically invalid, and/or nonsensical DNS queries that make it to the root.
Also of interest is that the datasets are dominated by query strings that appear random and/or machine-generated.
Google’s Chrome browser generates three random 10-character queries upon startup in an effort to detect network properties. Those “Chrome 10” queries together with a relatively small number of other common patterns comprise a significant proportion of the entire dataset.
Research is being done in order to better understand the source of these machine-generated queries.
More technical details and information on running the process is available on the DNS-OARC web site.

This is a guest post written by Kevin White, VP Technology, JAS Global Advisors LLC. JAS is currently authoring a “Name Collision Occurrence Management Framework” for the new gTLD program under contract with ICANN.

Name collisions expert JAS to guest blog on DI

Kevin Murphy, November 14, 2013, Domain Tech

JAS Global Advisors, the consultancy hired by ICANN to provide the final analysis on the risks posed by name collisions in new gTLDs, is to exclusively guest-blog its work here on DI.
ICANN picked JAS to provide a “Name Collision Occurrence Management Framework” earlier this week.
Its job is to basically figure out how new gTLD registries — some of which have been told to block many thousands of potential collisions from their zones — can identify and mitigate the risks, if any, posed by these names.
The framework will help registries reduce the size of their block-lists, in other words.
JAS expects to provide a short series of guest posts over the next few months, explaining the state of the project as it progresses. Reader comments will be read, I’m assured.
JAS CEO Jeff Schmidt said: “The macro intent is to shorten the feedback cycle so folks can see where we are incrementally and comment along the way.”
I’m hoping that the guest posts will provide DI readers with insight into the issue that is as disinterested as DI’s usual coverage, but better informed on the nitty-gritty of the affected technologies.
JAS is a regular consultant for ICANN. It was one of the independent evaluators for the new gTLD program itself.
I’m told that JAS doesn’t have financial relationships with either any new gTLD applicants, which generally think the collision risks have been overstated, or with Verisign, which say they could cause real damage.
JAS isn’t getting paid for the posts; nor is DI getting paid to carry them.
The first post in the series will appear soon, probably Friday.

Over half the world’s biggest brands will be blocked in new gTLDs

Kevin Murphy, November 12, 2013, Domain Registries

More than half of the world’s most-famous brand names already stand to benefit from blocks in new gTLDs, due to the name collisions policy introduced by ICANN recently.
That’s the preliminary conclusion of a quick analysis of the 37 block-lists already published.
Using Interbrand’s list of the top 100 most valuable brands, we find that only 32 do not appear anywhere — either as strings or substrings — on the collisions lists we have today.
Fifty-nine brands are to be blocked as exact matches in at least one new gTLD. Five brands are blocked exactly in 10 or more.
Brand owners blocked in collision lists may not have to fork out for as many defensive registrations, but may also face complications when registries finally start whittling down their lists.
We present the full table of results below, for which the following explanations might be needed:

  • Brand/String — The brands have been normalized to ASCII strings, removing punctuation not compatible with the DNS protocol and converting accented characters to their unaccented equivalents (for example, “Nescafé” becomes “Nescafe”). For DI PRO subscribers, each string links to a search on the database for that string.
  • Exact Matches — The number of gTLDs (currently out of 37) in which this exact-match brand will be blocked.
  • Unique Strings — The number of strings containing this brand that appear on block-lists. In some cases this may provide misleading results due to the usual overkill you get when matching substrings. For example, two-character brands such as 3M and HP get a lot of hits, the vast majority of which do not appear to relate to the brand itself, whereas every hit for Google does in fact refer to the brand.

[table id=19 /]
The numbers will of course grow rapidly as ICANN publishes more collisions lists.
If there’s sufficient interest from DI PRO subscribers in this breakdown being kept up to date on an ongoing basis, I’ll bolt it on to to the existing collisions database.

NTAG rubbishes new gTLD collision risk report

Kevin Murphy, August 15, 2013, Domain Policy

The New gTLD Applicants Group has slated Interisle Consulting’s report into the risk of new gTLDs causing security problems on the internet, saying the problem is “overstated”.
The group, which represents applicants for hundreds of gTLDs and has a non-voting role in ICANN’s GNSO, called on ICANN to reclassify hundreds of “Uncalculated” risk strings as “Low” risk, meaning they would not face as substantial a delay before or uncertainty about their eventual delegation.
But NTAG said it “agreed” that the high-risk .corp and .home “should be delayed while further studies are conducted”. The current ICANN proposal is actually to reject both of these strings.
NTAG was responding to ICANN’s proposal earlier this month to delay 523 applications (for 279 strings) by three to six months while further studies are carried out.
The proposal was based on Interisle’s study of DNS root server logs, which showed many millions of daily queries for gTLDs that currently do not exist but have been applied for.
The worry is that delegating those strings would cause problems such as downtime or data leakage, where sensitive information intended for a recipient on the same local network would be sent instead to a new gTLD registry or one of its (possibly malicious) registrants.
NTAG reckons the risk presented by Interisle has been overblown, and it presented a point-by-point analysis of its own. It called for everything except .corp and .home to be categorized “Low” risk, saying:

We recognize that a small number of applied for names may possibly pose a risk to current operations, but we believe very strongly that there is no quantitative basis for holding back strings that pose less measurable threat than almost all existing TLDs today. This is why we urge the board to proceed with the applications classified as “Unknown Risk” using the mitigations recommended by staff for “Low Risk” strings. We believe the 80% of strings classified as “Low Risk” should proceed immediately with no additional mitigations.

The group pointed to a recent analysis by Verisign (which, contrarily, was trying to show that new gTLDs should be delayed) which included data about previous new gTLD delegations.
That report (pdf) said that .xxx was seeing 4,018 look-ups per million queries at the DNS root (PPM) before it was delegated. The number for .asia was 2,708.
If you exclude .corp and .home, both of those PPM numbers are multiples larger than the equivalent measures of query volume for every applied-for gTLD today, also according to Verisign’s data.
NTAG said:

None of these strings pose any more risk than .xxx, .asia and other currently operating TLDs.

the least “dangerous” current gTLD on the chart, .sx, had 331 queries per million in 2006. This is a higher density of NXDOMAIN queries than all but five proposed new TLDs. 4 Again, .sx was launched successfully in 2012 with none of the problems predicted in these reports.

Verisign’s report, which sought to provide a more qualitative risk analysis based on some data-supported guesses about where the error traffic is coming from and why, anticipated this interpretation.
Verisign said:

This could indicate that there is nothing to worry about when adding new TLDs, because there was no global failure of DNS when this was done before. Alternately, one might conclude that traffic volumes are not the only indicator of risk, and the semantic meaning of strings might also play a role. We posit that in some cases, those strings with semantic meanings, and which are in common use (such as in speech, writing, etc.) pose a greater risk for naming collision.

The company spent most of its report making somewhat tenuous correlations between its data (such as a relatively large number of requests for .medical from Japanese IP addresses) and speculative impacts (such as “undiagnosed system failures” at “a healthcare provider in Japan”).
NTAG, by contrast, is playing down the potential for negative outcomes, saying that in many cases the risks introduced by new gTLDs are no different from collision risks at the second level in existing TLDs.

Just as the NTAG would not ask ICANN to halt .com registrations while a twelve month study is performed on these problems, we believe there is no reason to introduce a delay in diversifying the Internet’s namespace due to these concerns.

While it stopped short of alleging shenanigans this time around, NTAG also suggested that future studies of root server error traffic could be gamed if botnets were engaged to crapflood the roots.
Its own mitigation plan, which addresses Interisle’s specific concerns, says that most of the reasons that non-existent TLDs are being looked up are either not a problem or can be easily mitigated.
For example, it says that queries for .youtube that arrived in the form of a request for “” are probably browser typos and that there’s no risk for users if they’re taken to the YouTube dot-brand instead of
In another example, it points out that requests for “.cisco” or “.toshiba” without any second-level domains won’t resolve anyway, if dotless domains are banned in those TLDs. (NTAG, which has influential members in favor of dotless domains, stopped short of asking for a blanket ban.)
The Interisle report, and ICANN’s proposal to deal with it, are open for public comment until September 17. NTAG’s response is remarkably quick off the mark, for guessable reasons.