Latest news of the domain name industry

Recent Posts

Verisign plans TLD standards group

Kevin Murphy, September 22, 2014, Domain Tech

Verisign is trying to form a new industry standards-setting association for domain name registries and registrars.

To be called the Registration Operations AssociationTM (yes, according to its web site it is apparently already trademarked), Verisign wants potential members of the group to meet in October to figure out whether such an association is needed and what its remit would be.

But the Domain Name Association apparently has other ideas, suggesting in a recent blog post that the DNA would be the best place for these kinds of technical discussions to take place.

In the second of a series of three blog posts revealing the ROA plan, Verisign senior director Scott Hollenbeck said:

The primary purpose of an association would be to facilitate communication and technical coordination among implementers and operators of the EPP protocol and its current extensions to address interoperability and efficiency obstacles.

EPP is the Extensible Provisioning Protocol used by registrars to transact with all gTLD and many ccTLD registries. It’s an IETF standard written by Hollenbeck over a decade ago.

One of the problems with it is that it is “extensible” by design, so every time a registry extends it to deal with a peculiarity of a particular TLD, partner registrars have to code new connectors.

In a world of hundreds of new gTLDs, that becomes burdensome, Hollenbeck explained in his posts.

An industry association such as the formative ROA could help registries with common requirements standardize on a single EPP extension, streamlining interoperability.

That would be good for new gTLDs.

It’s no secret that many registrars are struggling to keep up with new gTLD launches while providing a good customer experience, as Andrew Allemann pointed out last week.

The need for cooperation seems plain; the question now is what is the correct forum.

While Verisign is pushing for a new group, the DNA reckons the task could be best-performed under its own umbrella.

Executive director Kurt Pritz blogged:

Given its multi-functional and global diversity, the DNA will be an effective place to coordinate discussion of these issues and to involve broader domain name industry involvement.

Verisign isn’t a DNA member. In fact, it appears to be the only significant back-end registry provider in the western world not to have purchased a membership.

But Pritz said in his post that technical discussions would not be limited to DNA members only — anyone would be able to participate without coughing up the $5,000 to $50,000 a year the group charges:

Recognizing that industry-wide issues are… well … industry wide, the DNA Board determined that this work must include those inside and outside the DNA, welcoming all domain name industry members. Scott and others from Verisign and other firms are invited regardless of whether they join the DNA.

So is the industry going to have to deal with two rival standards-setting groups?

In the many years I was a general Silicon Valley tech reporter, I must have written scores of articles about new technologies spurring the creation of competing “standards” organizations.

Usually, this involved pitting an incumbent monopolist such as Microsoft against a coalition of smaller rivals.

It makes for great headlines, but I’m not sure the domain name industry is big enough to support or require multiple groups tackling the same problems.

With resource-strapped registries and registrars already struggling to make new gTLDs work in any meaningful way, I doubt their geeks would appreciate duplicating their efforts.

I don’t know whether the DNA or ROA would be the best venue for the work, but I strongly suspect the work itself, which almost certainly needs to be done, only needs to be done once.

Verisign wants interested parties to meet in Los Angeles on October 16, just as the ICANN meeting there concludes. The meeting may also be webcast for those unable to attend in person.

Twitter starts supporting (some) new gTLDs

Kevin Murphy, March 7, 2014, Domain Tech

Twitter has started recognizing new gTLDs on its web page and on Tweetdeck.

As of some point in the last 48 hours, you can type something like “nic.berlin” or “fire.plumbing” in a tweet and Twitter will automatically turn it into a clickable link.

The switcheroo seems to have happened in the last two days, as this conversation may illustrate.

But there seems to be some delay — about a month, by my reckoning — in the support.

Domains such as nic.sexy, which is in a TLD delegated November 14, become clickable, but domains in more recent delegations such as .okinawa, which hit the root on Wednesday, are not.

Going back through the DI PRO Calendar, it seems that any TLD delegated on February 5 or earlier gets clickable links in Twitter and those delegated over the last four weeks do not.

I’m not sure why TLDs delegated in the last month are not supported, but I imagine it could be an annoyance during registries’ pre-launch marketing.

It’s difficult to overestimate how important application support is for the new gTLD program.

If new gTLDs don’t look like web addresses, there’s going to be a big barrier to adoption. A .link domain that isn’t clickable isn’t much use and nobody wants to have to copy-paste URLs.

Support for new gTLDs for Twitter’s 232 million active users is a big step along the road to universal acceptance of all TLDs, which ICANN has identified as a problem.

New gTLD registries given way to free up millions of blocked names

Kevin Murphy, February 27, 2014, Domain Tech

Up to 9.8 million new gTLD domain names are to get a get-out-of-jail card, with the publication yesterday of ICANN’s plan to mitigate the risk of damaging name collisions.

As a loyal DI reader, the details of the plan will not come as a great surprise. It was developed by JAS Global Advisors and previewed in a guest post by CEO Jeff Schmidt in January

Name collisions are scenarios where a TLD delegated by ICANN to the public DNS matches a TLD that one or more organizations already uses on their internal networks.

Verisign, in what many view as protectionist propaganda, has been arguing that name collisions could cause widespread technical and economic damage and even a risk to life.

Things might stop working and secret data might leak out of corporate networks, Verisign warns.

JAS’ proposed solution, which ICANN has opened for public comment, is quite clever, I think.

Called “controlled interruption”, it will see new gTLD registries being asked to wildcard their entire second level of their TLDs to point to the IP address 127.0.53.53.

If there’s a name collision on example.corp the company using that TLD on its network will notice unusual behavior and will have an opportunity to fix the problem.

Importantly, no data apart from the DNS look-up will leak out of their networks — the 127/8 IP address block is reserved by various standards for local uses only.

The registry will essentially bounce the DNS request back to the network making the request. If that behavior causes problems, the network administrator will presumably check her logs, notice the odd IP address, and Google it for further information.

Today, she’ll find a Slashdot article about the name collisions plan, which should put the admin on the road to figuring out the problem and fixing her network. In future, maybe ICANN will rank for the term.

Registries would be able to choose whether to wildcard their whole TLD or to only point to 127.0.53.53 those second-level names currently on their collisions block lists.

In either case, the redirection would only last for the first 120 days after delegation. That’s the same duration as the quiet period ICANN already imposes on new delegations, during which only “nic.” may resolve.

After the 120 days are up, the name collisions issue would be considered permanently closed for that TLD.

If this goes ahead, the plan will allow registries to unblock as many as 9.8 million domain names representing 6.8 million unique second-level labels, according to DI PRO collisions database.

It could also put an end to the argument about whether name collisions really were a significant problem (160,000 new gTLD names are already live and we haven’t heard any reports of collisions yet).

Pointing to the fact that new TLDs, some of which showed evidence of collisions, were getting delegated rather regularly before the current new gTLD round, JAS said in its report:

We do not find that the addition of new Top Level Domains (TLDs) fundamentally or significantly increases or changes the risks associated with DNS namespace collisions. The modalities, risks, and etiologies of the inevitable DNS namespace collisions in new TLD namespaces will resemble the collisions that already occur routinely in the other parts of the DNS.

However…

Collisions in all TLDs and at all levels within the global Internet DNS namespace have the ability to expose potentially serious security and availability problems and deserve serious attention.

JAS calls its plan “a conservative buffer between potential legacy usage of a TLD and the new usage”.

As wildcarding is currently prohibited by ICANN’s standard Registry Agreement (ironically, to prevent a repeat of Verisign’s Site Finder) an amendment is going to be needed, as the JAS plan acknowledges.

The drawback of the plan is that if an organization is relying on a colliding internal TLD, whatever systems use that TLD could break under the plan. The 127/8 redirection is a way to help them resolve the breakage, not always to prevent it happening at all.

For new gTLD registries it’s pretty good news, however. There are many thousands of potentially valuable premium names blocked under the current regime that would be made available for sale.

If you’re an applicant for .mail, however, it’s a different story. The JAS report says .mail should be reserved forever, putting it in the same category as .home and .corp:

the use of .corp and .home for internal namespaces/networks is so overwhelming that the inertia created by such a large “installed base” and prevalent use is not likely reversible. We also note that RFC 6762 suggests that .corp and .home are safe for use on internal networks.

Like .corp and .home, the TLD .mail also exhibits prevalent, widespread use at a level materially greater than all other applied-for TLDs. Our research found that .mail has been hardcoded into a number of installations, provided in a number of example configuration scripts/defaults, and has a large global “installed base” that is likely to have significant inertia comparable to .corp and .home. As such, we believe .mail’s prevalent internal use is also likely irreversible and recommend reservation similar to .corp and .home.

In other words, .mail is dead and the five remaining applicants for the string are probably going to be forced to withdraw through no fault of their own. Should these companies get a full refund from ICANN?

Bing: domains just not that relevant to SEO

Kevin Murphy, January 20, 2014, Domain Tech

Anyone who thinks that having a exact-match keyword domain automatically promotes their web site to the top of search results is in for a rude awakening, according to a top guy at Bing.

In a blog post, Bing senior product manager Duane Forrester tried to debunk the “myth… That merely having a popular keyword in the domain will help that site, regardless of content, rank on the high volume keyword”.

Forrester wrote:

Ranking today is a result of so many signals fed into the system the words used in a domain send less and less information into the stack as a percentage of overall decision making signals.

There are no shortcuts. Even the new generic top level domains (gTLDs) coming out near the end of February will be treated in this manner. Domain spamming isn’t new, so sites that provide value, are relevant and that people like will rank as usual. They won’t rank “just because” they have certain words in them, and thinking that keyword stuffing a domain (think: cars.cars) will give you an edge is dangerous.

Forrester’s post is not a condemnation of keyword domains, however. He does not deny that the domain is one factor Bing takes into account in rankings, albeit one of very many.

Rather, it seems he’s trying to point out that it’s possible to get decent search traffic even when your domain has nothing to do with your content (he gives satire site The Onion as an example).

His overall message is that creating good content is the way to get good SEO, something that will come as absolutely no surprise to anyone who’s been paying attention to the pronouncements of search engine companies for the last several years.

Controlled interruption as a means to prevent name collisions [Guest Post]

Jeff Schmidt, January 8, 2014, Domain Tech

This is a guest post written by Jeff Schmidt, CEO of JAS Global Advisors LLC. JAS is currently authoring a “Name Collision Occurrence Management Framework” for the new gTLD program under contract with ICANN.

One of JAS’ commitments during this process was to “float” ideas and solicit feedback. This set of thoughts poses an alternative to the “trial delegation” proposals in SAC062. The idea springs from past DNS-related experiences and has an effect we have named “controlled interruption.”

Learning from the Expired Registration Recovery Policy

Many are familiar with the infamous Microsoft Hotmail domain expiration in 1999. In short, a Microsoft registration for passport.com (Microsoft’s then-unified identity service) expired Christmas Eve 1999, denying millions of users access to the Hotmail email service (and several other Microsoft services) for roughly 20 hours.

Fortunately, a well-intended technology consultant recognized the problem and renewed the registration on Microsoft’s behalf, yielding a nice “thank you” from Microsoft and Network Solutions. Had a bad actor realized the situation, the outcome could have been far different.

The Microsoft Hotmail case and others like it lead to the current Expired Registration Recovery Policy.

More recently, Regions Bank made news when its domains expired, and countless others go unreported. In the case of Regions Bank, the Expired Registration Recovery Policy seemed to work exactly as intended – the interruption inspired immediate action and the problem was solved, resulting in only a bit of embarrassment.

Importantly, there was no opportunity for malicious activity.

For the most part, the Expired Registration Recovery Policy is effective at preventing unintended expirations. Why? We call it the application of “controlled interruption.”

The Expired Registration Recovery Policy calls for extensive notification before the expiration, then a period when “the existing DNS resolution path specified by the Registrant at Expiration (“RAE”) must be interrupted” – as a last-ditch effort to inspire the registrant to take action.

Nothing inspires urgent action more effectively than service interruption.

But critically, in the case of the Expired Registration Recovery Policy, the interruption is immediately corrected if the registrant takes the required action — renewing the registration.

It’s nothing more than another notification attempt – just a more aggressive round after all of the passive notifications failed. In the case of a registration in active use, the interruption will be recognized immediately, inspiring urgent action. Problem solved.

What does this have to do with collisions?

A Trial Delegation Implementing Controlled Interruption

There has been a lot of talk about various “trial delegations” as a technical mechanism to gather additional data regarding collisions and/or attempt to notify offending parties and provide self-help information. SAC062 touched on the technical models for trial delegations and the related issues.

Ideally, the approach should achieve these objectives:

  • Notifies systems administrators of possible improper use of the global DNS;
  • Protects these systems from malicious actors during a “cure period”;
  • Doesn’t direct potentially sensitive traffic to Registries, Registrars, or other third parties;
  • Inspires urgent remediation action; and
  • Is easy to implement and deterministic for all parties.

Like unintended expirations, collisions are largely a notification problem. The offending system administrator must be notified and take action to preserve the security and stability of their system.

One approach to consider as an alternative trial delegation concept would be an application of controlled interruption to help solve this notification problem. The approach draws on the effectiveness of the Expired Registration Recovery Policy with the implementation looking like a modified “Application and Service Testing and Notification (Type II)” trial delegation as proposed in SAC62.

But instead of responding with pointers to application layer listeners, the authoritative nameserver would respond with an address inside 127/8 — the range reserved for localhost. This approach could be applied to A queries directly and MX queries via an intermediary A record (the vast majority of collision behavior observed in DITL data stems from A and MX queries).

Responding with an address inside 127/8 will likely break any application depending on a NXDOMAIN or some other response, but importantly also prevents traffic from leaving the requestor’s network and blocks a malicious actor’s ability to intercede.

In the same way as the Expired Registration Recovery Policy calls for “the existing DNS resolution path specified by the RAE [to] be interrupted”, responding with localhost will hopefully inspire immediate action by the offending party while not exposing them to new malicious activity.

If legacy/unintended use of a DNS name is present, one could think of controlled interruption as a “buffer” prior to use by a legitimate new registrant. This is similar to the CA Revocation Period as proposed in the New gTLD Collision Occurrence Management Plan which “buffers” the legacy use of certificates in internal namespaces from new use in the global DNS. Like the CA Revocation Period approach, a set period of controlled interruption is deterministic for all parties.

Moreover, instead of using the typical 127.0.0.1 address for localhost, we could use a “flag” IP like 127.0.53.53.

Why? While troubleshooting the problem, the administrator will likely at some point notice the strange IP address and search the Internet for assistance. Making it known that new TLDs may behave in this fashion and publicizing the “flag” IP (along with self-help materials) may help administrators isolate the problem more quickly than just using the common 127.0.0.1.

We could also suggest that systems administrators proactively search their logs for this flag IP as a possible indicator of problems.

Why the repeated 53? Preserving the 127.0/16 seems prudent to make sure the IP is treated as localhost by a wide range of systems; the repeated 53 will hopefully draw attention to the IP and provide another hint that the issue is DNS related.

Two controlled interruption periods could even be used — one phase returning 127.0.53.53 for some period of time, and a second slightly more aggressive phase returning 127.0.0.1. Such an approach may cover more failure modes of a wide variety of requestors while still providing helpful hints for troubleshooting.

A period of controlled interruption could be implemented before individual registrations are activated, or for an entire TLD zone using a wildcard. In the case of the latter, this could occur simultaneously with the CA Revocation Period as described in the New gTLD Collision Occurrence Management Plan.

The ability to “schedule” the controlled interruption would further mitigate possible effects.

One concern in dealing with collisions is the reality that a potentially harmful collision may not be identified until months or years after a TLD goes live — when a particular second level string is registered.

A key advantage to applying controlled interruption to all second level strings in a given TLD in advance and at once via wildcard is that most failure modes will be identified during a scheduled time and before a registration takes place.

This has many positive features, including easier troubleshooting and the ability to execute a far less intrusive rollback if a problem does occur. From a practical perspective, avoiding a complex string-by-string approach is also valuable.

If there were to be a catastrophic impact, a rollback could be implemented relatively quickly, easily, and with low risk while the impacted parties worked on a long-term solution. A new registrant and associated new dependencies would likely not be adding complexity at this point.

Request for Feedback

As stated above, one of JAS’ commitments during this process was to “float” ideas and solicit feedback early in the process. Please consider these questions:

  • What unintended consequences may surface if localhost IPs are served in this fashion?
  • Will serving localhost IPs cause the kind of visibility required to inspire action?
  • What are the pros and cons of a “TLD-at-once” wildcard approach running simultaneously with the CA Revocation Period?
  • Is there a better IP (or set of IPs) to use?
  • Should the controlled interruption plan described here be included as part of the mitigation plan? Why or why not?
  • To what extent would this methodology effectively address the perceived problem?
  • Other feedback?

We anxiously await your feedback — in comments to this blog, on the DNS-OARC Collisions list, or directly. Thank you and Happy New Year!

Bing already recognizes new gTLDs (Google doesn’t)

Kevin Murphy, December 16, 2013, Domain Tech

Microsoft seems to be ahead of its rival Google when it comes to recognizing new gTLDs in their respective search engines.

Doing an advanced search for sites within specific new gTLDs on Bing is returning results today. The same cannot be said for Google, however.

Here’s an example of a search results page limited to Uniregistry’s .sexy:

The same type of search seems to work for .tattoo (Uniregistry) and .ruhr (Regiodot) but not for .uno or for any of Donuts’ many Latin-script gTLDs (which all currently redirect to donuts.co).

Sometimes the searches work with a dot, sometimes they don’t.

Searching for Donuts’ and other registries’ IDN gTLDs also seems to work in Bing, but only when you search for the A-label (eg .xn--unup4y) rather than the U-label (.游戏).

New gTLD support appears to be a work in progress at Microsoft, in other words, but the company does seem to be further along than Google, which so far doesn’t return any results for the same queries.

DNS Namespace Collisions: Detection and Response [Guest Post]

Jeff Schmidt, November 28, 2013, Domain Tech

Those tracking the namespace collision issue in Buenos Aries heard a lot regarding the potential response scenarios and capabilities. Because this is an important, deep, and potentially controversial topic, we wanted to get some ideas out early on potential solutions to start the conversation.

Since risk can almost never be driven to zero, a comprehensive approach to risk management contains some level of a priori risk mitigation combined with investment in detection and response capabilities.

In my city of Chicago, we tend to be particularly sensitive about fires. In Chicago, like in most cities, we have a priori protection in the form of building codes, detection in the form of smoke/fire alarms, and response in the form of 9-1-1, sprinklers, and the very capable Chicago Fire Department.

Let’s think a little about what the detection and response capabilities might look like for DNS namespace collisions.

Detection: How do we know there is a problem?

Rapid detection and diagnosis of problems helps to both reduce damage and reduce the time to recovery. Physical security practitioners invest considerably in detection, typically in the form of guards and sensors.

Most meteorological events are detected (with some advance warning) through the use of radars and predictive modeling. Information security practitioners are notoriously light with respect to systematic detection, but we’re getting better!

If there are problematic DNS namespace collisions, the initial symptoms will almost certainly appear through various IT support mechanisms, namely corporate IT departments and the support channels offered by hardware/software/service vendors and Internet Service Providers.

When presented with a new and non-obvious problem, professional and non-professional IT practitioners alike will turn to Internet search engines for answers. This suggests that a good detection/response investment would be to “seed” support vendors/fora with information/documentation about this issue in advance and in a way that will surface when IT folks begin troubleshooting.

We collectively refer to such documentation as “self-help” information. ICANN has already begun developing documentation designed to assist IT support professionals with namespace-related issues.

In the same way that radar gives us some idea where a meteorological storm might hit, we can make reasonable predictions about where issues related to DNS namespace collisions are most likely to first appear.

While problems could appear anywhere, we believe it is most likely that scenarios involving remote (“road warrior”) use cases, branch offices/locations, and Virtual Private Networks are the best places to focus advance preparation.

This educated guess is based on the observation that DNS configurations in these use cases are often brittle due to complexities associated with dynamic and/or location-dependent parameters. Issues may also appear in Small and Medium-sized Enterprises (SMEs) with limited IT sophistication.

This suggests that proactively reaching out to vendors and support mechanisms with a footprint in those areas would also be a wise investment.

Response: Options, Roles, and Responsibilities

In the vast majority of expected cases, the IT professional “detectors” will also be the “responders” and the issue will be resolved without involving other parties. However, let’s consider the situations where other parties may be expected to have a role in response.

For the sake of this discussion, let’s assume that an Internet user is experiencing a problem related to a DNS namespace collision. I use the term “Internet user” broadly as any “consumer” of the global Internet DNS.

At this point in the thought experiment, let’s disregard the severity of the problem. The affected party (or parties) will likely exercise the full range of typical IT support options available to them – vendors, professional support, IT savvy friends and family, and Internet search.

If any of these support vectors are aware of ICANN, they may choose to contact ICANN at any point. Let’s further assume the affected party is unable and/or unwilling to correct the technical problem themselves and ICANN is contacted – directly or indirectly.

There is a critical fork in the road here: Is the expectation that ICANN provide technical “self-help” information or that ICANN will go further and “do something” to technically remedy the issue for the user? The scope of both paths needs substantial consideration.

For the rest of this blog, I want to focus on the various “do something” options. I see a few options; they aren’t mutually exclusive (one could imagine an escalation through these and potentially other options). The options are enumerated for discussion only and order is not meaningful.

  • Option 1: ICANN provides technical support above and beyond “self-help” information to the impacted parties directly, including the provision of services/experts. Stated differently, ICANN becomes an extension of the impacted party’s IT support structure and provides customized/specific troubleshooting and assistance.
  • Option 2: The Registry provides technical support above and beyond “self-help” information to the impacted parties directly, including the provision of services/experts. Stated differently, the Registry becomes an extension of the impacted party’s IT support structure and provides customized/specific troubleshooting and assistance.
  • Option 3: ICANN forwards the issue to the Registry with a specific request to remedy. In this option, assuming all attempts to provide “self-help” are not successful, ICANN would request that the Registry make changes to their zone to technically remedy the issue. This could include temporary or permanent removal of second level names and/or other technical measures that constitute a “registry-level rollback” to a “last known good” configuration.
  • Option 4: ICANN initiates a “root-level rollback” procedure to revert the state of the root zone to a “last known good” configuration, thus (presumably) de-delegating the impacted TLD. In this case, ICANN would attempt – on an emergency basis – to revert the root zone to a state that is not causing harm to the impacted party/parties. Root-level rollback is an impactful and potentially controversial topic and will be the subject of a follow-up blog.

One could imagine all sorts of variations on these options, but I think these are the basic high-level degrees of freedom. We note that ICANN’s New gTLD Collision Occurrence Management Plan and SAC062 contemplate some of these options in a broad sense.

Some key considerations:

  • In the broader sense, what are the appropriate roles and responsibilities for all parties?
  • What are the likely sources to receive complaints when a collision has a deleterious effect?
  • What might the Service Level Agreements look like in the above options? How are they monitored and enforced?
  • How do we avoid the “cure is worse than the disease” problem – limiting the harm without increasing risk of creating new harms and unintended consequences?
  • How do we craft the triggering criteria for each of the above options?
  • How are the “last known good” configurations determined quickly, deterministically, and with low risk?
  • Do we give equal consideration to actors that are following the technical standards vs. those depending on technical happenstance for proper functionality?
  • Are there other options we’re missing?

On Severity of the Harm

Obviously, the severity of the harm can’t be ignored. Short of situations where there is a clear and present danger of bodily harm, severity will almost certainly be measured economically and from multiple points of view. Any party expected to “do something” will be forced to choose between two or more economically motivated actors: users, Registrants, Registrars, and/or Registries experiencing harm.

We must also consider that just as there may be users negatively impacted by new DNS behavior, there may also be users that are depending on the new DNS behavior. A fair and deterministic way to factor severity into the response equation is needed, and the mechanism must be compatible with emergency invocation and the need for rapid action.

Request for Feedback

There is a lot here, which is why we’ve published this early in the process. We eagerly await your ideas, feedback, pushback, corrections, and augmentations.

This is a guest post written by Jeff Schmidt, CEO of JAS Global Advisors LLC. JAS is currently authoring a “Name Collision Occurrence Management Framework” for the new gTLD program under contract with ICANN.

These are the top 50 name collisions

Kevin Murphy, November 19, 2013, Domain Tech

Having spent the last 36 hours crunching ICANN’s lists of almost 10 million new gTLD name collisions, the DI PRO collisions database is back online, and we can start reporting some interesting facts.

First, while we reported yesterday that 1,318 new gTLD applicants will be asked to block a total of 9.8 million unique domain names, the number of distinct second-level strings involved is somewhat smaller.

It’s 6,806,050, according to our calculations, still a bewilderingly high number.

The most commonly blocked string, as expected, is “www”. It’s on the block-lists for 1,195 gTLDs, over 90% of the total.

Second is “2010”. I currently have no explanation for this, but I’m wondering if it’s an artifact of the years of Day In The Life data upon which ICANN based its lists.

Protocol-related strings such as “wpad” and “isatap” also rank highly, as do strings matching popular TLDs such as “com”, “org”, “uk” and “de”. Single-character strings are also very popular.

The brand with the most blocks (free trademark protection?) is unsurprisingly Google.

The string “google” appears as an exact match on 930 gTLDs’ lists. It appears as a substring of 1,235 additional blocked strings, such as “google-toolbar” and “googlemaps”.

Facebook, Yahoo, Gmail, YouTube and Hotmail also feature in the top 100 blocked brands.

DI PRO subscribers can search for strings that interest them, discovering how many and which gTLDs they’re blocked in, using the database.

Here’s a table of the top 50 blocked strings.

StringgTLD Count
www1195
20101187
com1124
wpad1048
net1032
isatap1030
org1008
mail964
google930
ww911
uk908
info905
http901
de900
us897
co881
local872
edu865
cn839
a839
e837
ru836
m833
ca831
c826
it821
tv817
server817
in814
gov814
wwww810
f804
facebook803
br803
fr799
ftp796
au796
yahoo794
1784
w780
biz778
g776
forum776
my764
cc762
jp761
s758
images754
webmail753
p749

Demystifying DITL Data [Guest Post]

Kevin White, November 16, 2013, Domain Tech

With all the talk recently about DNS Namespace Collisions, the heretofore relatively obscure Day In The Life (“DITL”) datasets maintained by the DNS-OARC have been getting a lot of attention.

While these datasets are well known to researchers, I’d like to take the opportunity to provide some background and talk a little about how these datasets are being used to research the DNS Namespace Collision issue.

The Domain Name System Operations Analysis and Research Center (“DNS-OARC”) began working with the root server operators to collect data in 2006. The effort was coined “Day In The Life of the Internet (DITL).”

Root server participation in the DITL collection is voluntary and the number of contributing operators has steadily increased; in 2010, all of the 13 root server letters participated. DITL data collection occurs on an annual basis and covers approximately 50 contiguous hours.

DNS-OARC’s DITL datasets are attractive for researching the DNS Namespace Collision issue because:

  • DITL contains data from multiple root operators;
  • The robust annual sampling methodology (with samples dating back to 2006) allows trending; and
  • It’s available to all DNS-OARC Members.

More information on the DITL collection is available on DNS-OARC’s site at https://www.dns-oarc.net/oarc/data/ditl.

Terabytes and terabytes of data

The data consists of the raw network “packets” destined for each root server. Contained within the network packets are the DNS queries. The raw data consists of many terabytes of compressed network capture files and processing the raw data is very time-consuming and resource-intensive.

YearSize
2006230G
2007741G
20082T
2009806G
20106.6T
20114.6T
20128.2T
20134.7T

While several researchers have looked at DITL datasets over the years, the current collisions-oriented research started with Roy Hooper of Demand Media. Roy created a process to iterate through this data and convert it into intermediate forms that are much more usable for researching the proposed new TLDs.

We started with his process and continued working with it; our code is available on GitHub for others to review.

Finding needles in DITL haystacks

The first problem faced by researchers interested in new TLDs is isolating the relatively few queries of interest among many terabytes of traffic that are not of interest.

Each root operator contributes several hundred – or several thousand – files full of captured packets in time-sequential order. These packets contain every DNS query reaching the root that requests information about DNS names falling within delegated and undelegated TLDs.

The first step is to search these packets for DNS queries involving the TLDs of interest. The result is one file per TLD containing all queries from all roots involving that TLD. If the input packet is considered a “horizontal” slice of root DNS traffic, then this intermediary work product is a “vertical” slice per TLD.

These intermediary files are much more manageable, ranging from just a few records to 3 GB. To support additional investigation and debugging, the intermediary files that JAS produces are fully “traceable” such that a record in the intermediary file can be traced back to the source raw network packet.

The DITL data contain quite a bit of noise, primarily DNS traffic that was not actually destined for the root. Our process filters the data by destination IP address so that the only remaining data is that which was originally destined for the root name servers.

JAS has made these intermediary per-TLD files available to DNS-OARC members for further analysis.

Then what?

The intermediary files are comparatively small and easy to parse, opening the door to more elaborate research. For example, JAS has written various “second passes” that classify queries, separate queries that use valid syntax at the second level from those that don’t, detect “randomness,” fit regular expressions to the queries, and more.

We have also checked to confirm that second level queries that look like Punycode IDNs (start with ‘xn--‘) are valid Punycode. It is interesting to note the tremendous volume of erroneous, technically invalid, and/or nonsensical DNS queries that make it to the root.

Also of interest is that the datasets are dominated by query strings that appear random and/or machine-generated.

Google’s Chrome browser generates three random 10-character queries upon startup in an effort to detect network properties. Those “Chrome 10” queries together with a relatively small number of other common patterns comprise a significant proportion of the entire dataset.

Research is being done in order to better understand the source of these machine-generated queries.

More technical details and information on running the process is available on the DNS-OARC web site.

This is a guest post written by Kevin White, VP Technology, JAS Global Advisors LLC. JAS is currently authoring a “Name Collision Occurrence Management Framework” for the new gTLD program under contract with ICANN.

Name collisions expert JAS to guest blog on DI

Kevin Murphy, November 14, 2013, Domain Tech

JAS Global Advisors, the consultancy hired by ICANN to provide the final analysis on the risks posed by name collisions in new gTLDs, is to exclusively guest-blog its work here on DI.

ICANN picked JAS to provide a “Name Collision Occurrence Management Framework” earlier this week.

Its job is to basically figure out how new gTLD registries — some of which have been told to block many thousands of potential collisions from their zones — can identify and mitigate the risks, if any, posed by these names.

The framework will help registries reduce the size of their block-lists, in other words.

JAS expects to provide a short series of guest posts over the next few months, explaining the state of the project as it progresses. Reader comments will be read, I’m assured.

JAS CEO Jeff Schmidt said: “The macro intent is to shorten the feedback cycle so folks can see where we are incrementally and comment along the way.”

I’m hoping that the guest posts will provide DI readers with insight into the issue that is as disinterested as DI’s usual coverage, but better informed on the nitty-gritty of the affected technologies.

JAS is a regular consultant for ICANN. It was one of the independent evaluators for the new gTLD program itself.

I’m told that JAS doesn’t have financial relationships with either any new gTLD applicants, which generally think the collision risks have been overstated, or with Verisign, which say they could cause real damage.

JAS isn’t getting paid for the posts; nor is DI getting paid to carry them.

The first post in the series will appear soon, probably Friday.