Latest news of the domain name industry

Recent Posts

Demystifying DITL Data [Guest Post]

Kevin White, November 16, 2013, Domain Tech

With all the talk recently about DNS Namespace Collisions, the heretofore relatively obscure Day In The Life (“DITL”) datasets maintained by the DNS-OARC have been getting a lot of attention.

While these datasets are well known to researchers, I’d like to take the opportunity to provide some background and talk a little about how these datasets are being used to research the DNS Namespace Collision issue.

The Domain Name System Operations Analysis and Research Center (“DNS-OARC”) began working with the root server operators to collect data in 2006. The effort was coined “Day In The Life of the Internet (DITL).”

Root server participation in the DITL collection is voluntary and the number of contributing operators has steadily increased; in 2010, all of the 13 root server letters participated. DITL data collection occurs on an annual basis and covers approximately 50 contiguous hours.

DNS-OARC’s DITL datasets are attractive for researching the DNS Namespace Collision issue because:

  • DITL contains data from multiple root operators;
  • The robust annual sampling methodology (with samples dating back to 2006) allows trending; and
  • It’s available to all DNS-OARC Members.

More information on the DITL collection is available on DNS-OARC’s site at https://www.dns-oarc.net/oarc/data/ditl.

Terabytes and terabytes of data

The data consists of the raw network “packets” destined for each root server. Contained within the network packets are the DNS queries. The raw data consists of many terabytes of compressed network capture files and processing the raw data is very time-consuming and resource-intensive.

YearSize
2006230G
2007741G
20082T
2009806G
20106.6T
20114.6T
20128.2T
20134.7T

While several researchers have looked at DITL datasets over the years, the current collisions-oriented research started with Roy Hooper of Demand Media. Roy created a process to iterate through this data and convert it into intermediate forms that are much more usable for researching the proposed new TLDs.

We started with his process and continued working with it; our code is available on GitHub for others to review.

Finding needles in DITL haystacks

The first problem faced by researchers interested in new TLDs is isolating the relatively few queries of interest among many terabytes of traffic that are not of interest.

Each root operator contributes several hundred – or several thousand – files full of captured packets in time-sequential order. These packets contain every DNS query reaching the root that requests information about DNS names falling within delegated and undelegated TLDs.

The first step is to search these packets for DNS queries involving the TLDs of interest. The result is one file per TLD containing all queries from all roots involving that TLD. If the input packet is considered a “horizontal” slice of root DNS traffic, then this intermediary work product is a “vertical” slice per TLD.

These intermediary files are much more manageable, ranging from just a few records to 3 GB. To support additional investigation and debugging, the intermediary files that JAS produces are fully “traceable” such that a record in the intermediary file can be traced back to the source raw network packet.

The DITL data contain quite a bit of noise, primarily DNS traffic that was not actually destined for the root. Our process filters the data by destination IP address so that the only remaining data is that which was originally destined for the root name servers.

JAS has made these intermediary per-TLD files available to DNS-OARC members for further analysis.

Then what?

The intermediary files are comparatively small and easy to parse, opening the door to more elaborate research. For example, JAS has written various “second passes” that classify queries, separate queries that use valid syntax at the second level from those that don’t, detect “randomness,” fit regular expressions to the queries, and more.

We have also checked to confirm that second level queries that look like Punycode IDNs (start with ‘xn--‘) are valid Punycode. It is interesting to note the tremendous volume of erroneous, technically invalid, and/or nonsensical DNS queries that make it to the root.

Also of interest is that the datasets are dominated by query strings that appear random and/or machine-generated.

Google’s Chrome browser generates three random 10-character queries upon startup in an effort to detect network properties. Those “Chrome 10” queries together with a relatively small number of other common patterns comprise a significant proportion of the entire dataset.

Research is being done in order to better understand the source of these machine-generated queries.

More technical details and information on running the process is available on the DNS-OARC web site.

This is a guest post written by Kevin White, VP Technology, JAS Global Advisors LLC. JAS is currently authoring a “Name Collision Occurrence Management Framework” for the new gTLD program under contract with ICANN.

Name collisions expert JAS to guest blog on DI

Kevin Murphy, November 14, 2013, Domain Tech

JAS Global Advisors, the consultancy hired by ICANN to provide the final analysis on the risks posed by name collisions in new gTLDs, is to exclusively guest-blog its work here on DI.

ICANN picked JAS to provide a “Name Collision Occurrence Management Framework” earlier this week.

Its job is to basically figure out how new gTLD registries — some of which have been told to block many thousands of potential collisions from their zones — can identify and mitigate the risks, if any, posed by these names.

The framework will help registries reduce the size of their block-lists, in other words.

JAS expects to provide a short series of guest posts over the next few months, explaining the state of the project as it progresses. Reader comments will be read, I’m assured.

JAS CEO Jeff Schmidt said: “The macro intent is to shorten the feedback cycle so folks can see where we are incrementally and comment along the way.”

I’m hoping that the guest posts will provide DI readers with insight into the issue that is as disinterested as DI’s usual coverage, but better informed on the nitty-gritty of the affected technologies.

JAS is a regular consultant for ICANN. It was one of the independent evaluators for the new gTLD program itself.

I’m told that JAS doesn’t have financial relationships with either any new gTLD applicants, which generally think the collision risks have been overstated, or with Verisign, which say they could cause real damage.

JAS isn’t getting paid for the posts; nor is DI getting paid to carry them.

The first post in the series will appear soon, probably Friday.

Here’s how to display new IDN gTLDs in Chrome

Kevin Murphy, October 24, 2013, Domain Tech

A lot of people have noticed since the first four new gTLDs were delegated yesterday that Google’s Chrome browser doesn’t seem to handle internationalized domain names.

In fact it does, but if you’re an English-speaking user you’ll probably need to make a few small configuration changes, which should take less than a minute, to make it work.

If you’re using Chrome and you click this link http://nic.сайт chances are your address bar is going to automatically translate it and display it as http://nic.xn--80aswg/.

As far as the DNS is concerned, these are the same URLs. They’re just displayed differently by Chrome, depending on your browser’s display languages settings.

If you want to see the Cyrillic version in your address bar, simply:

  • Go to the Chrome Settings menu via the toolbar menu or by typing chrome://settings into the address bar.
  • Click the “Language and input settings” button. It’s in the Advanced options bit, which may be hidden at first. Scroll all the way down to unhide.
  • Click the Add button to add the languages you want to support in the address bar.

Right now, you can see all three active IDN gTLDs in their intended scripts by adding Arabic, Chinese (Simplified Han) and Russian. As gTLDs in other scripts are added, you’ll need to add those too.

Simple.

Thanks to DNS jack o’ all trades Jothan Frakes for telling me how to do this.

New gTLD applicants get a way to avoid name collision delay

Kevin Murphy, October 9, 2013, Domain Tech

ICANN has given blessed relief to many new gTLD applicants by wiping potentially months off their path to delegation.

Its New gTLD Program Committee this week adopted a new “New gTLD Collision Occurrence Management Plan” which aims to tackle the problem of clashes between new gTLDs and names used on private networks.

The good news is that the previous categorization of strings according to risk, which would have delayed “uncalculated risk” gTLDs by months pending further study, has been scrapped.

The two “high risk” strings — .home and .corp — don’t catch a break, however. ICANN says it will continue to refuse to delegate them “indefinitely”.

For everyone else, ICANN said it will conduct additional studies into the risk of name collisions, above and beyond what Interisle Consulting already produced.

The study will take into account not only the frequency that new gTLDs currently generate NXDOMAIN traffic in the DNS root, but also the number of second-level domains queried, the diversity of requesting sources, and other factors.

Any new gTLD applicant that does not wish to wait for this study will be able to proceed to delegation without delay, but only if they block huge numbers of second-level domains at launch.

The registries will have to block every SLD that was queried in their gTLD according to the Day in the Life of the Internet data that Interisle used in its study.

This list will vary by TLD, but in the most severe cases is likely to extend to tens of thousands of names. In many cases, it’s likely to be a few thousand names.

Fortunately, studies conducted by the likes of Donuts and Neustar indicate that many of these SLDs — maybe even the majority — are likely to be invalid strings, such as those with an underscore or other non-DNS character, or randomly generated 10-character strings of gibberish generated by Google Chrome.

In other words, the actual number of potentially salable domains that registries will have to block may turn out to be much lower than it appears at first glance.

Each SLD will have to be blocked in such a way that it continues to return NXDOMAIN responses, as they all do today.

Because the DITL data represented a 48-hour snapshot in May 2013, and may not include every potentially affected string, ICANN is also proposing to give organizations a way to:

report and request the blocking of a domain name (SLD) that causes demonstrably severe harm as a consequence of name collision occurrences.

The process will allow the deactivation (SLD removal from the TLD zone) of the name for a period of up to two (2) years in order to allow the affected party to effect changes to its network to eliminate the DNS request leakage that causes collisions, or mitigate the harmful impact.

One has to wonder if any trademark lawyers reading this will think: “Ooh, free defensive registration!” It will be interesting to see if any of them give it a cheeky shot.

I’ve got a feeling that most new gTLD applicants will want to take ICANN up on its offer. It’s not an ideal solution for them, but it does give them a way to get into the root relatively quickly.

There’s no telling what ICANN’s additional studies will find, but there’s a chance it could be negative for their string(s) — getting delegated at least mitigates the risk of never getting delegated.

The new ICANN proposal may in some cases interfere with their plans to market and use their TLDs, however.

Take a dot-brand such as .cisco, which the networking company has applied for. Its block list is likely to have about 100,000 strings on it, increasing the chances that useful, brandable SLDs are going to be taken out of circulation for a while.

ICANN is also proposing to conduct an awareness-raising campaign, using the media, to let network operators know about the risks that new gTLDs may present to their networks.

Depending on how effective this is, new registries may be able to forget about getting positive column inches for their launch — if a journalist is handed a negative angle for a story on a plate, they’ll take it.

Mockapetris hired as ICANN security advisor

Kevin Murphy, October 7, 2013, Domain Tech

DNS inventor Paul Mockapetris has been recruited by ICANN to act as senior security advisor to the Generic Domains Division under its president, Akram Atallah.

It’s not clear precisely what Mockapetris’ role will be, though it doesn’t appear to be a full-time position. He is still chairman and chief scientist of DNS software vendor Nominum.

ICANN recently recorded an interview with Mockapetris in which he pooh-poohed Verisign’s campaign against new gTLDs on security grounds, saying name collisions were not a new phenomenon.

It’s not the first time ICANN has hired a “name” as a security advisory.

One of the inventors of public key cryptography, Whitfield Diffie, became VP of information security under former CEO Rod Beckstom but quietly disappeared not too long after Fadi Chehade took over last year.

Crocker to speak at second gTLD collisions summit

Kevin Murphy, September 28, 2013, Domain Tech

ICANN chair Steve Crocker is among a packed line-up of speakers for an event on Tuesday that will address the potential security risks of name collisions in the new gTLD program.

It’s the second TLD Security Forum, which are organized by new gTLD applicants unhappy with ICANN’s proposal to delay hundreds of “uncalculated risk” applied-for gTLDs.

The first event, held in August, was notable for statements playing down the risk from the likes of Google and Digicert.

While Crocker is scheduled to speak on Tuesday, anyone expecting insight into the ICANN board’s thinking on name collisions is likely to be disappointed.

The title of his talk is “The Current State of DNSSEC Deployment”, which isn’t directly relevant to the issue.

Crocker, due to conflicts of interest protections, is also not a member of ICANN’s New gTLD Program Committee, which is tasked with making decisions about the collision problem.

While Crocker’s views may wind up remaining private, we can’t say the same for Amy Mushahwar and Dan Jaffe, representing the Association of National Advertisers, both of whom are also speaking.

The ANA is firmly in the Verisign camp on this issue, claiming that gTLD name collisions create unacceptable security risks for organizations on the internet.

Also on the line-up for Tuesday are Laureen Kapin of the US Federal Trade Commission and Gabriel Rottman of the American Civil Liberties Union, both of whom could bring new perspectives to the debate.

The TLD Security Forum begins at 9am at the Washington Hilton and Heights Meeting Center in Washington, DC. It’s free to attend and will be webcast for those unable to show up in person.

Phishing domains double in 2013

Kevin Murphy, September 20, 2013, Domain Tech

The number of domain names registered for phishing attacks doubled in the first half of the year, according to the latest data from the Anti-Phishing Working Group.

The APWG identified 53,685 phishing domains, of which 12,173 are believed to have been registered by phishers. The remainder belonged to compromised web servers.

This 12,173 number — up from 5,835 in the year-ago period — is the important one for the domain name industry, as it is there that registries and registrars have the ability to make a difference.

“The increase is due to a sudden uptick in domain registrations by Chinese phishers,” the APWG said in its Domain Name Use and Trends 1H2013 report (pdf). Chinese targets accounted for 8,240 (68%) of the registered domains.

This works out to about 66 maliciously registered domains per day on average, or less than half a percent of the total number of domains registered across all TLDs daily.

According to the APWG, the number of phishing domains that actually contain a brand or a variation of a brand is smaller still, at 1,244. That’s flat on the second half of 2012.

It works out to about seven new trademark-infringing phishing domain names per day that a brand owner somewhere in the world (though probably China) has to deal with.

APWG reiterated what it has said in previous reports:

most maliciously registered domain names offered nothing to confuse a potential victim. Placing brand names or variations thereof in the domain name itself is not a favored tactic, since brand owners are proactively scanning Internet zone files for their brand names. As we have observed in the past, the domain name itself usually does not matter to phishers, and a domain name of any meaning, or no meaning at all, in any TLD, will usually do. Instead, phishers often place brand names in subdomains or subdirectories.

.CLUB offers solution to name collision risks

Kevin Murphy, September 16, 2013, Domain Tech

.CLUB Domains has come up with a simple workaround for its applied-for .club gTLD being categorized as risky by ICANN.

The company wants to reserve the top 50 .club domains that currently see DNS root traffic, so that if and when .club goes live the impact on organizations that use .club internally will be greatly reduced.

It’s not a wholly original idea, but .CLUB seems to be unique at the moment in that it actually knows what those 50 strings are, having commissioned an Interisle Consulting report of its proposed gTLD.

You’ll recall that Interisle is the company that ICANN commissioned to quantify the name collisions problem in the first place.

Its report is what ICANN used to categorize all applied-for gTLD strings into low, high and “uncalculated” risks, putting .club into the uncalculated category, delaying it by months.

(Interisle was at pains to point out in its report for .CLUB that it is not making any recommendations, interpreting the data, or advocating any solutions. Still, nice work if you can get it.)

By reserving the top 50 clashes — presumably in such a way that they will continue to return error responses after .club is delegated — .CLUB says .club would slip into ICANN’s definition of a low-risk string.

In a letter to ICANN (pdf) sent today, .CLUB chief technology officer Dirk Bhagat wrote:

blocking the 50 SLD strings from registration would prevent 52,647 out of the 89,533 queries from a potential collision (58.88%). After blocking the top 50 strings as SLD strings, only 36,886 (41.12%) queries remain, which is 12,114 fewer invalid queries at the root than .engineering, which ICANN classified as a low risk gTLD.

He adds that a further chunk of remaining SLDs are random strings that appear to have been created by Google’s Chrome browser and, many say, pose no risk of name collisions, reducing the risk further.

It’s hard to argue with the logic there, other than to say that ICANN’s categorization system itself has already come in for heavy criticism for drawing unjustified, arbitrary lines.

The list of domains .CLUB proposes to block is pretty interesting, including some strings that appear to be trademarks, the names of likely .club registrants, or potentially premium names.

Verisign targets bank claims in name collisions fight

Kevin Murphy, September 15, 2013, Domain Tech

Verisign has rubbished the Commonwealth Bank of Australia’s claim that its dot-brand gTLD, .cba, is safe.

In a lengthy letter to ICANN today, Verisign senior vice president Pat Kane said that, contrary to CBA’s claims, the bank is only responsible for about 6% of the traffic .cba sees at the root.

It’s the latest volley in the ongoing fight about the security risks of name collisions — the scenario where an applied-for gTLD string is already in broad use on internal networks.

CBA’s application for .cba has been categorized as “uncalculated risk” by ICANN, meaning it faces more reviews and three to six months of delay while its risk profile is assessed.

But in a letter to ICANN last month, CBA said “the cause of the name collision is primarily from CBA internal systems” and “it is within the CBA realm of control to detect and remediate said systems”.

The bank was basically claiming that its own computers use DNS requests for .cba already, and that leakage of those requests onto the internet was responsible for its relatively high risk profile.

At the time we doubted that CBA had access to the data needed to draw this conclusion and Verisign said today that a new study of its own “shows without a doubt that CBA’s initial conclusions are incorrect”.

Since the publication of Interisle Consulting’s independent review into root server error traffic — which led to all applied-for strings being split into risk categories — Verisign has evidently been carrying out its own study.

While Interisle used data collected from almost all of the DNS root servers, Verisign’s seven-week study only looked at data gathered from the A-root and J-root, which it manages.

According to Verisign, .cba gets roughly 10,000 root server queries per day — 504,000 in total over the study window — and hardly any of them come from the bank itself.

Most appear to be from residential apartment complexes in Chiba, Japan, where network admins seem to have borrowed the local airport code — also CBA — to address local devices.

About 80% of the requests seen come from devices using DNS Service Discovery services such as Bonjour, Verisign said.

Bonjour is an Apple-created technology that allows computers to use DNS to automatically discover other LAN-connected devices such as printers and cameras, making home networking a bit simpler.

Another source of the .cba traffic is McAfee’s antivirus software, made by Intel, which Verisign said uses DNS to check whether code is virus-free before executing it.

While error traffic for .cba was seen from 170 countries, Verisign said that Japan — notable for not being Australia — was the biggest source, with almost 400,000 queries (79% of the total). It said:

Our measurement study reveals evidence of a substantial Internet-connected infrastructure in Japan that lies beneath the surface of the public-facing internet, which appears to rely on the non-resolution of the string .CBA.

This infrastructure appear hierarchical and seems to include municipal and private administrative and service networks associated with electronic resource management for office and residential building facilities, as well as consumer devices.

One apartment block in Chiba is is responsible for almost 5% of the daily .cba queries — about 500 per day on average — according to Verisign’s letter, though there were 63 notable sources in total.

ICANN’s proposal for reducing the risk of these name collisions causing problems would require CBA, as the registry, to hunt down and warn organizations of .cba’s impending delegation.

Verisign reiterates the point made by RIPE NCC last month: this would be quite difficult to carry out.

But it does seem that Verisign has done a pretty good job tracking down the organizations that would be affected by .cba being delegated.

The question that Verisign’s letter and presentation does not address is: what would happen to these networks if .cba was delegated?

If .cba is delegated, what will McAfee’s antivirus software do? Will it crash the user’s computer? Will it allow unsafe code to run? Will it cause false positives, blocking users from legitimate content?

Or will it simply fail gracefully, causing no security problems whatsoever?

Likewise, what happens when Bonjour expects .cba to not exist and it suddenly does? Do Apple computers start leaking data about the devices on their local network to unintended third parties?

Or does it, again, cause no security problems whatsoever?

Without satisfactory answers to those questions, maybe name collisions could be introduced by ICANN with little to no effect, meaning the “risk” isn’t really a risk at all.

Answering those questions will of course take time, which means delay, which is not something most applicants want to hear right now.

Verisign’s study targeted CBA because CBA singled itself out by claiming to be responsible for the .cba error traffic, not because CBA is a client of rival registry Afilias.

The bank can probably thank Verisign for its study, which may turn out to be quite handy.

Still, it would be interesting to see Verisign conduct a similar study on, say, .windows (Microsoft), .cloud (Symantec) or .bank (Financial Services Roundtable), which are among the 35 gTLDs with “uncalculated” risk profiles that Verisign promised to provide back-end registry services for before it decided that new gTLDs were dangerous.

You can read Verisign’s letter and presentation here. I’ve rotated the PDF to make the presentation more readable here.

Artemis plans name collision conference next week

Kevin Murphy, August 16, 2013, Domain Tech

Artemis Internet, the NCC Group subsidiary applying for .secure, is to run a day-long conference devoted to the topic of new gTLD name collisions in San Francisco next week.

Google, PayPal and DigiCert are already lined up to speak at the event, and Artemis says it expects 60 to 70 people, many of them from major new gTLD applicants, to show up.

The free-to-attend TLD Security Forum will discuss the recent Interisle Consulting report into name collisions, which compared the problem in some cases to the Millennium Bug and recommended extreme caution when approving new gTLDs.

Brad Hill, head of ecosystem security at PayPal, will speak to “Paypal’s Concerns and Recommendations on new TLDs”, according to the agenda.

That’s notable because PayPal is usually positioned as being aligned with the other side of the debate — it’s the only company to date Verisign has been able to quote from when it tries to show support for its own concerns about name collisions.

The Interisle report led to ICANN recommending months of delay for hundreds of new gTLD strings — basically every string that already gets more daily root server error traffic than legitimate queries for .sj, the existing TLD with the fewest look-ups.

The New TLD Applicants Group issued its own commentary on these recommendations, apparently drafted by Artemis CTO Alex Stamos, earlier this week, calling for all strings except .home and .corp to be treated as low risk.

NTAG also said in its report that it has been discussing with SSL certificate authorities ways to potentially speed up risk-mitigation for the related problem of internal name certificate collisions, so it’s also notable that DigiCert’s Dan Timpson is slated to speak at the Forum.

The event may be webcast for those unable to attend in person, according to Artemis. If it is, DI will be “there”.

On the same topic, ICANN yesterday published a video interview with DNS inventor Paul Mockapetris, in which he recounted some name collision anecdotes from the Mesolithic period of the internet. It’s well worth a watch.