Major registries posting “fabricated” Whois data
One or more of the major gTLD registries are publishing Whois query data that may be “fabricated”, according to some of ICANN’s top security minds.
The Security and Stability Advisory Committee recently wrote to ICANN’s top brass to complain about inconsistent and possibly outright bogus reporting of Whois port 43 query volumes.
SSAC said (pdf):
it appears that the WHOIS query statistics provided to ICANN by registry operators as part of their monthly reporting obligations are generally not reliable. Some operators are using different methods to count queries, some are interpreting the registry contract differently, and some may be reporting numbers that are fabricated or otherwise not reflective of reality. Reliable reporting is essential to the ICANN community, especially to inform policy-making.
SSAC says that the inconsistency of the data makes it very difficult to make informed decisions about the future of Whois access and to determine the impact of GPDR.
While the letter does not name names, I’ve replicated some of SSAC’s research and I think I’m in a position to point fingers.
In my opinion, Google, Verisign, Afilias and Donuts appear to be the causes of the greatest concern for SSAC, but several others exhibit behavior SSAC is not happy about.
I reached out to these four registries on Wednesday and have published their responses, if I received any, below.
SSAC’s concerns relate to the monthly data dumps that gTLD registries new and old are contractually obliged to provide ICANN, which publishes the data three months later.
Some of these stats concern billable transactions such as registrations and renewals. Others are used to measure uptime obligations. Others are largely of academic interest.
One such stat is “Whois port 43 queries”, defined in gTLD contracts as “number of WHOIS (port-43) queries responded during the reporting period”.
According to SSAC, and confirmed by my look at the data, there appears to be a wide divergence in how registries and back-end registry services providers calculate this number.
The most obvious example of bogosity is that some registries are reporting identical numbers for each of their TLDs. SSAC chair Rod Rasmussen told DI:
The largest issue we saw at various registries was the reporting of the exact or near exact same number of queries for many or all of their supported TLDs, regardless of how many registered domain names are in those zones. That result is a statistical improbability so vanishingly small that it seems clear that they were reporting some sort of aggregate number for all their TLDs, either as a whole or divided amongst them.
While Rasmussen would not name the registries concerned, my research shows that the main culprit here appears to be Google.
In its December data dumps, it reported exactly 68,031,882 port 43 queries for each of its 45 gTLDs.
If these numbers are to be believed, .app with its 385,000 domains received precisely the same amount of port 43 interest as .gbiz, which has no registrations.
As SSAC points out, this is simply not plausible.
A Google spokesperson has not yet responded to DI’s request for comment.
Similarly, Afilias appears to have reported identical data for a subset of its dot-brand clients’ gTLDs, 16 of which purportedly had exactly 1,071,939 port 43 lookups in December.
Afilias has many more TLDs that did not report identical data.
An Afilias spokesperson told DI: “Afilias has submitted data to ICANN that addresses the anomaly and the update should be posted shortly.”
SSAC’s second beef is that one particular operator may have reported numbers that “were altered or synthesized”. SSAC said in its letter:
In a given month, the number of reported WHOIS queries for each of the operator’s TLDs is different. While some of the TLDs are much larger than others, the WHOIS query totals for them are close to each other. Further statistical analysis on the number of WHOIS queries per TLD revealed that an abnormal distribution. For one month of data for one of the registries, the WHOIS query counts per TLD differed from the mean by about +/- 1%, nearly linearly. This appeared to be highly unusual, especially with TLDs that have different usage patterns and domain counts. There is a chance that the numbers were altered or synthesized.
I think SSAC could be either referring here to Donuts or Verisign
Looking again at December’s data, all but one of Donuts’ gTLDs reported port 43 queries between 99.3% and 100.7% of the mean average of 458,658,327 queries.
Is it plausible that .gripe, with 1,200 registrations, is getting almost as much Whois traffic as .live, with 343,000? Seems unlikely.
Donuts has yet to provide DI with its comments on the SSAC letter. I’ll update this post and tweet the link if I receive any new information.
All of the gTLDs Verisign manages on behalf of dot-brand clients, and some of its own non-.com gTLDs, exhibit the same pattern as Donuts in terms of all queries falling within +/- 1% of the mean, which is around 431 million per month.
So, as I put to Verisign, .realtor (~40k regs) purportedly has roughly the same number of port 43 queries as .comsec (which hasn’t launched).
Verisign explained this by saying that almost all of the port 43 queries it reports come from its own systems. A spokesperson told DI:
The .realtor and .comsec query responses are almost all responses to our own monitoring tools. After explaining to SSAC how Verisign continuously monitors its systems and services (which may be active in tens or even hundreds of locations at any given time) we are confident that the accuracy of the data Verisign reports is not in question. The reporting requirement calls for all query responses to be counted and does not draw a distinction between responses to monitoring and non-monitoring queries. If ICANN would prefer that all registries distinguish between the two, then it is up to ICANN to discuss that with registry operators.
It appears from the reported numbers that Verisign polls its own Whois servers more than 160 times per second. Donuts’ numbers are even larger.
I would guess, based on the huge volumes of queries being reported by other registries, that this is common (but not universal) practice.
SSAC said that it approves of the practice of monitoring port 43 responses, but it does not think that registries should aggregate their own internal queries with those that come from real Whois consumers when reporting traffic to ICANN.
Either way, it thinks that all registries should calculate their totals in the same way, to make apples-to-apples comparisons possible.
Afilias’ spokesperson said: “Afilias agrees that everyone should report the data the same way.”
As far as ICANN goes, its standard registry contract is open to interpretation. It doesn’t really say why registries are expected to collect and supply this data, merely that they are obliged to do so.
The contracts do not specify whether registries are supposed to report these numbers to show off the load their servers are bearing, or to quantify demand for Whois services.
SSAC thinks it should be the latter.
You may be thinking that the fact that it’s taken a decade or more for anyone to notice that the data is basically useless means that it’s probably not all that important.
But SSAC thinks the poor data quality interferes with research on important policy and practical issues.
It’s rendered SSAC’s attempt to figure out whether GDPR and ICANN’s Temp Spec have had an effect on Whois queries pretty much futile, for example.
The meaningful research in question also includes work leading to the replacement of Whois with RDAP, the Registration Data Access Protocol.
Finally, there’s the looming possibility that ICANN may before long start acting as a clearinghouse for access to unredacted Whois records. If it has no idea how often Whois is actually used, that’s going to make planning its infrastructure very difficult, which in turn could lead to downtime.
Rasmussen told DI: “Our impression is that all involved want to get the numbers right, but there are inconsistent approaches to reporting between registry operators that lead to data that cannot be utilized for meaningful research.”
ICANN confirms GoDaddy Whois probe
ICANN is looking into claims that GoDaddy is in breach of its registrar accreditation contract.
The organization last week told IP lawyer Brian Winterfeldt that his complaint about the market-leading registrar throttling and censoring Whois queries over port 43 is being looked at by its compliance department.
The brief note (pdf) says that Compliance is “in receipt of the correspondence and will address it under its process”.
Winterfeldt is annoyed that GoDaddy has starting removing contact information from its port 43 Whois responses, in what the company says is an anti-spam measure.
It’s also started throttling port 43 queries, causing no end of problems at companies such as DomainTools.
Winterfeldt wrote last month “nothing in their contract permits GoDaddy to mask data elements, and evidence of illegality must be obtained before GoDaddy is permitted to throttle or deny port 43 Whois access to any particular IP address”.
It’s worth saying that ICANN is not giving any formal credibility to the complaint merely by looking into it.
But while it’s usual for ICANN to publish its responses to correspondence it has received and published, it’s rather less common for it to disclose the existence of a compliance investigation before it has progressed to a formal breach notice.
It could all turn out to be moot anyway, given the damage GDPR is likely to do to Whois across the industry in a matter of weeks.
Lawyer: GoDaddy Whois changes a “critical” contract breach
GoDaddy is in violation of its ICANN registrar contract by throttling access to its Whois database, according to a leading industry lawyer.
Brian Winterfeldt of the Winterfeldt IP Group has written to ICANN to demand its compliance team enforces what he calls a “very serious contractual breach”.
At issue is GoDaddy’s recent practice, introduced in January, of masking key fields of Whois when accessed in an automated fashion over port 43.
The company no longer shows the name, email address or phone number of its registrants over port 43. Web-based Whois, which has CAPTCHA protection, is unaffected.
It’s been presented as an anti-spam measure. In recent years, GoDaddy has been increasingly accused (wrongly) of selling customer details to spammers pitching web hosting and SEO services, whereas in fact those details have been obtained from public Whois.
But many in the industry are livid about the changes.
Back in January, DomainTools CEO Tim Chen told us that, even as a white-listed known quantity, its port 43 access was about 2% of its former levels.
And last week competing registrar Namecheap publicly complained that Whois throttling was hindering inbound transfers from GoDaddy.
Winterfeldt wrote (pdf) that “nothing in their contract permits GoDaddy to mask data elements, and evidence of illegality must be obtained before GoDaddy is permitted to throttle or deny
port 43 Whois access to any particular IP address”, adding:
The GoDaddy whitelist program has created a dire situation where businesses dependent upon unmasked and robust port 43 Whois access are forced to negotiate wholly subjective terms for access, and are fearful of filing complaints with ICANN because they are reticent to publicize any disruption in service, or because they fear retaliation from GoDaddy…
This is a very serious contractual breach, which threatens to undermine the stability and security of the Internet, as well as embolden other registrars to make similar unilateral changes to their own port 43 Whois services. It has persisted for far too long, having been officially implemented on January 25, 2018. The tools our communities use to do our jobs are broken. Cybersecurity teams are flying blind without port 43 Whois data. And illegal activity will proliferate online, all ostensibly in order to protect GoDaddy customers from spam emails. That is completely disproportionate and unacceptable
He did not disclose which client, if any, he was writing on behalf of, presumably due to fear of reprisals.
He added that his initial outreaches to ICANN Compliance have not proved fruitful.
ICANN said last November that it would not prosecute registrar breaches of the Whois provisions of the Registrar Accreditation Agreements, subject to certain limits, as the industry focuses on becoming compliant with the General Data Protection Regulation.
But GoDaddy has told us that the port 43 throttling is unrelated to GDPR and to the compliance waiver.
Masking Whois data, whether over port 43 or not, is likely to soon become a fact of life anyway. ICANN’s current proposal for GDPR compliance would see public Whois records gutted, with only accredited users (such as law enforcement) getting access to full records.
GoDaddy and DomainTools scrap over Whois access
GoDaddy has seriously limited DomainTools’ access to its customers’ Whois records, pissing off DomainTools.
DomainTools CEO Tim Chen this week complained to DI that its access to Whois has been throttled back significantly in recent months, making it very difficult to keep its massive database of domain information up to date.
Chen said that DomainTools is currently only able to access GoDaddy’s Whois over port 43 at about 2% of the rate it had previously.
He said that this has been going on for about six months and that the market-leading registrar has been unresponsive to its requests to have previous levels restored.
“By throttling access to the data by 98% they’re defeating the ability of security practitioners to get data on GoDaddy domains,” Chen said. “It’s particularly troublesome because they [GoDaddy] are such a big part of DNS.”
“We have customers who say the quality of GoDaddy data is just degrading across the board, either through direct look-ups or in some of the DomainTools products themselves,” he said.
DomainTools customers include security professionals trying to hunt down the source of attacks and intellectual property interests trying to locate pirates and cybersquatters.
GoDaddy today confirmed to DI that it has been throttling DomainTools’ Whois access, and said that it’s part of ongoing anti-spam measures.
In recent years there’s been an increase in the amount of spam — usually related to web design, hosting, and SEO — sent to recent domain registrants using email addresses harvested from new Whois records.
GoDaddy, as the market-share leader in retail domain sales, takes a tonne of flak from customers who, unaware of standard Whois practice, think the company is selling their personal information to spammers.
This kind of Twitter exchange is fairly common on GoDaddy’s feed:
Being bombarded by web developers after purchasing domain frm @GoDaddy
I paid for that domain and u selling my personal info like anything.gotta switch frm godaddy.— Vikas Rawat (@VikasRa87555925) January 12, 2018
While GoDaddy is not saying that DomainTools is directly responsible for this kind of activity, throttling its port 43 traffic is one way the company is trying to counter the problem, VP of policy James Bladel told DI tonight.
“Companies like [DomainTools] present a challenge,” he said. “While we may know these folks, we don’t know who their customers are.”
But that’s just a part of the issue. GoDaddy was also concerned about the amount of resources DomainTools was consuming, and its own future legal responsibilities under the European Union’s forthcoming General Data Protection Regulation.
“When [Chen] says they’re down to a fraction or a percentage of what they had previously, well what they had previously was they were updating and archiving Whois almost in real time,” Bladel said. “And that’s not going to fly.”
“That is not only, we feel, not congruent with our responsibilities to our customers’ data, but it’s also, later on down the road, exactly the kind of thing that GDPR and other regulations are designed to stop,” he said.
GDPR is the EU law that, when it fully kicks in in May, gives European citizens much more rights over the sharing and processing of their private data.
Bladel added that DomainTools is still getting more Whois access than other parties using port 43.
“They have a level of access that is much, much higher than what they would normally have as a registrar,” he said, “but much lower than I think they want, because they want to effectively download and keep current the entirety of the Whois database.”
I’m not getting a sense from GoDaddy that it’s likely to backtrack on its changes.
Indeed, the company also today announced that it from January 25 it will start to “mask” key elements of Whois records when queried over port 43.
GoDaddy told high-value customers such as domainers today that port 43 queries will no longer return the registrant’s first name, last name, email address or phone number.
Bulk Whois users such as registrars (and, I assume, DomainTools) that have been white-listed via the “GoDaddy Port43 Process” will continue to receive full records.
Its web-based Whois, which includes a CAPTCHA gateway to prevent scraping, will continue to function as normal.
Bladel said that these changes are NOT related to GDPR, nor to the fact that ICANN said a couple months back that it would not enforce compliance with Whois provisions of the Registrar Accreditation Agreement, subject to certain conditions.
Recent Comments