Latest news of the domain name industry

Recent Posts

SpamHaus now publishing better TLD abuse data

Kevin Murphy, May 20, 2016, 09:48:30 (UTC), Domain Registries

SpamHaus has updated its “10 Most Abused Top Level Domains” list to provide a much more useful insight into abuse levels.
Rather than simply showing unexplained percentages of “badness” in each TLD, the spam-fighting organization’s daily report now exposes the hard numbers, in domain terms, underneath.
For example, on today’s list Famous Four Media’s .download is the most-abused TLD with 82% bad domains.
That percentage is based on SpamHaus categorizing 11,431 domains as abusive of the 13,945 .download domains that crossed its systems.
But the gTLD has 67,500 domains in its zone file, so the actual percentage of abusive domains could be as low as about 17%, much lower than SpamHaus’s 82%.
Whether you think the 82% metric is fair will depend on whether you think SpamHaus’s sample — about 20% of the full .download zone — is representative.
Some of the other TLDs on its list have even smaller sample sizes.
Minds + Machines’ .work is ranked #2 on the SpamHaus list with 73.3% badness, based on a SpamHaus-seen sample of 6,297 domains, something like 7% of the full .work zone.
Registries criticized SpamHaus for publishing misleading data when this list was first published in March, and I agreed with them.
Now that the group is publishing empirical data alongside its percentages, the conversation can now shift to something along the lines of:
“Is it okay that at least 17% of .download domains are abusive?”
To which the answer I believe is a clear: “Hell, no.”
The SpamHaus daily report can be found here.

Tagged: , , , ,

Comments (13)

  1. Luc Rossini says:

    You are misunderstanding the data.
    Certainly .download may have 67,500 domains in its zone file, but only 13,750 active .download domains have been seen by Spamhaus’ systems in the last 2 weeks (the rolling period the charts are based on), which means 53,750 .download domains are simply not in use, i.e: not active… i.e: they are parked.
    Spamhaus does not take parked domains into account.

    • Kevin Murphy says:

      I am not misunderstanding the data. You may be misunderstanding the article.

      • Luc Rossini says:

        OK, just pointing out that your article appears to be saying Spamhaus should include .download’s 53,750 inactive (parked) domains in the formula otherwise the result is based on only “about 20% of the full .download zone”.
        But the fact is, only “about 20%” of the full .download zone is active, meaning “80%” of the full zone file is inactive/parked unused domains. What Spamhaus is saying is: Of the active domains in the full zone file, 82% are abusive.
        By the same reasoning: Car maker makes 1000 cars. Only 100 of those cars are on the road, the rest are all parked in a field waiting to be shipped. Of the 100 cars driving on the road, 82% burst into flames. The car maker then says “Well, it’s not 82% because we made 1000 and the other 900 haven’t burst into flames (‘coz none of them have been driven yet)”… that would be quite disingenuous ๐Ÿ™‚

        • Kevin Murphy says:

          We seem to differ as to what constitutes an “active” domain.
          By my definition, an active domain is one that appears in the zone file. By yours, an active domain is a domain that SpamHaus has seen.
          The domains that appear in the zone file are of course a subset of the overall domains that have been registered. I would consider a registered domain that is not in the zone file as inactive.
          But the main thrust of the article is this: before Spamhaus started reporting its sample size, it was possible for people to say “Look, 80% of the domains .EXAMPLE have sold are abusive!” which was not necessarily true.
          Now we have better data, we can have better arguments about the extent of abuse in TLDs new and old ๐Ÿ™‚

          • Luc Rossini says:

            Obviously we differ completely on what constitutes and active domain. Spamhaus defines “active” as “actually being used on the internet, not dormant or parked”.
            The abuse solution for Registries then is simple: add one Billion random dormant domains to your zone file and insist folks use the full zone file count in any stats concerning visible abuse. Then abuse stats will always be a mere tiny “0.001% abused”. Problem solved! ๐Ÿ™‚

          • Kevin Murphy says:

            Parked domains are active by definition. If you visit them in your browser they resolve to web sites. If you click on the ads you see there somebody will get paid. They’re fully functioning commerce web sites.
            They may be low-value web sites but they’re not in any sense “inactive” as I understand the word.
            If a registry were to add a billion dormant domains to their zones just in order to fool Spamhaus’s stats, it would cost them $250,000,000 in ICANN fees, btw.

    • Rubens Kuhl says:

      Luc, could you clarify how SpamHaus see non-spam active domains ? I would imagine devices like SpamPots only getting spam all the time without any ham.

  2. Andrew says:

    Key takeaway: inverse relationship between price and amount of domains used for spam.

  3. Acro says:

    Luc – On the subject of parked domains, I’ve seen this scheme being used by abusive domains: while the “www” variant is indeed parked, another host name is created and serves malware. So far seen this with .xyz domains. The hostnames can vary from “pcrepair” to random strings. Perhaps Spamhaus needs to expand on its definition.

  4. Bret Fausett says:

    The “percentage of abusive” TLDs metric is deceptive, since it only looks at the tip of the iceberg. Overall volume is more reflective of what end-users actually see in their inboxes. The spam rates for higher volume TLDs are com (6.3%), net (7.3%), and info (11.3%). If you multiply those percentages by the sizes of those zone files – even subtracting parked pages or whatever else Spamhaus doesn’t count – you get numbers that make .download look like a tiny gnat in a swarm of abuse.
    If you’re going to publish “percentage of abusive”, you ought to at least also publish “overall abuse” as a companion metric. Percentage of abuse alone really disguises the problem.

    • Kevin Murphy says:

      That’s possibly true for some users (not me — almost all the spam that makes it to my inbox is from new gTLD domains), but if Spamhaus is trying to make the point that some registries are not doing enough to combat abuse (which I think it is) there is probably some value in the “percentage of abuse” numbers.

      • Luis Munoz says:

        Kevin, this might me actually evidence of what Bret’s saying.
        You see, anti-spam filters evolve based on what they see. The fact that your anti-spam filters are leaking new gTLDs could be interpreted as evidence that filter vendors are seeing way more spam from legacy TLDs.
        To put things in perspective, if you took out 100% of all spam using new gTLDs at the cost of 1% effectiveness on detection over legacy TLDs, you would get the same or more spam.

Leave a Reply to Bret Fausett