Feb
21
What I Learnt About 342,740 Domains
I recently did a parse of 12,573 keywords, extracting the top 100 results per keyword on Google. And after cleaning up the data I was left with over 1.2 million web pages and 342,740 unique domains.
For the last week or so I have been looking for interesting information within this mountain of data.
I published data on top domains, sites, Google’s use of Images, Products and News results and some strange URLs I noticed.
This data is part of my project to bring more science to SEO, initially by doing a correlation study into Google’s algorithm.
Domain Data
I was looking into domain related data and I spotted some interesting patterns, nothing ground breaking but just some stuff you might find cool.
I should have some domain related correlation data out later this week so this is a insight into the domain dataset.
The domain name and the ending for that domain is a really important choice for a new webmaster. When you make that choice your choosing your brand for life, plus its a super important decision from an SEO point of view.
There are a couple of considerations to take into account. The user and the search engine. The correlation data I’ll show you later this week should take care of the SEO point of view.
But from a user’s point of view you obviously want a domain that’s memorable, easy to type and easy to link to. And that in and of itself is a big factor in SEO. If people are linking to the wrong domain you’re losing out on valuable link juice. If people can’t type it then you’re losing out on type in traffic and if user’s can’t share it then say goodbye to some social media clicks.
Domain TLDs
The domain ending or TLD (Top Level Domain) is probably the most important part a website’s address.
There is a technical difference between a TLD and a domain ending. For example .uk is a TLD but .co.uk is not, it’s technically a subdomain of the TLD. But webmasters and users don’t care about technical definitions, so I’m going to treat them as the same for the purposes of this article.
If you have a really catchy, social, SEO perfect site name it’s useless unless you have the right ending to that great name.
Turns out not that many people are typing www.greatname.washingtondc.museum.
I extracted the domain endings for all the sites in my data and collected a neat list of all the domain endings I could identify in these 1.2 million URLs.
I used a list of all known TLDs from the Mozilla crew (I think?), but I can’t find the link so if you know the list I’m talking about please post the link in the comments section.
Luckily I downloaded and cleaned up the list into a nicely formatted text file so you can iterate through and check for matches if your running tests yourself.
Update: Thanks to Kris Turkaly who left a link to the list in the comments: http://publicsuffix.org/list/
After running my scripts and programs through the data it turns out there were 437 different domain endings in the dataset.
Thinking about it that’s a pretty small number of TLD’s for 1.2 million URLs but as you will see there is huge dominance with just 3 of domain endings.
I ranked them in order of number of sites out of the 342,740 that had that TLD. Here’s a handy Excel list of all 437 TLD’s in that descending order.
And a nice graph of the top 5 domain TLDs:
(You can hover over each bar with your mouse and you’ll get the exact numbers)
It was hard to see some of the smaller TLDs so here’s the next 45 (blown up and zoomed in) and .us repeated again i.e. the 5th-50th most popular extensions. Even within this subset there’s a really huge drop off from the top of the list. Combine that with the top 5 domains and you see a gigantic dominance of the top 2 or 3 domain endings.
Of course this dataset isn’t designed to find the most popular TLD’s but it’s probably a pretty good idea of what users are used to.
Realistically .com and .org are the only global domain extensions you should be going for from a user’s point of view. And even if you own a .org you should be on the lookout for the .com variation.
Domain Length
I thought it would be interesting to see the distribution of domain lengths, so I counted the length in characters of all 342,740 sites without the domain ending.
Out of interest the average domain name length was 14.75 characters long, but the most common length was 8 characters with around 1 in 12 sites 8 characters long.
Again you can hover over each circle for exact stats.
Here’s an Excel file based on the above graph with the domain lengths and number of domains with the corresponding length.
This post is pretty good at showing what users are used to and are likely to accept. Of course common sense is still required. For example if your in Ireland, Irish users are very much at home and used to .ie domian names so maybe grabbing that and the .com domain name might be a good idea.
Again, if you find a really great, catchy name that’s longer than normal then go for it and if you find a short one that ticks all the boxes go for it too.
Hope you enjoyed this post and stay tuned for some correlation data later this week.
Liked this article? Don't want to miss my next one? Subscribe to our email newsletter and receive all new posts via email: