What I looked at was mostly from a user’s point of view, not very scientific but pretty significant. I wanted to show, what the user was used to/found normal in terms of domains. What domain TLD (.com, .org, .net, etc.) they expected and number crunching on average domain lengths.
If you don’t have time to read the post there were two basic findings.
- If your buying a domain get a .com, turns out .com domains are heavily represented in the dataset with 78% of the domains I looked at being a .com. Even if you want a local TLD e.g. .fr or .ie or you want to go for .org, .net, .info, etc. its really important that you try and get your hands on the .com version of the domain, users are so used to it and assume a domain is .com unless its particularly obvious to them.
- The second finding, if less interesting was that the average domain length was 15 characters long but the most common length being 8 characters. Obviously common sense applies here, shorter = better, but the domain must make sense to your audience.
Today I’m publishing my first correlation data!
A correlation shows the size of the relationship between ranking well and whatever factor I am testing.
All correlations are between -1 and +1, with a negative (-) number meaning a negative relationship i.e. that as you do more of it your rankings are likely to suffer. And a positive number means as you do more of it your rankings are likely to increase.
The closer the correlation is to either of the ones (-1 or 1) the stronger the relationship i.e. the more powerful/important the factor.
For example a 0.8 correlation is more significant than a 0.3.
Correlation studies aren’t perfect and don’t prove causation. If you haven’t heard of correlations before then this post on what they are, how to do one and what are their weaknesses will be very helpful.
I’ll be looking at a few factors at a time and then publishing the data on that set of factors. I’ll be posting more data on various factors over the coming weeks and months and I hope to break them down into actionable chunks and sections within the algorithm that SEOs/readers can digest them easily.
But today I’m looking at the factors that SEO’s and new webmasters should consider before they buy a domain name.
Here’s the correlation data for what TLD of a domain had.
The correlations are very low which likely means that there is little or no relationship, but the minor correlations do show a mild preference for .com and .org domains while .net, .info and .us were negatively correlated.
It’s hard to say what is causing this small correlation, we know Google has taken action against individual TLDs in the past and it seems probable that some TLDs are targeted as spam or credible but the small correlations seen below are more likely to be a combination of Google’s preferences in domain TLDs and the types of sites that register each domain TLD.
While the correlation data below isn’t that significant it does seem to be in line with other data I have seen (some of which is below) which leads me to believe that there is merit in it and if you were down to a straight shoot off between two domain TLDs (and couldn’t get both) this would be a neat tie-breaker.
Probably not something to get too worried about but interesting data.
As you will see throughout this post .com domains are way out in front and .org domains are the next best, but it appears that beyond those you are in the danger zone unless you are registering a local TLD like .fr or .co.uk.
Exact Match Domains
One of the most heralded and contentious factors among SEOs is exact match domains i.e. domains that are the same as what the user searched for e.g. keyword: “the open algorithm”, exact match=http://www.theopenalgorithm.com.
Previous studies by SEOMoz have shown it to be a very powerful factor.
There are a number of points to be aware of before reviewing the findings.
Exact match domains make sense from a user’s point of view. If a domain is an exact match for the search query then it is likely to be relevant to the query. In addition it is possible that the user is searching specifically for that site.
There’s a potential anchor text boost in having an exact match domain. When you own an exact match domain it is likely that more sites will link to you with the keyword as the anchor text, for example I would imagine that tons of webmasters link to SearchEngineLand just as I did there, with the site’s name as its anchor text.
As a result there is the potential for the correlation for exact match domains to be slightly inflated due to its presumed benefit to anchor text related factors, unless Google have a clause in their algorithm to negate this effect.
If a domain is a company or some organisation then surely the fact that it is an entity is a better reason to show an exact match domain as opposed to purely the fact that it is an exact match domain. For example seomoz.org is highly relevant to the search query “seomoz”, but is seo.com the best for the search query “SEO”?
We know that Google do a reasonable job at figuring out whether a domain is an entity or not which some SEO’s believe is the reason why exact match domains do so well and why as an SEO tactic they might not be as powerful as data suggests. This is down to the fact that the domains that are entities e.g. “seomoz.org” are exact matches when people search for that entity. The argument is that these entity searches impact all the non-entity domains that happen to be exact matches in these types of studies and in my dataset.
Having said all that, I’ll let the data talk and you can infer your own conclusions.
Comments: These correlations are really interesting and very significant. They are pretty much in line with SEOMoz’s results although they only tested for exact match and exact match .com domains. I have run tests on other less influential factors and comparing the above data to those results and the SEOMoz correlation study it seems that exact match domains are one of the most powerful factors, but they seem to have declined in significance over the last few years.
SEOMoz reported a 0.38 correlation for .com exact matches in their original study and then a 0.22 correlation in their latest, my data shows a further decline which seems important to note.
I would speculate that this decrease in the correlation of exact match domains can be attributed to Google refining their algorithms to detect when an exact match domain bought for ranking benefits and when it is an entity that deserves the push up in the rankings.
The correlation data broken down into the various domain TLDs is important because beyond .com and .org exact matches there is a significant drop off in terms of a relationship between having one and ranking better in Google.
There are a couple of reasons why this may the case:
- Google may value these domain TLDs less and implement algorithms to penalise (or reward others more) for exact match domains.
- The domains with the less common TLDs may have been bought because there is less demand for them and therefore getting a domain that the webmaster believes will rank well is easier. Thus it would be other algorithmic factors (potentially the entity extraction ones) that would penalise the non .com exact matches.
This information is very important if you are registering a domain, it seems highly likely that something, either directly in the algorithm or an indirect factor is causing .com exact matches to rank significantly higher than its counterparts.
I did the parse of the search results from proxies within the US which doesn’t have a national TLD (although .us is its official one, in practice its not widespread in use and doesn’t hold the value of other local TLDs) so its hard to tell if other local TLDs, for example .co.uk or .ie, which are more popular in their home countries would have similar correlations to .com exact match domains.
Correlation would most vary based on the country and how the local TLD was used and managed in that country.
In the future I hope to run crawls from proxies within other English speaking countries with these prominent local TLDs, and will then be able to answer that question.
Note: The amount of data available for .info and .us domains wan’t anywhere near as much as the other TLDs and thus the size of the scientific error is likely to be higher. I have a very large dataset (1.2 million URLs) and because of this and the fact that the results seem in line with both common sense and the TLD correlation data above I suggest that they are relatively accurate but are likely to be slightly less accurate than the .com, .org and .net correlations.
Hyphenated Exact Match Domains
Ah yes, the good old hyphenated exact match. Often portrayed as the next best option if you can’t get that exact match .com domain.
Lets see if that portrayal has merit:
Wow. That’s really interesting. Hyphenated exact match domains are just nowhere near as correlated as exact match domains without the hyphens.
Plus they aren’t very user friendly, so maybe its time to rethink our strategy on hyphens?
The only potential downfall vs. the non-hyphenated correlation data is that it is again even less likely to be an entity using hyphens in their domain name.
Once again we see .com’s on top with .org in second.
Note: due to insignificant quantities of data I didn’t test .info or .us domains for this factor.
Now we’re delving into some more unique factors.
Partial match domain is when you have the keyword in your domain but its not an exact match domain e.g. “tech” and techcrunch.com.
Partial match ratio mean what percentage of the domain is taken up with the partial match. With the Techcrunch example it would be: “tech” = 4 characters, domain (“techcrunch”) = 10 characters, partial match ratio = (4/10)*100 = 40%
Note: the * beside partial match domain, partial match ratio and keyword is first word in domain means that I have excluded exact match domains in calculating the correlation for these factors. This is just common sense because Google wouldn’t reward exact match domains twice for a very similar factor.
Also note that the negative correlations for the number of characters and hyphens in domain name means that as the length/number of characters/hyphens in the domain name the ranking of that domain decreases i.e. longer/more is worse, shorter/fewer = better.
I suspect that the number of characters in a domain name is not something Google worries about unless it penalises very large domains. But what this large negative correlation most likely shows, is that there are other factors that are impacted by having a large domain.
For example the social sharability of your domain is reduced because in social cyberspace shorter = better. Also websites probably lose out on that brand factor or potential type in traffic due to the increased character length.
With the low correlations for partial match domains and the partial match ratio, it appears as if having the keyword in some of your domain isn’t very beneficial. It’s either exact match or forget it.
There’s a ton of data to digest here with 20 factors tested.
.com domains, be they tested on their own or as part of exact and hyphenated exact match domains came out on top with .org consistently in second.
If your buying a domain, .com came out a convincing winner both the in correlation data and the user/usage data.
Exact match domains are very significantly correlated to ranking well but there is a significant drop off in influence for non .com domains. .org exact matches were relatively well correlated but beyond that there was a continued progression towards nearly no relationship.
It is likely that these exact match domains bought on less popular TLDs e.g. .info, .net are either targeted directly by Google for looking suspicious in that they are more likely to be bought for their ranking potential or they are bought for their ranking potential and Google penalises them with their entity detection algorithms or through other factors.
Hyphenated exact matches beyond .com ones held nearly no correlation.
Domains with fewer characters and less hyphens in the domain name did significantly better.
And having a partial match domain even a relatively well populated one had only marginal benefit.
Of course the data doesn’t prove causation, but with some common sense and mental analytics I have come up with a handy list of takeaways for the next time you are buying a domain:
- Buy .com, .org, or a local TLD (.com preferably).
- Avoid other TLDs like the plague!
- Search hard for an exact match, but don’t dilute the brand of the site to get one.
- No hyphens please (unless absolutely necessary).
- Shorter is way better.
- If you can’t get an exact match, don’t compromise branding to get a partial match, its not worth it (although having your main keyword in the domain name might be a good branding idea).
- Entities are important, own your space with marketing, PR, clever link building, microdata, etc.
Hope you enjoyed the post and if you have any thoughts, ideas, criticisms, possible explanations please leave a comment below.