Domain Name SEO

Cartoon of wrenches building a domain nameLast week I published data on 342,740 domains that I extracted from the dataset I have built for TheOpenAlgorithm project.

What I looked at was mostly from a user’s point of view, not very scientific but pretty significant. I wanted to show, what the user was used to/found normal in terms of domains. What domain TLD (.com, .org, .net, etc.) they expected and number crunching on average domain lengths.

If you don’t have time to read the post there were two basic findings.

  • If your buying a domain get a .com, turns out .com domains are heavily represented in the dataset with 78% of the domains I looked at being a .com. Even if you want a local TLD e.g. .fr or .ie or you want to go for .org, .net, .info, etc. its really important that you try and get your hands on the .com version of the domain, users are so used to it and assume a domain is .com unless its particularly obvious to them.
  • The second finding, if less interesting was that the average domain length was 15 characters long but the most common length being 8 characters. Obviously common sense applies here, shorter = better, but the domain must make sense to your audience.

 

Today I’m publishing my first correlation data!

If your not sure what a correlation study is or what its good for read this article!

A correlation shows the size of the relationship between ranking well and whatever factor I am testing.

All correlations are between -1 and +1, with a negative (-) number meaning a negative relationship i.e. that as you do more of it your rankings are likely to suffer. And a positive number means as you do more of it your rankings are likely to increase.

The closer the correlation is to either of the ones (-1 or 1) the stronger the relationship i.e. the more powerful/important the factor.

For example a 0.8 correlation is more significant than a 0.3.

Correlation studies aren’t perfect and don’t prove causation. If you haven’t heard of correlations before then this post on what they are, how to do one and what are their weaknesses will be very helpful.

I’ll be looking at a few factors at a time and then publishing the data on that set of factors. I’ll be posting more data on various factors over the coming weeks and months and I hope to break them down into actionable chunks and sections within the algorithm that SEOs/readers can digest them easily.

But today I’m looking at the factors that SEO’s and new webmasters should consider before they buy a domain name.

Domain TLD

Here’s the correlation data for what TLD of a domain had.

Chart: Domain TLDs/Endings CorrelationsDescription: Domain TLDs Correlation Data.Tags: Domain TLDs

 

The correlations are very low which likely means that there is little or no relationship, but the minor correlations do show a mild preference for .com and .org domains while .net, .info and .us were negatively correlated.

It’s hard to say what is causing this small correlation, we know Google has taken action against individual TLDs in the past and it seems probable that some TLDs are targeted as spam or credible but the small correlations seen below are more likely to be a combination of Google’s preferences in domain TLDs and the types of sites that register each domain TLD.

While the correlation data below isn’t that significant it does seem to be in line with other data I have seen (some of which is below) which leads me to believe that there is merit in it and if you were down to a straight shoot off between two domain TLDs (and couldn’t get both) this would be a neat tie-breaker.

Probably not something to get too worried about but interesting data.

As you will see throughout this post .com domains are way out in front and .org domains are the next best, but it appears that beyond those you are in the danger zone unless you are registering a local TLD like .fr or .co.uk.

Exact Match Domains

One of the most heralded and contentious factors among SEOs  is exact match domains i.e. domains that are the same as what the user searched for e.g. keyword: “the open algorithm”, exact match=http://www.theopenalgorithm.com.

Previous studies by SEOMoz have shown it to be a very powerful factor.

There are a number of points to be aware of before reviewing the findings.

Exact match domains make sense from a user’s point of view. If a domain is an exact match for the search query then it is likely to be relevant to the query. In addition it is possible that the user is searching specifically for that site.

There’s a potential anchor text boost in having an exact match domain. When you own an exact match domain it is likely that more sites will link to you with the keyword as the anchor text, for example I would imagine that tons of webmasters link to SearchEngineLand just as I did there, with the site’s name as its anchor text.

As a result there is the potential for the correlation for exact match domains to be slightly inflated due to its presumed benefit to anchor text related factors, unless Google have a clause in their algorithm to negate this effect.

If a domain is a company or some organisation then surely the fact that it is an entity is a better reason to show an exact match domain as opposed to purely the fact that it is an exact match domain. For example seomoz.org is highly relevant to the search query “seomoz”, but is seo.com the best for the search query “SEO”?

We know that Google do a reasonable job at figuring out whether a domain is an entity or not which some SEO’s believe is the reason why exact match domains do so well and why as an SEO tactic they might not be as powerful as data suggests. This is down to the fact that the domains that are entities e.g. “seomoz.org” are exact matches when people search for that entity. The argument is that these entity searches impact all the non-entity domains that happen to be exact matches in these types of studies and in my dataset.

Having said all that, I’ll let the data talk and you can infer your own conclusions.

Chart: Exact Match DomainsDescription: Tags: Author:

 

Comments: These correlations are really interesting and very significant. They are pretty much in line with SEOMoz’s results although they only tested for exact match and exact match .com domains. I have run tests on other less influential factors and comparing the above data to those results and the SEOMoz correlation study it seems that exact match domains are one of the most powerful factors, but they seem to have declined in significance over the last few years.

SEOMoz reported a 0.38 correlation for .com exact matches in their original study and then a 0.22 correlation in their latest, my data shows a further decline which seems important to note.

I would speculate that this decrease in the correlation of exact match domains can be attributed to Google refining their algorithms to detect when an exact match domain bought for ranking benefits and when it is an entity that deserves the push up in the rankings.

The correlation data broken down into the various domain TLDs is important because beyond .com and .org exact matches there is a significant drop off in terms of a relationship between having one and ranking better in Google.

There are a couple of reasons why this may the case:

  • Google may value these domain TLDs less and implement algorithms to penalise (or reward others more) for exact match domains.
  • The domains with the less common TLDs may have been bought because there is less demand for them and therefore getting a domain that the webmaster believes will rank well is easier. Thus it would be other algorithmic factors (potentially the entity extraction ones) that would penalise the non .com exact matches.

This information is very important if you are registering a domain, it seems highly likely that something, either directly in the algorithm or an indirect factor is causing .com exact matches to rank significantly higher than its counterparts.

I did the parse of the search results from proxies within the US which doesn’t have a national TLD (although .us is its official one, in practice its not widespread in use and doesn’t hold the value of other local TLDs) so its hard to tell if other local TLDs, for example .co.uk or .ie, which are more popular in their home countries would have similar correlations to .com exact match domains.

Correlation would most vary based on the country and how the local TLD was used and managed in that country.

In the future I hope to run crawls from proxies within other English speaking countries with these prominent local TLDs, and will then be able to answer that question.

Note: The amount of data available for .info and .us domains wan’t anywhere near as much as the other TLDs and thus the size of the scientific error is likely to be higher. I have a very large dataset (1.2 million URLs) and because of this and the fact that the results seem in line with both common sense and the TLD correlation data above I suggest that they are relatively accurate but are likely to be slightly less accurate than the .com, .org and .net correlations.

Hyphenated Exact Match Domains

Ah yes, the good old hyphenated exact match. Often portrayed as the next best option if you can’t get that exact match .com domain.

Lets see if that portrayal has merit:

Chart: Correlation Data for:Hyphenated Exact Match DomainsDescription: Hyphenated exact match domains correlation data.Tags:

 

Wow. That’s really interesting. Hyphenated exact match domains are just nowhere near as correlated as exact match domains without the hyphens.

Plus they aren’t very user friendly, so maybe its time to rethink our strategy on hyphens?

The only potential downfall vs. the non-hyphenated correlation data is that it is again even less likely to be an entity using hyphens in their domain name.

Once again we see .com’s on top with .org in second.

Note: due to insignificant quantities of data I didn’t test .info or .us domains for this factor.

More Data

Now we’re delving into some more unique factors.

Chart: More Domain Related Correlation DataDescription: More Domain Related Correlation DataTags:

Partial match domain is when you have the keyword in your domain but its not an exact match domain e.g. “tech” and techcrunch.com.

Partial match ratio mean what percentage of the domain is taken up with the partial match. With the Techcrunch example it would be: “tech” = 4 characters, domain (“techcrunch”) = 10 characters, partial match ratio = (4/10)*100 = 40%

 

Note: the * beside partial match domain, partial match ratio and keyword is first word in domain means that I have excluded exact match domains in calculating the correlation for these factors. This is just common sense because Google wouldn’t reward exact match domains twice for a very similar factor.

Also note that the negative correlations for the number of characters and hyphens in domain name means that as the length/number of characters/hyphens in the domain name the ranking of that domain decreases i.e. longer/more is worse, shorter/fewer = better.

I suspect that the number of characters in a domain name is not something Google worries about unless it penalises very large domains. But what this large negative correlation most likely shows, is that there are other factors that are impacted by having a large domain.

For example the social sharability of your domain is reduced because in social cyberspace shorter = better. Also websites probably lose out on that brand factor or potential type in traffic due to the increased character length.

With the low correlations for partial match domains and the partial match ratio, it appears as if having the keyword in some of your domain isn’t very beneficial. It’s either exact match or forget it.

Summary

There’s a ton of data to digest here with 20 factors tested.

.com domains, be they tested on their own or as part of exact and hyphenated exact match domains came out on top with .org consistently in second.

If your buying a domain, .com came out a convincing winner both the in correlation data and the user/usage data.

Exact match domains are very significantly correlated to ranking well but there is a significant drop off in influence for non .com domains. .org exact matches were relatively well correlated but beyond that there was a continued progression towards nearly no relationship.

It is likely that these exact match domains bought on less popular TLDs e.g. .info, .net are either targeted directly by Google for looking suspicious in that they are more likely to be bought for their ranking potential or they are bought for their ranking potential and Google penalises them with their entity detection algorithms or through other factors.

Hyphenated exact matches beyond .com ones held nearly no correlation.

Domains with fewer characters and less hyphens in the domain name did significantly better.

And having a partial match domain even a relatively well populated one had only marginal benefit.

Actionable takeaways

Of course the data doesn’t prove causation, but with some common sense and mental analytics I have come up with a handy list of takeaways for the next time you are buying a domain:

  • Buy .com, .org, or a local TLD (.com preferably).
  • Avoid other TLDs like the plague!
  • Search hard for an exact match, but don’t dilute the brand of the site to get one.
  • No hyphens please (unless absolutely necessary).
  • Shorter is way better.
  • If you can’t get an exact match, don’t compromise branding to get a partial match, its not worth it (although having your main keyword in the domain name might be a good branding idea).
  • Entities are important, own your space with marketing, PR, clever link building, microdata, etc.

Hope you enjoyed the post and if you have any thoughts, ideas, criticisms, possible explanations please leave a comment below.

21 thoughts on “Domain Name SEO

    1. Anonymous

      Thanks, I was expecting some difference but the size of difference was much larger than I thought it would be.

      Reply
  1. Rishi Lakhani

    Sometimes gut feel is is just as interesting to analyse as a study – interestingly, most my exeriences with tests show a pretty much the same picture as your results her, excet for the .net extension. I still get them running at the same strength for ranking as a .org, and in some niches better than a .com.

    Reply
    1. Anonymous

      Interesting that you see that in your experience. Still that’s why data’s great because some of those little beliefs we have may not be true and big data has a way to dispell big myths.

      It may be the case that domain TLDs are treated differently in different industries or that the correlation is not due to a direct factor but other interconnected factors.

      Having said that the correlation data isn’t perfect and while it seems to be one of the better models around there are going to be things the study gets wrong or is off by some small correlation point.

      Reply
  2. Autocrat (Lyndon NA)

    The below assumes that your data set(s) did not have equal quantities of domain types (TLD, keyworded vs non-keyworded, hyphenated vs non-hyphenated etc.).
    (I appreciate that is highly unlikely to obtain such things and have unbiased data :D)

    Okay – this is likely to make me unpopular, but…

    You seem to be attempting to state what Users will be comfortable with in the SERPs, based on volume of various TLD types.

    Of course there are a higher number of .coms in the results;
    * They are older
    * They have been around longer than many other ccTLDs
    * They are constantly touted as being more trusted.
    * They are often pushed (falsely) as being more SE friendly.

    Further – how can you base a statement of what Searchers will be happier with based on volume present … surely such a statemetn should be based on research with actualy searchers/users?

    .

    For those reading this – there is No Direct Ranking Factor in TLD type.
    Google makes no distinction between a .com or a .net or a .co.uk.

    .

    What’s next … Hyphens.
    Is it not possible that due to points similar to the .com above, that there are fewer such domains?

    .

    At the end of the day – you are looking at results that are skewed due to, well, information like that presented here.

    It basically becomes a self-fulfilling prophecy :(

    +

    Your link to SearchEngineLand is malformed (you’re missing the TLD).

    Reply
    1. Mark Collier

      Thanks for your comments and thoughts. I’m not quite sure you understand the idea and methodology behind a correlation study and its purpose, I would recommend you read: http://www.theopenalgorithm.com/the-project/the-science-of-correlation-studies/

      But you do ask some good questions that other people may have so I here are my answers:

      Question: ”
      The below assumes that your data set(s) did not have equal quantities of domain types (TLD, keyworded vs non-keyworded, hyphenated vs non-hyphenated etc.).”

      Answer: Getting such a dataset would not represent Google’s search results or algorithm and therefore would poison the dataset and the results. It is a correlation study and therefore you want results as Google sees it not as some sort of unbiased, evenly distributed outlook of the web. 

      Question: “You seem to be attempting to state what Users will be comfortable with in the SERPs, based on volume of various TLD types.
      Of course there are a higher number of .coms in the results;* They are older* They have been around longer than many other ccTLDs* They are constantly touted as being more trusted.* They are often pushed (falsely) as being more SE friendly.
      Further – how can you base a statement of what Searchers will be happier with based on volume present … surely such a statemetn should be based on research with actually searchers/users?”

      Answer: Ok, so the first part about of the reasons behind why .com domains are more popular are true but I’m not trying to prove why they are more popular I am merely stating they are more popular.

      The second part about using volume as a means to determine what users are used to is true. A better test would be to do some sort of study into user behaviour. But failing that all I have done is state what they are used to, with more .com’s being seen by users you can say that they are more used to .com domains and it is well known that humans are comfortable with what we are used to.

      Question: “For those reading this – there is No Direct Ranking Factor in TLD type.
      Google makes no distinction between a .com or a .net or a .co.uk.”

      Answer: ”First of all I never said there was a direct ranking factor I am merely publishing the data. Second, neither of us can say for sure if there is a ranking factor.

      But I must say that you are totally incorrect that Google makes no distinction between TLD’s. Why do .ie domains do well in Ireland and not so well in USA? Why do .co.uk domains do well in the UK and not Ireland?

      It almost definitely has something to do do domain TLD’s!

      Question: “Is it not possible that due to points similar to the .com above, that there are fewer such domains?”

      Answer: Yes there were fewer such domains, but that would have no impact on the correlation data because correlations aren’t based on volume but on ranking, again I refer you to the above article or Googling correlation studies.

      Question: “Your link to SearchEngineLand is malformed (you’re missing the TLD).”

      Answer: The link was for demonstration purposes for how SOME webmasters would link to SearchEngineLand (others perhaps you would link with the TLD.)

      Question: “At the end of the day – you are looking at results that are skewed due to, well, information like that presented here.
      It basically becomes a self-fulfilling prophecy.”

      Answer: Please read up on correlation studies and how they are conducted and how they work, I have taken several serious steps to ensure the integrity of my dataset.

      Reply
      1. Autocrat (Lyndon NA)

        Nice answers.

        I’m not sure you understand the problem of bias in the purchase process, nor the knock on effect that has had on the SERPs, and thus the data you have.
        :D

        Seriously – the general point I’m making (above and now) is that the TLDs have no direct ranking influence.
        (and yes – I can say that – as it’be been covered with Google numerous times in the past … even directly by myself)

        The only direct factor that a TLD will have in regards to Ranking Global vs Geo
        Everything else is merely coincidental.

        That said – a TLD (be it G or CC) may influence searchers/visitors … but no idea how many searchers actually look at the URL in the SERPs.

        So though it’s a nice study – it has little practical use/value.
        If it was a ranking factor – it would be fantastic though.
        So don’t think of this as being a knock, it’s not.
        It’s solid work, and very well done!
        I hope you can keep it up.

        Reply
        1. Mark Collier

          I have to disagree.

          While I agree that MOST LIKELY general domain TLDs aren’t a direct ranking factor, it is true as you mentioned that country specific TLDs are a direct ranking factor and it is also true that Google have taken direct action on TLDs before: http://www.dailyblogtips.com/google-removed-all-co-cc-domains-from-its-index/

          It is also true that I have stated that the correlation data doesn’t prove causation.

          Additionally we, you or me can’t tell what Google are doing, so nobody can say with absolute certainty what is a factor and what isn’t.

          Have to counter that the study has little practical application, it does, think about it if you were buying a domain today you would probably want a .com exact match and if that wasn’t available you probably wouldn’t worry about a partial match or a hyphenated match and just look at the strength of the brand.

          Reply
    2. Simon Dalley (SEO Consultant)

      I disagree with your points on hyphens – I remember years ago when all you had to do with buy a domain with hyphens in it and you could get a rank, it was getting to be a real problem in the rankings because everyone was doing it and Google’s index was getting full of these junk websites. I am sure Google targeted the domains containing hyphens, I don’t see hbyphens coming back in the serps anytime soon.

      I also think that Google weights .coms more than any other TDL – I’ve seen this in our results. I normally advise clients to choose a .co.uk domain (us being in the UK) but recently I’ve found that a .com with a partial match keyword combination can far outstrip the the performance of the .co.uk domains in the UK.

      Reply
  3. Cuyler Pagano

    Wow great research project. This has some interesting results I didn’t expect. This will be good knowledge for work with start up companies who haven’t chosen domain names yet or for companies looking for a new domain name if they are branching out or expanding.

    Reply
  4. Cuyler Pagano

     Wow great research project. This has some interesting results I didn’t expect. This will be good knowledge for work with start up companies who haven’t chosen domain names yet or for companies looking for a new domain name if they are branching out or expanding.

    Reply
  5. José Féria

    It is very good the study you´ve done. I work on the web several years before appearing google and I have had the privilege of observing the evolution of the ranking in google. One of the factors I have seen lose strength is the use of exact domain macht, but either way I see that there are many reasons behind it, one is the excessive use of parked domains – this meant that those who use exact match domains with much competition has created a site that actually related to the domain name.
    Yet users still prefer the exact domain macht in the serps, google and greatly respects user choice …
    The choice for the users of dot com domains – are in the era of the dot com, and not in the era of dot org, dot us, dot com, or whatever. When thinking of a domain, we think of dot com, not dot anything. For example, I am interested in SEO, and therefore when I go to the place of better information (SEOmoz) still write dot com, even though the domain name has another extension. I’m sure many of you do the same as many sites!

    There is a difference between creating a brand and Having a brand … it can reach the same place, but one already exists before it get there

    booking.com is “booking” because of its domain dot com
    I’m sure SEO.com, if placed in a market with less number of SEO professionals to compete would also be better classified in the serps!

    Reply
  6. Marcos Lujan

    Stunning work. The negative correlation on .net, .info and .us not only matches what I’ve been observing but also makes sense. I come accross a lot of spammy .info and .net sites but not so many .com or .org sites (proportionally) . 

    I would love to see you produce figures for domains which have combinations of negative factors. 

    For example, how did domains that were .net AND had 2 hyphens do vs exact match .com sites without hyphens for domains between say 6 and 12 letters long (excluding hyphens).

    I’m willing to bet when negative factors are added (hyphens+.info) the negative factors have a stronger effect overall.

    Reply
    1. Mark Collier

      Great comment Marcos, I have been thinking about the relationships between factors for a while. I hope to incorporate more of this type of data in my next correlation study, as hopefully I can base these potential interdependent relationships on the findings from this correlation study as your example displays perfectly.

      Thanks

      Mark

      Reply
  7. Julio Fernandez

    Good article; thanks! Under your takeaway, you have “Avoid other TLDs like the plague!”  but as more ccTLDs have content for areas that are not the corresponding country, Google will have to look at the webmaster tools setting for geolocation of the ccTLD.  They already do that with the ccTLD for Colombia, .CO.  where my .CO could have content for readers in the USA or any other country I select, instead of Colombia.  I would love additional info on ccTLDs.

    Reply
    1. Mark Collier

      Agreed that from a direct algorithmic implementation point of view the recommendation may be a mute point, but correlations also likely include indirect factors such as user-sentiment, etc. that impact direct ranking factors, which in this case supports the recommendation.

      Reply
  8. abarley

    The results here are very interesting, the negative associations with .info and .net are something that i have personally noticed as opposed to .org sites and also .com as it appears you dont seem to get as much spam enriched sites with these so it makes sense.

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>