Links – Huge Correlation Between Link Building and Google Ranking

Links have been an integral part of SEO since Google joined the scene.

But recently link building’s popularity has taken a bit of a hit, with many believing that Google have reduced its weighting of PageRank in the algorithm. The emergence social signals and other factors indicating user satisfaction have according to many within the industry eclipsed (or will in the future will eclipse) links as the primary ranking factor.

But this speculation hasn’t been mirrored in my data. Over the course of this post we will examine over 40 link related factors, all of which correlate very well, and a number of which are the most heavily weighted factors in my study.

The main finding from this data, is how well links correlate to ranking in Google. I have tested over 150 potential ranking factors in 6 categories and without a doubt, links stand head and shoulders above any other section of factors.

Link building is a bit of an ugly duckling within the industry, everybody knows its importance, but very few are effective in its practice.

Unlike changing title tags, building quality links requires skill, creativity and determination. Its not easy work, its not the low hanging fruit, but based on the data below, it appears to be the most rewarding.

While I won’t discuss link building strategies in this post, I would like to mention that I feel many strategies are extremely inefficient and unproductive and a lot of the theory behind this area of SEO is fundamentally flawed. I will be publishing some more of these ideas, with anecdotal evidence in the future.

The project

The below data is based on a dataset of the top 100 results in Google, for 12,573 keywords.

I have analysed this data using Spearman’s Rank Correlation Co-efficient, looking for relationships between individual factors and ranking in Google.

I have already published some of the results from the study including domain name related factors, on page factors and domain authority signals.

This is all part of a greater project to bring more science to SEO and make it a truly data driven industry.

There are inherent issues with correlations and they don’t prove anything per se, but as I have covered these issues before I won’t rehash old information, what I will suggest is – that if this is your first time on the site, please read this and this.

I would like to thank SEOMoz for providing incredible access to both their amazing Mozscape API, from which the below results are derived and their expertise and advice. In particular I’d like to thank Rand Fishkin, Dr. Matt Peters and the API support team for all their help.

Data

This Excel Spreadsheet provides the keyword by keyword correlation figures from which the above mean correlations are derived.

Breakdown

Google’s algorithm doesn’t just look at how many links there are to a page, it looks at quality signals, website authority indicators and tries to protect against manipulation.

Basically, just building links isn’t good enough, there are certain kinds of links that are better than others.

Below I have covered the types and areas of link building that are thought to be utilised within the algorithm.

General Links

The correlations for general links, as compared to specific counts such as # of IPs/Cblocks/Domains/Subdomains are significantly lower.

This supports the fact that Google looks at several factors and classifiers when considering the quality of the source of a link.

While this certainly isn’t an interesting finding, it is important from the point of view, that such a conclusion supports a known fact and therefore increases the likelyhood that the data gathered and the resulting correlations are correct and do represent what’s actually happening within the Google algorithm.

I investigate which particular classifiers and types of links would be best in a link profile, below.

Cblocks and IPs

 

Both the number of unique Cblocks and IPs linking to a site are thought to indicate the diversity of a link profile.

Google want to see a variety of sites “voting” for a website’s content. The weighting of each additional link from the same site is reduced relative to a link from a new source.

Knowing this many webmasters began to build “lens sites”, that’s sole goal was to link to the mother site.

It is believed, that to counter this Google implemented an algorithm that could figure out if a link was coming from the same source (i.e. the same webmaster) as the site that was being linked to.

There are a number of factors that Google likely use in such an algorithm, but it would make sense that Google treat links coming from the same IP or Cblock as more likely to be coming from the same webmaster, and thus marginally less trustworthy.

While the data doesn’t prove or disprove this theory, it does show a higher level of correlation for the # of Cblocks/IPs linking than for a general count of the # of links to a page/site/subdomain. Although the difference is small it could support the above theory.

With this data and using some common sense, I would recommend following the current industry practice of building a diversified link profile.

Domains and Subdomains

 

Again the above data further enhances the argument for a diversified link profile.

It also shows a potentially interesting albeit small difference between the # of unique domains vs. subdomains linking. With the # of unique domains coming out on top.

While the difference is too small to make a concrete conclusion, such data would certainly point us in the direction of building links from a diversified set of domains, and treating subdomains on the same root domain as related to each other and therefore each additional link from a separate subdomain on the same root domain as slightly less valuable than the link before it.

Links to the page

The above data conforms to the seemingly obvious conclusion that if you want to get a page to rank well, then building links directly to that page is the best way to get that to happen.

While most SEO’s will find that stupidly basic, I have seen some SEO’s suggesting that domain level links would be more powerful or a better use of time. The data just doesn’t support that strategy if you are trying to increase the ranking of a specific page.

Links to the page’s domain vs. subdomain

 

Interestingly the strong performance of domains vs. subdomains as the source of a link, is not matched in the location/target of a link. If we are to believe that such marginal differences are important, then the data may suggest (as a number of industry watchers have stated) that Google treat subdomains as separate to the root domain in looking at the host’s (which could be the domain or subdomain) authority.

This seems strange, and I may be reading too much into the data but if the above statement was the case, then Google’s treatment of subdomains as separate sources of content would not be matched by their treatment of subdomains on the same root domain as essentially the same source of links.

If such a conclusion were to be made, then it would be most likely to explained away by the likelihood that Google doesn’t just look at whether its a subdomain or not, and it likely uses much more advanced algorithms to figure out whether a subdomain should be considered part of the same domain.

Thus Google would understand that blogname.wordpress.com is not related to wordpress.com but blog.exampledomain.com is related to exampledomain.com.

Nofollow vs. Followed

 

Here is a classic case of inter-related factors impacting on the correlations of each other, we know that nofollowed links carry no SEO benefit directly, although they may result in some other factors being impacted e.g. someone clicks on a nofollow link and then shares the page on Twitter.

A page with a lot of nofollow links pointing to it, is far more likely to have a lot of followed links pointing to it.

This is because there are standard ratios, different types of links hold within the link profile. And any deliberate alteration by a webmaster is only likely to result in a small shift in those ratios.

There are many inter-factor relationships going on in the above data. Nofollow links may indeed carry no search engine benefit, but could still show the strong correlations, as above.

Marginal differences in the correlations shown by different categories of links, e.g. followed vs. nofollowed may be more important than it appears as face value.

This is why I have read a lot into such small differences.

SEOMoz Metrics

SEOMoz have created a number of algorithms that are meant to mimic Google’s link related algorithms. I don’t know the exact make-up of these algorithms, but I thought it would be interesting the test the performance of these algorithms, to check whether using these metrics as a measure of the success of your link building is a good idea.

If you are interested, here’s the general make-up of these algorithms: MozTrust, MozRank, Domain Authority, Page Authority.
 

Wow! Moz really seem to have done a great job developing their algorithms. In keeping with the above data on the value of page level metrics, Page Authority comes out at an astounding .36 correlation, which is massive, making it the highest correlated factor out of the 150+ I have tested.

Comments

The link related data is in my opinion is on par with the on page factors as being the most interesting and important to the SEO industry. Both lead to the same conclusion, on page factors are by far less important than off page factors.

Links aren’t just about SEO

Building links isn’t just an exercise in SEO, its also an exercise in marketing. Links can drive a lot of direct traffic from people clicking on them and also can build your brand name.

Its important to factor the direct traffic value of links into your link building decisions. This is particularly evident where a second, third or fourth link from the same site, may seem like a step down in SEO importance but may still provide high value direct traffic.

Links aren’t dead!

If I read another article proclaiming PageRank or link building is dead, I’ll scream. Its very simple, the scientific data simply does not support the speculative accusations of the reduced value of PageRank or link building.

In fact in many cases their level of correlation has increased, not decreased since Moz conducted their 2011 study.

Link related factors are far and above the highest correlated set of factors.

While we in the SEO industry recognise the importance of links, I don’t think we covert this mental idea, into action. I don’t believe that SEOs spend the right proportion of their time on link building. And SEO blogs, conferences and experts certainly don’t talk enough about how to do great link building.

There definitely isn’t enough data available on what the best link building strategies are, with the majority of link related blog posts stemming from speculation, not data driven proof, something I hope to address scientifically through this project.

I welcome presentations like this from Mike King, that back up strategies with solid data.

Bottom line – spend a whole lot more time link building.

21 thoughts on “Links – Huge Correlation Between Link Building and Google Ranking

  1. Lyndon NA

    “…
    But recently link building’s popularity has taken a bit of a hit, with
    many believing that Google have reduced its weighting of PageRank in the
    algorithm. The emergence social signals and other factors indicating
    user satisfaction have according to many within the industry eclipsed
    (or will in the future will eclipse) links as the primary ranking
    factor.
    …”

    Yup – and most people still think that Public PR scores are solely to do with Link Value/Link Juice … where as the chacnes are that the metric is actually a combined score of not only Link juice, but a few other factors – such as Trust (of at least 1 sort, if not 2).
    As for Links being replaced by Social … Links are here to stay.  Social is a balancer, a way to help gauge whether the links are deserved/natural etc.
    You won’t see Link Value disappear – it will simply decline a tad as social picks up a bit.

    G handles links differently than it did a few years ago, and distinctly differently than it did at the start.  They make various adjustments.
    But it’s not just “links” – There’s a ton of additionals, which could include things like;
    what type of link, where it is, what’s next to it, what the Origin/Destination pages are about, what the site(s) are about, how trusted they are in general, how trusted they are for specific subjects, whether the page appears natural or contrived, Link text (or img alt attrib.), title attribute, whether there are numerous links from the Origin to Destination, whether there is reciprocity present, whether the accruement  pattern matches natural patterns, whether there were precursors to a rapid aquisition rate, whether there is a natural fall-off rate, the age of the link, the activity on the page/site … etc. etc. etc. etc. etc. etc. and so on.

    So much more than just “links”.

    Reply
  2. Brian Crouch

    Hi Mr. Collier, 

    Link-building certainly isn’t dead, but the term foir it is indeed due for a public image rehabilitation. When a webmaster/site owner creates a brilliant tool, downloadable asset, blog post or slideshow, that isn’t normally called “link-building” in common parlance, though we know that when properly promoted to the right audiences, it will likely attract links (and social shares). SEOs of course say the content is a fundamental first step for link acquisition, thus it’s a step in link-building, but their marketing departments and clients think of it as content creation. They have relegated the term “link-building” to the specific practices of asking/beseeching/begging/negotiating for links, or perhaps more unfortunately using directories, splogs, article sites, to build links. 

    I’m a big fan of Michael Martinez and Eric Ward on the topic. 

    Reply
  3. Mike Blumenthal

    I am not that familiar with the logic behind the construction of the SEOMOZ tools but as I recall the measurements made available were those that correlated well with rank… thus showing that the things it measures correlates well with rank confirms their system design but probably can’t be relied on as an independent proof.

    Reply
    1. Mark Collier

      I think how they were designed is likely to be a company secret, although if they were designed solely to correlate with rank, they would still have value in that they managed to find a balance in their algorithms to represent the balance in Google’s algorithm relatively well.

      Reply
  4. Ben Cook

    If the goal of this project really is to bring better math & science to the industry, why do you continue to distribute these studies while ignoring the critiques of people like Sean Golliher who commented on your Domain Authority post?

    You’ve not yet addressed his point that averaging the correlations of 10k results is not an acceptable method, and yet you continue disseminate your faulty findings.

    The fact that you’re using SEOmoz’s data, while getting help from their “data scientist” and using their faulty methods renders your findings regarding Moz’s metrics completely useless.

    Reply
    1. Mark Collier

      First of all I have addressed a number of the criticisms of the project. Not the one you mentioned above but many others.

      I have been totally transparent about my dealings with Moz, they provided incredible free access to their API and I spent an hour on Skype with Dr. Matt Peters to ask him questions on what method they followed, how they handled certain data, where they got data like social media shares from, etc.

      Other than that Moz had no involvement in the operation of the study, I wasn’t even in touch with them for approx. 1-2 months when I conducted the research.

      They have been a fantastic source of data and funding and by now means biased the study.

      Reply
      1. Ben Cook

         I would love for you to address the concerns raised by Sean. He’s consistently shown that your method is flawed.

        Also, you mentioned Moz has been a source of funding. Have you mentioned that previously or disclosed that somewhere? I admit I might have missed that, forgive me if that’s the case.

        Reply
    2. CandleForex

       If you think the project is flawed, rather than critique it, why not write it yourself?

      Then me and many others have even more info to make up their mind(s).

      Reply
  5. Neil Ferree

    I will withhold my vote on whether the new Majestic “visual” link metric Citation Flow and Trust Flow with rival MozRank • but I must commend you on one of the better explanations of how and why links are in now way — dead in the water. Strong work!

    Reply
  6. Alex Irvin

    I haven’t studied data on well-ranking websites in different industries of varying competitiveness.  I have, however, run several experiments in an attempt to get a page to rank for a very competitive term.  Without fail – links from quality sites that are relevant to the query and which point to a site with decent content are what drives high rankings.  Social sharing, on-page, load time, surely help, but in my experience it is the (quality) links – far and away – which had the most impact.  So, yea, good study.

    Reply
  7. Ted

    Mark,

     

    I love your site and your scientific approach to SEO for
    Google.  I am a bit jealous in the
    respect that I have thought of launching a similar site and find yours is
    probably more interesting than mine would be. 
    You are stifling my motivation. 
    This comment is not all accolades however.

    I am perplexed by some of your interpretations of the data
    in a number of these posts.  Maybe I am
    misunderstanding some of these things. 
    For instance, in many posts you are quick to point out instances where
    your data reinforces assumptions you had about certain aspects of SEO.   You also have pointed out some things that should
    make us rethink our positions on topics such as whether homepage PageRank
    matters more than we think.

    Then we come to this post where your data shows that there
    is a highly positive correlation between nofollow links and high rankings.   It also shows that there is a highly
    positive correlation between followed links and high rankings.   According to the math (if your method of
    analysis is trustworthy) then you have essentially proven a relationship
    between nofollow links and high rankings.  
    Yet, you go on to explain it away and that maybe it’s just because pages
    with lots of followed links also get lots of nofollow links.   And, maybe nofollow links are still
    meaningless for SEO.

    Well, if that is in fact the case, then the fact that your
    statistical analysis shows a high correlation between the nofollow and high
    rankings would suggest that your entire statistical analysis is severely flawed
    to the point of being scientifically useless. 
     It is useless because you can
    only make sense of it if you apply your own prejudices about how you already
    think ranking for Google works.

    If we are really to believe that this method of studying
    rankings by correlation is to be even close to accurate, then you can’t just
    explain away cases where the analysis doesn’t fit with your pre-existing beliefs.

    I am accusing you of doing what politicians do all the time;
    they shape the interpretations of the data so that it appears to reinforce the
    positions they want to take on the subject.

    I think you are on to something with your site and your
    scientific approach.  I just think that
    your scientific methods are nowhere near accurate enough or developed well
    enough to prove anything.  All you are
    doing thus far is using this data correlation as a neat talking point that
    brings a person no closer to understanding the algorithms than he would come
    from simply looking at the top ten results of any search with a handful of
    metrics about each page and domain ranking in the top ten.

    Reply
    1. Mark Collier

      Hi Ted

      I sort of agree and disagree with your comment.

      I accept that correlations aren’t pure proof of a ranking factor and do require interpretation to determine whether or not we should look at them as serious factors.

      Obviously this interpretation could be impacted by my personal biases or opinions which may or may not be true. This is why I try to outline my reasoning for believing something about a specific factor that may mitigate the correlation data, so if others see flaws in my reasoning they can be pointed out and the interpretation amended.

      If we were to take correlations at face value without interpretations that would be wrong and sometimes these interpretations can be wrong, so yes this is a flaw with correlations.

      But if you provide sound, logical reasoning for your interpretations of these results then in the majority of cases these correlations provide a really good insight into how Google’s algorithm is likely to work.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>