Domain Authority

New Correlation Data Suggests Enhanced Importance of Site Wide SEO

 

SEO’s are huge believers in signals relating to Google’s overall perception of a website.

It makes a lot of sense, if Google can understand that Wikipedia’s articles are typically of a higher standard than eHow’s then they can make better decisions on the quality and relevance of web pages on these domains.

By using this data search engines can also make quick decisions regarding new content published by these sites. This fresh content wouldn’t have gained the links and other time related ranking factors as an established article, but may still be relevant to the user. This may be particularly true with news or “query deserves freshness” results.

In addition to gathering data that might indicate the quality of content published on the site, it is thought that Google gathers data on what geographical location, type of user, industry, etc the site targets. Much of this data is difficult or in many cases impossible to gather without being Google, for example a site’s average SERP CTR or bounce rate.

Overall it would be fair to say that Google utilises different models to gather and analyse domain level data pointing to the authority of a website as a whole.

The potential value of domain level factors to the webmaster is immense. If you make a single site-wide improvement, it may impact the ranking of several thousand pages on the site. Domain level SEO offers easy to implement strategies that can hold a much higher ROI than page by page factors.

What data is collected by Google and how much influence it has in the overall ranking of a web page has been theorised and debated for many a year.

Overall what we will see in this article is that domain authority signals are relatively highly correlated, and that for the most part, many of the industry’s theories surrounding these factors have largely been correct, which is refreshing in light of some stunning on page factors’ correlation data.

The study

Over the past 2 months I have gathered data on 31 domain authority signals, for the top 100 results in Google, for 12,573 keywords.

I have analysed this data using Spearman’s Rank Correlation Co-efficient, looking for relationships between individual factors and ranking in Google.

I have also studied several other areas of SEO. I have published some of these results (including domain name related factors and on page factors) although some results haven’t been made public yet and will be published over the coming weeks.

This is all part of a greater project to bring more science to SEO and make it a truly data driven industry.

There are inherent issues with correlations and they don’t prove anything per se, but as I have covered these issues before I won’t rehash old information, what I will suggest is – that if this is your first time on the site, please read this and this.

I would like to thank Link Research Tools for generously providing me with free access to their highly useful API from which all the below correlations are derived.

Please note: while domain level link metrics could be included in this post I have decided to deal with all link related factors in a separate post which will be published in the near future.

Data

Chart: Domain Authority SignalsDescription: Tags: Author:

If you wish to see the keyword by keyword correlations that resulted in the mean correlations reported above, feel free to download this spreadsheet with all the relevant data.

Definitions

Here’s some handy definitions in case you aren’t sure what some of the above factors are;

  • Domain age, is the time since the domain was first registered.
  • PageSpeed rating, is the Google measured score out of 100 on how well a page is performing with regards to several indicators of how quickly a page loads. The higher the score the faster the performance.
  • Days to domain expiry, is the time until the domain expires or needs to be re-registered.
  • Alexa and Compete rank, are both independent measures of how much traffic a site gets. The lower the score, the more traffic the site is supposedly getting.
  • Basic, intermediate and advanced reading levels, are Google measures, of what reading standard a given page is at.

 

Trust indicators

Chart: Domain Trust IndicatorsDescription: Tags: Author:

Google are always trying to figure out how trustworthy a site and its content is. Many theories have emerged as to what factors likely impact the trustworthiness of a whole site.

Domain age, is a classic and while I personally am sceptical about its use as a direct ranking factor, it does seem to have a strong relationship to ranking well in Google, with a near 0.2 correlation, which is highly significant.

How much of this can be written off due to the increased time available to established sites to build links and content and of course just the pure common sense – that a site running for a significant length of time will only have survived by providing for a user’s needs, is hard to determine. Domain age is a factor that’s impossible to manipulate, only worthy for consideration in the procurement of a new web property.

But by saying that its impossible to manipulate, I am then strengthening the case for Google’s use of the factor. So the truth is, its difficult to say whether its a factor or not. It does correlate well, so I would suggest that if you come across a situation where domain age is being considered give it some but not substantial weight in whatever decision you are making.

Homepage PageRank, and PageRank in general is one of the most hotly debated topics on the SEO circuit. We all know of the PageRank Toolbar’s problems and unrepresentative view of the real PageRank Google calculates and uses within their algorithm.

But at the same time the social data Google may pull from APIs may be more complete than the data I have access to and the internal Google link graph is even larger than the gigantic SEOMoz link graph yet we treat these representations of what Google sees as perfectly good.

My point is not that social data and link counts should be disregarded but that perhaps some, if not all of our suspicion at the value of PageRank as a metric is misplaced.

The importance of PageRank is backed up in its mighty performance in the correlation study, the highest correlated domain level authority signal at .244.

This and data on domain level link metrics which I will be publishing in the coming weeks has solidified my view that Google certainly weights and utilises domain link popularity in the ranking of content on a site.

Thus it is reasonable to recommend the already popular theory of building links to the homepage and domain as a whole.

Whether homepage link building warrants special treatment, is dubious and I would in general advise a strategy of building links to a domain as a whole, linking to the homepage only when it feels right and not because of any particular strategy.

Days to domain expiry, is an intriguing and interesting idea, that how long the webmaster registers a domain into the future is an indicator of the webmaster’s intent at creating a long-term user resource.

The marginal correlation at .089 probably suggests its minimal to lack of weight within the algorithm. In saying that, it is an easy and inexpensive factor to manipulate and even a marginal boost in search engine performance would be worth the puny risk.

There have been theories in the past which suggest its importance to newly registered sites, which again complies with basic common sense.

I can recommend registering your domain for 3+ years as a simple, one time, SEO strategy that may or may not impact ranking but certainly has no significant downside.

Site size

Chart: Site Size Correlation DataDescription: Tags: Author:

Alexa and Compete rank, I doubt whether the amount of traffic a site gets is a ranking factor. But its significant correlation may be indicative of a deeper positive correlation from Google towards larger sites.

Whether this is due to ranking factors in favour of larger sites, these sites performing better in non-discriminative factors or something else is worth pondering.

What I will say is that in general sites are large because they are useful to users and its a search engine’s job to try to find sites that are helpful and useful for users.

The same logic should track for the number of pages in Google’s index of a site,while this is highly unlikely to be a direct ranking factor it is perhaps an indicator of other factors actually implemented in the algorithm.

If the data is taken at face value, the then it would appear somewhat surprising that larger sites are performing worse, although the reliability of Google’s provision of this data appears to have impacted results.

I would like to test this factor and other similar indicators further before drawing a definite conclusion.

Geographic targeting

Chart: IP Location of Web ServerDescription: Tags:

The near random correlations for the geographic location of host servers is not surprising and in fact not very interesting at all.

I tested it purely to check whether there was any significant correlation but I didn’t expect there to be as  I conducted my searches from which these correlations are drawn, on Google.com.

The theory of geographic targeting is largely protested to be in use in non USA countries. In the future I hope to conduct studies on non-US versions of Google and to recheck this factor, but for the meantime the data is inconclusive and the current theories within the industry on server location should be followed.

Reading Levels

Chart: Homepage (Google) Reading LevelsDescription: Tags:

While the data is somewhat flawed in that Link Research Tools didn’t return data on a significant number of domains for this factor and the fact that homepage reading levels may not be the same as page level reading levels, the idea and the testing of such a factor is very interesting.

It is something that I believe Google to be using as a factor in the personalisation of search results. For example if they have figure out you are an 8 year old, then maybe you don’t want Shakespeare or research papers returned and you want content written in the language that you as an eight year old use. Not to mention the fact that not many eight year olds are searching for “Macbeth” or “quantum physics”.

A broad correlation study is not conducive to making a recommendation on what language you as a webmaster should use, but it is an interesting topic and something that you should consider when you are writing. Who are your audience and are you writing in their language?

Registrar

Chart: Domain RegistrarsDescription: Tags:

This was a rather cheeky test, and was never likely to reveal a ranking factor, more likely to represent the success achieved by sites registered through the above registrars.

I wasn’t surprised to see GoDaddy with the worst correlation as its add-on products and the clientèle don’t quite indicate quality or high editorial standards, not that many registrars do.

Once you understand and are disciplined with your implementation of SEO and general website ownership standards and strategies then the registrar you choose shouldn’t impact your ranking. But if you are new to the game or likely lead astray, then a registrar and host that promotes these standards may prove a more fruitful path.

Miscellaneous

Chart: Other Domain Authority SignalsDescription: Tags: Author:

The PageSpeed ranking is important, it suggests that if a site follows good principals with regard to the loading of content it will be rewarded with higher rankings. Tests on a page by page basis would be even more conclusive, but this reasonably high correlation for homepage level PageSpeed vindicates some of the excitement generated by Google announcing it used site loading speed in rankings.

The incredibly large correlation for both total and nofollowed external links on the homepage of a site is puzzling to say the least, although the internal data seems more explainable.

While I have some ideas on what may be causing such large correlations, primarily surrounding the type of site that would link to another website from its homepage, I have no real explanation. If you have an idea, guess or have experienced this in the field then please leave a comment below the post.

Social metrics

Chart: Homepage Level Social Media MetricsDescription: Tags:

Wow! I saved the best till last.

Some super interesting social media correlations, with the general theme being that social media is really important.

The fact that Facebook and Google + links to the homepage of a site are the lowest correlated of the bunch is rather strange. The Facebook data could be explained by a possible block on Google accessing FB data. But Google Plus?

Perhaps this indicates that homepage social media shares are not used as a ranking factor but that the other social networks have such a strong user base, that recommend quality content that these social media shares actually represent a measure of the quality of the site as a whole, hence explaining the high correlation.

Also the fact that Google + has a relatively small user base, may mean that its disruptive influence on other factors such as links arising from the additional traffic sent to the site by high levels of sharing of the site on Google + is minimised.

Another explanation is that Google is using Digg, Reddit and StumbleUpon data more than we know about and we should focus more effort on these social networks and Twitter.

But again I’m not certain what these correlations mean, if you have any ideas on these correlations or you have seen Reddit, Digg or StumbleUpon marketing result in increases rankings for your site then please leave a comment below.

Further study of these factors on a page level basis would tell us more about these speculations.

Summary

The correlations for domain level authority signals are comparatively higher than those seen by on page factors.

Domain level factors are ideal starting points for an SEO and often provide a one time, easy change that could, based on the above results, have a substantial impact on ranking.

Even if you disregard the individual factors above as ranking signals, it would still be more than fair to conclude that domain level SEO is very powerful and you should be constantly trying to improve the domain, through site-wide enhancements.

Some of the results, in particular the social and homepage links are somewhat puzzling and I am looking forward to hearing what people think are the likely causes of such strong correlations.

I will be publishing the link related domain authority factors in the coming weeks, so stay tuned.

11 thoughts on “Domain Authority

  1. Winooski

    Quite enlightening, to say the least.

    I’m particularly interested in the PageRank correlation. A few years back, Google removed PageRank reporting from their Webmaster Tools because, per Google’s Susan Moskwa, (as reported in http://www.seroundtable.com/pagerank-webmaster-tools-12767.html):

    “We’ve been telling people for a long time that they shouldn’t focus on PageRank so much; many site owners seem to think it’s the most important metric for them to track, which is simply not true. We removed it because we felt it was silly to tell people not to think about it, but then to show them the data, implying that they should look at it.”

    What your analysis appears to show, however, is that homepage PageRank value is significantly positively correlated with Google ranking. As the whole “correlation is not causation” motto reminds us, that doesn’t necessarily mean that a homepage with a high-value PageRank *causes* a better rank…but it might be the case that the things a webmaster has to do that yield higher homepage PageRank values –namely, build valuable inbound links to the homepage– also happen to yield higher rankings. So was Ms. Moskwa and her employer misleading us? It would appear to be the case.

    Sidebar: We shouldn’t discuss PageRank without noting that a page’s true PageRank value is never known to us. Instead, the 0-10 ranking represents PageRank *tiers* in which the page in question belongs, but as to whether those tiers are equally-sized segments on a linear scale or some other scale (e.g., logarithmic), we just don’t know.

    Reply
    1. Mark Collier

      Great comment, I wouldn’t say she was misleading us. But that she was right in saying that many webmasters focussed too much time on PageRank and not enough on building quality content and other factors.

      Having said that in the upper-echelons of the SEO world, it appears we have gone full circle with the metric devalued too much, to a point at which it is almost disregarded.

      I agree there are issues with how the Toolbar is measured, but as I stated in the post, there are also issues with many other SEO metrics that we treasure.

      Totally agree that it is the actions that result in a high PageRank score that cause ranking, but that PR is a good measure at how you doing at these actions.

      Reply
  2. Gary

    Hi Mark

    Great post, I like what you are trying to achieve here

    Can I ask why you feel a correlation of 0.2 is “significant?”

    Best

    Gary

    Reply
    1. Mark Collier

      Hi Gary

      Good question, my primary reason is that in comparison with many other factors which I have tested and others have personally found highly effective, a .2 correlation is relatively high.

      For example it is higher than exact match .com domains and all on-page factors I analysed.

      In addition, as the Google algorithm is made up of thousands of contributing, individual factors, I can’t hold the correlations found in this study to quite the same standard in terms of significance as would traditionally be applied through more focussed scientific tests.

      Thanks

      Mark

      Reply
  3. SeanGolliher

    The first thing that should come to ones mind when looking at statistical data is “why is this data incorrect”. I glanced at the first column of your data regarding domain names and you averaged all the correlation coefficients together. If I am reading this correctly they range from 0.18 to -0.18. Which is an indication of why correlation coefficients aren’t additive. You need to provide mathematical proof that you can justify this and also discuss your level of significance. I am not sure what the long-term goal of this website is but I would think you would start with a mathematical justification for your approach to the overall problem. This type of analysis was proven to be inconclusive a while ago when analyzing a similar study on LDA. Running simple correlation exercises is the wrong approach to the problem. Without mathematical proof of a long-term strategy you are really spinning your wheels and applying the wrong mathematical techniques to the problem. There are other ways to go about this. We talked about some of the issues with this approach in 2010. http://www.seangolliher.com/2010/seo/185/ and some other explanations about adding together these coefficients are here: http://www.miislita.com/information-retrieval-tutorial/a-tutorial-on-standard-errors.pdf . Much of your data for domain names shows zero or negative correlation values. 

    Reply
  4. Barry Magennis

    Hi Mark, just discovered your website. Very interesting reading and a new fresh approach such as yours is very much welcomed. You mention query deserves freshness, I was wondering if you had any correlation factors for ranking in search terms where QDF is the overriding factor on page 1 for a particular keyword?

    Reply
    1. Mark Collier

      Hey Barry, thanks and great question. I haven’t extracted any specific correlation data in terms of subsections of SERPs like QDF, but I am running a second, lerger iteration of the study in 2 months time and will most likely look at factors like that.

      Reply
  5. Rich Brooks

    Do keep going Sir! 

     

    Good post (and concept) though i encourage you to look at dynamic
    modelling over correlation, examine error correction models and state change
    for algo proxies that will reveal magnitude too (and ultimately enable us
    SEOers and Clients to allocate resources…. erm… Optimally :))

     

    I note your acknowledged redundancy of IP location (for self or BLs?).
    As you’ll already know ‘IP-location-as-authority’ is
    hugely query-specific and depends whether we’re invoking local variables in the
    algo… clearly we may want a diverse, international BL footprint or otherwise a
    specifically local one. Regardless, we are behoved to maximise IP Block/Class-C
    separation in one of either 2 or three dimensions (once the above is
    determined). The math in this gets more fun as we add further dimensions (e.g.
    IPv6). It would be nice to put heads together on this but, again, its
    query-specificity reduces any real insight…

     

    Keep
    up the good work

     

    Best

     

    Rich

    Reply
  6. victor willemse

    Great post Mark,really interesting find on the social media front, will really have to look into the stumble and digg for my sites.. have only been implementing fb and G+

    Reply
  7. drew

    I just found your site and absolutely love the work you’re doing. This may or may not be because it validates my own beliefs about SEO.

    Is there any chance you’ll be doing comparative analysis of factors pre and post Penguin/Panda updates?

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>