Hi, I am Mark Collier and I am the author, owner and writer for this blog.
In a sentence this blog is dedicated to: finding, weighting and publishing as many of the search engine algorithm’s factors.
Basically I’m trying to apply statistical and scientific methods to prove or disprove SEO theories.
To start with I’m just looking at Google but in the future hopefully I will extend the project to all the major search engines.
This project was originally conceived as a science project. Being interested in SEO and ranking higher in the search engines I knew the importance of the elusive 200 factors, or 10,000 algorithmic elements (depends on how and what you count). And I sat in amazement when a new algorithmic factor was announced by Google or Bing and the webmaster community went wild.
Links, traffic, blog posts, sites restructuring, press releases, countless blog posts, for every new factor announced. Essentially I noticed the pattern that everybody wanted these 200 factors (and the 10,000 elements) and they wanted them badly.
But what amazed me was that very few people were, in an open fashion trying to find all the factors, weight them and tell the world.
Sure there was speculation and the odd bit of research but no real central or coordinated effort to apply science to SEO.
A lot of people say its impossible to reverse engineer a search engine algorithm, and they’re partially right. We’ll never find all the factors, Google is constantly changing, evolving and introducing new factors but we certainly can expand our knowledge of the search engines and gain a better understanding of what makes them tick.
Why don’t the search engines just tell us the factors?
Search engines are complex beasts and if their competitors got their hands on their company secrets they could steal their best ideas.
Another argument Google have put forward is that if everyone knew what went into the algorithm then spammers could abuse it and rank higher than they should.
That’s not an argument I subscribe to as in general I believe that information should be open and Google should be able to react to spam attacks no matter how open their algorithm is.
This blog is not about standing up to the man but if openness and transparency turn out to be bi-products of the project and this blog, that’s really just a side benefit.
This blog is about opening up and demystifying the search industry and proving or disproving SEO theories with science.
I see so many webmasters spending a fortune on SEOs who don’t know a thing about search engine optimization (that’s not to say there aren’t great SEOs out there). I’ve seen SEO reports that cost in the hundreds that believe real PageRank is the same as Toolbar PageRank and you should boost your link building efforts when a PR update is in progress.
And I also see a lot of new webmasters confused and making elementary mistakes that get them penalized and in a way the world and the internet penalized because we can’t find their great content on the search engines.
By opening up this information and having a broader knowledge on what the algorithms are truly comprised of webmasters will be able to do more of the SEO themselves but more importantly, black hat SEOs will be less likely to rank higher than white hat ones if white hatters implement the knowledge while still keeping their great content at the forefront of their websites and blogs.
All you need to know about TheOpenAlgorithm Project: In bite sized format
I was trying out some really cool video production software and I created this cool video outlining some of the more nitty, gritty details of the project in a smooth linear fashion.
Disclaimer: while Spearman’s correlation co-efficient is a good guide to whether a factor is a factor or not, it’s not perfect and doesn’t always imply causation, but more on that later.
Where did I come up with TheOpenAlgorithm.com?
Well I was trying to come up with a name that represented the blog and was original.
The Open part is based on open source which is a term that is based on the idea of crowd sourcing and being more open/transparent than companies have been in the past, which is both a goal of this blog (open up the search engines) and the way this blog is run (feel free to contribute resources, ideas, posts or possible factors).
The Algorithm part is obviously because the blog is based on reverse engineering and creating a hypothetical algorithm that mimics the major search engine’s algorithms, but you probably already know that.
I don’t see any ads, how do you cover your costs?
This is a community blog where people can contribute to the project, as a result I will endeavor to keep the blog free of money making strategies for as long as possible. But there are significant costs to running a blog (hosting, domain, laptop, internet connection, designer) and most importantly the thousands of hours I will pour into the project.
For example the cost of running the first round of tests will be approximately $2,000 + thousands of hours coding, analysing, writing, etc.
As a high school student naturally I don’t have massive amounts of cash and so far I have funded the project from my savings so any donations of time, products, services, ideas or funds are greatly appreciated.
So far a number of companies have given me free access to their APIs, services and tools and a number of individuals have spent time looking over my code or talking with me about search engines and potential algorithmic factors.
But don’t be surprised to see the odd affiliate link here or there either, if I’m referring you to a product I have tested it, used it myself or know somebody who has.
I do plan to make some money from the site in the future, hopefully not with ads but maybe I’ll develop some software or neat tools, who knows?
The reason I am happy to take a loss for the first couple of months of the project is because I have watched The Social Network.
Anyone who has watched the great movie will remember how hard Zuckerburg fought putting ads on the site as he wanted it to “become cool first” and he thought that ads would stop it from being cool. Look at Facebook now and its worth billions.
While this blog is never going to be Facebook and that’s fine, I still want The Open Algorithm to be cool and provide a great service before trying to make some money back from it.
I even wrote an article about this theory.
Just thought I’d warn you about my future plans and what your getting into if you get hooked on search engine goodness!
Update: Since writing this I’ve added a page to the site with all our supporters companies and individuals. These people have donated time, resources and expertise for free to help the project.
Who am I?
Name: Mark Collier
DOB: 30 April 1995
Where do you live?: Dublin, Ireland
Interests: I like algorithms and search. I started learning to program (Python) because of this project and I’m loving it. I’m no expert (yet) but I can get what I need done. I’m fascinated by artificial intelligence and I’m going to take an online course in it and I plan to study computer science in college.
Sports: Cricket (I love my cricket and I play for Clontarf Cricket Club), soccer for Howth Celtic.
Other websites: I had a couple of failures, that did a lot of things wrong in terms of SEO, but I was like 12 at the time so forgive me.
I used to run 2buildbacklinks.com before I sold it to a Danish man, it was relatively successful, if not the best website and some of the content leaves a lot to be desired, but that’s the learning curve all webmasters go through.
In the future I hope to use a number of scientific methods to do the project but for the initial testing round and most likely the second round of tests I will be looking at the correlation/relationship between individual factors and ranking well in Google using a statistical formula called Spearman’s correlation coefficient.
Spearman Correlation Coefficient, in Plain English
A great statistician called Charles Spearman created a fantastic formula that looks and assigns a rating (quantifies) the relationship between two variables. The formula looks like this:
Credit Wikipedia for the image.
Ok, I promised plain English right. So remember we said the formula is looking at the relationship between two variables.
In this case our two variables are the ranking i.e. where in the search results, the page you are testing is placed e.g. #1, #2, #3, etc.
And the second variable is the factor you are testing, for example you might be testing the relationship between PageRank and search engine ranking, so you might have the second variable being PageRank = 5, 4. 3, etc.
Here’s an example:
When you feed this information into the formula you get: a correlation of 0.894 (remember this is only an example).
The correlation you get back will always be between -1 and 1 and the closer to the extremes (-1 or 1) you get the stronger the relationship.
If the correlation is a minus number e.g. -0.5 that means that the relationship is negative e.g. maybe Page loading speed’s correlation will be negative because the higher the loading speed the greater the penalty from Google.
Imagine PageRank did have a positive correlation of 0.894. That would be a very strong positive relationship. That means that there would be an extremely close relationship between PageRank and ranking well in Google.
But correlation doesn’t mean causation, a lot of people think that just because there’s a strong relationship between one variable and another that one influences the other. This is not necessarily the case, it only implies a relationship.
You can have third party factors come into play. For example lets say that Google didn’t count Facebook likes in their algorithm, but we managed to find a really strong correlation between Facebook likes and ranking well.
This would most likely be caused by the fact that an increase in Facebook likes probably means two things, its a good article and might do well in Google because of that. Or more people will visit the article and link to it from their website, meaning higher PageRank and higher ranking.
So what we see here is Facebook likes (again just an example) having a strong relationship with ranking well but not because it is in the Google algorithm.
That’s why Spearman’s Rank Correlation Coefficient isn’t perfect and doesn’t really prove anything, its just a good guide to what is likely or unlikely.
As an SEO this doesn’t matter quite so much, because as an SEO you shouldn’t mind taking steps that increase ranking indirectly as well as directly. So in the example above an SEO should still build Facebook likes to increase ranking its just not in the algorithm.
(Disclaimer: there are cases when the formula will give a correlation when even with 3rd party influence there is no impact of ranking, but that’s why you have to look at the results critically, with common sense in mind and test each correlation for causation).
As a scientist this does matter, because I want to find out what goes into the search engine algorithms and not just what causes a site to rank well, because they are two totally different things.
Now let’s recap:
- You are measuring the strength of the relationship between two variables.
- Our two variables are, the pages position in the search result and the number given to the factor we are testing, so for example I searched for Coca Cola and the Coca Cola homepage came up #1 on Google, then the first variable = 1 and the second variable (in this case PageRank) = 7.
- The influence on the first variable (the ranking) of the second variable in the example PageRank is given a number (that’s what p is in the formula) between -1 and 1.
- Negative number means its a factor that will possibly lower your ranking and a positive number would mean it possibly has a positive impact on your search engine ranking.
- The closer to 1, negative or positive, the stronger the relationship and possibly the greater the weight in the algorithm.
- So if PageRank for example was given the number .894 and PageSpeed (again not a real example) was given the number .1, you would say that PageRank has a greater relationship on your search engine ranking because .884 is closer to 1 than .1.
- Fact: if a factor was assigned the number 0 there is no correlation/relationship between it and the ranking of a page, so if I tested a random and stupid factor like whether the page mentioned “cats” then the number returned from the Spearman analysis should be 0 if it is not a factor (which hopefully its not : ).
This is a decent method for determining the relationship and I would like to credit the SEOMoz team for leading me onto the Spearman coefficient.
But we want to know what causes a site to rank well and not just relationships and correlations.
While the initial testing won’t involve looking at causation, I do hope to look at causation in the future.
Of course the correlation is an excellent guideline and any causal studies would be based on the correlation finding but determining causation is of huge interest to me.
The most likely method would be to get a bunch of webmasters together and run scientific tests on their websites. For example one week we could change the titles on a bunch of the pages to include the keyword they wanted to rank well for and see if it had an impact on ranking.
Again this would be a massive project which would also adhere to scientific guidelines but it’s the correlation study first and then I can look at causation.
You can do is follow me on Twitter or subscribe to the RSS feed. Once you have done that please feel free to interact with me and everyone else in the comments section with your ideas, thoughts and brilliance.
Also if you have any contacts or are yourself great at programming, willing to help out or have worked in a search engine or an algorithmic based environment please contact me as I need all the help I can get.
Author, writer, owner, blogger and I guess scientist.