The source of the following post is Robert Adler (BOFU2U) of AutomatedTendencies.com. Robert has a longstanding reputation within the internet marketing community. He is well known for thinking outside the box and specialises in controlling search engines for a variety of nefarious ends. In his spare time he makes a habit of dropping nuggets of wisdom out of context in various closed Skype chats as this both delights and confuses the hell out of most. If there was a single moment of enlightenment with regards my understanding of search; it was immediately after reading the following rather sizable nugget and sleepless night that followed as I started to connect the dots.
Robert started by breaking down the base operation of the search engine excluding the storage of information into five core functions:
- Monitors (I will explore these in a later post).
Crawlers can encompass many different aspects of their daily routine. Mainly they’re known for fetching pages and sending them back to the parents for indexing & classification. They’re the grunts. The minutemen. The gophers of Google’s infrastructure. These are the guys that make sure that whatever needs to be obtained, is obtained. Simplified down to a programmatic level, think of it along the lines of telling a dog to fetch a ball, bone, or anything else. The dog fetches the item and then hopefully brings it back to you. They don’t inspect the item, they don’t parse or classify the item itself, they simply grab and return.
Once the crawler has collected the big mess of unstructured data that is your website / blog post it needs to turn it into a format that is easier for the classifier to understand. When in school, teachers would often suggest/recommend that before writing a paper that you should write an outline. This would help expand on the topics and structure the deliverable a lot better. This is moreso the opposite, with the paper being the first step then slimming it down to an outline. However, as these guys are not actually doing classification and the like, they’re moreso creating a more readable format to speed up the rest of the workflow.
Classifiers determine what a page is really about. Not only that, but whether or not a link is “related” or “relevant” to a page it links to. This isn’t just for the page itself either, but also on a website level, category level, even a tag level instead of just the page of content itself (more on this later).
Judges, as I like to call them, are basically the level after the classifiers that take the classification and spread the “juice” or “scores” according to the information that it was given in the previous steps. This is what determines whether or not a page should receive credit for something, or what score it should receive from the links pointing inwards to it. How old are the links pointing in? How old is the page so the credit is properly passed from it to the children it’s pointing to. This is all handled on the judge level.
Connecting the Dots
Have you ever looked at Majestic Topical Trust Flow and noticed that their topics are an exact match for DMOZ’s open directory category tree. Have you ever wondered why?
If you look at the top ten results for 50-100 keywords in a particular niche, have you ever noticed a pattern in the topical trust flow?
E.G. The majority of top 10 results for TEFL keywords are ARTS/EDUCATION & REFERENCE/EDUCATION or that fashion niches are dominated by SHOPPING/CLOTHING?
The longer you stare at the data, it becomes very clear that tools such as Majestic are giving you a window into the effects of the classification/judge parts of Google’s scoring workflow (topical page rank). This helps the search engine establish if a keyword is a good match for a specific search query. This also allows the search engine to get a sense or whether a link graph is relevant well below the first and second tier of links by giving page rank a flavour. Before topic sensitive page rank Google was limited by what it would see below the first tier due to CPU run time. While the search engine still cannot see past tier one, it can now taste the eau de spam of a low quality link profile.
Topical page rank / topical trust flow passes through links like water through pipes as with regular page rank. A pure topical signal gives the search engine the impression that a site is an authority in its niche. So what’s the key take away? Simple; figure out which classification the search engine considers to be a good match for the keywords you are going after and aim for a robust onsite strategy backed up with links that are topically and thematically relevant ensuring your site fits the appropriate classification.