For several years now Google has maintained two separate indexes, referred to as the main index and the supplemental index. The number of pages appearing in the supplemental index increased substantially in early 2006. The understanding people have of the supplemental index is still evolving, informed partly by limited explanations of how it works from Google, both official and unofficial. However, there are certain features that have become clear and that you need to be aware of:
- Pages that trigger spam filters are more likely to be removed from the index altogether than to be placed in the supplemental index; the supplemental index is not a form of penalty that can be lifted through requests to be reincluded.
- Pages in the supplemental index are less likely to be returned to users undertaking a search and less likely to be ranked well in the search results served to users.
- Pages are very likely to end up in the supplemental index if they closely match pages elsewhere on your site (so-called duplicate content).
- Pages may end up in the supplemental index if they closely match pages on other sites that are cited more often than yours (i.e., where you appear to have syndicated – or even plagiarized – content).
- Pages from very large sites may end up in the supplemental index if insufficient PageRank has been passed to the page (more on this later, page 129).
- Pages that are hard to crawl (e.g., because they use too many parameters in a URL or are larger than 101k) may end up in the supplemental index.
- According to Google’s unofficial spokesperson Matt Cutts, the pages held in the supplemental index are parsed differently and held in the form of a “compressed summary,” meaning that “not every single word in every single phrase relationship” has been fully indexed.
Scary, eh? Despite Google’s attempts to placate webmasters – and to insist that ending up in the supplemental index is no tragedy – for some website owners the changes have been calamitous, with many hundreds of their pages disappearing into relative obscurity. Perhaps this is why Google has recently taken steps to hide which pages are in the supplemental index, by removing the supplemental marker from search engine results pages.
You might wonder why Google bothers with a supplemental index at all if it is so controversial. There are two simple reasons: quality and cost effectiveness. Five years ago, the search engine market was all about who had the biggest index (the so-called size wars). Today, with the number of web pages stretching into the billions, the challenge is much more about quality than quantity. Less is, in fact, more. Google has recognized this and is trying to find ways to present an ever leaner and better quality set of results.
Not indexing every single word and phrase relationship in the supplemental index also saves Google money over the long term, as it does not need to purchase as many servers and data centers. It can also manage and limit its carbon footprint and help the environment. The latter goes down well with stakeholders.
There are, in practice, two main SEO steps you should take to counter the effects of the supplemental index. I cover these in greater depth later on, but for now a simple summary should suffice:
- Ensure that the navigation of your site is even. If the structure of your site is like a tree, with the directories being the branches and the pages the leaves, your tree should look symmetrical, rather than lopsided, if you want the life-giving sap of PageRank to pass evenly down through to each and every page.
- Try to ensure that at least 10–15% of all the inbound links to your site are “deep links” to important internal content or category pages.
- Test pages on your site that might be too similar to one another using tools like the Similar Page Checker at www.webconfs.com/similar-page-checker.php. If the page pairs fail the test (or you have a strong suspicion they may already be in the supplemental index), consider rewriting them from scratch and giving them different URLs (without using 301 redirects). This should get you a fresh start.
- Make sure that the URLs do not contain too many parameters and that the size of each page is less than 100k.
To find out which of your pages are in the supplemental index, the most reliable technique is to compare your total pages indexed to the total pages in the main index (which can still be identified):
- To get total pages indexed, type into Google: site:www.yourdomain.com
- To get pages in the main index, type into Google:
site:www.yourdomain.com -inallurl:www.yourdomain.com
- Pages in supplemental index = Total pages indexed – Pages in the main index.
By printing off and comparing the two lists, you can work out which pages on your site are in the supplemental index. If you can’t find a single one, well done! If you find loads, don’t panic. Simply follow the instructions above and persevere. If you really struggle to shake off the supplementals, give me a shout on the forum and I will see what I can do to help.