‘seo’ Archive

SEO – On-page optimization

You remember my insistence on having the right (keyword) ingredients before attempting to cook your meal? Or my assertion that 20% of your time should be spent on preparing those ingredients? Having learnt about phrases that pay, you will understand why. Optimization effort spent on a poor set of keywords is a fruitless exercise. In this section, you are (finally) ready to get cooking with your keywords.

On-page SEO means all the techniques that you can use on your own site; as distinct from off-page SEO, which is all the techniques you can get other site owners to use that will help your site to rank well. These techniques apply to all the content of your web servers, including your pages and other assets (such as Word documents, PDF files, videos, and more).

The on-page elements include page titles, page metadata, headings, body text, internal links, off-page links, and image alt tags, as well as a host of other assets. I will walk through each item in turn, in the order in which they generally appear in a well-formed HTML page. For each item, I will show you what to do for best effect and consider the overall importance of the item in the mix. I cover asset optimization in a separate section (page 115).

SEO – The supplemental index

For several years now Google has maintained two separate indexes, referred to as the main index and the supplemental index. The number of pages appearing in the supplemental index increased substantially in early 2006. The understanding people have of the supplemental index is still evolving, informed partly by limited explanations of how it works from Google, both official and unofficial. However, there are certain features that have become clear and that you need to be aware of:

  • Pages that trigger spam filters are more likely to be removed from the index altogether than to be placed in the supplemental index; the supplemental index is not a form of penalty that can be lifted through requests to be reincluded.
  • Pages in the supplemental index are less likely to be returned to users undertaking a search and less likely to be ranked well in the search results served to users.
  • Pages are very likely to end up in the supplemental index if they closely match pages elsewhere on your site (so-called duplicate content).
  • Pages may end up in the supplemental index if they closely match pages on other sites that are cited more often than yours (i.e., where you appear to have syndicated – or even plagiarized – content).
  • Pages from very large sites may end up in the supplemental index if insufficient PageRank has been passed to the page (more on this later, page 129).
  • Pages that are hard to crawl (e.g., because they use too many parameters in a URL or are larger than 101k) may end up in the supplemental index.
  • According to Google’s unofficial spokesperson Matt Cutts, the pages held in the supplemental index are parsed differently and held in the form of a “compressed summary,” meaning that “not every single word in every single phrase relationship” has been fully indexed.

Scary, eh? Despite Google’s attempts to placate webmasters – and to insist that ending up in the supplemental index is no tragedy – for some website owners the changes have been calamitous, with many hundreds of their pages disappearing into relative obscurity. Perhaps this is why Google has recently taken steps to hide which pages are in the supplemental index, by removing the supplemental marker from search engine results pages.

You might wonder why Google bothers with a supplemental index at all if it is so controversial. There are two simple reasons: quality and cost effectiveness. Five years ago, the search engine market was all about who had the biggest index (the so-called size wars). Today, with the number of web pages stretching into the billions, the challenge is much more about quality than quantity. Less is, in fact, more. Google has recognized this and is trying to find ways to present an ever leaner and better quality set of results.

Not indexing every single word and phrase relationship in the supplemental index also saves Google money over the long term, as it does not need to purchase as many servers and data centers. It can also manage and limit its carbon footprint and help the environment. The latter goes down well with stakeholders.

There are, in practice, two main SEO steps you should take to counter the effects of the supplemental index. I cover these in greater depth later on, but for now a simple summary should suffice:

  • Ensure that the navigation of your site is even. If the structure of your site is like a tree, with the directories being the branches and the pages the leaves, your tree should look symmetrical, rather than lopsided, if you want the life-giving sap of PageRank to pass evenly down through to each and every page.
  • Try to ensure that at least 10–15% of all the inbound links to your site are “deep links” to important internal content or category pages.
  • Test pages on your site that might be too similar to one another using tools like the Similar Page Checker at www.webconfs.com/similar-page-checker.php. If the page pairs fail the test (or you have a strong suspicion they may already be in the supplemental index), consider rewriting them from scratch and giving them different URLs (without using 301 redirects). This should get you a fresh start.
  • Make sure that the URLs do not contain too many parameters and that the size of each page is less than 100k.

To find out which of your pages are in the supplemental index, the most reliable technique is to compare your total pages indexed to the total pages in the main index (which can still be identified):

  • To get total pages indexed, type into Google: site:www.yourdomain.com
  • To get pages in the main index, type into Google:
    site:www.yourdomain.com -inallurl:www.yourdomain.com
  • Pages in supplemental index = Total pages indexed – Pages in the main index.

By printing off and comparing the two lists, you can work out which pages on your site are in the supplemental index. If you can’t find a single one, well done! If you find loads, don’t panic. Simply follow the instructions above and persevere. If you really struggle to shake off the supplementals, give me a shout on the forum and I will see what I can do to help.

SEO – How Google interrogates the index

The list of documents that contain a word is called a posting list, and looking for documents with more than one word is called “intersecting a posting list.” Our intersected list in the gulf war example above contains documents 9 and 22.

Google’s search engine essentially performs two tasks:

  • Finding the set of pages (from the index) that contain the user’s query somewhere.
  • Ranking the matching pages in order of importance and relevance.

I cover the latter in greater depth in the section on “landing the links” (page 128).

SEO – How Google stores the index

Google makes this process quicker by using hundreds of computers to store its index. When a query is processed, the task of identifying the pages containing the query words is divided among many machines,
speeding up the task immensely. To return to our library analogy, if one person had to search a 70-page index in a book to find one phrase that did not occur in the main alphabetical sequence, it might take up to a minute to locate the text. However, if 70 people each had a page of the index and were working as a team, this task would take a few seconds at most.

SEO – Google stop words

The Google Search box ignores certain common words, such as “where” and “how,” as well as certain single digits and letters. In its official FAQ Google says, “these terms rarely help narrow a search and can slow search results.” Of course, the main reason such words are not indexed is because doing so would massively increase the Google index (at great computing cost and with limited user benefit).

These stop words include (but are not limited to) i, a, about, an, and, are, as, at, be, by, for, from, how, in, is, it, of, on, or, that, the, this, to, was, what, when, where, who, will, with.

However, Google is quite intelligent at recognizing when a stop word is being used in a way that is uncommon. So, for example, a search for “the good, the bad and the ugly” will be read by Google as “good bad ugly.” However, a search for “the who” will not be ignored but will be processed as it is, returning results for the well-known rock band.