Posts Tagged ‘seo’

What is URL Rewriting? Why Rewrite URLs?

What is URL Rewriting?

URL Rewriting is a server-side technique for mapping URL requests to request handlers.

Typically there is a direct mapping between request URL and the handler for that request. All requests that end in .php will be handled by a PHP script with the given name. Similarly, request paths that end in .html will typically be handled by a static file handler. The mapping between URL and handler is typically static, and depends solely on the extension of the URL Request.

URL Rewriting allows administrators to more flexibly map between the incoming requests and the actual resource that handles the request on the server. For example, using URL Rewriting, requests that have a .html extension could be served by ASP.NET, or requests that have no extension could be served by a PHP script.

Most URL Rewriters match the incoming URL against a set of patterns, and rewrite the URL according to which patterns match. The language used in the most powerful and flexible rewriters to describe the patterns is known as Regular Expressions. Some rewriters also allow rewriting based on other factors, including the request headers, Server variables, and even the state of the server filesystem.

Many URL Rewriters can also redirect requests. This has led to confusion in the terms Rewrite and Redirect. To learn the difference, see Redirecting versus Rewriting. IIRF can perform URL redirects as well as URL rewrites.

Why Rewrite URLs?

There are many of reasons to rewrite URLs:

1. Search Engine Optimization (SEO)
SEO is a broad topic, but the main goal is to assist search engines in finding content on a web site. One aspect of that is optimizing the URLs themselves.

2. Making user-friendly URLs.
Similar in effect to SEO, this allows the use of friendly public URLs where they are observed by users in links and browser bars. Elements within URLs that are meaningful only to server-side technology, including the extension of the server-side script or web app platform, can be obscured from the public.

3. Faking people out.
In some cases the web administrator would like to conceal the server-side technology that is being used. URL Rewriting allows, for example, a public URL that ends in .jsp to be handled by a .php script.

4. Routing requests.
You can force certain requests to use a secure connection (https), or a particular server.

5. Server-side technology migrations.
When migrating from one technology to another in stages, URL rewriting can be used to keep the URL space stable while things change on the server back-end. URL Rewriting can also be used to support migration of “old” or stale URLs to the new URL namespace, when those changes occur.

6. Injecting custom processing.
In some cases, a server administrator may wish to inject new, additional, server-side processing for well-known existing URLs. One example here is inserting special image handling logic behind a .jpg URL. You may wish to block access to image URLs from outside referrers, to limit bandwidth leaching.

7. Filtering URL requests.
An administrator may want to restrict access to certain URLs based on the referer, the requesting IP address, and so on.
You can imagine lots of other reasons, too.

IIRF lets you do any of these things.

From IIRF an ISAPI Filter solutions fro IIS

Bookmark and Share

SEO – On-page optimization

You remember my insistence on having the right (keyword) ingredients before attempting to cook your meal? Or my assertion that 20% of your time should be spent on preparing those ingredients? Having learnt about phrases that pay, you will understand why. Optimization effort spent on a poor set of keywords is a fruitless exercise. In this section, you are (finally) ready to get cooking with your keywords.

On-page SEO means all the techniques that you can use on your own site; as distinct from off-page SEO, which is all the techniques you can get other site owners to use that will help your site to rank well. These techniques apply to all the content of your web servers, including your pages and other assets (such as Word documents, PDF files, videos, and more).

The on-page elements include page titles, page metadata, headings, body text, internal links, off-page links, and image alt tags, as well as a host of other assets. I will walk through each item in turn, in the order in which they generally appear in a well-formed HTML page. For each item, I will show you what to do for best effect and consider the overall importance of the item in the mix. I cover asset optimization in a separate section (page 115).

Bookmark and Share

SEO – The supplemental index

For several years now Google has maintained two separate indexes, referred to as the main index and the supplemental index. The number of pages appearing in the supplemental index increased substantially in early 2006. The understanding people have of the supplemental index is still evolving, informed partly by limited explanations of how it works from Google, both official and unofficial. However, there are certain features that have become clear and that you need to be aware of:

  • Pages that trigger spam filters are more likely to be removed from the index altogether than to be placed in the supplemental index; the supplemental index is not a form of penalty that can be lifted through requests to be reincluded.
  • Pages in the supplemental index are less likely to be returned to users undertaking a search and less likely to be ranked well in the search results served to users.
  • Pages are very likely to end up in the supplemental index if they closely match pages elsewhere on your site (so-called duplicate content).
  • Pages may end up in the supplemental index if they closely match pages on other sites that are cited more often than yours (i.e., where you appear to have syndicated – or even plagiarized – content).
  • Pages from very large sites may end up in the supplemental index if insufficient PageRank has been passed to the page (more on this later, page 129).
  • Pages that are hard to crawl (e.g., because they use too many parameters in a URL or are larger than 101k) may end up in the supplemental index.
  • According to Google’s unofficial spokesperson Matt Cutts, the pages held in the supplemental index are parsed differently and held in the form of a “compressed summary,” meaning that “not every single word in every single phrase relationship” has been fully indexed.

Scary, eh? Despite Google’s attempts to placate webmasters – and to insist that ending up in the supplemental index is no tragedy – for some website owners the changes have been calamitous, with many hundreds of their pages disappearing into relative obscurity. Perhaps this is why Google has recently taken steps to hide which pages are in the supplemental index, by removing the supplemental marker from search engine results pages.

You might wonder why Google bothers with a supplemental index at all if it is so controversial. There are two simple reasons: quality and cost effectiveness. Five years ago, the search engine market was all about who had the biggest index (the so-called size wars). Today, with the number of web pages stretching into the billions, the challenge is much more about quality than quantity. Less is, in fact, more. Google has recognized this and is trying to find ways to present an ever leaner and better quality set of results.

Not indexing every single word and phrase relationship in the supplemental index also saves Google money over the long term, as it does not need to purchase as many servers and data centers. It can also manage and limit its carbon footprint and help the environment. The latter goes down well with stakeholders.

There are, in practice, two main SEO steps you should take to counter the effects of the supplemental index. I cover these in greater depth later on, but for now a simple summary should suffice:

  • Ensure that the navigation of your site is even. If the structure of your site is like a tree, with the directories being the branches and the pages the leaves, your tree should look symmetrical, rather than lopsided, if you want the life-giving sap of PageRank to pass evenly down through to each and every page.
  • Try to ensure that at least 10–15% of all the inbound links to your site are “deep links” to important internal content or category pages.
  • Test pages on your site that might be too similar to one another using tools like the Similar Page Checker at www.webconfs.com/similar-page-checker.php. If the page pairs fail the test (or you have a strong suspicion they may already be in the supplemental index), consider rewriting them from scratch and giving them different URLs (without using 301 redirects). This should get you a fresh start.
  • Make sure that the URLs do not contain too many parameters and that the size of each page is less than 100k.

To find out which of your pages are in the supplemental index, the most reliable technique is to compare your total pages indexed to the total pages in the main index (which can still be identified):

  • To get total pages indexed, type into Google: site:www.yourdomain.com
  • To get pages in the main index, type into Google:
    site:www.yourdomain.com -inallurl:www.yourdomain.com
  • Pages in supplemental index = Total pages indexed – Pages in the main index.

By printing off and comparing the two lists, you can work out which pages on your site are in the supplemental index. If you can’t find a single one, well done! If you find loads, don’t panic. Simply follow the instructions above and persevere. If you really struggle to shake off the supplementals, give me a shout on the forum and I will see what I can do to help.

Bookmark and Share

SEO – How Google interrogates the index

The list of documents that contain a word is called a posting list, and looking for documents with more than one word is called “intersecting a posting list.” Our intersected list in the gulf war example above contains documents 9 and 22.

Google’s search engine essentially performs two tasks:

  • Finding the set of pages (from the index) that contain the user’s query somewhere.
  • Ranking the matching pages in order of importance and relevance.

I cover the latter in greater depth in the section on “landing the links” (page 128).

Bookmark and Share

SEO – How Google stores the index

Google makes this process quicker by using hundreds of computers to store its index. When a query is processed, the task of identifying the pages containing the query words is divided among many machines,
speeding up the task immensely. To return to our library analogy, if one person had to search a 70-page index in a book to find one phrase that did not occur in the main alphabetical sequence, it might take up to a minute to locate the text. However, if 70 people each had a page of the index and were working as a team, this task would take a few seconds at most.

Bookmark and Share