Possible "Speed Hindrance" for SE Spiders:
• URLs with 2+ dynamic parameters; i.e. http://www.url.com/page.php?id=4&CK=34rr&User=%Tom% (spiders may be reluctant to crawl complex URLs like this because they often result in errors with non-human visitors)
• Pages with more than 100 unique links to other pages on the site (spiders may not follow each one)
• Pages immersed more than 3 clicks/links from the home page of a website (unless there are many other external links pointing to the site, spiders will often ignore deep pages)
• Pages requiring a "Session ID" or Cookie to enable navigation (spiders may not be able to retain these elements as a browser user can)
• Pages that are split into "frames" can hinder crawling and cause confusion about which pages to rank in the results.
Possible "Fences" for SE Spiders:
• Pages approachable only through a select form and submit button
• Pages requiring a drop down menu (HTML attribute) to access them
• Documents accessible only via a search box
• Documents blocked purposefully (via a robots meta tag or robots.txt file)
• Pages requiring a login
• Pages that re-direct before showing content (search engines call this draping or bait-and-switch and may actually ban sites that use this tactic)
The key to ensuring that a site's contents are fully crawl able is to provide direct, HTML links to each page you want the search engine spiders to index. Remember that if a page cannot be accessed from the home page (where most spiders are likely to start their crawl) it is likely that it will not be indexed by the search engines. A sitemap (which is discussed later in this guide) can be of tremendous help for this purpose.


No comments:
Post a Comment