
|
Robots & Spider Crawls-
Search engines like Google develop software programs that are designed to “crawl” the millions of websites and web pages that comprise the internet. These programs are known as “spiders” or “crawlers.” Since we often refer the the internet as the web, the term spider seemed a good fit to describe programs that crawl through it's millions of pages. How Spiders Crawl: When a new website is created, the webmaster submits the website address to Google and other search engines like Yahoo! and MSN. The website is added to a list of new sites that crawlers or robots will visit. It typically takes several weeks for the spider to pay its initial visit to a newly created website. When the spider reaches the website, it automatically navigates through the site, for keywords and tagging like meta tags, navigating the inbound and outbound links and the various components of the site. As a spider or robot visits and “crawls” through the website, the software is essentially forming a snapshot of the website and all its individual web pages at that particular point in time. That snapshot or “memory” of the website and its individual web pages is cached or “filed.” Create a Robot Txt. File: Creating a robots.txt file will not improve your search engine positioning, but it does provide robots with information concerning which files you will not allow to be crawled and indexed in the search engines. When a robot crawls your site it looks for the robots.txt file. If it doesn't find one it assumes automatically that it may crawl and index the entire site. Not having a robots.txt file can also create unnecessary 404 errors in your server logs, making it more difficult to track "real" 404 errors. Assuming you want your entire site indexed and only want to stop the unnecessary 404 errors from occurring you should upload a simple robots.txt file to the root directory of your domain. The simplest robots.txt file uses two rules: * User-agent: (which is the robot the following rule applies to) * Disallow: (the URL you want to block) This code allows all robots to crawl all files: User-agent: * Disallow: Add a Robot Txt. File: Simply create a text document and save the new document as "robots.txt". Do not use a html editor to create the file unless is has the ability to create a plain text document (ASCII). Most computers will allow you to create a text document using notepad. |
Great Websites: Some of these pages list great links and
ideas found online, while others are some of Website Creations personal
favorites or websites we have created:
Website Creations was developed to empower online users in making
quality choices on their personal and business websites. Website Creations
offers creative solutions to design, enhance and improve your website.
Website Creations
11300 N Fuego Drive Dunnellon, Fl 34434 Phone: 352 533-3210 Email Us |
| Privacy Disclaimer Advertise with Us Contact Us |