Website Creations Banner

 
Robots & Spider Crawls-
Search engines like Google develop software programs that are designed to “crawl” the millions of websites and web pages that comprise the internet. These programs are known as “spiders” or “crawlers.” Since we often refer the the internet as the web, the term spider seemed a good fit to describe programs that  crawl through it's millions of pages. 


How Spiders Crawl:

When a new website is created, the webmaster submits the website address to Google and other search engines like Yahoo! and MSN. The website is added to a list of new sites that crawlers or robots will visit. It typically takes several weeks for the spider to pay its initial visit to a newly created website. When the spider reaches the website, it automatically navigates through the site, for keywords and tagging like meta tags, navigating the inbound and outbound links and the various components of the site. As a spider or robot visits and “crawls” through the website, the software is essentially forming a snapshot of the website and all its individual web pages at that particular point in time. That snapshot or “memory” of the website and its individual web pages is cached or “filed.”


Create a Robot Txt. File:
Creating a robots.txt file will not improve your search engine positioning, but it does provide robots with information concerning which files you will not allow to be crawled and indexed in the search engines. When a robot crawls your site it looks for the robots.txt file. If it doesn't find one it assumes automatically that it may crawl and index the entire site. Not having a robots.txt file can also create unnecessary 404 errors in your server logs, making it more difficult to track "real" 404 errors. Assuming you want your entire site indexed and only want to stop the unnecessary 404 errors from occurring you should upload a simple robots.txt file to the root directory of your domain. The simplest robots.txt file uses two rules:
* User-agent: (which is the robot the following rule applies to)
* Disallow: (the URL you want to block)

This code allows all robots to crawl all files:
User-agent: *
Disallow:

Add a Robot Txt. File:
Simply create a text document and save the new document as "robots.txt".  Do not use a html editor to create the file unless is has the ability to create a plain text document (ASCII). Most computers will allow you to create a text document using notepad.
  • Right click on your desktop
  • Choose new
  • Choose text document
  • Open the document you just created
  • Insert instructions to robots
  • Click on save as
  • Save document as robots.txt
  • Then simply upload the file to your website. Since I use Frontpage Extensions, after I have made the file I click on "save as" and save the robot.txt file directly into my website folder, then the next time I publish the folder is published online.
    Great Websites: Some of these pages list great links and ideas found online, while others are some of Website Creations personal favorites or websites we have created:
    Website Creations was developed to empower  online users in making quality choices on their personal and business websites.  Website Creations offers creative solutions to design,  enhance and improve your website. Website Creations
    11300 N Fuego Drive
    Dunnellon, Fl 34434
    Phone: 352 533-3210
    Email Us
    Privacy Disclaimer       Advertise with Us        Contact Us