SEO Company   Home contact us sitemap
Robots.txt

Robots.txt

Search engines frequently visit your site for index your content but often there are many cases when indexing your online content is not what you want. If you have sensitive data on your website regarding your content, files or folders that you do not want the world to see, you can also advise the search engines do not index that page. If you want to save some files like images, css and javascript from indexing, you also need a particular type to tell the spiders to keep away from these items.
To avoid some files and folders from your website, you can tell search engines by using robots metatags. All search engines can’t read the metatags, so the robots metatag can go unnoticed simply. So you can customize search engines about your metatags is to use a robots.txt file.

What Is Robots.txt?

Robots.txt is a text file not html file, by putting robot.txt on your site for some files and folders search robots doesn’t visit that pages. It is important that robots.txt is not a way from preventing search engines from crawling your site and the fact that you put a robots.txt file is something like putting a note “Please, do not come inside” on an unlocked door
The robots.txt location is very important. Place the robot.txt file in the main directory otherwise user agents (search engines) are unable to find the file. Search spiders will not search the whole site for a file named robots.txt. Instead, first they look in the main directory (i.e. http://mydomain.com/robots.txt) and if they don't find it in main directory, they simply think that this site doesn’t have a robots.txt file and then they index everything which they find in the site along there way. So, if you put robots.txt in the right place, then search engines will not index your whole site.

Structure of a Robots.txt File

The robots.txt file structure is very simple which can be an endless process for user agents and disallowed files and directories. Basically, the syntax is as follows:
User-agent:
Disallow:
User-agent” is search engines' crawlers and disallow: lists the files and directories which are not allowed from indexing. In addition to “user-agent:” and “disallow:” entries, you can include comment lines – just put the # sign at the beginning of the line:
# All user agents are disallowed to see the /temp directory.
User-agent: *
Disallow: /temp/

The Traps of a Robots.txt File

When you work on test files you will allow many user agents to access from different directories, from this problem can occur, if we don’t pay attention to the traps of a robots.txt file. Common mistakes can happen with typos and includes directives. Typos which are misspelled user-agents, directories, wrong stuff missing colons after User-agent and Disallow, etc. Typos can be detect to find but in some times validation tools are helpful.
The more problem will occur with logical errors. For instance:
User-agent: *
Disallow: /temp/
User-agent: Googlebot
Disallow: /images/
Disallow: /temp/
Disallow: /cgi-bin/
From the above example we can know that robot.txt allows all the agents except the /temp directory. While search engines starts reading robots.txt, it will see that all user agents are allowed to all folders except /temp/, which is enough for search engines to know, so it will not read the /temp/, /images/ and /cgi-bin/ while booting.

Tools to Generate and Validate a Robots.txt File

The simple syntax of a robots.txt file, you can read it to check if everything is OK but it is easier to use a validator, like this one: http://tool.motoricerca.info/robots-checker.phtml. This tool tells about the common mistakes like missing slashes are colons in the robot-txt file, which if not detected a local user trying to compromise your efforts. For instance, if you have typed:
User agent: *
Disallow: /temp/

There shouldn’t be slash between “user” and “agent” otherwise the syntax will be incorrect.

Top

Google Analytics overview...More
Google Search Based Keyword Tools...More
Google Webmaster Tools Settings...More
Featured Article
Crawl Stats...More
Google Index Stats...More
Content Analysis...More
Title Tag Optimization...More
Sitemaps, Sitemaps Advantages, Google XML Sitemaps...More
Major Commandments of Link Building...More
Google Guidelines for Webmaster...More
Google Base, Google Product Search, Google Base Basics...More
On Page Optimization Techniques...More
Improving website SERP Results...More
Ultimate Search Engine Optimization in 10 Steps...More
Black Hat SEO Techniques...More
Google Webmasters Tools...More
Email this page
Bookmark this page
Print this page
 
Name :
Phone :
Email :
Subject:
 
Content Writing
Email Marketing
Pay Per Click Programs
Search Engine Optimization
Web Analytics
Web Design
Web Development 
SEO Services
Pay Per Click Services
Affiliate Marketing Services