Справка / Robots.txt exclusion protocol

How to exclude your website or certain pages from indexing using Robots.txt

 

Banning to index the whole website

In order to exclude your website from any of the search engines and to ban all robots to crawl your website up ahead, you need create robots.txt file in the root directory of your website server with the following content:

User-agent: *
Disallow: /

In order to ban just Quintura robot to crawl your website, you need create robots.txt file in the root directory of your website server with the following content:


User-agent: Quintura-Crw
Disallow: /

 

Banning to index certain parts of your website

In order to exclude folders and certain individual pages from indexing, you need create robots.txt file in the root directory of your website server. The robots.txt file is organized in accordance with the Robots exclusion standard. In creating robots.txt file, follow the certain rules. The Quintura crawler is following the restrictions of the robots.txt file, where "User-agent" parameter equals to “Quintura-Crw”. In case there's no such records, it is following the restrictions, where "User-agent" equals "*". And then, in case of no records with "*" parameter, it is following the restrictions, where "User-agent" equals “Googlebot”.

In order to exclude from index all the pages from certain folder (ex., “limurs”), please, add the following record to your robots.txt file:

User-agent: Quintura-Crw
Disallow: /limurs/

If you would add the following restriction:

User-agent: Quintura-Crw
Disallow: /limurs

Then, all sections starting with “limurs” would be excluded, ex. /limurs1, /limurs2 etc.

Here is the example of the more complex robots.txt file:

User-Agent: Quintura-Crw
Crawl-Delay: 5
Disallow: /users/
Disallow: /forum.php
Disallow: /login.php?action=login

With such robots.txt parameters Quintura crawler would index your website, omitting the following sections: users, forum and login.php?action=login, crawling one page every 5 seconds. Here, the pages of login.php would not be indexed only if the link would contain 'action=login' parameter (other parameters doesn't matter).

 

Sitemap

Sitemap if the instrument to indicate which pages of the website the crawler should index. In this case the robot would not scan your whole website, but would address just those of your pages, which are listed in the sitemap file.

For example:

User-Agent: Quintura-Crw
Crawl-Delay: 5
Disallow: /users
Disallow: /forum.php
Disallow: /login.php?action=login

Sitemap: /products.xml
Sitemap: /services.txt

In this case the Quintura crawler would ignore all of the Disallow rules and would index the website just following the rules written in the two files /products.xml and /services.txt. For details on sitemaps, please refer to the sitemap standard.

Note:
If the sitemap file if not available or not compatible with the standard, the Quintura crawler would index your website in the scan mode.

 

Metatags

The other standard, more convenient in use with webpages, implies the use of the HTML metatag <META> on your pages, which disallows robots to index the page. The description of this standard.

In order to ban the robots to index your website page, add to the <HEAD> section of the page the following metatag:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

In order to allow the robots to index your website page, but disallow them to follow the external links, use the following metatag:


<META NAME="ROBOTS" CONTENT="NOFOLLOW">

 

HTML-tag <noindex>

The Quintura crawler supports the noindex tag, which disallows the indexing certain (auxiliary) parts of the text. Place the open <noindex> tag at the beginning of such a part, and </noindex> — at its end, and Quintura would not index such a part.

Example:

... text ... <noindex>this text shall not be indexed</noindex> ... text ...