Several data types can serve as source data for indexing a website. You can choose any or a combination of sources described below.
For example. indexing a whole site by downloading its HTML pages and updating the index by using its RSS feed is a usual practice.
We recommend that you look through the section below to pick up most acceptable combinations and learn pros and cons of these data sources.
Your website URL is in the Data Source box.
This is a most simple data source usable for indexing. Quintura downloader first reads your robots.txt, if any, then considering the restrictions from robots.txt it scans the starting page for links, and then newly found pages for further links and so on. The downloader never downloads the folders leveled upper than the folder specified in the Data Source box, nor does it download alien pages.

After the found pages are downloaded, Quintura indexer downloads their textual data and compiles an index used for searching. Currently we restrict the number of downloaded pages to 5,000. Quintura can withdraw the restriction at your written request sent to support@quintura.com.
If your sitemap is registered in your robots.txt (for more info on sitemaps, see below), Quintura downloader will refer to it and start downloading your site by using the sitemap. Other restrictions from robots.txt are also considered.
To enhance the cloud, special meta tags are used on web pages. Most useful are meta tags unique for each page or a set or pages from the same section. Your users will get a chance to find just what you want them to find for a given search request. Meta tags get higher priority against common words from texts. More details…
The virtue is simplicity for the users of this data source: enter the URL of their site and wait till the cloud is made. However this data source contains a handful of disadvantages.
The Data Source box contains a site or site section URL, a sitemap or several sitemaps being mentioned in robots.txt.
Sitemaps are used by a majority of search engines. A sitemap is an XML file containing information of pages that are important for a given site and will be interesting for searching. In this case, Quintura crawler also downloads HTML pages but filters these according to the sitemap.
You can make crawlers aware of your sitemap by mentioning it in your …/robots.txt file.
If your site is devoid of a sitemap you can create one by following the instructions at http://www.sitemaps.org/protocol.php.
Your RSS feed is in the Data Source box.
An RSS feed regularly contains only the latest content added to your site. The information is contained in a compact XML file and is structurally subdivided into URLs, titles, and annotations. The format is most suitable for updating your site index (and hence, your cloud). RSS feeds on blogs can also be used for compiling initial clouds. Some blog platforms support adding almost unlimited number of entries for RSS feeds. For example, the max-results parameter for yoursite.blogspot.com blogs: http://yoursite.blogspot.com/feeds/posts/default?max-results=1000.
RSS annotations serving as a source of textual information usually contain only an abstract and do not contain the details. Therefore a lot of words do not that could get into index from the full article are missing in your cloud. Here is a tip how you can remedy the problem: Add your full texts as a <content> tag to your RSS or by creating QXML for your site.
In addition to the standard RSS protocol we support several most popular RSS and ATOM variants. And though Quintura RSS downloader allows some non-standard tag names and deviation in their positioning, a substantial deviation from the standard protocol may result in processing troubles. This is why we recommend that you stick to the standard.
You can also use QXML, a proprietary Quintura standard, instead of RSS.
QXML is XML compatible with Quintura. You can convert your site content to QXML to build the most high-quality cloud.
Files of this format can be used both for initial indexing and for index updating. More details…