A robots.txt file is a list of parts of a website that a web robot should or should not have access to. Website owners can use
robots.txt to control automated page requests by web robots or web crawlers.
By default, the Rigor Content Check obeys
robots.txt when it crawls a site to check for link health:
When this setting is enabled, Rigor Content Checks will not visit any URL that is disallowed by
Web crawlers are not required to obey rules in
robots.txt, so Rigor gives users the option to configure Content Checks to disobey.
Should I disable ‘Obey Robots.txt’?
A site’s robots.txt file could prevent Rigor’s Content Check from crawling an entire site. If the starting URL for the Content Check is disallowed by
robots.txtthis could keep the Content Check from running altogether.
robots.txt is blocking or partially blocking a Content Check you can:
robots.txtand add an allowance to let Rigor crawl the site (Recommended)
User-agent: Rigor Allow: /
- Uncheck ‘Obey Robots.txt’ on the Advanced tab when creating or editing a Content Check to allow Rigor to ignore rules set in the robots.txt file