Web robots, spiders, and crawlers they all mean the same thing. These are automated scripts which browse the internet in a systematic and automated manner. Web spiders also permit marketing companies to conduct researchers to study trends, competitive analysis etc.
Though a broad range of reliable sites uses web spiders and crawlers as a source to provide updated data, there are certain sites that even now block spiders and crawlers.
Sensitive Corporate Information
Online companies desire to reach out to their target consumers even without revealing sensitive or competing data to their competitors.
If spiders and crawlers from competitor's system are found, the site can prevent this incoming traffic or present false information.
Copyright Violation and Legal Issues
Data discovered in certain sites can be seen only by a restricted public for legal purposes.
Keeping that in mind, these sites place limitations that define when the information can be obtained, how frequently it can be accessed, from where it can be accessed and more. It is not advised that these sites are obtained in an automatic mode as it can infringe their TOS (terms of service). Also, it's important to know that spiders and crawlers belong to automated systems.
Certain sites are common victims of abuse.
For instance,web-based email systems. To restrict their users from getting spam emails, these websites block spiders and crawlers. Besides, they also take steps to restrict automatic access, like executing text message confirmation, CAPTCHAs and more, to substantiate that an individual is trying to access the site.