publication . Other literature type . Conference object . 2018

Free-Range Spiderbots!

Boruta, Luc;
Open Access English
  • Published: 09 Oct 2018
  • Publisher: Zenodo
Abstract
<strong>Free-range what!?</strong> The robots exclusion standard, a.k.a. robots.txt, is used to give instructions as to which resources of a website can be scanned and crawled by bots.<br> Invalid or overzealous robots.txt files can lead to a loss of important data, breaking archives, search engines, and any app that links or remixes scholarly data. <strong>Why should I care?</strong> You care about open access, don’t you? This is about open access for bots, which fosters open access for humans. <strong>Mind your manners</strong> The standard is purely advisory, it relies on the politeness of the bots. Disallowing access to a page doesn’t protect it: if it is re...
Subjects
free text keywords: crawling, robots.txt, digital preservation
Download fromView all 3 versions
Zenodo
Other literature type . 2018
Provider: Datacite
Zenodo
Other literature type . 2018
Provider: Datacite
ZENODO
Conference object . 2018
Provider: ZENODO
Powered by OpenAIRE Research Graph
Any information missing or wrong?Report an Issue