Sunday, May 14, 2017

PolitiFact and robots.txt (updated)

We were surprised earlier this week when our attempt to archive a PolitiFact fact check at the Internet Archive failed.



Saving a page to the Internet Archive has served as one of the standard methods for keeping record of changes at a website. PolitiFact Bias has often used the Internet Archive to document PolitiFact's mischief.

Webmasters have the option of instructing search engines to skip indexing content at a website through use of a "robots.txt" instruction. Historically, the Internet Archive has respected the presence of a robots.txt prohibition.

PolitiFact apparently decided to start using a limiting robots.txt recently. As a result, it's likely that none of the PolitiFact.com archived links will work for a time, either at PolitiFact Bias or elsewhere.

The good news in all of this? The Internet Archive is likely to start ignoring the robots.txt instruction in the very near future. Once that happens, PolitiFact's sketchy Web history will return from the shadows back into the light.

PolitiFact may have had a legitimate reason for the change, but our extension of the benefit of the doubt comes with a big caveat: The PolitiFact webmaster could have created an exception for the Internet Archive in its robots.txt instruction. That oversight creates an embarrassment for PolitiFact, at minimum.


Update May 18, 2017:

This week the Internet Archive Wayback Machine once again functioned properly in saving Web pages at PolitiFact.com. Links at PolitiFactBias.com to archived pages likewise function properly.

We do not know at this point whether PolitiFact created an exception for the Internet Archive (and others), or whether the Internet Archive has already started ignoring robots.txt. PolitiFact has made no announcement regarding any change, so far as we can determine.

No comments:

Post a Comment