Whitelisting the robots.txt file is a simple thing to do and easily gets rid of hundreds of well behaved polite crawlers that actually read and obey the robots.txt directives.

Blacklist vs. Whitelist:
One Works, One Doesn’t.

Most people blacklist robots.txt, which means you have to list every bot you don’t want on the site. Blacklisting is futile because there are literally thousands of them and dozens, if not hundreds, new bots you don’t know about come online daily. Therefore, the only logical way to control your content distribution is to whitelist unless you actually like spending the day chasing down every new bot name that appears in your log files.

Let’s make it simple:

  • WHITELISTING: Only allowed bots crawl the site
  • BLACKLISTING: Everything not listed crawls the site

Sample Robots.txt File:

Construct a robots.txt file as follows:

# allowed bots
User-agent: bingbot
User-agent: google
Disallow:

# tell everyone else to go away
User-agent: *
Disallow: /

Make sure you include everything you need to crawl your site in the allowed list as it will go away. It won’t go away mad and never return, it will simply go away until it decides to check your robots.txt file again.  Some bots denied access to robots.txt go bat crap crazy asking for it up to 20+ times in a single day, but many do honor the directives to not crawl the site.

Robots.txt Tells Spiders To Go Away Nicely

While robots.txt is just a nice way of asking bots not to crawl your site it has no teeth and they can, and sometimes do, completely ignore robots.txt. Some people say why bother with robots.txt since it can’t be enforced (not true, just a myth spread by the less technical webmasters) but it’s the proper way to handle well behaved bots. However, some bad bots come and read robots.txt to see which bots you allow and they will assume those bot names to attempt to gain access. Basically, if you see a bot misbehaving you cannot judge it on the user agent name alone, but the IP address to make sure it’s really who it claims but that’s getting a bit advanced for this brief article.

How To Enforce Robots.txt

To enforce the robots.txt file there are some surprisingly simple methods you can use such as the .htaccess file which we’ll discuss in a future post. Another option is installing a PHP robots.txt class library normally used by web crawlers and use it in reverse to validate access by the user agent requesting access. Both advanced robots.txt enforcement topics will be discussed in detail in future posts so stay tuned!watch full Mission: Impossible – Rogue Nation 2015 movie online

Content Control Part 2: Coming Soon

I was sitting in my living room this evening watching the big screen TV, which is in every room, and reading the tablet, not a book, when it hit me like a ton of bricks. The prophetic Ray Bradbury book Fahrenheit 451 was rapidly coming true. Not to mention the surveillance camera of George Orwell’s 1984 was silently blinking at me from my Play Station 3 across the room, plus the camera on my netbook, tablet, cell phone, etc. but I digress from the topic of censorship.

Just remember that Bradbury envisioned, or predicted if you will, that the government or corporations would be telling us what they wanted us to know on big glowing screens on our walls and thanks to HDTV that’s now a reality. We’ve welcomed it into our homes and with prices dropping and a wave of old first generation hand-me-down HDTVs they are now in the living rooms of everyone and messages beamed right into the suggestive minds of millions by the minute.

While reading, on the other hand, is such old news that the real news organizations aren’t surviving the switch to the internet and HDTV and newspapers are folding all over the country. There are other factors involved, including Walmart driving out small papers that used to thrive in small communities, but the end result is paper is out, digital is in, and books are on the endangered species list.

The government didn’t start this technlogical transition from print to online, but they could ultimately benefit in ways we won’t necessarily like but Ray’s book covers that. Go read it while you can still find it and Kindle doesn’t recall it.

Ray Bradbury’s tale of the future where books are burned by the Fire Dept. isn’t exactly what’s happening, but books are ironically all being burned into a Kindle Fire. The government isn’t confiscating or burning books but the public is freely giving up the old printed material in favor of lighter more resource friendly digital media. That’s the rub as there is no hard copy for all this digital media which is easily filtered or erased.

The government didn’t have to take our books away from us like they did in Bradbury’s cautionary tale, we’ve given them up for their digital equivalents. However, that doesn’t mean that the ‘book burning’ censorship of Bradbury’s tome won’t come to pass because blocking content on the internet is as easy as clicking the DELETE key or applying a filter on the backbone of the internet to block it from downloading.

Kindle Fire has already recalled books so we know they have the technology and in this case it was applied to correct an error.

What happens the day the recall is used to censor the first book, or more likely, an opposing political candidate?

Nobody raised hell about it and insisted that the ability was removed like they did about privacy and tracking cookies on the internet. Apparently privacy is important but the ability to censor us at the click of a button or flip of a switch is OK. The lack of action and continued purchase of these devices by the public without this correction is, in my opinion, appalling.

Just ask China how easy it is to keep undesirable material away from their countrymen as they already have applied massive censorship firewalls on the web to ensure only authorized content is read by all the residents. Dissidents that think otherwise are dealt with harshly.

Will that kind of Fahrenheit 451 censorship filter down to the United States?

It already has but it’s candy coated so-called journalism biased to bend your opinion while your books. which could be written by just anyone with any opinion or agenda <gasp!>, are being silently carried away to landfills.

Just beware that the devices you buy to read this news may not allow you to read it in the near future and can already recall what you’re reading today.

Also, when will the government start spying on you through all those cameras in every device in your house under the guise of security? But I digress again from the topic of censorship. 🙂

Yes, I drifted a little and I’m kind of all over the map but it’s late and this is a RANT!Watch movie online The Transporter Refueled (2015)

Enjoy.