beggers
Jul 1 2003, 02:24 AM
I'm not sure if you are familiar with this new company called TurnItIn.com, but they spider sites and then compare the content so they can report copyright infringement to their clients. It's supposed to be a copyright enforcement company.
What it really is is a massive bandwidth sucker. When they spider a site they hammer it to death. They keep rescanning the same site, burning up hundreds of gigabytes of bandwidth. I checked my logs once and calculated that they must have spidered this small site 50 times to burn up that much bandwidth.
I'd like to ban them at the server level. How do I do it? Thanks.
Bryan
4PSA
Jul 1 2003, 04:39 AM
If the spiders come from the turnitin.com you can block them at the server level with a firewall.
beggers
Jul 1 2003, 05:45 PM
Just out of curiousity, would it be possible to block them with a simple .htaccess file? Firewalls are a bit over my head at this early stage. Thanks.
Bryan
Gromit
Jul 1 2003, 06:04 PM
An htaccess should work fine, for just one site or page... it would look something like this:
AuthName "blah"
AuthType Basic
Satisfy Any
order deny,allow
deny from cr1.turnitin.com
allow from all
There's a good howto here:
http://javascriptkit.com/howto/htaccess.shtml
Hope it helps...
Gromit
4PSA
Jul 2 2003, 03:17 AM
Of course you can limit them with htaccess, but if you have more than one site on a server you have to set the rule for every domain.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please
click here.