понеделник, 13 януари 2014 г.

How to limit bad bandwidth on your site

I think I'm a star for the Chinese bots. Not this blog, but my site where I pay for hosting, I get thousands and thousand of hits, which suck my bandwidth and make me communicate with the hosting sys-admins way too often. So, here is what I found so far in how to protect my precious bandwidth.
1. No hotlinks
I never thought of this, but it turns out this can be a significant drain of your bandwidth and CPU time. Basically, somebody links to your images and use your hosting to display them, thus stealing both your images and costing you money. So how to stop it? I found the answer here and here:
You need to edit your .htaccess to add: 
RewriteEngine on
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?your-site.com [NC]
RewriteRule \.(jpg|jpeg|png|gif)$ http://i.imgur.com/g7ptdBB.png [NC,R,L]
Where you need to  change your-site.com with your domain.
What this code does is that it redirects all viewers but those from your site to the link on the row number 4. This way, people won't be able to hotlink to your images.

2. Preventing bad bots to crawl your site.
Yes, there are bad bots. In my case, I have some unidentified bot who steals 300MB of traffic per day from my site. A very good tutorial what to do in this case can be found on this post: Protect your site from spam bots which I will summarize below, so that I don't forget what I did.
The steps are as follow:
2.1. Add to your .htaccess the content of this file. It is a list of the known bad bots.
2.2. Create 403.php in your top level/home directory and paste this. This script will log forbidden requests.
2.3. Create forbidden.html  in your top level/home directory and use the content of this file.
2.4. Create a trap directory and name it something interesting to humans, we'll call it /your_trap_directory. 
2.5. Create robots.txt and put in it:
User-agent: *
Disallow: /your_trap_directory
 2.6. In your_trap_directory place the following index.php.
2.7. Make sure both index.php and forbiddent.html have the 644 permissions.
2.8. Put some links to your_trap_directory on your site, so that you can lure bots. You can use something like:
<a href="http://your_site/your_trap_directory/" style="display: none;">check this hot offer</a>
in the html of your site or put it as HTML plugin on your blog.
VERY IMPORTANT! DO NOT VISIT THIS DIRECTORY from your site!
Because you will be added to the ban list and you'll need to go to cpanel, open .htaccess and remove your ip from it. This links are not meant to be read by humans, but only from bots.
I recommend you to visit the source site: Protect your site from spam bots to find a way to check whether what you did is working.
3. Check your AWstats for suspicious ip-s and then ban then from IP Deny Manager or directly in your .htaccess. I know some people ban entire countries, but I find this for too restrictive. We'll see if this will help my problem.
4. Ok, I did some additional research (like source1 and source2), I decided to use :
RewriteEngine on
Options -Indexes
    RewriteEngine on 
RewriteCond %{HTTP_REFERRER} !=http(s)?://(www\.)?mysite.com/index.php [NC] 
RewriteCond %{HTTP_REFERRER} !=http(s)?://(www\.)?mysite.com/*.php [NC] 
RewriteRule ^comment\.php$    -                                   [F]

So far that's what I did. I'll add stuff when I change something else. But for the moment, the bad traffic and cpu use seems under control.