How to prevent known bots heavily crawled (Googlebot, Bingbot, Facebook, Amazonbot, GPTBot, etc.)

We’re getting crawled heavily by known bots (Googlebot, Bingbot, Facebook, Amazonbot, GPTBot, etc.) which can slow down wer server, especially on high-traffic or high-volume sites like wers.

Here are several effective strategies to make wer site faster and reduce the load caused by crawlers:

🛡️ 1. Rate-Limit or Throttle Bots via Nginx

Since we use Nginx (via CentminMod), we can rate-limit requests by user agent or IP to slow down bots without blocking them completely.

# In wer server block or conf.d
map $http_user_agent $limit_bots {
    default 0;
    "~*googlebot" 1;
    "~*bingbot" 1;
    "~*meta-externalagent" 1;
    "~*amazonbot" 1;
    "~*gptbot" 1;
}

limit_req_zone $binary_remote_addr zone=botlimit:10m rate=1r/s;

server {
    ...
    if ($limit_bots) {
        limit_req zone=botlimit burst=5 nodelay;
    }
}

🧠 2. Serve Bots a Cached Static Version (Bot Cache)

Instead of blocking bots, serve them static pre-cached files.
– Create a daily cron job to generate static versions for common URLs.
– Use try_files in Nginx to serve static .html to bots:

if ($http_user_agent ~* (googlebot|bingbot|amazonbot|gptbot|meta-externalagent)) {
    set $is_bot 1;
}
if ($is_bot) {
    try_files /static$uri.html $uri $uri/ /index.php?$args;
}

🚫 3. Block or Delay Less Useful Bots

Some bots bring little value. We can block or delay them with .htaccess, Nginx, or CSF/LFD.
– Block in Nginx:

if ($http_user_agent ~* (semrushbot|mj12bot|ahrefsbot|dotbot)) {
    return 403;
}

⚙️ 4. Improve Caching & CPU Handling

Make sure we’re using:
– ✅ NGINX FastCGI Cache or full-page caching
– ✅ Redis Object Cache
– ✅ OPcache enabled
– ✅ W3 Total Cache (configured smartly to avoid massive file counts)

🧮 5. Offload to a CDN

Use Cloudflare with “Cache Everything” for bots.

if ($http_user_agent ~* (googlebot|bingbot|gptbot|meta-externalagent)) {
    add_header Cache-Control "public, max-age=86400, s-maxage=86400";
}

📈 6. Monitor & Analyze with Log Tools

Install goaccess or use csf.deny + tail -f to watch bots in real time and respond accordingly.

⚒ 7. Optimize PHP & Database Queries

If bots still hit uncached pages:
– Ensure fast PHP processing with php-fpm tuning.
– Optimize DB queries — especially on dynamic hashtag pages.
– Reduce plugin load on bot requests.

Tags:

Discover more from Juzhax Technology

Subscribe now to keep reading and get access to the full archive.

Continue reading