How to prevent known bots heavily crawled (Googlebot, Bingbot, Facebook, Amazonbot, GPTBot, etc.)

Last updated on April 9th, 2025 at 03:47 am

We’re getting crawled heavily by known bots (Googlebot, Bingbot, Facebook, Amazonbot, GPTBot, etc.) which can slow down wer server, especially on high-traffic or high-volume sites like wers.

Here are several effective strategies to make wer site faster and reduce the load caused by crawlers:

🛡️ 1. Rate-Limit or Throttle Bots via Nginx

Since we use Nginx (via CentminMod), we can rate-limit requests by user agent or IP to slow down bots without blocking them completely.

# In wer server block or conf.d
map $http_user_agent $limit_bots {
    default 0;
    "~*googlebot" 1;
    "~*bingbot" 1;
    "~*meta-externalagent" 1;
    "~*amazonbot" 1;
    "~*gptbot" 1;
}

limit_req_zone $binary_remote_addr zone=botlimit:10m rate=1r/s;

server {
    ...
    if ($limit_bots) {
        limit_req zone=botlimit burst=5 nodelay;
    }
}

🧠 2. Serve Bots a Cached Static Version (Bot Cache)

Instead of blocking bots, serve them static pre-cached files.

  • Create a daily cron job to generate static versions for common URLs.
  • Use try_files in Nginx to serve static .html to bots:
if ($http_user_agent ~* (googlebot|bingbot|amazonbot|gptbot|meta-externalagent)) {
    set $is_bot 1;
}
if ($is_bot) {
    try_files /static$uri.html $uri $uri/ /index.php?$args;
}

🚫 3. Block or Delay Less Useful Bots

Some bots bring little value. We can block or delay them with .htaccess, Nginx, or CSF/LFD.

  • Block in Nginx:
if ($http_user_agent ~* (semrushbot|mj12bot|ahrefsbot|dotbot)) {
    return 403;
}

⚙️ 4. Improve Caching & CPU Handling

Make sure we’re using:

  • ✅ NGINX FastCGI Cache or full-page caching
  • ✅ Redis Object Cache
  • ✅ OPcache enabled
  • ✅ W3 Total Cache (configured smartly to avoid massive file counts)

🧮 5. Offload to a CDN

Use Cloudflare with “Cache Everything” for bots.

if ($http_user_agent ~* (googlebot|bingbot|gptbot|meta-externalagent)) {
    add_header Cache-Control "public, max-age=86400, s-maxage=86400";
}

📈 6. Monitor & Analyze with Log Tools

Install goaccess or use csf.deny + tail -f to watch bots in real time and respond accordingly.

⚒ 7. Optimize PHP & Database Queries

If bots still hit uncached pages:

  • Ensure fast PHP processing with php-fpm tuning.
  • Optimize DB queries — especially on dynamic hashtag pages.
  • Reduce plugin load on bot requests.

Tags:

Discover more from Juzhax Technology

Subscribe now to keep reading and get access to the full archive.

Continue reading