Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save endurtech/3db12de2af978d1f643b6d62ae4d4bbe to your computer and use it in GitHub Desktop.

Select an option

Save endurtech/3db12de2af978d1f643b6d62ae4d4bbe to your computer and use it in GitHub Desktop.
The facebookexternalhit/1.1 is a user agent used by Facebook to crawl and index web pages for its various services, such as Facebook, Instagram, and WhatsApp. This crawler is responsible for retrieving content, images, and other metadata to improve Facebook's search functionality and provide users with relevant results. However, the facebookexte…
# BLOCK Facebook Crawler
# https://endurtech.com/block-facebook-crawler-facebookexternalhit/
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^facebookexternalhit/1\.1 [NC]
RewriteRule ^ - [F,L]
# BLOCK Facebook Crawler END
@bostiq
Copy link

bostiq commented May 29, 2025

Hi, I was wondering, can it be limited rather than entirely blocked?

@endurtech
Copy link
Author

To the best of my knowledge, there isn't a reliable way to limit facebookexternalhit. In most cases, we're fortunate if it even respects the robots.txt file. However, I've read that some developers have successfully implemented throttle scripts to reduce how frequently Facebook's bot accesses their sites. These scripts monitor the timing between requests and deny access if they occur too rapidly. Read more here: https://stackoverflow.com/questions/11521798/excessive-traffic-from-facebookexternalhit-bot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment