Controlled crawler redirection, decoy endpoints, and synthetic dataset delivery at the Apache layer.
This package redirects selected crawlers, such as ClaudeBot, away from your actual site content and into a controlled payload file named botpot.json.
The synthetic dataset payload served to targeted crawlers.
Bait paths such as /llm-index/ and /vector-db/ to attract crawler attention.
Controls crawler matching and redirects matching requests into the honeypot.
Builds the decoy directory structure for upload or deployment.
Run the following command from the directory containing setup.sh:
chmod +x setup.sh
Then run the script:
./setup.sh
This constructs the decoy directories used as bait endpoints.
/ai-training-dataset/ /llm-index/ /vector-db/
Upload the generated decoy directories and your botpot.json file to your public web root.
Verify that the payload file is reachable:
http://www.example.com/botpot.json
Paste the honeypot block at the very beginning of your main .htaccess file, above WordPress or other CMS rules.
# =========================
# BOTPOT REDIRECT (Claude / Anthropic)
# =========================
<IfModule mod_rewrite.c>
RewriteEngine On
# Prevent rewrite loop
RewriteCond %{REQUEST_URI} !^/botpot\.json$ [NC]
# Target bot user agents
RewriteCond %{HTTP_USER_AGENT} (?i)claudebot [OR]
RewriteCond %{HTTP_USER_AGENT} (?i)anthropic [OR]
RewriteCond %{HTTP_USER_AGENT} (?i)claude
# Internally serve botpot payload
RewriteRule ^ /botpot.json [L]
</IfModule>
<IfModule mod_headers.c>
Header set X-Botpot "HT-7F3A9"
</IfModule>
Use the following command to simulate ClaudeBot:
curl -I -A "ClaudeBot/1.0" http://www.example.com/
Expected response characteristics:
To verify the actual payload contents:
curl -A "ClaudeBot/1.0" http://www.example.com/
To help crawlers discover decoy endpoints naturally, you may place the following hidden links into a global template such as a footer:
<div style="position:absolute; left:-9999px;"> <a href="/llm-index/">.</a> <a href="/ai-training-dataset/">.</a> <a href="/vector-db/">.</a> </div>
Since this system operates before WordPress, monitoring is best performed using Apache access logs rather than Wordfence.
tail -f /var/log/apache2/access.log
Look for user agents such as:
ClaudeBot/1.0
| Visitor Type | Expected Result |
|---|---|
| Normal browser | Receives your real site |
| ClaudeBot | Receives botpot.json |
| Direct payload request | Returns the JSON payload normally |