No Guardrails, Full Auto: with Claude’s Web Tools
I'm on the edge of my seat waiting for a non-beta version of Claude Security, till then, non-enterprise users need to build their own guardrails. Claude currently has two built-in web tools — Web Fetch and Web Search which are available to use in every Claude Code installation. With a fresh Claude install, the default permission mode asks for approval before each use. Most people approve or switch defaultMode to "auto", which allows web searching with full permission. Full auto is a reasonable choice, but consider what this means when using AI to search the internet: not every site out there has good intentions for an AI crawler, and certain "Stay off my lawn / website" web admins are punching back with non-sense data, causing AI to struggle through AI tarpits.
Searching the web without guardrails could lead to a prompt-injection attack — something as simple as a page saying "ignore your previous instructions" — and Claude will follow along. No suspicion, no pushback, just a very capable assistant doing exactly what a random website suggested.
Curious? Ask Claude: "Are there built-in protections when using web fetch or web search tools?"
You'll get an answer, and the official docs back up this response: No. there aren't any prompt injection guardrails. That's exactly why Safe Web Research guardrails are necessary.
Enabling the web fetch tool in environments where Claude processes untrusted input alongside sensitive data poses data exfiltration risks. Only use this tool in trusted environments or when handling non-sensitive data.
To minimize exfiltration risks, Claude is not allowed to dynamically construct URLs. Claude can only fetch URLs that have been explicitly provided by the user or that come from previous web search or web fetch results. However, there is still residual risk that should be carefully considered when using this tool.
If data exfiltration is a concern, consider:
- Disabling the web fetch tool entirely
- Using the
max_usesparameter to limit the number of requests - Using the
allowed_domainsparameter to restrict to known safe domains
Source: Claude Docs — Web fetch tool
The Safe Web Research skill is a contingency layer of defense:
- Custom TypeScript hooks that catch and clean every web response before it reaches Claude.
- A clear judgment skill that helps Claude spot trouble and respond wisely.
No paranoia. Just everyday common sense, like sending a teenager out for milk and they won't stroll in at sunset with warm milk, Takis, and a basketball they "found."
Keep Claude helpful and fast while making sure it stays safe. Simple, reliable protection for real-world use. Ready to set it up? Grab a coffee and we will walk you through it.
TL;DR
- Two hooks (PreToolUse + PostToolUse) intercept every web call and wrap content in
<untrusted_source>tags with computed risk signals. - A skill file carries the judgment rules (the playbook): when to abort a source, how to classify signals, what to report.
- Everything runs locally via Bun. Five steps, under five minutes.
- Fail-open by design — a hook crash never blocks Claude.
What's in the box
| File | What it does |
|---|---|
hooks/web-fetch-pre.ts |
PreToolUse hook — checks robots.txt, rewrites Bash curl/wget through the sanitizer, fires the session-counter reminder on the 2nd+ web call |
hooks/web-fetch-post.ts |
PostToolUse hook — strips dangerous DOM, runs cloaking detection, computes risk signals, wraps the response in <untrusted_source> |
hooks/lib/ |
Shared library: sanitiser, signal classifier, refetch/cloaking detector, SQLite state, Bash command matcher |
skills/safe-web-research/SKILL.md |
The judgment layer — abort rules, corroboration discipline, reporting format |
skills/safe-web-research/risk-tiers.json |
Tunable thresholds and signal definitions (edit this, not the TypeScript) |
bin/claude-sanitize |
CLI binary for status checks, stdin sanitization, and historical replay analysis |
When to reach for this
You're doing security research. Claude fetches a threat intelligence page. The page helpfully includes a hidden <div> telling Claude it's now in "unrestricted mode" — a textbook case of input manipulation. The post-hook strips hidden elements before Claude reads a byte, then wraps the sanitized output with a hidden_content risk signal attached. Claude sees the signal, applies the abort rules, and treats the source with the skepticism it deserves.
You're fact-checking a claim from a source you don't recognize. The pre-hook checks robots.txt before fetching. The post-hook runs a parallel refetch with a different user-agent and compares the two responses — anomaly detection for a classic cloaking tell. If they diverge significantly, Claude gets a cloaking_suspected Critical signal and aborts the source entirely.
Claude is running autonomously through a long session with many web calls, and overreliance on its judgment builds silently as the context grows. After the second web call in a session, the pre-hook injects a reminder that loads the full skill rules fresh. The rules do not rely on Claude remembering them from the first fetch — they get restated on every subsequent call. Prompt drift does not accumulate.
You share your Claude setup with a team and want a consistent hygiene baseline. One copy operation, one bun install, a JSON block in settings.json. Every team member gets identical behavior. The hooks fail-open, so a misconfigured Bun path never blocks Claude entirely — it just loses the wrapper, which the skill itself treats as an abort signal.
You want protection against zero-width character injection. The sanitizer strips <script>, <style>, <iframe>, all event handler attributes (onclick=, onload=, etc.), HTML comments, and zero-width Unicode characters from every web response — signature matching at the source, before Claude reads a single byte.
The research backs it up
No browser agent is immune to prompt injection, and we share these findings to demonstrate progress, not to claim the problem is solved.
— Anthropic, Mitigating the risk of prompt injections in browser use
Prompt injection is not a fringe concern someone invented to sell security tools. The Open Web Application Security Project listed prompt injection as the number one LLM security risk in its 2025 report — ahead of sensitive data exposure, insecure tool use, and everything else on the list. Anthropic agrees. They published dedicated research on mitigating browser-based prompt injection and tested Claude Opus 4.5 against live attacks to measure how often the model-layer defenses actually hold.
The result: a 1% attack success rate. That is a genuine improvement, and Anthropic is right to highlight it. But one-in-a-hundred still happens, and it happens more often the more pages Claude browses. An agentic session that touches ten pages has ten chances. A session that runs through a hundred has a hundred. The math compounds quietly in the background while your assistant cheerfully continues its work.
The deeper issue is structural: prompt injection via fetched web content is not a bug with a patch. Any AI that reads external content faces it, because the model cannot perfectly separate content to summarize from instructions to follow. Model-layer classifiers and training push the rate down; infrastructure defenses push it further. The hooks in this package work at the infrastructure layer, stripping hostile content before Claude ever reads a byte and tagging everything else with its origin. That is the layer the model alone cannot cover.
How it works
User prompt
│
├─ Claude calls WebFetch / WebSearch / curl in Bash
│
▼
[ PreToolUse: web-fetch-pre.ts ]
│ checks robots.txt cache (advisory)
│ rewrites Bash curl/wget through claude-sanitize pipe
│ increments session counter → injects skill reminder on 2nd+ call
│ checks persistent + session domain blocklist
│
▼
[ Web request executes ]
│
▼
[ PostToolUse: web-fetch-post.ts ]
│ strips scripts, styles, iframes, event handlers,
│ hidden content, HTML comments, zero-width chars
│ runs parallel refetch → computes cloaking divergence
│ computes risk signals, classifies as Critical or Elevated
│ wraps output: <untrusted_source url="..." risk_signals="..." ...>
│ logs fetch to SQLite (url, signals, simhash, bytes, timestamp)
│
▼
[ Claude reads the wrapped output ]
│ checks wrapper presence (missing = Critical abort signal)
│ reads risk_signals attribute
│ applies abort rules: any Critical → abort; 3+ Elevated → abort
│ otherwise → proceed with normal research judgment
│
▼
User response
Both hooks run inside a 3-second budget. If anything goes wrong internally — Bun crash, missing dependency, disk full — the hooks fail open and Claude sees the raw response without a wrapper. The skill treats a missing wrapper as itself a Critical abort signal: a technical guardrail that holds even when everything else fails.
Risk signal thresholds live in risk-tiers.json, which you can edit without touching any TypeScript. Values like oversized_response_bytes, zero_width_chars_max, and hidden_content_ratio_max are plain numbers in that file.
Prerequisites
You need Bun — the hooks are TypeScript and run directly with bun run, no compile step required. Install it with:
curl -fsSL https://bun.sh/install | bash
Verify with bun --version. You need 1.x or higher. Open a new terminal after install so $PATH picks up ~/.bun/bin/. That's the full prerequisite list — no Node.js, no Docker, no database server. The hooks spin up their own SQLite file on first run.
Installation
All five steps should take about three minutes. Run these from inside the setup-web-search-deploy directory.
Step 1 — Copy the files
Create the destination directories and copy every file to its target location under ~/.claude/:
mkdir -p ~/.claude/hooks/lib
mkdir -p ~/.claude/skills/safe-web-research
mkdir -p ~/.claude/bin
cp hooks/package.json ~/.claude/hooks/package.json
cp hooks/web-fetch-pre.ts ~/.claude/hooks/web-fetch-pre.ts
cp hooks/web-fetch-post.ts ~/.claude/hooks/web-fetch-post.ts
cp hooks/lib/bash-matcher.ts ~/.claude/hooks/lib/bash-matcher.ts
cp hooks/lib/refetch.ts ~/.claude/hooks/lib/refetch.ts
cp hooks/lib/sanitise.ts ~/.claude/hooks/lib/sanitise.ts
cp hooks/lib/signals.ts ~/.claude/hooks/lib/signals.ts
cp hooks/lib/state.ts ~/.claude/hooks/lib/state.ts
cp skills/safe-web-research/SKILL.md ~/.claude/skills/safe-web-research/SKILL.md
cp skills/safe-web-research/risk-tiers.json ~/.claude/skills/safe-web-research/risk-tiers.json
cp bin/claude-sanitize ~/.claude/bin/claude-sanitize
chmod +x ~/.claude/bin/claude-sanitize
Or as a one-liner from the parent directory:
cp -r setup-web-search-deploy/hooks/. ~/.claude/hooks/ && \
cp -r setup-web-search-deploy/skills/. ~/.claude/skills/ && \
cp setup-web-search-deploy/bin/claude-sanitize ~/.claude/bin/ && \
chmod +x ~/.claude/bin/claude-sanitize
Step 2 — Install the npm dependency
The hooks use shell-quote for safe Bash command parsing. Install it with Bun:
cd ~/.claude/hooks && bun install
Bun reads package.json and drops node_modules/shell-quote alongside the hook files. No build step, no transpilation — Bun resolves the import at runtime.
Step 3 — Register the hooks in settings.json
Open ~/.claude/settings.json (create it if it does not exist) and add the hooks block — the wiring for your prompt firewall. If you already have hooks defined, add these entries alongside your existing ones — do not replace the whole array.
{
"hooks": {
"PreToolUse": [
{
"matcher": "WebFetch|WebSearch|Bash|mcp__claude-in-chrome__(navigate|read_page|get_page_text|read_network_requests)|mcp__brightdata__.*",
"hooks": [
{
"type": "command",
"command": "$HOME/.bun/bin/bun run $HOME/.claude/hooks/web-fetch-pre.ts",
"timeout": 5000
}
]
}
],
"PostToolUse": [
{
"matcher": "WebFetch|WebSearch|mcp__claude-in-chrome__(navigate|read_page|get_page_text|read_network_requests)|mcp__brightdata__.*",
"hooks": [
{
"type": "command",
"command": "$HOME/.bun/bin/bun run $HOME/.claude/hooks/web-fetch-post.ts",
"timeout": 8000
}
]
}
]
}
}
Pro Tip: Bash is PreToolUse only
PreToolUse includes Bash so that curl/wget calls inside shell commands get piped through claude-sanitize. The post-hook omits Bash — only structured tool responses (WebFetch, WebSearch, MCP) return content the post-hook can wrap.
Step 4 — Add the skill reference to CLAUDE.md
Add this block to ~/.claude/CLAUDE.md (your global system prompt) or your project's CLAUDE.md (project-scoped). It tells Claude where the full rule set lives and ensures the skill loads automatically when the pre-hook injects its session reminder:
## Web Research Protocol
Web research safety is handled by the Safe Web Research skill
(`~/.claude/skills/safe-web-research/SKILL.md`). The hook
(`~/.claude/hooks/web-fetch-pre.ts` + `web-fetch-post.ts`) wraps every
web fetch in `<untrusted_source>`; the skill carries the abort,
corroboration, and reporting rules.
Step 5 — Restart Claude Code
Settings are read at startup. Quit and reopen Claude Code — or reload the window in your IDE extension — to activate the hooks.
Verify it's working
Here is a real example from a live session. We asked:
fetch rfc 4949 and summarize it
The post-hook intercepted the response from rfc-editor.org and wrapped it before Claude read a byte:
<untrusted_source url="https://www.rfc-editor.org/rfc/rfc4949"
fetched_at="2026-05-14T15:25:48.172Z"
sanitiser_version="1.0.0"
risk_signals="content_type_mismatch"
content_sha256="faaf7bc58c4cdcc68393bd4882776a2c131276d290ef715bbb0ba6f1f427fd05">
...summarized page content...
</untrusted_source>
One Elevated signal fired — content_type_mismatch, likely because the RFC editor served HTML where plain text was expected. Not Critical, not enough to abort. Claude applied the skill rules, continued with the source, and emitted the required provenance block alongside its summary:
<safe_research_summary>
URL: https://www.rfc-editor.org/rfc/rfc4949
Sanitiser Version: 1.0.0
Risk Signals: content_type_mismatch
Verdict: Caution
Action: Continued
Recommendation: Single Elevated signal (likely HTML served where plain-text
was expected); content is the authoritative RFC editor source and is reliable.
</safe_research_summary>
That is the system working. Wrapper present, signal tagged, verdict justified, source cited with full provenance.
You will also see a log-mode advisory in Claude's response. That is expected — log mode is the default, and it means signals are computed and reported but the original bytes are passed through. Think of it as prompt monitoring mode: a window to watch signal frequency before you commit to stripping content.
To check the sanitizer's internal state directly:
~/.claude/bin/claude-sanitize status
Expected output after one fetch:
{
"sanitiser_version": "1.0.0",
"mode": "log",
"fetch_log_rows": 1,
"blocklist_size": 0,
"sessions": 1
}
If fetch_log_rows is 0 after fetching a page, the post-hook is not firing — go back to Step 3 and verify your settings.json is valid JSON with no trailing commas.
Optional configuration
Enforce mode
By default the hooks run in log mode: signals are computed and reported, but the original unsanitized bytes pass through to Claude. Once you have monitored signal frequency for a while and are comfortable with the thresholds (a deliberate human-in-the-loop checkpoint), switch to enforce mode to actually strip dangerous content:
# In ~/.zshrc or ~/.bashrc
export CLAUDE_SANITISER_MODE=enforce
Or scope it to Claude Code only via settings.json:
{
"env": {
"CLAUDE_SANITISER_MODE": "enforce"
}
}
In enforce mode, scripts, hidden elements, event handlers, iframes, HTML comments, and zero-width characters are stripped from every response before Claude sees them.
Debug mode
Logs full request and response bodies. Generates large files fast — use sparingly:
export CLAUDE_SANITISER_DEBUG=1
Output goes to ~/.claude/safe-web-research/fetch-log-debug.jsonl.
Persistent domain blocklist
Block specific domains permanently by editing ~/.claude/web-blocklist.json:
{
"version": 1,
"entries": [
{
"domain": "known-injection-site.com",
"reason": "confirmed prompt-injection host",
"added_at": "2026-01-01T00:00:00.000Z",
"source": "user",
"expires_at": null
}
]
}
The pre-hook reconciles this file with its SQLite database on every invocation. User-sourced entries always take precedence over auto-detected ones.
Tuning signal thresholds
All thresholds live in ~/.claude/skills/safe-web-research/risk-tiers.json. Edit numbers there — no TypeScript required. You can change what counts as "oversized," how many zero-width characters trigger a signal, or how long robots.txt results are cached.
Troubleshooting
Hooks aren't firing — no <untrusted_source> wrapper in responses
Run which bun in your terminal. If the path is not ~/.bun/bin/bun, update the command values in your settings.json hook entries to use the absolute path that which bun printed. Also confirm settings.json is valid JSON — trailing commas will silently break it. Restart Claude Code after any settings change.
shell-quote import error when running a hook
The npm dependency was not installed. Run: cd ~/.claude/hooks && bun install
risk-tiers.json not found error
The hook hard-codes the path ~/.claude/skills/safe-web-research/risk-tiers.json. Confirm the file exists there. If you moved it, you need to either edit hooks/lib/signals.ts and re-copy, or symlink from the expected path.
Wrapper is present but Claude is not following the abort rules
The safe-web-research skill needs to be loaded. Confirm the CLAUDE.md reference from Step 4 is in place. You can also manually load it by starting your prompt with /safe-web-research.
Hook crashed silently
Check ~/.claude/safe-web-research/hook-errors.log. The hooks write any internal exception there before exiting. If the file has content, that's your root cause.
What it creates at runtime
On first use the hooks auto-create these files. You do not need to create them manually.
| Path | Purpose |
|---|---|
~/.claude/safe-web-research/state.db |
SQLite database — sessions, blocklist, robots.txt cache, fetch log |
~/.claude/safe-web-research/fetch-log.jsonl |
JSONL record of every web fetch: URL, signals, byte counts, simhash, timestamp |
~/.claude/safe-web-research/fetch-log-debug.jsonl |
Full request/response bodies (only written when CLAUDE_SANITISER_DEBUG=1) |
~/.claude/safe-web-research/hook-errors.log |
Hook crash log — should stay empty; non-empty means a hook is erroring silently |
~/.claude/web-blocklist.json |
Persistent domain blocklist — human-editable, reconciled on every pre-hook run |
Drift analysis
After the hooks have been running for a while, you can re-classify all historical fetches against the current signal tier table. Useful for regression testing your threshold changes — checking whether the new values would have altered any abort decisions before you commit to them:
~/.claude/bin/claude-sanitize replay --since=2026-01-01
It reads from state.db, runs the current risk-tiers.json thresholds against historical data, and reports which fetches would have been classified differently. A good sanity check before enabling enforce mode.
Part of a series
- Safe AI Search Path — five Critical signals, five Elevated signals, and the full abort playbook
- The Resilient AI Web Research Protocol — seven behavioral rules that keep agents out of tarpits
- AI Tarpits vs AI Citations — the site-owner side: Nepenthes, Iocaine, Anubis, and Cloudflare AI Labyrinth