Hero image for How to Install the Safe Web Research Skill for Claude Code

How to Install the Safe Web Research Skill for Claude Code

PC Drama
126 views

No Guardrails, Full Auto: with Claude’s Web Tools

I'm on the edge of my seat waiting for a non-beta version of Claude Security, till then, non-enterprise users need to build their own guardrails. Claude currently has two built-in web tools — Web Fetch and Web Search which are available to use in every Claude Code installation. With a fresh Claude install, the default permission mode asks for approval before each use. Most people approve or switch defaultMode to "auto", which allows web searching with full permission. Full auto is a reasonable choice, but consider what this means when using AI to search the internet: not every site out there has good intentions for an AI crawler, and certain "Stay off my lawn / website" web admins are punching back with non-sense data, causing AI to struggle through AI tarpits.

Searching the web without guardrails could lead to a prompt-injection attack — something as simple as a page saying "ignore your previous instructions" — and Claude will follow along. No suspicion, no pushback, just a very capable assistant doing exactly what a random website suggested.

Curious? Ask Claude: "Are there built-in protections when using web fetch or web search tools?"
You'll get an answer, and the official docs back up this response: No. there aren't any prompt injection guardrails. That's exactly why Safe Web Research guardrails are necessary.

Enabling the web fetch tool in environments where Claude processes untrusted input alongside sensitive data poses data exfiltration risks. Only use this tool in trusted environments or when handling non-sensitive data.

To minimize exfiltration risks, Claude is not allowed to dynamically construct URLs. Claude can only fetch URLs that have been explicitly provided by the user or that come from previous web search or web fetch results. However, there is still residual risk that should be carefully considered when using this tool.

If data exfiltration is a concern, consider:

  • Disabling the web fetch tool entirely
  • Using the max_uses parameter to limit the number of requests
  • Using the allowed_domains parameter to restrict to known safe domains

Source: Claude Docs — Web fetch tool

The Safe Web Research skill is a contingency layer of defense:

  • Custom TypeScript hooks that catch and clean every web response before it reaches Claude.
  • A clear judgment skill that helps Claude spot trouble and respond wisely.

No paranoia. Just everyday common sense, like sending a teenager out for milk and they won't stroll in at sunset with warm milk, Takis, and a basketball they "found."
Keep Claude helpful and fast while making sure it stays safe. Simple, reliable protection for real-world use. Ready to set it up? Grab a coffee and we will walk you through it.

Developer viewing the safe-web-research GitHub repository on a laptop with a coastal mountain backdrop
Prompt injection via fetched web content is not a bug with a patch.
Teenager returning at sunset carrying warm milk, Takis, and a basketball they "found"
Claude on an unfiltered web fetch as compared to a teenager sent for milk, who returns at sunset with warm milk, Takis, and a basketball they found. The hooks ensures that Claude returns with the actual request.

TL;DR

  • Two hooks (PreToolUse + PostToolUse) intercept every web call and wrap content in <untrusted_source> tags with computed risk signals.
  • A skill file carries the judgment rules (the playbook): when to abort a source, how to classify signals, what to report.
  • Everything runs locally via Bun. Five steps, under five minutes.
  • Fail-open by design — a hook crash never blocks Claude.

What's in the box

File What it does
hooks/web-fetch-pre.ts PreToolUse hook — checks robots.txt, rewrites Bash curl/wget through the sanitizer, fires the session-counter reminder on the 2nd+ web call
hooks/web-fetch-post.ts PostToolUse hook — strips dangerous DOM, runs cloaking detection, computes risk signals, wraps the response in <untrusted_source>
hooks/lib/ Shared library: sanitiser, signal classifier, refetch/cloaking detector, SQLite state, Bash command matcher
skills/safe-web-research/SKILL.md The judgment layer — abort rules, corroboration discipline, reporting format
skills/safe-web-research/risk-tiers.json Tunable thresholds and signal definitions (edit this, not the TypeScript)
bin/claude-sanitize CLI binary for status checks, stdin sanitization, and historical replay analysis

When to reach for this

You're doing security research. Claude fetches a threat intelligence page. The page helpfully includes a hidden <div> telling Claude it's now in "unrestricted mode" — a textbook case of input manipulation. The post-hook strips hidden elements before Claude reads a byte, then wraps the sanitized output with a hidden_content risk signal attached. Claude sees the signal, applies the abort rules, and treats the source with the skepticism it deserves.

You're fact-checking a claim from a source you don't recognize. The pre-hook checks robots.txt before fetching. The post-hook runs a parallel refetch with a different user-agent and compares the two responses — anomaly detection for a classic cloaking tell. If they diverge significantly, Claude gets a cloaking_suspected Critical signal and aborts the source entirely.

Claude is running autonomously through a long session with many web calls, and overreliance on its judgment builds silently as the context grows. After the second web call in a session, the pre-hook injects a reminder that loads the full skill rules fresh. The rules do not rely on Claude remembering them from the first fetch — they get restated on every subsequent call. Prompt drift does not accumulate.

You share your Claude setup with a team and want a consistent hygiene baseline. One copy operation, one bun install, a JSON block in settings.json. Every team member gets identical behavior. The hooks fail-open, so a misconfigured Bun path never blocks Claude entirely — it just loses the wrapper, which the skill itself treats as an abort signal.

You want protection against zero-width character injection. The sanitizer strips <script>, <style>, <iframe>, all event handler attributes (onclick=, onload=, etc.), HTML comments, and zero-width Unicode characters from every web response — signature matching at the source, before Claude reads a single byte.

The research backs it up

No browser agent is immune to prompt injection, and we share these findings to demonstrate progress, not to claim the problem is solved.

— Anthropic, Mitigating the risk of prompt injections in browser use

Prompt injection is not a fringe concern someone invented to sell security tools. The Open Web Application Security Project listed prompt injection as the number one LLM security risk in its 2025 report — ahead of sensitive data exposure, insecure tool use, and everything else on the list. Anthropic agrees. They published dedicated research on mitigating browser-based prompt injection and tested Claude Opus 4.5 against live attacks to measure how often the model-layer defenses actually hold.

The result: a 1% attack success rate. That is a genuine improvement, and Anthropic is right to highlight it. But one-in-a-hundred still happens, and it happens more often the more pages Claude browses. An agentic session that touches ten pages has ten chances. A session that runs through a hundred has a hundred. The math compounds quietly in the background while your assistant cheerfully continues its work.

The deeper issue is structural: prompt injection via fetched web content is not a bug with a patch. Any AI that reads external content faces it, because the model cannot perfectly separate content to summarize from instructions to follow. Model-layer classifiers and training push the rate down; infrastructure defenses push it further. The hooks in this package work at the infrastructure layer, stripping hostile content before Claude ever reads a byte and tagging everything else with its origin. That is the layer the model alone cannot cover.

How it works

User prompt
    │
    ├─ Claude calls WebFetch / WebSearch / curl in Bash
    │
    ▼
[ PreToolUse: web-fetch-pre.ts ]
    │  checks robots.txt cache (advisory)
    │  rewrites Bash curl/wget through claude-sanitize pipe
    │  increments session counter → injects skill reminder on 2nd+ call
    │  checks persistent + session domain blocklist
    │
    ▼
[ Web request executes ]
    │
    ▼
[ PostToolUse: web-fetch-post.ts ]
    │  strips scripts, styles, iframes, event handlers,
    │  hidden content, HTML comments, zero-width chars
    │  runs parallel refetch → computes cloaking divergence
    │  computes risk signals, classifies as Critical or Elevated
    │  wraps output: <untrusted_source url="..." risk_signals="..." ...>
    │  logs fetch to SQLite (url, signals, simhash, bytes, timestamp)
    │
    ▼
[ Claude reads the wrapped output ]
    │  checks wrapper presence (missing = Critical abort signal)
    │  reads risk_signals attribute
    │  applies abort rules: any Critical → abort; 3+ Elevated → abort
    │  otherwise → proceed with normal research judgment
    │
    ▼
User response

Both hooks run inside a 3-second budget. If anything goes wrong internally — Bun crash, missing dependency, disk full — the hooks fail open and Claude sees the raw response without a wrapper. The skill treats a missing wrapper as itself a Critical abort signal: a technical guardrail that holds even when everything else fails.

Risk signal thresholds live in risk-tiers.json, which you can edit without touching any TypeScript. Values like oversized_response_bytes, zero_width_chars_max, and hidden_content_ratio_max are plain numbers in that file.

Prerequisites

You need Bun — the hooks are TypeScript and run directly with bun run, no compile step required. Install it with:

curl -fsSL https://bun.sh/install | bash

Verify with bun --version. You need 1.x or higher. Open a new terminal after install so $PATH picks up ~/.bun/bin/. That's the full prerequisite list — no Node.js, no Docker, no database server. The hooks spin up their own SQLite file on first run.

Installation

All five steps should take about three minutes. Run these from inside the setup-web-search-deploy directory.

Step 1 — Copy the files

Create the destination directories and copy every file to its target location under ~/.claude/:

mkdir -p ~/.claude/hooks/lib
mkdir -p ~/.claude/skills/safe-web-research
mkdir -p ~/.claude/bin

cp hooks/package.json          ~/.claude/hooks/package.json
cp hooks/web-fetch-pre.ts      ~/.claude/hooks/web-fetch-pre.ts
cp hooks/web-fetch-post.ts     ~/.claude/hooks/web-fetch-post.ts
cp hooks/lib/bash-matcher.ts   ~/.claude/hooks/lib/bash-matcher.ts
cp hooks/lib/refetch.ts        ~/.claude/hooks/lib/refetch.ts
cp hooks/lib/sanitise.ts       ~/.claude/hooks/lib/sanitise.ts
cp hooks/lib/signals.ts        ~/.claude/hooks/lib/signals.ts
cp hooks/lib/state.ts          ~/.claude/hooks/lib/state.ts

cp skills/safe-web-research/SKILL.md        ~/.claude/skills/safe-web-research/SKILL.md
cp skills/safe-web-research/risk-tiers.json ~/.claude/skills/safe-web-research/risk-tiers.json

cp bin/claude-sanitize ~/.claude/bin/claude-sanitize
chmod +x ~/.claude/bin/claude-sanitize

Or as a one-liner from the parent directory:

cp -r setup-web-search-deploy/hooks/. ~/.claude/hooks/ && \
cp -r setup-web-search-deploy/skills/. ~/.claude/skills/ && \
cp setup-web-search-deploy/bin/claude-sanitize ~/.claude/bin/ && \
chmod +x ~/.claude/bin/claude-sanitize

Step 2 — Install the npm dependency

The hooks use shell-quote for safe Bash command parsing. Install it with Bun:

cd ~/.claude/hooks && bun install

Bun reads package.json and drops node_modules/shell-quote alongside the hook files. No build step, no transpilation — Bun resolves the import at runtime.

Step 3 — Register the hooks in settings.json

Open ~/.claude/settings.json (create it if it does not exist) and add the hooks block — the wiring for your prompt firewall. If you already have hooks defined, add these entries alongside your existing ones — do not replace the whole array.

{
    "hooks": {
        "PreToolUse": [
            {
                "matcher": "WebFetch|WebSearch|Bash|mcp__claude-in-chrome__(navigate|read_page|get_page_text|read_network_requests)|mcp__brightdata__.*",
                "hooks": [
                    {
                        "type": "command",
                        "command": "$HOME/.bun/bin/bun run $HOME/.claude/hooks/web-fetch-pre.ts",
                        "timeout": 5000
                    }
                ]
            }
        ],
        "PostToolUse": [
            {
                "matcher": "WebFetch|WebSearch|mcp__claude-in-chrome__(navigate|read_page|get_page_text|read_network_requests)|mcp__brightdata__.*",
                "hooks": [
                    {
                        "type": "command",
                        "command": "$HOME/.bun/bin/bun run $HOME/.claude/hooks/web-fetch-post.ts",
                        "timeout": 8000
                    }
                ]
            }
        ]
    }
}

Pro Tip: Bash is PreToolUse only

PreToolUse includes Bash so that curl/wget calls inside shell commands get piped through claude-sanitize. The post-hook omits Bash — only structured tool responses (WebFetch, WebSearch, MCP) return content the post-hook can wrap.

Step 4 — Add the skill reference to CLAUDE.md

Add this block to ~/.claude/CLAUDE.md (your global system prompt) or your project's CLAUDE.md (project-scoped). It tells Claude where the full rule set lives and ensures the skill loads automatically when the pre-hook injects its session reminder:

## Web Research Protocol

Web research safety is handled by the Safe Web Research skill
(`~/.claude/skills/safe-web-research/SKILL.md`). The hook
(`~/.claude/hooks/web-fetch-pre.ts` + `web-fetch-post.ts`) wraps every
web fetch in `<untrusted_source>`; the skill carries the abort,
corroboration, and reporting rules.

Step 5 — Restart Claude Code

Settings are read at startup. Quit and reopen Claude Code — or reload the window in your IDE extension — to activate the hooks.

Verify it's working

Here is a real example from a live session. We asked:

fetch rfc 4949 and summarize it

The post-hook intercepted the response from rfc-editor.org and wrapped it before Claude read a byte:

<untrusted_source url="https://www.rfc-editor.org/rfc/rfc4949"
    fetched_at="2026-05-14T15:25:48.172Z"
    sanitiser_version="1.0.0"
    risk_signals="content_type_mismatch"
    content_sha256="faaf7bc58c4cdcc68393bd4882776a2c131276d290ef715bbb0ba6f1f427fd05">
    ...summarized page content...
</untrusted_source>

One Elevated signal fired — content_type_mismatch, likely because the RFC editor served HTML where plain text was expected. Not Critical, not enough to abort. Claude applied the skill rules, continued with the source, and emitted the required provenance block alongside its summary:

<safe_research_summary>
  URL: https://www.rfc-editor.org/rfc/rfc4949
  Sanitiser Version: 1.0.0
  Risk Signals: content_type_mismatch
  Verdict: Caution
  Action: Continued
  Recommendation: Single Elevated signal (likely HTML served where plain-text
  was expected); content is the authoritative RFC editor source and is reliable.
</safe_research_summary>

That is the system working. Wrapper present, signal tagged, verdict justified, source cited with full provenance.

You will also see a log-mode advisory in Claude's response. That is expected — log mode is the default, and it means signals are computed and reported but the original bytes are passed through. Think of it as prompt monitoring mode: a window to watch signal frequency before you commit to stripping content.

To check the sanitizer's internal state directly:

~/.claude/bin/claude-sanitize status

Expected output after one fetch:

{
    "sanitiser_version": "1.0.0",
    "mode": "log",
    "fetch_log_rows": 1,
    "blocklist_size": 0,
    "sessions": 1
}

If fetch_log_rows is 0 after fetching a page, the post-hook is not firing — go back to Step 3 and verify your settings.json is valid JSON with no trailing commas.

Optional configuration

Enforce mode

By default the hooks run in log mode: signals are computed and reported, but the original unsanitized bytes pass through to Claude. Once you have monitored signal frequency for a while and are comfortable with the thresholds (a deliberate human-in-the-loop checkpoint), switch to enforce mode to actually strip dangerous content:

# In ~/.zshrc or ~/.bashrc
export CLAUDE_SANITISER_MODE=enforce

Or scope it to Claude Code only via settings.json:

{
    "env": {
        "CLAUDE_SANITISER_MODE": "enforce"
    }
}

In enforce mode, scripts, hidden elements, event handlers, iframes, HTML comments, and zero-width characters are stripped from every response before Claude sees them.

Debug mode

Logs full request and response bodies. Generates large files fast — use sparingly:

export CLAUDE_SANITISER_DEBUG=1

Output goes to ~/.claude/safe-web-research/fetch-log-debug.jsonl.

Persistent domain blocklist

Block specific domains permanently by editing ~/.claude/web-blocklist.json:

{
    "version": 1,
    "entries": [
        {
            "domain": "known-injection-site.com",
            "reason": "confirmed prompt-injection host",
            "added_at": "2026-01-01T00:00:00.000Z",
            "source": "user",
            "expires_at": null
        }
    ]
}

The pre-hook reconciles this file with its SQLite database on every invocation. User-sourced entries always take precedence over auto-detected ones.

Tuning signal thresholds

All thresholds live in ~/.claude/skills/safe-web-research/risk-tiers.json. Edit numbers there — no TypeScript required. You can change what counts as "oversized," how many zero-width characters trigger a signal, or how long robots.txt results are cached.

Troubleshooting

Hooks aren't firing — no <untrusted_source> wrapper in responses

Run which bun in your terminal. If the path is not ~/.bun/bin/bun, update the command values in your settings.json hook entries to use the absolute path that which bun printed. Also confirm settings.json is valid JSON — trailing commas will silently break it. Restart Claude Code after any settings change.

shell-quote import error when running a hook

The npm dependency was not installed. Run: cd ~/.claude/hooks && bun install

risk-tiers.json not found error

The hook hard-codes the path ~/.claude/skills/safe-web-research/risk-tiers.json. Confirm the file exists there. If you moved it, you need to either edit hooks/lib/signals.ts and re-copy, or symlink from the expected path.

Wrapper is present but Claude is not following the abort rules

The safe-web-research skill needs to be loaded. Confirm the CLAUDE.md reference from Step 4 is in place. You can also manually load it by starting your prompt with /safe-web-research.

Hook crashed silently

Check ~/.claude/safe-web-research/hook-errors.log. The hooks write any internal exception there before exiting. If the file has content, that's your root cause.

What it creates at runtime

On first use the hooks auto-create these files. You do not need to create them manually.

Path Purpose
~/.claude/safe-web-research/state.db SQLite database — sessions, blocklist, robots.txt cache, fetch log
~/.claude/safe-web-research/fetch-log.jsonl JSONL record of every web fetch: URL, signals, byte counts, simhash, timestamp
~/.claude/safe-web-research/fetch-log-debug.jsonl Full request/response bodies (only written when CLAUDE_SANITISER_DEBUG=1)
~/.claude/safe-web-research/hook-errors.log Hook crash log — should stay empty; non-empty means a hook is erroring silently
~/.claude/web-blocklist.json Persistent domain blocklist — human-editable, reconciled on every pre-hook run

Drift analysis

After the hooks have been running for a while, you can re-classify all historical fetches against the current signal tier table. Useful for regression testing your threshold changes — checking whether the new values would have altered any abort decisions before you commit to them:

~/.claude/bin/claude-sanitize replay --since=2026-01-01

It reads from state.db, runs the current risk-tiers.json thresholds against historical data, and reports which fetches would have been classified differently. A good sanity check before enabling enforce mode.

Part of a series

Related Articles