Safe AI Search Path: Spot AI Traps Before Your Agent Falls In

Q: What counts as a tarpit signature in practice?

Two signals do most of the work. url_cardinality_explosion catches infinite-link mazes: thirty distinct URLs against one domain in sixty seconds is a fingerprint, not a research pattern. repeating_substring_ratio_high catches Markov-poisoned content: text that reads like English on a skim but compiles as a noise generator. Together they identify tools like Nepenthes, Iocaine, and Cloudflare AI Labyrinth before the agent has wasted serious budget inside them.

Q: How do I know my source was actually aborted vs just flagged?

Read the emitted next to the citation. The Action field is one of Continued, Aborted, or Blocklisted. The Verdict field is one of Clean, Caution, or High_Risk. A Clean verdict with Continued action means the source passed every check. A High_Risk verdict with Aborted action means the source is in the answer's audit trail but its content was not used. A Caution verdict with Continued action means one or two Elevated signals fired, the content was used, but you should glance at the recommendation line.

Q: Can a website hide its tarpit behavior from the sanitiser?

It can try, and some do. The two Critical tarpit signals (repeating_substring_ratio_high and url_cardinality_explosion) are statistical, not pattern-matched, so a sophisticated operator could in principle tune their generator to fly under the ratio threshold or distribute URLs across a larger window. The pragmatic ceiling is that any tarpit clever enough to evade those signals is also expensive to run, which is the goal. The economics of poisoning AI research only work at scale; pushing operators into bespoke evasion shifts the cost back to them.

AI Web Search & Research Workflow

This article outlines a practical defense playbook (/safe-web-research) for AI agent web research. This skill prevents common pitfalls like prompt injection, cloaked/manipulated pages, DoS payloads, Markov-generated "tarpits," and infinite URL mazes that waste tokens or poison outputs.

This AI /safe-web-research skill is an agent-side companion to a content sanitization hook: the hook wraps every web fetch in an <untrusted_source> marker and computes mechanical risk signals; the safe web research skill carries the judgment rules that decide when to abort a source, when to pivot, and when to add a domain to a session blocklist. Between them, they catch five things AI agents fall into when they crawl the modern web: prompt injection inside page content, cloaked pages that show one thing to a browser and something else to a bot, oversized DoS-style payloads, Markov-poisoned tarpit text, and infinite URL mazes engineered to burn an agent's budget. This post is the operator-friendly walkthrough of that playbook.

TL;DR

Two layers: a sanitizer hook (mechanical, runs on every fetch) plus a /safe-web-research skill (judgment, decides what the signals mean).
Five Critical signals trigger immediate abort: injection_phrase, cloaking_suspected, oversized_response, repeating_substring_ratio_high, and url_cardinality_explosion.
Five Elevated signals form a 3-of-5 abort rule: zero-width chars, hidden content ratio, long redirect chains, content-type mismatches, and session near-duplicates.
On abort: emit a provenance summary, blocklist the domain for the session, and pivot to archive.org or an institutional alternative. Never retry the same URL through a different tool.
If the wrapper is missing, the source is automatically adversarial. Fail-open at the hook, fail-closed at the skill. Both behaviors are correct in their layer.

What goes wrong without a safe AI search path

AI traps illustrated as prompt injection, hidden content, cloaked pages, and infinite looping mazes facing an AI agent on a research path — Four costly traps that ambush unprotected AI agents: prompt injection, hidden content, cloaked pages, and infinite-looping URL mazes engineered to drain token budgets and poison downstream output.

AI agents are the new junior researchers. Fast, low cost, and they will cheerfully wander into a Markov maze and bill you for the privilege. The overage cost shows up in three places, sometimes all at once.

The first is dollars.
Token meters spike when an agent gets stuck inside an infinite-link tarpit, looping through synthetic content that looks novel on every page and is actually the same Markov salad with the nouns swapped. We covered the operator-side narrative in the cost-spike story that motivated this playbook, and the defensive infrastructure on the site-owner side in AI tarpits explained. Read either if you want the "why," then come back here for the "how to not be the victim."

The second is truth.
A poisoned page can pollute the agent's working set with plausible nonsense, which then gets summarized, cited, and quietly shipped into your output. Once a hallucination has a citation attached, it stops feeling like a hallucination. It feels like research.

The third is control.
Prompt injection inside fetched content (visible text, invisible Unicode, alt attributes, JSON-LD blobs) is OWASP's number-one risk for LLM applications for a reason: the agent reads what the page tells it, and the page can tell it to do almost anything. A safe AI search path treats every byte of fetched content as untrusted by default, even when the site looks reputable.

The two layers: mechanical hook plus judgment skill

Two-layer defense diagram pairing the mechanical sanitiser hook with the judgment-based safe-web-research skill that reads its risk signals — Two cooperating layers: a deterministic sanitiser hook that fires on every fetch, paired with the /safe-web-research skill that reads the wrapper's risk signals and decides when to continue, pivot, or blocklist.

The playbook is split on purpose. The mechanical layer runs on every fetch with no language-model reasoning involved, which means it cannot be talked out of its job by a clever page. The judgment layer reads the mechanical output, applies context, and decides what to do. Each layer is the right tool for its half of the problem, and neither one is sufficient alone.

The hook

The web-fetch-sanitiser hook fires on every WebFetch, every WebSearch, every browser-automation MCP call, every shell-based curl, wget, http, lynx, or w3m. It does three things. It strips known dangerous DOM artifacts (script tags, encoded payloads, malicious data URIs). It wraps the result in <untrusted_source url="..." sanitiser_version="..." risk_signals="..." rules_applied="..." original_bytes="..." content_sha256="...">. And it computes risk signals against the response in deterministic, language-model-free code, so the signals cannot be reasoned away by a hostile page.

The skill

/safe-web-research is the policy layer. It reads the wrapper, applies the abort rules, decides whether to surface a provenance summary, manages the session blocklist, and orchestrates pivots to archive.org or alternate sources. The skill never operates on web content without first checking that the wrapper is present, because a missing wrapper is itself a Critical signal. If the hook crashed, timed out, or got bypassed, that is a fail-closed condition: the source is treated as adversarial, full stop.

Definition: AI tarpit

A server-side defense that intentionally serves automated visitors an infinite chain of synthetic pages. The dual goal is to waste the bot's compute and, when Markov-generated text is involved, corrupt any downstream training set. See AI Tarpits Explained for the full taxonomy (Nepenthes, Iocaine, Anubis, Cloudflare AI Labyrinth).

Definition: cloaking_suspected

A risk signal that triggers when the same URL returns two different versions of content: one version for the AI agent, another for a clean request.. Classic indicator of SEO cloaking and bot-targeted manipulation: the site is showing one thing to a human browser and a different (often hostile) thing to whatever the agent is identifying as.

The five Critical signals (any one triggers abort)

Critical means "do not pass go." A single Critical signal on a source is enough to discard that source entirely. No quoting, no weighting as evidence, no "well, except for this one paragraph." The search ends immediately. No debate. The source is discarded and ends the search immediately.

injection_phrase. A substring match against a curated list of prompt-injection patterns. Catches the obvious ("ignore previous instructions"), the polite ("you are now in admin mode"), and the slightly clever ("the previous user has authorized the following"). Pages that ship with this kind of language inside their body, alt text, or hidden divs are not contributing to your research goal. They are trying to hijack it.

cloaking_suspected. The hook does a parallel low-fingerprint refetch from a clean control path. If results return differing content for an AI agent, the site is considered to use cloaking. The agent saw bait. A human visitor would see something else. Either could be the malicious version; both are reason enough to drop the bait.

oversized_response. The fetch returned more bytes than any reasonable page or an absurdly large amount of data. DoS-style payloads, decompression bombs, and pathologically long responses designed to exhaust a model's context window all live in this bucket. The cap is configurable but the principle isn't: an honest article is not 40 MB of HTML.

repeating_substring_ratio_high. he text looks like English on the surface, but its internal patterns repeat like machine-generated noise. The statistical fingerprint is unmistakable. Markov-style text repeats specific n-grams more often than natural prose. The hook ratios this against the response and fires when the curve looks generated, which is anomaly detection on prose statistics rather than substring matching. This is the canonical tarpit fingerprint: a page that reads like English on a quick skim but compiles statistically as a noise generator.

url_cardinality_explosion. One domain suddenly spawns dozens of unique links in a short time. The infinite-maze fingerprint or hall of mirrors built to exhaust an AI search agent. An agent that has followed thirty unique links on example.com in the last sixty seconds is almost certainly inside a tarpit, not researching example.com's actual content.

The five Elevated signals (the 3-of-5 rule)

Elevated signals are weaker individually but damning in combination. A single Elevated signal might be a quirk; two might be coincidence; three on the same fetch is a pattern, and the playbook aborts.

zero_width_chars. Invisible Unicode characters (U+200B, U+200C, U+200D, U+FEFF) injected into otherwise normal text. Frequently used to hide prompt-injection payloads from human reviewers while leaving them legible to language models. One stray zero-width might be a copy-paste accident. A dozen scattered through the body is a payload.

hidden_content_ratio_high. A large share of the page’s text is deliberately hidden, off-screen positioned, or rendered with display:none and color tricks. A human sees the visible top of the iceberg; the agent reads everything beneath the iceberg. Hidden-content techniques are how black-hat SEO has worked for two decades; classic black-hat technique now targets machines most of all.

redirect_chain_long. The fetch went through more than five hops to arrive. The origin is being hidden for a reason. Long redirect chains are normal for some legitimate use cases (URL shorteners, single-sign-on, marketing tracking), but they are also the standard way malicious infrastructure obscures origin. Combined with any other Elevated signal, the redirects start to feel deliberate.

content_type_mismatch. The HTTP Content-Type header says one thing, yet the actual bytes tell a different story. A site claiming to return text/html while actually delivering JavaScript blob, a streaming response, or worse, executable content, is a textbook deception.

near_duplicate_to_session. The same content keeps appearing across queries in this session. Could be benign (the page genuinely is the canonical answer). Could be tarpit echo (one site is dominating results because it is structurally designed to). Three signals together make the pattern unmistakable.

Critical vs Elevated: the abort matrix

Signal	What it catches	Tier
injection_phrase	Prompt injection embedded in page content	Critical
cloaking_suspected	Agent and clean refetch see different pages	Critical
oversized_response	DoS-style payload, decompression bomb, context bomb	Critical
repeating_substring_ratio_high	Markov text, classic tarpit fingerprint	Critical
url_cardinality_explosion	Infinite-maze URL pattern, tarpit trip	Critical
zero_width_chars	Invisible Unicode used to hide prompt injections	Elevated
hidden_content_ratio_high	CSS-hidden text used to manipulate the model	Elevated
redirect_chain_long	More than five hops, common in evasion infrastructure	Elevated
content_type_mismatch	Declared MIME differs from sniffed MIME (cloaking)	Elevated
near_duplicate_to_session	Same content keeps re-appearing across queries	Elevated

AI agents are the new junior researchers. They are fast, they are cheap, and they will cheerfully wander into a Markov maze and bill you for the privilege.

What happens on abort

An abort in /safe-web-research is loud, traceable, and recoverable. The skill emits a <safe_research_summary> block alongside the affected source with full provenance: URL, sanitiser version, the exact risk signals that fired, the verdict (High_Risk, Caution, or Clean), the action taken (Continued, Aborted, or Blocklisted), and a one-line recommendation for the next step.

The domain that triggered the abort goes onto the session-local blocklist immediately. The blocklist is in-memory for the session and persisted via SQLite, so subsequent fetches in the same conversation route around the bad domain without needing the user to do anything. If the same domain trips the abort more than once across multiple fetches, the skill prompts the user to promote it to the persistent blocklist at ~/.claude/web-blocklist.json. The agent never writes to the persistent file without explicit user confirmation (human-in-the-loop by design), because that file is the user's permanent allow/deny ledger and the agent should not edit it on a hunch.

The pivot is mandatory. After an abort, the playbook tries archive.org, archive.today, an institutional alternative (a peer-reviewed citation, an official statement page, a wire service version), or a different originating source. What it never does is retry the aborted URL through a different tool, because all that proves is that the bypass tool also got owned.

Even Clean sources get a summary. A <safe_research_summary> with Verdict Clean and Action Continued shows up next to every citation the agent uses, so the audit trail is symmetric. You always know which sources were flagged and which sailed through.

Pro Tip: Have a pivot URL ready before the abort happens

The cleanest research workflows decide on a backup citation lane before the first fetch fires. Two minutes of "if Reuters is unreachable, AP is the fallback; if AP is also flagged, archive.org of either" beats two minutes of frantic improvisation after a Critical signal already blew up your top-of-funnel source. archive.org is your friend. So is the cached search-engine version of the page, when the original is cloaked.

How it composes with /truthseeker

/safe-web-research and /truthseeker are two skills with explicit, non-overlapping ownership. /safe-web-research owns fetch hygiene, sanitisation, and abort decisions. /truthseeker owns corroboration depth, lateral reading, ACH (Analysis of Competing Hypotheses), and source authentication.

When both are invoked together (which is the default for any factual claim worth verifying), /safe-web-research runs first on every web result. /truthseeker reads the <untrusted_source> wrappers and the <safe_research_summary> blocks before applying SIFT, lateral reading, ACH, or tier-based source weighting. Sources that triggered abort-level signals are discarded by /truthseeker, not weighted as evidence, even if they happen to contain a fact that the verifier was looking for.

The corroboration rule worth memorizing is this: the same article surfaced by multiple search engines is one source, not three. Independent corroboration requires distinct originating organisations, not distinct retrieval paths. Three engines returning the same Reuters URL is one source seen three ways. Three engines returning Reuters plus AP plus BBC reporting independently is three sources.

Conflicting sources are weighted by tier multiplied by independence multiplied by recency multiplied by methodology. Not averaged. Not split-the-difference. A primary court filing and a content-farm summary disagreeing about a court case is not a draw.

What SAIF says we missed, and what we added

Google's Secure AI Framework (SAIF) publishes a taxonomy of fifteen distinct AI security risks. We ran the playbook against that taxonomy and learned three useful things at once. Most of the fifteen are out of scope by design, because training-pipeline, supply-chain, and exfiltration concerns belong to other layers. The risks that are this skill's job (prompt injection, denial of ML service, rogue actions) were already well covered. But the audit surfaced four concrete gaps, and the playbook is now four rules wider.

FR-27: URL-level adversarial input

The first gap was at the front door. A URL itself can be hostile before any bytes are fetched. The string раypal.com with a Cyrillic а is visually indistinguishable from PayPal in a console, and an IDN host without an explicit xn-- opt-in is the canonical homoglyph attack on agents that take URLs from search results. The playbook now refuses to fetch URLs that contain non-ASCII host characters without explicit xn--, visually confusable homoglyphs, zero-width characters anywhere in host or path, embedded credentials (https://user:pass@host/), or more than one @ in the authority section. The gate fires before the request goes out, which is cheaper and safer than catching it after the fact.

FR-28: Meta-content allowlist

The second gap was a real, slightly embarrassing false positive. When we fetched Google's SAIF risks page for this very audit, the hook fired injection_phrase Critical and tried to abort. The page taxonomizes prompt injection definitionally, so it has to enumerate the canonical jailbreak phrasings as examples, and it always trips the signal. Without an exception, the playbook can never cite SAIF itself, or OWASP LLM Top 10, or MITRE ATLAS, or NIST AI RMF. The fix is a narrow allowlist. When the only Critical signal is injection_phrase and the host is on a short list of risk-taxonomy publishers, the verdict is downgraded to Caution and the source is continued. Any other Critical still aborts unconditionally. The allowlist is not a "trust this site" lever. It is a "this site's subject matter trips the signal as a guaranteed false positive, and only that signal" lever.

FR-29: Output discipline

The third gap was on the output side. Sanitising input does not finish the job, because what leaves the model after a fetch matters too. The output-side failure mode has a name, improper output handling, and three rules now govern that boundary. First, the playbook never verbatim-quotes canonical injection phrasings, even from a Clean source, because re-emitting the phrase risks downstream tools, logs, or later context windows treating the response as the next round of input. Second, content from an aborted source has zero downstream gravity. It does not influence tool selection, subsequent URL choices, package or library recommendations, command-line suggestions, or generated code, in this turn or any later one. "I read it but I am not citing it" is not sufficient. Aborted content is fully quarantined. Third, any High_Risk Continued (a user override) requires an explicit caveat naming the signals that fired and the override that was given.

FR-30: SAIF risk mapping table

The fourth gap was documentation. There was no easy answer to "which SAIF risk does signal X cover?" The playbook now ships with an explicit coverage table that maps each SAIF risk to its mechanism inside /safe-web-research, and flags the risks that are intentionally out of scope. Honest coverage beats aspirational coverage. Auditors get a one-table answer; future signals get a column to extend.

SAIF risk	Coverage	Mechanism
Prompt Injection (PIJ)	Core	`injection_phrase` Critical, `<untrusted_source>` wrapper, self-reminder, missing-wrapper abort
Denial of ML Service (DMS)	Strong	`oversized_response`, `repeating_substring_ratio_high`, `url_cardinality_explosion` Critical signals
Rogue Actions (RA)	Partial	Abort plus blocklist plus per-source `safe_research_summary` plus FR-29 output discipline
Insecure Integrated Component (IIC)	Weak	Robots.txt and AI-UA disallow, `sanitiser_version` mismatch abort
Insecure Model Output (IMO)	Weak	Input-side sanitisation, with FR-29 output discipline as the partial backstop
Model Evasion (MEV)	Weak	`zero_width_chars` Elevated plus URL-level confusable check (FR-27)
Data Poisoning, Unauthorized Training Data, Model Source Tampering, Excessive Data Handling, Model Exfiltration, Model Deployment Tampering, Model Reverse Engineering, Sensitive Data Disclosure, Inferred Sensitive Data	Out of scope	Training-pipeline, supply chain, deployment, data governance, and output-side concerns. Other layers (MCP sandboxing, model-side guardrails, infra controls) own them.

Expert Tip: Pages about prompt injection will always trip prompt-injection detectors

The SAIF false positive is the canonical example of a category that any agent doing serious AI security research will hit constantly. Red-team write-ups, OWASP cheat sheets, MITRE ATLAS techniques, and NIST AI RMF playbooks all enumerate canonical jailbreak phrasings by necessity. A narrow allowlist keyed on the host, and bounded to the case where injection_phrase is the only Critical signal firing, is the right primitive. A broader "trust this domain" knob is wrong, because a SAIF page that also fires oversized_response is still aborted. The allowlist is scoped to one specific false positive, not to the domain as a whole.

Robots.txt as a defensive choice, not just an ethical one

The sanitizer hook fetches robots.txt per domain (cached for twenty-four hours) and warns when the requested path is disallowed. The hook does not block on its own; the skill decides what the warning means.

The three cases the playbook recognizes are simple. If the disallow is targeted at general crawlers and the path is open to user-agents (a normal SEO config), the skill proceeds and notes the situation in the summary. If the site explicitly disallows AI or LLM agents (entries like User-agent: GPTBot Disallow: /, User-agent: ClaudeBot, User-agent: anthropic-ai), or if the path is disallowed for the wildcard *, the skill does not fetch. It pivots to an archive or an alternate source instead. If the robots.txt is missing, malformed, served as HTML, or otherwise suspicious, the domain is treated as Caution.

Expert Tip: Respect AI-disallow robots.txt because it filters tarpit operators in

Sites that explicitly disallow AI agents are signaling two things. Ethically: they do not want their content used for training. Defensively: a non-trivial share of operators who go to the trouble of writing those rules also deploy tarpits, prompt-injection payloads, or cloaking specifically targeting agents that ignore the disallow. The intersection of "site that opts out of AI" and "site that has booby-trapped its content against AI" is meaningful. Treating an AI-disallow robots.txt as a hard stop is the right ethical call and the right defensive one at the same time.

A real example: when the playbook saved this trilogy

Researching the AI tarpits article that opens this trilogy, /safe-web-research flagged two of eleven web sources as High_Risk. Both fired cloaking_suspected plus content_type_mismatch Critical signals, and both had robots.txt entries disallowing AI agents on the exact paths in question. Both were aborted. One of them was the original source for a statistic the article needed (a Rutgers and Wharton study on publisher traffic loss after AI blocking), so the playbook required the stat to be independently re-corroborated from at least one Tier 1 or Tier 2 source before it could be used. The number was eventually verified from four independent secondary sources reporting the same study, and the original High_Risk source was never cited. Without the abort, the article would have linked to a cloaked page on a site that publicly does not want AI agents on it. Embarrassing for the author. Worse, traceable in perpetuity.

Watch out: fail-open at the hook, fail-closed at the skill

The hook is designed to fail-open. If the sanitiser process crashes, the original web response still reaches the agent. That is an availability trade-off: you do not want every transient hook bug to block all research. The skill is designed to fail-closed. If the wrapper is missing, the sanitiser_version is wrong, or the hook signaled an integrity error, the skill discards the source. The combination is intentional: the hook can have a bad day without taking research offline, and the skill catches the cases where a hook bad day means the agent is reading unsanitised content.

What a safe AI search path actually looks like

Safe Web Research: AI Traps illustration showing prompt injection, hidden content, cloaked pages, and infinite-looping maze defenses around an AI agent walking a research path — The Safe Research Protocol shields an AI agent's research path from prompt injection, hidden content, cloaked pages, and infinite-looping mazes, and routes to archive.org or institutional sources when a fetch trips an abort rule.

For a typical research task, the planned path now goes: agent issues fetch, hook intercepts, hook wraps the response in <untrusted_source> with computed risk signals, /safe-web-research reads the wrapper, applies abort rules, either continues (with a Clean summary emitted alongside the citation) or aborts (with a High_Risk summary emitted and a pivot fired). On Clean sources, /truthseeker takes over to handle corroboration, lateral reading, and tier weighting. On Aborted sources, the domain hits the session blocklist and the pivot starts. Repeat for every fetch in the session. Every cited source ends the session with a paired summary that the user can audit after the fact.

What changes with the playbook in place is not the speed of research. It is the visibility. You can see, source by source, what the agent decided to trust, what it discarded, and why. Without the playbook, "the AI did the research" is a black box. With the playbook, the audit trail is on the page.

Key Takeaways

Two layers, two jobs. The sanitizer hook is mechanical and runs on every fetch. The /safe-web-research skill is judgment and decides what the signals mean.
Five Critical signals = instant abort. injection_phrase, cloaking_suspected, oversized_response, repeating_substring_ratio_high, url_cardinality_explosion. No one of them gets a second chance.
Five Elevated signals = a 3-of-5 abort. Defense in depth. No single trip-wire is enough; three together is a pattern worth respecting.
A missing wrapper is a Critical signal. Fail-open at the hook is fine because fail-closed at the skill catches the bypass.
Robots.txt that disallows AI is a hard stop. The intersection of "opt out of AI" and "booby-trapped against AI" is meaningful. Respect both for the same reason.
Pivot before retrying. Archive.org and an institutional alternative beat firing a second tool at the same compromised URL.
Every cited source gets a summary. Clean, Caution, or High_Risk, the audit trail is symmetric. Black-box research is the default; visible research is the upgrade.

FAQ

Will the safe-web-research skill make my AI research slower?

Per-fetch overhead is in the tens of milliseconds for the wrapper plus signal computation, which is invisible against the seconds-to-minutes timescales of actual web research. The skill becomes noticeable when it aborts a source, because then the pivot adds another fetch. That trade is the entire value proposition: the slower path that avoids a poisoned source is the faster path overall, because you do not pay later for fixing a citation that was wrong on first contact.

Can I run /safe-web-research without /truthseeker?

Yes. /safe-web-research is the per-fetch hygiene layer and stands alone for any single-source lookup or browsing task. /truthseeker is the corroboration layer for claims that need verification, and it relies on the wrappers and summaries /safe-web-research produces. If you only need to fetch one page safely, /safe-web-research is sufficient. If you need to verify a claim, run both.

What counts as a tarpit signature in practice?

Two signals do most of the work. url_cardinality_explosion catches infinite-link mazes: thirty distinct URLs against one domain in sixty seconds is a fingerprint, not a research pattern. repeating_substring_ratio_high catches Markov-poisoned content: text that reads like English on a skim but compiles as a noise generator. Together they identify tools like Nepenthes, Iocaine, and Cloudflare AI Labyrinth before the agent has wasted serious budget inside them.

How do I know my source was actually aborted vs just flagged?

Read the <safe_research_summary> emitted next to the citation. The Action field is one of Continued, Aborted, or Blocklisted. The Verdict field is one of Clean, Caution, or High_Risk. A Clean verdict with Continued action means the source passed every check. A High_Risk verdict with Aborted action means the source is in the answer's audit trail but its content was not used. A Caution verdict with Continued action means one or two Elevated signals fired, the content was used, but you should glance at the recommendation line.

Does the skill respect robots.txt that disallows AI agents?

Yes, as a hard stop. Sites that explicitly disallow GPTBot, ClaudeBot, anthropic-ai, or the relevant path for User-agent * are not fetched. The skill pivots to an archive or an alternate originating source instead. This is both an ethical and a defensive choice: a non-trivial share of operators who go to the trouble of writing those rules also deploy tarpits or prompt-injection content for agents that ignore the disallow.

Can a website hide its tarpit behavior from the sanitiser?

It can try, and some do. The two Critical tarpit signals (repeating_substring_ratio_high and url_cardinality_explosion) are statistical, not pattern-matched, so a sophisticated operator could in principle tune their generator to fly under the ratio threshold or distribute URLs across a larger window. The pragmatic ceiling is that any tarpit clever enough to evade those signals is also expensive to run, which is the goal. The economics of poisoning AI research only work at scale; pushing operators into bespoke evasion shifts the cost back to them.

What happens if the hook crashes or times out?

The sanitiser is designed to fail-open: if the hook process dies, the unsanitized response still reaches the agent so research is not blocked by transient hook bugs. The skill is designed to fail-closed: if the wrapper is missing, or the sanitizer_version is a major mismatch, or the hook signaled an integrity error, the source is treated as adversarial and discarded. The two failure modes compose: availability stays high, safety stays high.

Closing thought

Research used to be slow and traceable. AI made it fast and opaque. /safe-web-research is the attempt to make it fast and traceable: same speed gains the agent always gave you, plus an audit trail that says exactly which sources the agent trusted, which ones it discarded, and why. The next time an AI agent confidently cites something, the right question is no longer "is this true?" It is "what does the safe_research_summary say about the source?" Ask the audit trail before you ask the model. That habit alone is the simplest hedge against overreliance .

Sources: Install Guide on PCDrama, AI Tarpits Explained on PCDrama, AI Web Research Protocol on PCDrama, OWASP Top 10 for LLM Applications, Cloudflare: Trapping misbehaving bots in an AI Labyrinth, Internet Archive.