Anthropic Has a CLUE and It’s Running Their Entire SOC
Security analysts are drowning in alerts. The average SOC team receives nearly 3,000 daily security alerts, and according to the SANS 2025 SOC Survey, 40% of those alerts are never investigated at all. Of the ones that are, 90% turn out to be false positives. The humans trying to protect our infrastructure are spending most of their shift chasing phantoms.
Anthropic, the company building Claude, faced the same problem at scale. Their solution was CLUE (Claude Looks Up Evidence), an internal detection and response platform that wires agentic Claude directly into their security operations. Built by Technical Lead Jackie Bow and her Detection Platform Engineering team, CLUE went from whiteboard to proof of concept in a single day and reached full production deployment within a week. The results were striking: false positives dropped from 33% to 7%, and analysts reclaimed an estimated 1,870 hours in just 30 days.
This is the inside story of how Anthropic dogfoods its own AI for security, what the architecture looks like under the hood, and what security teams everywhere can learn from the experiment.
TL;DR
- Anthropic built CLUE (Claude Looks Up Evidence) as an internal AI-powered threat detection and response platform
- False positive rate dropped from 33% to 7%, a 79% reduction
- The team saved an estimated 1,870 analyst-hours (234 person-days) in 30 days
- PoC was complete in one day; full production deployment shipped in one week
- CLUE uses agentic Claude with tool use to query Slack, code repos, data warehouses, and internal docs in natural language
- CLUE is separate from Anthropic's commercial Claude Security product, which scans customer codebases for vulnerabilities
The Quiet Crisis Inside Every Security Operations Center
Before understanding what CLUE does, it helps to understand what it replaces. Modern SOC analysts live inside a context-switching nightmare. An alert fires. They pivot to one tool to enrich it with host data, switch to another to pull Slack history for context, crack open a third to run a SQL query against the data warehouse, and consult a fourth to check internal documentation. Each pivot costs time and cognitive bandwidth. By the time they have assembled enough context to make a decision, three more alerts have landed in the queue.
The industry has a name for what happens next: alert fatigue. Research shows between 40% and 70% of security alerts are false positives. A 2024 survey found that 62% of alerts are ignored entirely. Accuracy drops by 40% after extended shifts as analysts lose the mental sharpness needed to spot subtle signals in a sea of noise. The result is not just inefficiency; it is genuine risk. The alerts that matter most are the ones most likely to get buried — and anomaly detection only helps when someone is watching for it.
Jackie Bow, who built threat detection systems at Facebook, Patreon, and for the US government before joining Anthropic, described the core constraint directly: "There's only so many alerts a human can look at in a day before they start to drop off" in investigative detail. The problem is not effort. It is throughput.
What Is CLUE? Anthropic's Internal AI Detection Platform
Definition: CLUE (Claude Looks Up Evidence)
CLUE is Anthropic's internal security detection and response platform. It uses Claude with tool use capabilities to connect to internal systems including Slack, code repositories, internal documentation, and data warehouses. Security analysts investigate alerts and run queries in plain English rather than switching between specialized tools and query languages. CLUE is an internal platform, distinct from Anthropic's commercial Claude Security product, which is available to Enterprise customers.
CLUE has two main subsystems: CLUE Triage and CLUE Investigate. Together they handle the full alert lifecycle, from first signal to final disposition, without requiring analysts to leave a unified natural language interface.
The architecture reflects a deliberate design choice: use Claude's tool use capabilities to reach into existing internal systems rather than replacing those systems. The data warehouse stays. The Slack history stays. The code repos stay. What changes is how analysts interact with all of it at once.
How CLUE Triage Works: From Alert to Disposition in Minutes
When a security alert fires, CLUE Triage kicks in first. It automatically enriches the alert by pulling contextual data from multiple internal sources simultaneously: Slack conversations that might explain unusual access patterns, internal documentation covering known benign behaviors, code repository history, and data warehouse records. Armed with that context — real-time retrieval-augmented generation applied to security operations — CLUE runs through an automated playbook, assigning a disposition from four categories: false positive, true positive, malicious, or expected behavior.
Each disposition comes with a confidence score — the transparency layer that separates intelligent triage from noisy automation. Rather than presenting a binary verdict, CLUE tells analysts how certain it is, which means human review effort scales naturally to signal quality. High-confidence false positives get dismissed quickly. Ambiguous signals with moderate confidence scores get closer human scrutiny. Definite positives with high confidence scores escalate immediately.
The 5-10x speed improvement over manual triage comes directly from this enrichment step. In a traditional workflow, an analyst enriches alerts by hand, one tool at a time. CLUE runs those enrichment steps in parallel, synthesizes the results, and presents a reasoned disposition before a human has opened their first browser tab.
Expert Tip: Confidence Scores Beat Binary Verdicts
When designing AI-assisted triage systems, resist the temptation to output only true or false verdicts. Confidence scores let analysts allocate their attention where it matters most. High-confidence dismissals need no review; moderate-confidence flags need a human in the loop. This graduated approach captures most of the efficiency gains without removing humans from consequential decisions.
How CLUE Investigate Works: Agentic Orchestration at Query Scale
CLUE Triage handles the first pass. CLUE Investigate handles the deep dive. When an alert warrants a full investigation, CLUE Investigate allows analysts to describe what they want to know in plain English, then executes the necessary SQL queries automatically. No query language fluency required. No manual join construction. Just a question and an answer — prompt engineering as the only interface skill required.
What makes this particularly powerful is the agentic orchestration layer. Rather than running queries sequentially, CLUE Investigate spins up sub-agents that execute queries in parallel — a security orchestration, automation, and response architecture without the six-figure platform cost. Multiple data sources get interrogated simultaneously. Results are synthesized into a coherent investigation summary within 3 to 4 minutes, covering territory that would take an experienced analyst an hour or more to cover manually.
The numbers from a 30-day production window tell the story: 12,000 queries executed and 27,000 tool calls made, averaging 25 tool calls and 11 queries per investigation session. That volume, handled manually, would require an army of analysts working around the clock.
"Claude is much better at writing precise queries than humans are." — Jackie Bow, Technical Lead, Anthropic Detection Platform Engineering
This quote lands differently when you consider that Bow's team includes experienced security engineers who are not novices at query construction. The claim is not that Claude outperforms beginners. It is that Claude outperforms professionals at the mechanical precision that database queries demand: exact syntax, no typos, correct table references, optimal join logic. Human analysts bring judgment. Claude brings precision. CLUE combines both.
The Numbers: What Cutting AI Threat Detection False Positives from 33% to 7% Actually Means
Abstract percentages hide concrete impacts. A false positive rate of 33% means roughly one in three alerts an analyst investigates turns out to be nothing. At scale, that represents a staggering tax on attention. Every phantom investigation is time not spent on real threats, not spent on improving detection logic, and not spent on the cognitive recovery that sustained focus requires.
CLUE brought that rate to 7%, a 79% reduction in wasted investigation cycles. For a team processing hundreds of alerts per week, the reclaimed time compounds quickly. Over 30 days, Anthropic's estimate came to 1,870 hours saved, equivalent to 234 person-days, roughly what a full-time analyst working without interruption for nine months would produce.
| Metric | Before CLUE | After CLUE | Change |
|---|---|---|---|
| False positive rate | 33% | 7% | 79% reduction |
| Triage speed vs. manual | Baseline | 5-10x faster | Major improvement |
| Analyst hours saved (30 days) | 0 | 1,870 hours | 234 person-days |
| Queries per investigation session | Manual, one at a time | ~11 (parallel) | Dramatically higher coverage |
| Tool calls per session | Manual context switching | ~25 automated | Unified interface |
The deeper benefit is harder to quantify. Analysts who are not buried in false positives make better decisions on real alerts. They retain more context over the course of a shift. They catch subtle patterns that exhausted attention misses — the behavioral signals that user and entity behavior analytics was purpose-built to surface. Alert fatigue is not just an efficiency problem; it is a detection quality problem. CLUE addresses both at once.
One Day to PoC, One Week to Production: What the Build Speed Signals
The performance numbers matter. But for security engineers evaluating what is possible today, the build timeline may matter more.
Bow's team had a working proof of concept within a single day. Full production deployment shipped within a week. This was not a multi-quarter platform project with a dedicated team of twenty engineers. It was a small team wiring Claude's tool use capabilities into existing internal systems via a natural language interface. The infrastructure was already there. Claude provided the connective tissue.
"We can't scale to meet Anthropic's needs without augmenting with something like Claude," Bow noted. The key word is augmenting. CLUE did not replace the data warehouse. It did not replace Slack. It did not replace the code repositories or the internal documentation. It replaced the cognitive overhead of context-switching between them.
Pro Tip: Start With Tool Use, Not Fine-Tuning
Teams evaluating AI for SOC automation often assume they need custom-trained models to see results. Anthropic's CLUE shows that sanctioned AI built on tool use with a capable base model can deliver production-quality triage in days, not months. Connect Claude to your existing data sources first. Fine-tune later if you hit genuine capability gaps that tool use alone cannot close.
CLUE vs. the Commercial Claude Security Product
CLUE is an internal Anthropic tool. It is not available to purchase. This distinction matters because Anthropic also has a separate external product called Claude Security (formerly Claude Code Security), now in public beta for Enterprise customers.
Claude Security runs static application security testing against customer codebases, reading code the way a security researcher would, understanding how components interact, tracing data flows, catching logic flaws that rule-based signature-matching scanners miss. Using Claude Opus 4.6, Anthropic's team found over 500 vulnerabilities in production open-source codebases that had gone undetected for years despite expert review.
CLUE and Claude Security address different problems. CLUE is about detection and response operations: handling the alert stream in real time, enriching and triaging signals, and enabling rapid investigation. Claude Security is about vulnerability discovery: scanning codebases before deployment to find exploitable weaknesses. Both represent Anthropic applying Claude to security, but at entirely different layers of the stack.
What Security Teams Can Take Away From Anthropic's Agentic AI Security Approach
CLUE is an internal system that no outside team can directly replicate. But the architecture principles behind it are not proprietary. They are patterns any security team with API access to a capable LLM and existing internal tooling can explore today.
The core insight is that most SOC inefficiency comes from fragmentation across SIEM-adjacent tooling: multiple query languages, multiple mental contexts to maintain simultaneously. An LLM with tool use does not require building new infrastructure. It requires wiring existing infrastructure into a unified interface. The data stays where it is. The model becomes the interface layer.
Anthropic's investment in this area extends beyond CLUE. Project Glasswing and the Claude Mythos research program at red.anthropic.com represent frontier-level work in AI threat modeling — mapping both offensive and defensive security capabilities at the frontier. The same team presenting CLUE's results at BSidesSF 2025 is the team pushing on what AI-enabled threat research looks like at the edge.
Expert Tip: Measure Both Speed and Quality
Teams evaluating AI-assisted triage often measure speed and stop there. Anthropic's results are compelling precisely because they tracked false positive rate alongside throughput. Fast wrong answers create different problems than slow wrong answers. Track both dimensions from day one of any AI triage pilot.
Frequently Asked Questions About AI Threat Detection Automation
What does CLUE stand for in Anthropic's security platform?
CLUE stands for Claude Looks Up Evidence. It is Anthropic's internal security detection and response platform, built by Jackie Bow's Detection Platform Engineering team. CLUE uses Claude with tool use capabilities to connect to internal systems including Slack, code repositories, internal documentation, and data warehouses, allowing analysts to investigate security alerts using plain English rather than switching between specialized tools and query languages.
How much did CLUE reduce false positives at Anthropic?
CLUE reduced Anthropic's security alert false positive rate from 33% to 7%, a reduction of approximately 79%. Over a 30-day measurement window, this translated to an estimated 1,870 analyst-hours saved, equivalent to 234 person-days. The system also delivered a 5 to 10 times speed improvement over manual triage processes.
Is CLUE the same as Claude Security?
No. CLUE is an internal Anthropic tool used by their own security operations team. It focuses on alert triage and investigation in real time. Claude Security, formerly known as Claude Code Security, is a separate commercial product available to Anthropic Enterprise customers that scans customer codebases to find and patch software vulnerabilities. The two products address different layers of security: CLUE handles operational detection and response, while Claude Security handles static vulnerability discovery in code.
How long did it take Anthropic to build CLUE?
Bow's team had a working proof of concept in a single day. Full production deployment was complete within one week. The fast timeline was possible because CLUE used Claude's tool use capabilities to connect to existing internal systems rather than building new data infrastructure from scratch. The key engineering work was connecting Claude to existing tools, not building the underlying data sources.
Can other companies build something similar to CLUE for AI threat detection?
The architecture patterns behind CLUE are not proprietary. Any security team with access to a capable LLM and existing internal tooling can explore similar approaches. The core pattern is using an LLM with tool use to create a unified natural language interface over fragmented existing systems. This does not require replacing existing infrastructure; it requires connecting that infrastructure to a model that can reason across it simultaneously. Claude's tool use API, available to any developer, is the same mechanism Anthropic's team used internally.
Key Takeaways
- CLUE is real and it works: Anthropic's internal detection platform cut false positives by 79% and saved 1,870 analyst-hours in 30 days using agentic Claude with tool use
- Speed of build matters: A PoC in one day and production deployment in one week demonstrates that LLM-powered SOC tooling is a build-now opportunity, not a future-roadmap item
- The problem is fragmentation, not data: CLUE succeeded by unifying access to existing systems, not by building new ones
- Confidence scores enable graduated review: Assigning disposition confidence lets analysts focus human judgment where it matters, rather than reviewing everything uniformly
- Parallel sub-agents multiply investigation coverage: CLUE Investigate's parallel query execution turns a 3 to 4 minute synthesis into something that would take an analyst hours to cover manually
- CLUE and Claude Security are different products: CLUE is internal; Claude Security is the commercial vulnerability scanning product now in public beta for Enterprise customers
Conclusion: Anthropic Eats Its Own Dog Food, and the Results Speak for Themselves
The most compelling argument for any technology is that its creators trust it enough to run their own operations on it. Anthropic's security team did not buy a vendor's AI-powered SOC solution. They built their own, using the same Claude tool use API that any developer can access today. The results, 79% fewer false positives, 1,870 hours reclaimed, 12,000 queries executed without a single analyst having to write SQL, are not a marketing deck projection. They are internal production metrics from a team whose job is to detect real threats to Anthropic's own infrastructure.
Jackie Bow put it simply: "I can finally build the tools I always wished I had." That line deserves a moment. This is a security engineer with experience at Facebook, Patreon, and the US government, someone who has spent a career wishing for better tools, finally feeling like she has them. That is a meaningful signal about what AI threat detection automation looks like when it actually works.
For security teams evaluating AI-assisted operations, the lesson from CLUE is not "wait for a commercial product to arrive." It is "connect what you already have to a model that can reason across it." The infrastructure is almost certainly already there. The interface layer is the part that changed.
Curious how AI is reshaping the broader cybersecurity landscape? Explore our cybersecurity coverage for more on how organizations are defending against an increasingly AI-accelerated threat environment.
Sources
- Anthropic: How Anthropic's cybersecurity team built a threat detection platform with Claude Code
- BSidesSF 2025: AI's Bitter Lesson for SOCs: Let Machines Be Machines
- SANS Institute: SANS 2025 SOC Survey
- Vectra AI: 2024 State of Threat Detection and Response Research Report announcement
- Anthropic: Making frontier cybersecurity capabilities available to defenders