POSTS

Insights and ideas from the world of technology.

Anthropic Exposes Chinese AI Labs’ Data Extraction from Claude Amid US Chip Controls Debate

chinese

Anthropic published a significant investigative report on February 23, 2026, detailing systematic exploitation of its Claude AI model by three Chinese companies. The firm identified DeepSeek, Moonshot AI, and MiniMax as key actors in this unauthorized data harvesting effort. This revelation arrives at a critical juncture, as US policymakers intensify discussions on restricting exports of advanced AI chips to curb technological proliferation.

 

Such activities underscore tensions in the global AI landscape, where rapid innovation often clashes with national security priorities. In response to these incursions, Anthropic’s analysis reveals how the labs bypassed access restrictions to gather training data, potentially speeding up their model development. These findings highlight vulnerabilities in open AI systems and the urgent need for stronger defenses.

 

The Mechanics of Model Distillation Attacks

Anthropic documented a technique known as model distillation, where operators query a sophisticated model like Claude with millions of targeted prompts to replicate its reasoning patterns. This method involves the systematic extraction of architectural nuances and proprietary logic to accelerate the development of competitive models. While distillation serves legitimate research purposes internally, its application across rival systems through mass querying violates Anthropic’s terms of service.

 

The sheer magnitude of the extraction campaign represents a paradigm shift in industrial espionage. The implicated firms created approximately 24,000 synthetic user accounts, generating over 16 million interactions with Claude. These accounts utilized proxy networks to mask origins, evading Anthropic’s geoblocking measures implemented for export compliance and safety reasons. Queries concentrated on Claude’s advanced capabilities, including chain-of-thought reasoning for complex problem-solving, tool integration for multi-step tasks, and code generation for software development.

 

DeepSeek ran over 150,000 sessions focused on chain-of-thought logic and censorship evasion techniques, while Moonshot AI generated 3.4 million exchanges centered on agentic behaviors and vision tasks. MiniMax issued 13 million prompts prioritizing technical coding utilities. Anthropic detected these efforts through anomalous patterns like repeated prompt sequences from clustered accounts, rapid adaptation to new Claude versions within hours, and IP footprints linking to known lab infrastructures—clear indicators of systematic exploitation.

 

Beyond the technical extraction, the labs aggressively tested safety boundaries to identify bypasses for dual-use restrictions, such as chemical, biological, radiological, and nuclear (CBRN) risks or cyber offensive tools.

 

Implications for AI Safety and Security

Distilled models inherit diminished safeguards, amplifying hazards when deployed in uncontrolled environments. Anthropic invests heavily in alignment techniques to prevent misuse, yet copied versions strip these layers, enabling the proliferation of unmitigated systems for surveillance, disinformation, or adversarial applications. Open-sourcing such derivatives exacerbates the issue, democratizing high-risk capabilities to non-state actors.

 

In the geopolitical context, this activity bolsters China’s AI ambitions, potentially enhancing military applications from autonomous drones to mass surveillance networks. It circumvents US export controls on high-performance chips, allowing labs to simulate frontier-level progress via intellectual property extraction rather than hardware-intensive training. Historical precedents, including 2025 incidents where Chinese actors jailbroke Claude for 80-90% of the attack workflow in traced cyberattacks on financial and tech sectors, illustrate the pattern’s persistence.

 

Intersection with US AI Chip Export Policies

Ongoing US deliberations on AI semiconductor restrictions align closely with these events. Anthropic CEO Dario Amodei advocated for stringent measures in a recent Wall Street Journal op-ed, echoing the firm’s 2025 recommendations to refine Biden-era Tier 2 quotas for countries like Mexico and Israel, enhance government-monitored transshipments, and allocate resources for verification. These controls classify chips into performance tiers, imposing near-total bans on China while capping allied purchases.

 

Nvidia previously contested Anthropic’s narratives on smuggling as exaggerated, yet this empirical evidence of distillation lends urgency to the debate. By demonstrating scalable alternatives to direct compute access, the report strengthens calls for integrated policy responses—merging export limits with AI ecosystem protections.

 

Anthropic’s Technical Countermeasures and Policy Ramifications

To counter these threats, Anthropic swiftly deployed advanced detection systems. Novel algorithms now flag distillation signatures, such as iterative self-grading loops or coordinated prompt floods from proxy clusters. Enhanced scrutiny applies to institutional accounts from educational or startup domains, cross-referencing behavioral heuristics against known attack profiles. Model-level interventions embed subtle perturbations that degrade utility for adversarial actors while preserving performance for legitimate users. These defenses, while effective, carry economic costs—estimated at millions in USD annually for monitoring infrastructure, behavioral analytics, and iterative model hardening.

 

Industry-wide collaboration forms a key pillar of response. Anthropic urges shared threat intelligence via blocklists, standardized watermarking for outputs, and collective advocacy for regulatory harmonization. Law enforcement agencies and regulatory bodies have received detailed dossiers, potentially precipitating investigations or sanctions. Affected labs have yet to respond publicly, though MiniMax’s timely model release invites scrutiny.

 

These events are set to fundamentally alter the trajectory of global AI governance, as policymakers balance innovation incentives against theft risks. Such incidents erode incentives for substantial capital investments in alignment and safety research, driving up costs passed to consumers and stifling tool advancements. For enterprises relying on Claude, fortified access protocols like multi-factor authentication tiers, usage caps, and provenance tracking will become standard. Anthropic’s transparency fosters trust, positioning it as a leader in responsible scaling amid great-power competition.

 

By Kavishan Virojh