Demo

Researchers demonstrate how malicious actors can exploit metadata in AI-powered review systems, exposing weaknesses in trust-based automation and raising concerns over security in AI-assisted code development.

Anthropic’s Claude Code has been shown to approve malicious changes when attackers spoof a trusted maintainer’s Git identity, underlining how easily automated review systems can be misled when they treat metadata as proof of trust.

In a demonstration described by Manifold Security, a fake author name and email set in Git were enough to make a commit look as though it came from a respected contributor. The code was then passed through an AI review flow that accepted it, even though the apparent authorship was fabricated. The firm argued that the weakness is not in Git itself, but in the assumption that commit metadata says anything reliable about who actually wrote the code.

That distinction matters because trust-based automation is already common in open-source workflows. Manifold said the logic is understandable: maintainers are overwhelmed, so systems that fast-track well-known contributors can save time. But the same approach becomes risky when identity checks are reduced to org membership, contribution history or a maintainer list, none of which proves authorship. The company compared the issue with recent supply-chain compromises in which malicious code was treated as legitimate long enough to do damage.

The concern also lands against a wider backdrop of security problems in Anthropic’s code tooling. GitLab has flagged CVE-2025-59041, in which malicious Git email settings could lead to arbitrary code execution before a workspace-trust prompt appears, while SentinelOne has documented later flaws that could bypass trust dialogs or leak information from attacker-controlled repositories. Separately, The Atlantic reported this week that Anthropic is simultaneously promoting a far more powerful cybersecurity model, Claude Mythos Preview, which the company says is capable of autonomous exploitation work but is being kept from public release because of the risks.

Taken together, the episodes point to the same lesson: identity cues and repository settings should not be treated as security controls. Manifold’s conclusion was blunt: if the only thing standing between a bad change and a merge is the model’s impression of who sent it, the system is too trusting for its own good.

Source Reference Map

Inspired by headline at: [1]

Sources by paragraph:

Source: Noah Wire Services

Noah Fact Check Pro

The draft above was created using the information available at the time the story first
emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed
below. The results are intended to help you assess the credibility of the piece and highlight any areas that may
warrant further investigation.

Freshness check

Score:
10

Notes:
The article was published on April 16, 2026, and presents new findings from Manifold Security regarding Anthropic’s Claude Code. No evidence of prior publication or recycled content was found. The information appears original and timely.

Quotes check

Score:
10

Notes:
The article includes direct quotes from Manifold Security’s blog post. These quotes are consistent with the original source and have not been found in earlier publications. No discrepancies or unverifiable quotes were identified.

Source reliability

Score:
8

Notes:
The primary source, Manifold Security, is a security-focused company with a public presence. The Register is a reputable technology news outlet. However, The Register is not a major news organisation like the BBC or Reuters, which slightly lowers the reliability score.

Plausibility check

Score:
9

Notes:
The claim that Anthropic’s Claude Code can be tricked into approving malicious code by spoofing a trusted developer’s identity is plausible. This aligns with known issues in automated code review systems and the importance of verifying commit metadata. No contradictory information was found.

Overall assessment

Verdict (FAIL, OPEN, PASS): PASS

Confidence (LOW, MEDIUM, HIGH): HIGH

Summary:
The article presents original, timely information with consistent and verifiable quotes. The sources are reliable, and the claims are plausible and supported by existing knowledge. No significant concerns were identified.

[elementor-template id="4515"]
Share.