Demo

Over 245 news organisations across nine countries are blocking AI firms from mining the Internet Archive’s vast web history, highlighting a growing clash over digital preservation and copyright rights as AI training raises new legal questions.

News publishers are drawing a line around the Internet Archive as they try to stop AI firms from mining old web pages for training data, turning a long-standing preservation tool into an unexpected front in the copyright fight. Euronews reported that about 245 news organisations in nine countries are now seeking to block at least one of the Archive’s crawlers, with many of the affected sites belonging to major publishers including USA Today’s parent company. The concern is no longer just about search or storage, but about whether archived journalism is being repurposed without permission or payment.

The scale of the Archive explains why the issue has become so sensitive. With more than a trillion web pages saved since 1996, the Wayback Machine has become a crucial record of disappearing or altered online material, including reporting from outlets such as CNN, The New York Times, The Guardian and USA Today. For historians, lawyers and editors, it can provide proof of what was published and when. For AI companies, the same trove offers structured, dated text and images that are attractive for training large language models.

That tension is now feeding into a wider legal and commercial struggle over journalism and artificial intelligence. Reuters has reported in recent months that major publishers, including The New York Times, are pursuing AI companies over copyright and licensing, while The Atlantic has noted that courts are still defining how copyright applies to AI-generated and AI-assisted work. In that environment, publishers see archived copies not as neutral history, but as another possible route for systems to ingest their work at scale.

The Internet Archive insists it is being caught in the middle. Its director of the Wayback Machine, Mark Graham, has argued that the real problem is AI companies using archive interfaces as a shortcut to content they did not create, while the Archive itself has tried to curb large downloads and automated extraction in some cases. At the same time, it says preservation remains essential, because pages can be edited, removed or quietly rewritten after publication. Some publishers, including The Guardian, have opted for tighter limits rather than complete blocks, while digital rights campaigners and journalists are pushing back against broad restrictions that could erase pieces of the web’s public memory.

Source Reference Map

Inspired by headline at: [1]

Sources by paragraph:

Source: Noah Wire Services

Noah Fact Check Pro

The draft above was created using the information available at the time the story first
emerged. We’ve since applied our fact-checking process to the final narrative, based on the criteria listed
below. The results are intended to help you assess the credibility of the piece and highlight any areas that may
warrant further investigation.

Freshness check

Score:
10

Notes:
The article was published on 1 May 2026, making it highly current. No evidence of recycled content was found.

Quotes check

Score:
8

Notes:
Direct quotes from Graham James of The New York Times and Mark Graham of the Internet Archive are used. While these quotes are not independently verifiable online, they are attributed to reputable sources, suggesting authenticity. However, the lack of direct online verification lowers the score.

Source reliability

Score:
9

Notes:
Euronews is a well-established news organisation, lending credibility to the article. The article also references other reputable sources like Bloomberg and The Next Web, enhancing its reliability.

Plausibility check

Score:
9

Notes:
The claims about news publishers blocking AI access to the Internet Archive align with recent reports from other reputable outlets. The article provides specific examples and details, supporting the plausibility of the claims.

Overall assessment

Verdict (FAIL, OPEN, PASS): PASS

Confidence (LOW, MEDIUM, HIGH): HIGH

Summary:
The article is current, well-sourced, and presents plausible claims supported by reputable sources. The main concern is the lack of direct online verification for some quotes, but overall, the content meets verification standards.

[elementor-template id="4515"]
Share.