Publishers Sue OpenAI and Microsoft Over Alleged Scraping of Nearly 400 Local Newspapers
Several dozen U.S. publishers sue OpenAI and Microsoft over alleged scraping of paywalled and free articles to train AI, claiming local news was taken without payment. The publishers — who operate almost 400 local newspapers — say the defendants copied and ingested their content to build products such as ChatGPT and Microsoft Copilot. Plaintiffs assert the companies generated hundreds of billions in market value from that material while paying publishers nothing.
Dozens of U.S. Publishers Bring Consolidated Lawsuit
The complaint, filed in U.S. court, groups several dozen regional publishers that together run nearly 400 local newspapers. Plaintiffs allege the defendants systematically accessed publisher websites, including content behind paywalls, and copied articles for use in large language model training. The suit asserts the removal of bylines and copyright information to obscure original authorship and ownership.
Allegations of Secretive Scraping and Attribution Removal
According to the filing, the companies did more than automated indexing: they allegedly stripped author names, metadata and other markers that would link content back to its creators. The publishers argue this practice allowed AI developers to “ingest” editorial work without permission and to incorporate it into models that the companies now commercialize. The complaint also warns that the models may reproduce passages verbatim or nearly verbatim when prompted by users.
Publishers Warn of Dire Consequences for Local News
The plaintiffs frame the litigation as an existential battle for community journalism, saying unpaid use of local reporting undermines outlets that already operate on thin margins. They argue that if technology firms are permitted to exploit editorial content without compensation, the economic model that supports local reporting will be fatally weakened. Attorneys for the publishers describe the alleged conduct as a potential “death blow” to newsrooms that remain primary sources of local information.
OpenAI and Microsoft Respond with Fair Use Defense
A spokesperson for OpenAI told reporters the company trains models on publicly available data and relies on the legal doctrine of fair use. Microsoft has previously defended its investments in generative tools such as Copilot and framed its use of third-party content as legally permissible. The complaint references public remarks by OpenAI’s chief acknowledging that modern AI models often rely on copyrighted materials for training, a point that proponents of AI say reflects industry reality.
Legal Context: Mixed Court Rulings and Growing Caseload
The new case joins more than 120 pending copyright suits against AI developers in U.S. courts, many brought by media companies and authors. Courts have been divided: a 2025 Delaware decision held that fair use may not protect AI training when it serves clear commercial aims, while other rulings have reached different conclusions. That inconsistent jurisprudence has left publishers and technology firms awaiting clearer standards from appellate courts and, possibly, Congress.
Implications for AI Development, Licensing and Regulation
Lawyers and industry observers say the suit could accelerate licensing negotiations between publishers and AI companies or push regulators to impose stricter rules on dataset sourcing. A legal victory for the plaintiffs could require technology firms to obtain permissions or pay licensing fees for news content used in training models. Conversely, a defense win could affirm broader leeway for developers and complicate publishers’ efforts to monetize their journalism.
The case is likely to draw attention to how large language models are built and to the commercial relationships between data owners and AI platforms. Publishers say they seek accountability and compensation, while developers argue that accessible web content and fair use are essential to innovation. How judges reconcile those positions will shape the future economics of both journalism and AI.