AI 저작권 소송, 2025년을 지나 ‘라이선스 질서’로 이동했다

The copyright war surrounding generative AI entered a new phase in 2025. Until 2024, the central question was relatively simple: is it fair use for AI companies to train models on books, articles, music, images, videos, and other works collected from the internet? Can companies legally copy vast numbers of copyrighted works and use them as training data without permission from rights holders?

But the developments of 2025 pushed that question further. The issue is no longer limited to whether AI training qualifies as fair use. It has expanded into whether companies used pirated datasets, whether AI systems generate outputs that substitute for original works, whether retrieval-augmented generation, or RAG, copies, summarizes, or replaces original content, whether technical protection measures in music and video were bypassed, and how AI companies and copyright owners will build a licensing market.

According to the Copyright Alliance’s review of 2025 AI copyright litigation, copyright owners have filed more than 70 infringement lawsuits against AI companies. That is more than double the roughly 30 cases pending at the end of 2024. The number shows that generative AI is no longer merely a technical experiment. It has become an industrial-order conflict touching the entire market for creative works.

The 2025 Turning Point: Bartz v. Anthropic

The most important AI copyright case of 2025 was undoubtedly Bartz v. Anthropic. On June 23, the U.S. District Court for the Northern District of California ruled at the summary judgment stage that Anthropic’s use of works to train large language models was “highly transformative” and qualified as fair use. At first glance, the ruling appeared favorable to AI companies.

But the case did not end there. Although Anthropic obtained a partial fair use ruling on model training itself, it faced a separate problem over how it had obtained training data. According to the Copyright Alliance, Anthropic reached a $1.5 billion settlement with the plaintiffs in September 2025 over 482,460 books allegedly downloaded from pirated libraries such as Library Genesis and Pirate Library Mirror.

That settlement became a symbolic turning point in AI copyright litigation. It sent a clear message: even if a court recognizes the transformative nature of AI training, companies may still face enormous liability if the data used for that training came from pirated sources. AI companies can no longer rely solely on the argument that “training is fair use.” They must also answer the question of what data they used, where it came from, and how it was obtained.

The $1.5 billion figure also matters. It strengthened the argument from copyright owners that AI companies have the economic capacity to compensate rights holders. The Copyright Alliance described the settlement as evidence that AI companies can continue to innovate and compete while also paying copyright owners. That interpretation reflects the perspective of rights-holder groups, but the market signal was unmistakable. AI training data can no longer be treated as free raw material.

Kadrey v. Meta: A Narrow Fair Use Victory

Two days after the Bartz ruling, on June 25, the same federal court issued a summary judgment order in Kadrey v. Meta. In that case as well, the court found Meta’s LLM training to be “highly transformative” and held that it qualified as fair use.

Yet this ruling should not be seen as a complete victory for AI companies. The court made clear that the decision was narrow and based on deficiencies in the plaintiffs’ evidence. One especially important part of the opinion was the court’s extended discussion of generative AI’s possible indirect substitution effects on actual and potential markets for copyrighted works.

That discussion points to the strategy copyright owners may need in future litigation. It may not be enough simply to argue that a work was used in training. Plaintiffs will need to show concretely how an AI model substitutes for the market for original works, how copyright owners are harmed in the market for AI training data licenses, and how AI outputs may erode demand for creative works.

The Kadrey case also left another issue unresolved: whether Meta, while downloading books through BitTorrent, also uploaded or “seeded” copyrighted works. If large-scale distribution of pirated works is established, the case could lead to major damages or settlement exposure similar to the Anthropic case. Kadrey therefore remains a case marked by both a narrow fair use win and a potentially serious piracy-related risk.

Music AI Litigation Begins Moving Toward Settlement and Licensing

Another major trend in 2025 was settlement and licensing in the AI music sector. On October 29, Universal Music Group announced that it had settled its copyright infringement lawsuit with AI music generation company Udio. The settlement included not only compensation but also a licensing agreement covering UMG’s recording and publishing catalogs. The two sides agreed to work toward launching a subscription-based generative AI music service in 2026 trained on licensed music.

The important point is that the licensing structure was described as artist opt-in. Rights holders and creators would be able to decide whether their works may be used for AI training and generative services. This is the opposite of the opt-out model often preferred by AI companies, under which works are used unless rights holders explicitly object. From the rights-holder perspective, opt-in is the structure that best protects creator control.

Warner Music Group also reached a similar settlement with Udio. Warner later settled with Suno as well. Suno announced plans to launch a more advanced license-based model in 2026 and to phase out its older model. According to the announcement, artists and songwriters would be able to control how their names, images, likenesses, voices, and compositions are used in AI-generated music.

This trend marks an important shift in the music industry. Rather than fighting AI music companies to the end in court, major labels and AI startups are beginning to search for a path toward licensed services. Copyright litigation is no longer only about damages. It is also becoming a bargaining tool for shaping a new AI content market.

Still, not all cases have settled. Sony had not reached a settlement in the Udio case, and lawsuits by independent musicians continue. In 2025, independent musicians including Tony Justice sued Suno and Udio over unauthorized use of training data and alleged copying of original songs. Another group of musicians filed a similar suit against Udio, Suno, and Chinese company Kunlun Tech. These cases are especially notable because they include allegations of stream ripping from YouTube and violations of the DMCA, making them a likely new litigation strategy in AI music cases in 2026.

OpenAI Multidistrict Litigation Becomes a Central Battlefield for 2026

In 2025, multiple literary and news media copyright cases against OpenAI were centralized in the U.S. District Court for the Southern District of New York. This became known as the In re OpenAI multidistrict litigation. Major cases such as New York Times v. OpenAI and Authors Guild v. OpenAI are part of this broader litigation structure.

These cases have not been completely merged in every respect, but they may influence one another in pretrial proceedings, discovery, and summary judgment issues. The Copyright Alliance noted reports of possible settlements in some OpenAI-related cases in 2026. If a major settlement or licensing agreement emerges between OpenAI and major publishers or news organizations, it could reshape copyright negotiation standards across the LLM industry.

The importance of the OpenAI litigation is not limited to one company’s legal risk. OpenAI is the symbol of generative AI’s mass adoption. Since ChatGPT triggered explosive growth in the LLM market, the court’s treatment of OpenAI-related disputes may shape legal standards for AI training data, use of news articles, book datasets, output substitution, and licensing markets.

Film Studios Target AI Image and Video Generators

In 2025, major Hollywood studios also entered the courtroom in force. Disney and Universal filed a complaint against Midjourney in the U.S. District Court for the Central District of California in June. They alleged that Midjourney generates, displays, and distributes copies and derivatives of famous franchise characters from properties such as Marvel and Star Wars.

In September, Warner Bros. Entertainment filed a similar lawsuit against Midjourney. Warner alleged that Midjourney knew its system could generate infringing outputs involving copyrighted characters but failed to implement sufficient safeguards. On November 4, the Warner case was consolidated with the Disney and Universal case.

The significance of the film studio lawsuits is substantial. Until then, AI copyright litigation had focused largely on books, news articles, and music. Film studios, however, own characters, worlds, visual images, and brand assets. If image and video generation AI can imitate specific characters and styles, it raises a market substitution problem different from text training disputes.

Disney, Universal, and Warner also sued Chinese AI company Minimax. The studios alleged that Minimax’s image and video generator, Hailuo AI, could produce outputs containing copyrighted characters from Star Wars, The Simpsons, Despicable Me, Shrek, Scooby-Doo, Looney Tunes, and other properties. Because this is an AI copyright lawsuit against a foreign company, service of process and jurisdiction are likely to become important issues in 2026.

RAG Lawsuits Expand the Debate From Training Data to Real-Time Search Responses

Another major development in 2025 was litigation over retrieval-augmented generation, or RAG. Perplexity and Cohere emerged as key defendants. RAG allows AI systems to answer questions not only from information already contained in model weights, but by retrieving documents from the internet or databases in real time and generating answers based on them.

This technology can improve accuracy and freshness. But from a copyright perspective, it creates new problems. A model may crawl websites, copy copyrighted content at the input stage in order to answer user questions, and then provide outputs that are substantially similar to the original text or that function as substitute summaries.

In September 2025, Encyclopedia Britannica and Merriam-Webster sued Perplexity. They alleged that PerplexityBot crawled and scraped their websites and that Perplexity’s question-answering process infringed copyrighted articles at both the input and output stages. In December, the Chicago Tribune and The New York Times also filed lawsuits against Perplexity.

Cohere was sued by Advance. The plaintiff alleged that Cohere’s Command models were trained on unauthorized copies of news and magazine articles, and that Cohere’s RAG features provided full-text copies, substantial excerpts, and substitute summaries in response to user queries.

RAG litigation changes the focus of the AI copyright debate. Earlier disputes centered mainly on what data had been used to train models in the past. RAG lawsuits ask what content AI systems are retrieving and displaying right now in order to answer users’ questions. This issue could redraw the boundaries among search engines, news media, knowledge databases, and AI answer services.

Apple, Salesforce, Adobe, and ByteDance Join the List of Defendants

In 2025, large technology companies that had not been at the center of AI copyright litigation also became defendants. Apple was sued by authors who alleged that copyrighted books were used without authorization to train the company’s OpenELM model. The plaintiffs claimed that Apple used the RedPajama dataset, which allegedly included material derived from Books3 and pirated book repositories.

Salesforce also faced a class action from authors over the training of its CodeGen and XGen LLMs. The plaintiffs alleged that Salesforce used datasets such as RedPajama and The Pile. Adobe was also sued over its use of the SlimPajama dataset to train a small language model called SlimLM. SlimPajama is derived from RedPajama, and the plaintiffs claimed that it contained pirated books.

ByteDance faced a different type of lawsuit. A group of video creators called Ted Entertainment alleged that ByteDance scraped millions of copyrighted YouTube videos and bypassed technical protection measures to train its generative AI model, MagicVideo. The case is notable because it centers not only on direct copyright infringement, but also on Section 1201(a) of the DMCA, which prohibits circumvention of technological protection measures.

This trend shows that the range of AI copyright defendants is expanding. The lawsuits are no longer limited to AI-native companies such as OpenAI, Anthropic, Midjourney, and Stability AI. Large technology and platform companies such as Apple, Salesforce, Adobe, and ByteDance are also beginning to face legal scrutiny over training data and generative AI products.

Thomson Reuters v. Ross: Another Reference Point for Fair Use

One case to watch in 2026 is Thomson Reuters v. Ross Intelligence. The case is not identical to today’s generative AI disputes, but it has important implications for AI training and fair use. Thomson Reuters sued Ross, alleging that Ross scraped Westlaw legal content to develop a legal research service.

In early 2025, the district court accepted Thomson Reuters’ direct copyright infringement claim and rejected Ross’s fair use defense. The court held that Westlaw headnotes could be copyrightable, that Ross’s use was commercial and not transformative, and that it harmed the potential market for AI training data.

In April, however, the court allowed an interlocutory appeal. The Third Circuit’s treatment of copyrightability and fair use will therefore be important. Although the case differs from LLM litigation, it could provide an important benchmark for how courts consider the AI training data market under the fourth fair use factor: market harm.

The 2025 Conclusion: Markets Moved Before the Courts Did

The defining features of 2025 AI copyright litigation were twofold. First, the number of lawsuits exploded. Second, major settlements and licensing agreements began to emerge. Courts have not yet answered most of the central questions definitively. Fair use decisions remain case-specific, and outcomes may depend heavily on the evidence presented.

But the market is already moving. Anthropic’s $1.5 billion settlement showed the legal cost of using pirated datasets. The settlements between UMG, WMG, Udio, and Suno showed that the AI music market may move toward licensing and opt-in structures. The OpenAI multidistrict litigation, film studio lawsuits against Midjourney, and RAG cases against Perplexity show that almost every layer of the AI industry has entered the arena of copyright negotiation.

AI companies are approaching a moment when they may need to choose between two strategies. One path is to continue asserting fair use and wait for court rulings. The other is to negotiate licenses with copyright owners and build lawful data supply chains. The developments of 2025 suggest that the second path is becoming increasingly realistic.

What to Watch in 2026

The first major point to watch in 2026 will be fair use rulings. According to the Copyright Alliance, the next major fair use decisions may come in cases such as In re Google Generative AI, UMG v. Suno, Concord v. Anthropic, and In re Mosaic LLM Litigation. However, those decisions may not arrive until summer 2026 or later.

The second point is the spread of settlements. If licensing-based settlements became more visible in music in 2025, similar structures may emerge in publishing, news, film, and RAG services in 2026. In particular, a major settlement in the OpenAI-related cases could fundamentally alter how the LLM industry obtains data.

The third point is pirated datasets. Allegations that open datasets such as Books3, RedPajama, The Pile, and SlimPajama included pirated books are recurring across multiple lawsuits. Going forward, AI companies may not be able to rely on simply saying, “We used a public dataset.” They may need to prove data provenance, licensing status, the presence or absence of pirated material, and the steps taken to remove or clean problematic data.

The fourth point is RAG. Unlike past copying at the model training stage, RAG creates issues of real-time copying and output in response to user queries. For news organizations and knowledge-content companies, there is deep concern that RAG-based AI answers could substitute for visits to original pages and paid subscriptions. RAG litigation could therefore shake the business model of AI search services.

The Real Battle Is Over the Price of Data

AI copyright litigation appears on the surface to be a legal conflict. But at its core, it is a fight over the price of data and creative works. Generative AI grew by absorbing books, articles, music, images, videos, code, and knowledge databases created by humans. Copyright owners are now demanding compensation for those inputs, while AI companies are seeking broad recognition of fair use for the sake of training and innovation.

The year 2025 was when this conflict became fully institutionalized. Anthropic’s settlement showed the danger of pirated data use. The music AI settlements opened the possibility of licensing markets. The lawsuits against Midjourney brought character and fictional-universe protection to the center of the debate. The RAG suits against Perplexity and Cohere brought to court the concern that AI answer services may substitute for original content markets.

There is still no final winner. Courts may rule differently depending on the facts of each case, and the boundaries of fair use will continue to be contested. But the direction is becoming clearer. The generative AI industry can no longer easily remain on the loose assumption that anything publicly available on the internet can be used for training. Data provenance, licensing, compensation, transparency, and creator control are becoming core infrastructure for the AI industry.

The year 2026 may be when this new order becomes more concrete. Fair use rulings will arrive. More settlements will emerge. AI companies and copyright owners will move back and forth between courtrooms and negotiation tables as they create new market rules. The future of generative AI will not be decided by model performance alone. It will also depend on how the rights of the creative works that trained those models are handled.