News

AI on Trial: ANI Media vs. OpenAI

The legal tussle between ANI Media Pvt Ltd. and OpenAI Inc. has taken center stage in the ongoing debate regarding the ethical and legal limits imposed on the training of artificial intelligence. ANI, a leading news agency, has accused OpenAI of exploiting its content without permission to train its Large Language Models (LLMs), particularly ChatGPT, and alleges that the AI replicates ANI’s content verbatim in response to user queries. In the copyright infringement suit filed before the Hon’ble Delhi High Court, ANI has also claimed that OpenAI has accredited false news and events to ANI, further complicating the issue.

OpenAI, predictably, has multiple lawsuits in the United States, Germany and Canada, with news agencies claiming against it on similar grounds. The Court has issued a notice concerning the application and intends to appoint an amicus curia to assist in the case.

In response, OpenAI has informed the court that it had blocked the website aninews.in from its training data as of October 2024, in accordance with its opt-out policy. In addition, ANI has been allowed to pray for inclusion of any other websites or sources into the blocklist, so they may not be used for training purposes.

The implications of the legal battle go well beyond the parties immediately at issue. The core question here surrounds whether one commits an infringement on a creator’s rights by training AI models on copyrighted works. While generating responses to mirror the content of ANI could arguably be classified as direct infringement, the more profound concern appears to lie in the act of training on copyrighted datasets.

This is not an Indian problem; it is a global quandary, without any consensus on the subject yet. Generative AI depends on copying and storing enormous amounts of data to improve its outputs. Those in favour argue that this is fair use because the data being used is strictly for training purposes, not for outright distribution. But critics cite that even this narrow use is a trespass against content creators because their expressive works are being used without permission and for free.

Some of the issues the court may consider in this matter:

  • Does mere storage, for instance, of datasets with only the purpose of training constitute copyright infringement even if it is not distributed?
  • Do AI models expose these protected datasets to the users in a way that violates copyright?
  • Is training of datasets a “transformative use,” since the AI models themselves don’t access the protected content while creating new outputs?
  • Are “facts” and “information” extracted from copyrighted works protected, and does accessing the entire work to extract that data violate copyright?

The ruling in this case may bring some clarity on how artificial intelligence firms relate to copyrighted material. If courts decide that training on copyrighted content without permission constitutes infringement, AI developers will be required to obtain licenses or enter into commercial deals with content creators, which could be an expensive proposition, especially for startups.