For businesses with over 3 billion active users, and, as a result, a stream of data that follows, it is strange why meta should rely on such a large amount of external data and rely on AI tools.
In any case, as the company faces important legal challenges in the United States over the misuse of copyrighted material to train llama models, Meta has been struck by another copyright challenge in France, where French publishers have launched legal action for copyright infringement.
As Bloomberg reported:
“French publishers and authors have sued meta for copyright infringement, accusing the high-tech giant of using books to train generative artificial intelligence models without approval. SNE, Hachette and Vorticesalong with the authors’ association SGDL and writers’ union SNAC, filed a complaint this week with a Paris court dedicated to intellectual property, the group said at a press conference on Wednesday.
Just as they are trying to keep meta to try and use their work illegally, French publishers have discovered the same, indicating that Meta’s AI model can produce highly accurate replicas of the author’s work, indicating that it is possible for scraping and theft of their intellectual property.
This could be due to the company’s push for the same AI development.
Reports show that following the rise of Openai in 2022, Meta CEO Mark Zuckerberg is eager to catch up, building a rival AI model guaranteed by Meta, the leader of AI races.
In this, Zuckerberg reportedly approved the use of what Meta knew was copyrighted material to construct the language model.
As reported by The New York Times:
“Meta could not match ChatGPT unless we retrieved more data. Some discussed paying $10 for the book for full license rights for the new title. They discussed the purchase of Simon & Schuster, which publishes authors like Stephen King, according to the recording. They also spoke about how they summarised books, essays and other works from the internet without permission, and discussed more sucking up even if it faced a lawsuit. One lawyer warned of “ethical” concerns about obtaining intellectual property from artists, according to the recording, but met with silence. ”
Meta has since reportedly consolidated illegally procured copyrighted material from a shattering platform known to be operating in violation of the law.
According to NYT, the problem was that despite the meta having so many users of the app, most of the content they generate is not very useful in building AI models. Because people generally don’t post long content on apps, writing styles don’t align with the nature of conversations, such as chatbots.
So for the meta to compete, they need a new data source and found it in pirated books. Which publishers are currently detecting it using their own methods?
Meta could face a parade of litigation around the world, especially if these first cases lead to compensation transactions for the affected authors.
Certainly, if legal precedents can be established, you can bet that every publisher in the world will be trolling the information they can find to smell cash and sniff out traces of their work.
This could lead to major penalties for the meta to move forward.
But how can Openai, a much smaller startup, build its own database in the same way without the same copyright issues?
Well, it also faces a variety of legal challenges for the same thing.
In fact, in all these cases, we can expect Openai to be investigated for the exact same violation, as the authors and publishers are asking them to resort to for fraudulent use.
Data is an arterial power source for large language models, and the company with the best data source ultimately wins as the system produces better, more accurate, and easier to use results based on the reference set. Without that first data source, the system has nothing to go on. This is why meta and open, and others were willing to take such risks when building LLM.
At the same time, once they are constructed, they exist and from there they can be trained with supplemental data. Therefore, it is possible that Meta viewed this as a necessary risk for setup. This allows us to use more of our own data lobs to improve our models.
This is similar to how Xai approaches LLM, builds the foundations and uses X-post to improve and modify models to provide real-time information updates.
So, this can cost them, but it may be worth offset by the benefits they collect from the sale of the model.
In any case, it could take years for a court to litigate each case, and by then there may be a new legal approach to LLM training and the use of such works.
You can bet that Meta is exploring every angle on this front.