NVIDIA
Accueil » AI: NVIDIA Sued for Scrapping Youtube – Does Ethical Generative AI Really Exist?

AI: NVIDIA Sued for Scrapping Youtube – Does Ethical Generative AI Really Exist?

This article is also available in: Français

Generative AI is once again at the heart of the controversy. NVIDIA is alleged to have engaged in unethical practices to train its tools by massively downloading videos from YouTube and Netflix, among other sources, without authorization. These actions contradict the ethical approach publicly advocated by the company. A class action lawsuit has been launched in response to these revelations.

Data scrapped from various sources, including Youtube and Netflix

The situation began with internal NVIDIA documents obtained by 404 Media. These documents focus particularly on the training data used for Cosmos, an AI model by NVIDIA that is not publicly available. According to Slack messages presented by 404 Media, NVIDIA employees discussed downloading astronomical amounts of videos from sources like YouTube and Netflix, using virtual machines with renewed IP addresses to avoid being blocked by YouTube. According to 404 Media, the volume of videos downloaded was especially massive, amounting to “80 years of videos per day.”
Moreover, 404 Media noted that, according to these leaks, this practice of downloading tens of millions of videos was approved at the highest levels of the company.

In parallel, Google and Netflix confirmed to 404 Media that this kind of activity is entirely against their terms of use. Additionally, according to 404 Media, NVIDIA also used HD-VG-130M, a database whose license does not permit commercial use.

These revelations are in stark contrast to NVIDIA’s publicly stated policy. The tech giant states that “AI must respect regulations regarding privacy and data protection,” and it must “operate transparently” to achieve “trustworthy AI.”

NVIDIA Claims Compliance with the Law

In response to this information, NVIDIA spoke to 404 Media and denied any illegal practices:

We respect the rights of all content creators and are confident that our models and our research efforts are in full compliance with the letter and the spirit of copyright law. […] Copyright law protects particular expressions but not facts, ideas, data, or information. Anyone is free to learn facts, ideas, data, or information from another source and use it to make their own expressions. Fair use also protects the ability to use a work for a transformative purpose, such as model training.”

In other words, NVIDIA claims to be fully compliant with the law. However, questions about the ethical aspects of these practices remain, especially when considering that NVIDIA emphasizes the ethical aspect of their projects regarding AI, for example when it comes to their partnership with Shutterstock (the data is used with the consent of the creators).

A Youtuber launches a class action against NVIDIA

These revelations prompted YouTuber David Millette to file a lawsuit and launch a class action against NVIDIA, allowing others affected to join the suit. Interestingly, the lawsuit does not focus on copyright but on unfair competition. Millette argues that NVIDIA unfairly profited by using data created by YouTubers.

It remains to be seen if other content creators will join the lawsuit and what the legal outcomes will be.

Adobe, NVIDIA: Words vs. Actions

These developments naturally bring to mind Adobe. Since the announcement of its generative AI, Firefly, the company has emphasized the ethical aspect of its AI, which was trained on Adobe Stock images with artists’ consent. However, Bloomberg revelations showed that these training images also included content created using another generative AI, Midjourney, which itself was trained on images downloaded in bulk from the internet without artists’ permission.
In both cases, the lack of transparency by the companies is regrettable, to say the least, especially considering their public communication.

Meanwhile, it’s worth noting that some companies are adopting radically different models to avoid any legal or ethical issues. Golaem, recently acquired by Autodesk, had an interesting policy: training was done on the client’s side, using the client’s data. Golaem, therefore, did not use unauthorized data.

3DVF will closely monitor the developments in the class action against NVIDIA. Be sure to follow us on social media: Facebook, X/Twitter, Instagram, LinkedIn, YouTube.

Laissez un commentaire

A Lire également