skip to content

Apple, NVIDIA and Anthropic allegedly used YouTube videos illegally to train their AI models

A recent investigation by Proof News has uncovered that some of the world’s largest tech companies used transcripts from over 173,000 YouTube videos to train their AI models without obtaining permission. The dataset, compiled by EleutherAI, a nonprofit organization, included transcripts from over 48,000 YouTube channels and was utilized by companies like Apple, NVIDIA, and Anthropic.

This investigation sheds light on a troubling aspect of AI technology: much of its development relies on data taken from content creators without their consent or compensation.

The dataset consists solely of video transcripts, not actual videos or images, from notable creators such as Marques Brownlee and MrBeast and major news outlets like The New York Times, BBC, and ABC News.

Marques Brownlee expressed concern on social media, noting that his data, among others, had been scraped from YouTube videos without proper authorization.

According to a spokesperson from Google, YouTube CEO Neal Mohan has previously stated that using YouTube data to train AI models violates the platform’s terms of service. Apple, NVIDIA, Anthropic, and EleutherAI declined to comment despite this.

AI companies have been lacking in transparency regarding the sources of data used to train AI models. Recently, Apple has been criticized for not disclosing the origin of the training data used for Apple Intelligence, its upcoming generative AI platform set to launch on millions of devices this year.

YouTube, renowned as the world’s largest repository of videos, offers transcripts, audio, video, and images, making it a highly desirable dataset for training AI models.

Earlier this year, when questioned by The Wall Street Journal, Mira Murati, OpenAI’s chief technology officer, avoided discussing whether YouTube videos were used to train Sora, OpenAI’s upcoming AI video generation tool. Murati mentioned that the data used was publicly available or licensed.

Alphabet CEO Sundar Pichai reiterated that using YouTube data to train AI models violates the platform’s terms of service.

Share your love
Facebook
Twitter
LinkedIn
WhatsApp

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

error: Unauthorized Content Copy Is Not Allowed