Placeholder canvas

AI companies have consumed the entire internet to train their models and are now running out of data

AI companies are facing a monumental challenge that would render all the billions of dollars that Big Tech is investing in them pointless: they need more internet.

AI companies have practically consumed all of the open internet in the race to develop ever-larger and more advanced language models. They are now facing the imminent end of data, as the Wall Street Journal reported.

This issue is pushing some firms to seek alternative sources for training data, such as publicly available video transcripts and the creation of AI-generated “synthetic data.” However, using AI-generated data to train AI models is a problem in and of itself — it leads to a higher chance of AI models hallucinating.

Furthermore, discussions around synthetic data have raised serious concerns regarding the potential consequences of training AI models on AI-generated data. Experts believe relying too much on AI-generated data leads to digital “inbreeding,” which could eventually collapse the AI model.

While entities like Dataology, founded by former Meta and Google DeepMind researcher Ari Morcos, are exploring methods to train expansive models with fewer data and resources, most major players are playing with some rather unconventional and contentious approaches to data training.

According to sources cited by the WSJ, OpenAI, for example, is considering training its GPT-5 model using transcriptions from publicly available YouTube videos. The AI company is facing criticism for using such videos to train Sora and may face lawsuits from video creators.

Nevertheless, companies like OpenAI and Anthropic plan to address this by developing superior synthetic data, although specifics regarding their methodologies remain unclear.

Fears of AI companies have been running around for quite some time now. Despite predictions by some, like Epoch researcher Pablo Villalobos, estimating that AI could exhaust its usable training data in the coming years, there is a prevailing sentiment that significant breakthroughs could mitigate these concerns.

However, an alternative solution to this dilemma exists: AI companies could avoid pursuing more extensive and advanced models, considering the environmental toll associated with their development, including significant energy consumption and the reliance on rare-earth minerals for computing chips.

Share your love
Facebook
Twitter
LinkedIn
WhatsApp

Newsletter

Follow Us

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

error: Unauthorized Content Copy Is Not Allowed