Main Menu

News:

Welcome to the AI & AI prompt sharing forum!

Elon Musk pushes for self-learning synthetic data as human data is exhausted

Started by Drfun, Jan 11, 2025, 09:12 AM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

Drfun

Elon Musk recently said that AI companies have run out of human data to train their models. According to Musk, the "sum of human knowledge" was exhausted last year. He suggested that tech firms would now need to use "synthetic" data—data created by AI itself—to continue improving their systems.



AI models like GPT-4, which powers ChatGPT, learn by analyzing huge amounts of data from the internet. This helps them recognize patterns and predict things like the next word in a sentence. But Musk believes that, with all human data used up, the only way forward is to turn to AI-created data for training.

He explained that synthetic data could allow AI to generate its own content, like writing essays or creating ideas, and then "grade" itself. This process of self-learning would help build and refine new models. Companies like Meta, Microsoft, Google, and OpenAI have already been using some synthetic data in their AI work.

However, Musk also raised concerns about AI "hallucinations"—when models produce wrong or nonsensical answers. He pointed out that if AI creates its own data, it might generate these errors, making it hard to tell if the information is reliable.

Experts like Andrew Duncan from the UK's Alan Turing Institute agreed with Musk's view on running out of data. He also warned that relying too much on synthetic data could lead to "model collapse," where AI output loses quality and creativity. He said feeding too much synthetic material into models can lead to biased or unoriginal results.

With more AI-generated content appearing online, there's a risk that this new material could get mixed into the training data, further affecting the quality of future models. The control over high-quality data is becoming a big legal issue, with creators and publishers demanding compensation for their work used in AI training.