IT News

Rapid development of AI is breaking the ability to assess them, safety tests spouting false results

April 18, 2024

The rapid advancement of AI is creating massive challenges for the traditional methods used to assess it. This presents a significant dilemma for businesses and public bodies in effectively keeping AI under check and navigating the evolving AI landscape.

According to a report by The Financial Times, some of the most commonly used evaluation criteria for assessing AI performance, such as accuracy and safety, fall short of the standard they are usually subjected to as more sophisticated models enter the market.

According to experts in AI development, testing, and investment, these traditional tools are easily manipulated and too narrow in scope to accommodate the complexity of the latest AI systems.

The intense competition in AI, fueled by investments from venture capitalists and tech giants like Microsoft, Google, and Amazon, has rendered many older benchmarks obsolete.

With new AI models and updates being launched every month, the evaluation standards that we have now are quickly becoming obsolete.

As generative AI becomes a top investment priority for most tech businesses, ensuring that our AI products are trustworthy is becoming increasingly important.

Governments are also grappling with how to deploy and manage the risks associated with the latest AI models, prompting initiatives like bilateral arrangements on AI safety between countries.

Concerns are also arising regarding the integrity of public tests, as the AI models’ training data may inadvertently include the exact questions used in evaluations, which challenges the reliability of benchmarks.

We now have startups focused on developing innovative approaches to evaluating emerging AI models. Some platforms offer bespoke tests set by individual users, providing a more direct reflection of user preferences. However, while such approaches may benefit individual users, they may need more utility for companies with specific AI model requirements.

Ultimately, businesses are advised to conduct internal testing and human evaluation to supplement traditional benchmarks, recognizing that selecting AI models is as much an art as a science. As AI continues to evolve, adapting evaluation methods to ensure accuracy and reliability remains paramount for effectively harnessing the potential of this transformative technology.

Tags: AI, Artificial Intelligence

Share your love

Newsletter

Recent News

Moveover ChatGPT, LAMs are here: India’s Kogo AI to launch new Large Action Model

May 1, 2024

Amazon’s launches new short-form video feature in Twitch to ‘replace’ TikTok in the US

May 1, 2024

Google blocked 2.28 million apps from being published, including deepfake, AI nude generator apps

May 1, 2024

Rapid development of AI is breaking the ability to assess them, safety tests spouting false results

Recent News

Related News

Moveover ChatGPT, LAMs are here: India’s Kogo AI to launch new Large Action Model

Amazon’s launches new short-form video feature in Twitch to ‘replace’ TikTok in the US

Google blocked 2.28 million apps from being published, including deepfake, AI nude generator apps

Elon Musk fires Tesla’s entire supercharging team in latest round of layoffs

US government forms new AI Safety Board, keeps Elon Musk, Mark Zuckerberg, out of it

Ola lays off over 200 employees from cabs division, CEO quits just three months after taking charge

Students who play computer games are better at spotting disinformation & fake news, shows study

What exactly did Musk achieve from his deal in Beijing, and will it be enough to reverse Tesla’s slide?

IIT-Madras shows how elections should be conducted, is using blockchain for student body polls

Leave a Reply Cancel reply