The UAE government introduced a novel artificial intelligence large language model that is explicitly designed for Arabic and was developed in Abu Dhabi. The objective of creating such an AI language, LLM, was to integrate one of the world’s most extensively spoken languages into the mainstream AI landscape.
Meet Jais – UAE’s LLM
Known as Jais, this open-source bilingual Arabic-English model was created through collaboration between Inception, a subsidiary of Abu Dhabi’s AI firm G42, Mohammed bin Zayed University of Artificial Intelligence, and Silicon Valley’s Cerebras Systems.
The developers assert that Jais surpasses the accuracy of existing Language Models (LLMs) for Arabic. This resource can be downloaded via the machine-learning platform Hugging Face.
The unveiling of Jais marks a progressive stride in encouraging the scientific and computational communities to direct more attention toward non-English LLMs, akin to initiatives seen in Japan and India, according to Andrew Jackson, the CEO of Inception.
In an interview with a local news agency, Jackson elaborated, “We envision Jais as highly valuable for generative applications, such as formulating responses to queries, generating documents, performing translations, composing emails, and even dispensing advice and recommendations.”
Jais is adept at capturing the subtleties within various Arabic dialects and can comprehend language, context, and cultural allusions, making it notably more precise and contextually pertinent compared to other models, as stated by the collaborating companies.
Dubbed ‘Jais’ after the tallest peak in Ras Al Khaimah, UAE, this innovation was specifically designed for governmental use and sectors spanning finance, energy, climate, and healthcare.
Mainly developed for the UAE government
Numerous public and private entities within the UAE have joined as launch partners for Jais, including the Ministry of Foreign Affairs, the Ministry of Industry and Advanced Technology, the Department of Health – Abu Dhabi, ADNOC, Etihad Airways, FAB, and e&, the technology conglomerate previously known as Etisalat.
Jais has undergone training on the Condor Galaxy, recognized as the “world’s largest AI supercomputer,” established by G42 and Cerebras in July. This training involved 116 billion Arabic tokens and 279 billion English tokens. The model continually expands as more Arabic content is amassed to create fresh instruction sets.
The need for local languages in LLMs and AI
Arabic is widely spoken worldwide, encompassing over 400 million individuals, as indicated by WorldData. It is the official language in 22 nations and is partially expressed in 11 others. However, its online presence remains limited, with only around 1 percent of Arabic content accessible online, as per data shared by the collaborating companies.
Jackson said that Jais would aid in elevating this statistic. He noted, “We’re initiating a project to collect more Arabic data from offline sources. This initiative has already been launched earnestly.”
He added, “We’re also exploring novel methods to synthesize Arabic content and translate existing English content into Arabic. Although we have a long way to go, optimism is crucial as we vigorously advance.”
Overall, this has introduced a new battleground within the tech sector, with companies striving to establish an early advantage and broaden their horizons in generative AI.
The availability of LLMs stands to assist these companies in their endeavors, particularly as developers continually enhance AI capabilities.
“Speed performance is a priority for developers, not only because it expedites the introduction of new models to the community, production, or market, but also because it empowers data scientists and machine learning researchers to implement and iterate various models promptly,” emphasized Jackson.