AI is powering the preservation of African languages

Admin
By Admin
7 Min Read

Share

By Conrad Onyango

More African languages are finding their way to Google’s online translation service, as the giant search engine integrates artificial intelligence to learn closely related languages.

During 2024, the search engine made its largest foray into the translation of African languages and saw the highest number of new languages added to the service – ever.

“We’re using AI to expand the variety of languages we support. Thanks to our PaLM 2 large language model, we’re rolling out 110 new languages to Google Translate, our largest expansion ever,” said Google Translate Senior Software Engineer, Isaac Caswell.

This development marks a pivotal moment that not only offers to popularise indigenous languages but also facilitate the development of a comprehensive local linguistic resource.

Nearly a quarter of all the recently added languages on the platform are African and Africa now has more than 50 languages on the translation service.

The new language additions include Dholuo, spoken by Kenya’s fourth largest ethnic group, the Luo, with more than 4.2 million speakers across several Nilotic ethnic groups found in Egypt, Sudan, South Sudan, Ethiopia, Northern Uganda, eastern DRC, and a part of Tanzania.

Another is Afar, a tonal language spoken by 2.3 million people in Djibouti, Eritrea, and Ethiopia. Google noted that of all the languages in this launch, Afar had the most volunteer community contributions.

Another addition is N’Ko, a standardized form of the West African Manding languages which unifies many dialects into a common language. Its unique alphabet was invented in 1949, and it has an active research community that develops resources and technology for it today.

Tamazight (Amazigh), a Berber language spoken across North Africa, is another important new additions. Although there are many dialects, the written form is generally mutually understandable. It is written using both Latin and Tifinagh script, with Google Translate supporting both.

“Google Translate breaks down language barriers to help people connect and better understand the world around them. We’re always applying the latest technologies so more people can access this tool,” Caswell explained.

Other African languages added this year include, Fon, Kikongo, Ga, Swati, Venda and Wolof.

In 2022, Google added 24 new languages across the world using Zero-Shot Machine Translation, where a machine learning model learns to translate into another language without ever seeing an example.

While Google said languages have an immense amount of variation ranging from regional varieties, dialects, different spelling standards making it almost impossible to pick a “right” variety, Its approach prioritized the most commonly used varieties of each language.

“PaLM 2 was a key piece to the puzzle, helping Translate more efficiently learn languages that are closely related to each other. As technology advances, and as we continue to partner with expert linguists and native speakers, we’ll support even more language varieties and spelling conventions over time,” explained Caswell.

According to Google, these new languages represent more than 614 million speakers, opening up translations for around 8% of the world’s population. Some of these languages are major world languages with over 100 million speakers, while others are spoken by small Indigenous communities. A few of the languages have almost no native speakers but are undergoing active revitalization efforts.

Swahili is the most widely spoken African language with the United Nations placing the number of speakers at over 200 million. In 2021 the UN designated July 7 as World Kiswahili Language Day.

This year’s event is hosted by Kenya under the theme “Kiswahili, Multilingual Education and the Enhancement of Peace.”

Organisers of the event, East Africa Community and Kenya government said the annual event offers a platform for Kiswahili stakeholders to share knowledge, research-based evidence, best practices, experiences, and worldviews on the role of Kiswahili education in promoting a culture of peace.

The East Africa Community Deputy Secretary General (DSG) in charge of Infrastructure, Productive, Social and Political Sectors, Andrea Aguer Ariik, emphasized the significance of language diversity and unity in the EAC.

“Kiswahili, as widely spoken language in East Africa, not only bridges communication gaps but also represents a common identity among the member states of the EAC,” said Ariik in a statement.

And its not Google playing in this field alone, Young African scholars studying abroad are also rising up to the challenge with similar initiatives leveraging the power of AI.

Ife Adebara, a programmer and scholar at the University of British Columbia’s linguistics department, is among those leading initiatives to deploy AI in preserving local languages, with a focus on African languages.

Her project, Afrocentric Natural Language Processing, aims to raise awareness and develop tools and programs that are accessible to speakers of African languages such as Swahili and Zulu.

The project has already birthed two language identification programs online. SERENGETI, Massively Multilingual Language Models for Africa and AfroLID, a neural Language ID toolkit that covers 517 African languages and varieties, utilizing a multi-domain web dataset manually curated from across 14 language families and five orthographic systems.

There are over 2,000 living languages in Africa. Nigeria is home to the most, with 522 languages, according to research firm, Statista.

The research firm places Cameroon (with 275 languages) and the Democratic Republic of Congo (with 217) as countries with the second and third most number of languages used and spoken by people on the continent.

bird story agency

During 2024, Google made significant strides in integrating more African languages into its Google Translate service using artificial intelligence. This expansion, the largest ever for the platform, saw the addition of 110 new languages, with nearly a quarter being African languages, bringing the total to over 50. Notable additions include Dholuo, Afar, N'Ko, and Tamazight. AI, specifically the PaLM 2 model, played a crucial role in learning closely related languages efficiently.

This development not only popularizes indigenous languages but also helps create a comprehensive local linguistic resource. The new languages cover approximately 614 million speakers, accounting for about 8% of the world's population. Google aims to support even more language varieties and spelling conventions over time as technology advances. Google’s efforts align with broader cultural initiatives, such as the celebration of World Kiswahili Language Day in Kenya.

Parallelly, young African scholars are also leveraging AI to preserve local languages. For instance, Ife Adebara’s Afrocentric Natural Language Processing project at the University of British Columbia focuses on developing tools and programs for African languages. The project has created significant resources like the SERENGETI multilingual models and AfroLID, a neural language ID toolkit.

Africa is home to over 2,000 living languages, with Nigeria, Cameroon, and the Democratic Republic of Congo having the highest language diversity. These efforts by Google and others signify a powerful move towards preserving and promoting linguistic diversity on the continent.

Share this article

Facebook
Twitter
WhatsApp
Leave a comment