
Google has launched WAXAL, a groundbreaking open speech dataset aimed at expanding artificial intelligence (AI) voice technologies to African languages long excluded from the digital ecosystem.
The initiative, unveiled on February 2, supports 21 African languages, including Dholuo (Luo) and Kikuyu, and is expected to benefit more than 100 million people across the continent.
Developed over three years, WAXAL—named after the Wolof word for “speak”—is a collaborative project led by Google Research Africa in partnership with several African academic and research institutions.
The dataset represents one of the most comprehensive efforts yet to address Africa’s chronic lack of high-quality speech data, a key barrier to building effective voice-enabled technologies.
At its core, WAXAL contains 11,000 hours of speech data collected from nearly two million contributors. It also includes approximately 1,250 hours of transcribed audio for automatic speech recognition (ASR) and more than 20 hours of high-quality studio recordings for text-to-speech (TTS) systems.
To ensure real-world relevance, the data was gathered from everyday speech.
Participants were asked to describe images in their native languages, capturing natural patterns of expression rather than scripted or formal speech.
East African languages feature prominently in the dataset.
These include Dholuo, spoken by the Luo communities in Kenya, Tanzania and Uganda, and Kikuyu, the language of Kenya’s largest ethnic group.
They appear alongside Swahili and Luganda, as well as 18 other languages such as Hausa, Yoruba, Igbo and Acholi, spanning large parts of Sub-Saharan Africa.
Google says the expansion responds to a persistent imbalance in AI development.
Although Africa is home to more than 2,000 languages, most AI voice systems have historically focused on English and a handful of global languages.
“The main barrier to creating helpful voice technologies for this region has been a lack of accessible, high-quality speech data,” Head of Google Research Africa Aisha Walcott-Bryant said during the launch.
She described WAXAL as both a technological and cultural intervention.
“The ultimate impact of WAXAL is the empowerment of people in Africa. This dataset provides the critical foundation for students, researchers and entrepreneurs to build technology on their own terms, in their own languages, finally reaching over 100 million people,” she said.
Perry Nelson, Google Ghana’s site lead and co-author of the launch announcement, emphasised the community-driven nature of the project.
“WAXAL is a collaborative achievement, powered by the expertise of leading African organisations who were essential partners,” he noted.
Key collaborators include Makerere University in Uganda, the University of Ghana, and Digital Umuganda in Rwanda.
Additional partners contributed studio recordings, while the African Institute for Mathematical Sciences (AIMS) supported multilingual development for future expansions.
Importantly, the African institutions involved retain ownership of the data they collected, aligning the project with ethical standards and community empowerment.
For Kenya, where Luo and Kikuyu are spoken by millions, the implications are significant.
African languages have often been marginalised in digital platforms, limiting access to AI-powered tools such as voice assistants, speech-to-text services and digital public services.
Google believes WAXAL could help transform sectors including education, agriculture and healthcare, particularly in communities with limited English proficiency, by enabling localised and voice-based access to information.
The dataset has been released under a Creative Commons license and is freely available to developers worldwide through Hugging Face, a leading AI collaboration platform.
The move aligns with Google’s broader commitments in Africa, including investments in public data infrastructure and training future AI developers.
While experts have widely welcomed WAXAL as a milestone for localised AI, some caution that challenges remain—particularly around data privacy and the need to represent Africa’s vast linguistic diversity more fully.
As Walcott-Bryant put it, “This project is about more than technology. It’s about giving Africans a voice in the AI future.”
With WAXAL, Google is not merely adding languages to its systems—it is opening new pathways toward equitable digital participation across Africa.
Comments 0
Sign in to join the conversation
Sign In Create AccountNo comments yet. Be the first to share your thoughts!