• News
    • Latest news
    • News search
    • Health
    • Finance
    • Food
    • Career news
    • Content series
    • Focus areas
    • Try Devex Pro
  • Jobs
    • Job search
    • Post a job
    • Employer search
    • CV Writing
    • Upcoming career events
    • Try Career Account
  • Funding
    • Funding search
    • Funding news
  • Talent
    • Candidate search
    • Devex Talent Solutions
  • Events
    • Upcoming and past events
    • Partner on an event
  • Post a job
  • About
      • About us
      • Membership
      • Newsletters
      • Advertising partnerships
      • Devex Talent Solutions
      • Contact us
Join DevexSign in
Join DevexSign in

News

  • Latest news
  • News search
  • Health
  • Finance
  • Food
  • Career news
  • Content series
  • Focus areas
  • Try Devex Pro

Jobs

  • Job search
  • Post a job
  • Employer search
  • CV Writing
  • Upcoming career events
  • Try Career Account

Funding

  • Funding search
  • Funding news

Talent

  • Candidate search
  • Devex Talent Solutions

Events

  • Upcoming and past events
  • Partner on an event
Post a job

About

  • About us
  • Membership
  • Newsletters
  • Advertising partnerships
  • Devex Talent Solutions
  • Contact us
  • My Devex
  • Update my profile % complete
  • Account & privacy settings
  • My saved jobs
  • Manage newsletters
  • Support
  • Sign out
Latest newsNews searchHealthFinanceFoodCareer newsContent seriesFocus areasTry Devex Pro
    • Opinion
    • Predictions for development

    Opinion: Localizing AI through languages is a 2025 imperative

    The “Predictions for Global Development” series offers insight from thought leaders for the year ahead. In the field of AI, expect to see large language models developed in languages other than English and Mandarin.

    By Uyi Stewart // 16 December 2024
    As we look toward 2025, linguistic inclusivity in artificial intelligence development will become increasingly urgent. As AI globally transforms life and society, two things are confounding. The first is the few big technology companies driving AI advancement paying lip service about “AI serving the needs of everyone, everywhere.” The second is the global development community embracing an exclusive and unrepresentative AI technology that is driven predominantly by English and Mandarin, to the detriment of over 7,000 languages spoken by about 5 billion people across the community it seeks to serve. When a woman, who is part of a marginalized community, with low levels of education and living in poverty, wakes up to find a lump in her breast, she may not know what her next step should be. She is fluent in her native language but does not speak English or Mandarin. How can AI help her find her way through the daunting maze of medical diagnosis until treatment? Until AI can “speak” the languages of these vulnerable communities, its potential to advance the Sustainable Development Goals, reduce global disease burdens, and address global inequities will remain limited. Given this, I believe that in the coming year, we can expect a growing recognition of the critical gaps in current AI language approaches. Representation matters, and for global development professionals working in the field of AI, I would like to outline four key considerations for localizing AI through languages. 1. There are no short cuts We will likely see increased momentum by philanthropies toward a more nuanced approach to making AI work for marginalized and vulnerable communities. Some are proposing replicating the pharma vaccine development model by investing in Big Tech to create adaptable large language models, or LLMs. As a result, they are making investments in big tech companies who are driving AI systems in English or Mandarin to create modules in their large language model development pipelines that can be adapted to the languages across their geographical footprint. This is just the start, however. The resulting LLMs will remain insufficient to support the actual implementation of interventions (usable solutions) for the SDGs because they cannot truly “speak” the languages of billions of people in local communities. Moreover, looking more closely at the vaccine development models — these benefit marginalized and vulnerable communities because of initiatives like Gavi, the Vaccine Alliance, anchored by public-private partnership. A similar platform is needed to make AI developed by big tech to work for SDG interventions. By de-risking the investments required for such a platform to localize and adapt these large language models, philanthropies will not only help to leverage their technical efforts but will, eventually, help to align the incentives of Big Tech wanting to make AI work for everyone, everywhere, with the aspirations of governments seeking to enable their communities to contribute to and benefit from AI technology. The saying “a word is enough for the wise” applies here, as more governments are forming AI committees who are prioritizing content in local languages tailored to their contexts and priorities — for example, the Nigerian Multilingual Large Language Model. This will only intensify as AI continues to trend. 2. Reimagining data collection The coming year will likely highlight the need to scrutinize the quality, representativeness, and completeness of the underlying data on which the current LLMs are trained. The acquisition of massive amounts of textual or digitized data — primarily from the internet — leads to biases in AI models. This also creates a new kind of digital divide — a data divide whereby languages available online are termed high-resource, while those absent are called low-resource languages. This, invariably, exposes a gaping hole in the development of AI models to cater to languages that are considered low-resourced. As I’ve previously stated in a Devex article, a revised approach is required. One that promotes the digitization of these languages (bringing them online) through data collection playbooks aimed at capturing speech data for these predominantly oral languages. For example, data.org is partnering with Karya Inc, with support from the Mastercard Center for Inclusive Growth, to create a playbook for the digitization of 10 languages in India — Bhojpuri, Konkani, Dogri, Kashmiri, Sindhi, Manipuri, Tripuri, Mizo, Bodo, and Santali — spoken by over 100 million people. Similarly, in Africa, data.org is also partnering with Data Science Nigeria, University of Lagos, and University of Pretoria, starting with two Pan-African languages, Yoruba and Hausa, spoken by over 100 million people. For many of these undigitized oral languages, tones are meaning-bearing. On most keyboards, the keystrokes for these semantic markers are not present. In many African languages, you find words that are spelled the same way but have different meanings based on the tonal inflection. If these distinctions are not preserved in the corpus for training the LLMs, the resulting models will be inaccurate and insufficient. This underscores the need to collect speech data to capture these critical characteristics of languages that are lost from using internet data and in languages that are not digitized in the first place. 3. Viewing AI as a sociotechnical system Expect growing discourse on designing AI systems based on social and technical considerations to benefit society. When AI systems are developed mainly on text data, they miss critical social features of language including worldviews, beliefs, culture, and lived experiences. Currently, AI model developers use prompt engineering — i.e., identifying the variations in the questions that people can ask of the model — combined with text augmentation, human alignment, etc., to improve their models to encode the social features of language required for effective communication. Unfortunately, based on their poor understanding of idioms, proverbs, symbolisms, and nuanced communication that are crucial in many undigitized languages, these AI systems struggle with communicative performance, i.e., communication that effectively and appropriately meets the needs of stakeholders in each situation. As such, there needs to be a concerted and intentional effort on the development of language corpora for AI systems that goes beyond the current practice of text augmentation and human alignment, to ensure that these large language models capture and encode the variations in different contexts such as historical, regional, cultural, and sociolinguistic. Recently, I interviewed a few people to take care of a sick relative in my native country, Nigeria. One of the applicants opened with a special form of greeting that is part of the culture of the Edo Kingdom in southern Nigeria. Greetings are encoded into words based on kinship or lineage. This is not obvious to outsiders but when this applicant greeted me in my own special word form, I felt a kindred spirit and trust right away. This is communicative performance and brings me to my final consideration. 4 . Building local capacity to support the development of AI In the coming year and beyond, we will see an increased focus on developing AI knowledge bases from personal or lived experiences. Communities should have the capacity to develop AI solutions that reflect their specific contexts. Initiatives like data.org’s Capacity Accelerator Network can fill gaps in linguistic corpora for LLMs and help build trust with communities. We must all be intentional about democratizing AI development by empowering local communities to capture and contribute local datasets that honor and incorporate Indigenous knowledge, among other things. Localizing data, language, and skills is crucial to ensuring AI meets the needs of billions who don't speak English or Mandarin, while also creating more robust and accurate AI models. Localizing AI is not just a necessity, it's also a transformative step toward equity and inclusion for all. A word is enough for the wise.

    Related Stories

    Why AI can’t transform classrooms until it learns local languages
    Why AI can’t transform classrooms until it learns local languages
    Opinion: Innovation challenges yield bold ideas in data and AI
    Opinion: Innovation challenges yield bold ideas in data and AI
    Opinion: How community-led innovation can help drive equitable AI
    Opinion: How community-led innovation can help drive equitable AI
    Devex Career Hub: Best options for breaking into development
    Devex Career Hub: Best options for breaking into development

    As we look toward 2025, linguistic inclusivity in artificial intelligence development will become increasingly urgent.

    As AI globally transforms life and society, two things are confounding. The first is the few big technology companies driving AI advancement paying lip service about “AI serving the needs of everyone, everywhere.” The second is the global development community embracing an exclusive and unrepresentative AI technology that is driven predominantly by English and Mandarin, to the detriment of over 7,000 languages spoken by about 5 billion people across the community it seeks to serve.

    When a woman, who is part of a marginalized community, with low levels of education and living in poverty, wakes up to find a lump in her breast, she may not know what her next step should be. She is fluent in her native language but does not speak English or Mandarin. How can AI help her find her way through the daunting maze of medical diagnosis until treatment? Until AI can “speak” the languages of these vulnerable communities, its potential to advance the Sustainable Development Goals, reduce global disease burdens, and address global inequities will remain limited.

    This article is free to read - just register or sign in

    Access news, newsletters, events and more.

    Join usSign in
    • Innovation & ICT
    • Social/Inclusive Development
    • data.org
    Printing articles to share with others is a breach of our terms and conditions and copyright policy. Please use the sharing options on the left side of the article. Devex Pro members may share up to 10 articles per month using the Pro share tool ( ).
    The views in this opinion piece do not necessarily reflect Devex's editorial views.

    About the author

    • Uyi Stewart

      Uyi Stewart

      Uyi Stewart is the chief data and technology officer at data.org, where he oversees the delivery of programmatic initiatives to accelerate the power of data and AI to solve some of our pressing global challenges. He holds a doctorate in Linguistics, with about 25 years’ experience advancing data for social impact in both public and private sectors.

    Search for articles

    Related Stories

    EducationRelated Stories - Why AI can’t transform classrooms until it learns local languages

    Why AI can’t transform classrooms until it learns local languages

    Sponsored by data.orgRelated Stories - Opinion: Innovation challenges yield bold ideas in data and AI

    Opinion: Innovation challenges yield bold ideas in data and AI

    Sponsored by The Pfizer FoundationRelated Stories - Opinion: How community-led innovation can help drive equitable AI

    Opinion: How community-led innovation can help drive equitable AI

    Devex Career HubRelated Stories - Devex Career Hub: Best options for breaking into development

    Devex Career Hub: Best options for breaking into development

    Most Read

    • 1
      Why NTDs are a prime investment for philanthropy
    • 2
      The silent, growing CKD epidemic signals action is needed today
    • 3
      Trump withdraws, defunds dozens of international orgs and treaties
    • 4
      Why capital without knowledge-sharing won't solve the NCD crisis
    • 5
      Why are 3.4 billion people still offline?
    • News
    • Jobs
    • Funding
    • Talent
    • Events

    Devex is the media platform for the global development community.

    A social enterprise, we connect and inform over 1.3 million development, health, humanitarian, and sustainability professionals through news, business intelligence, and funding & career opportunities so you can do more good for more people. We invite you to join us.

    • About us
    • Membership
    • Newsletters
    • Advertising partnerships
    • Devex Talent Solutions
    • Post a job
    • Careers at Devex
    • Contact us
    © Copyright 2000 - 2026 Devex|User Agreement|Privacy Statement