A Wordle for the Worldle

A Wordle for the Worldle

By Omar Kamali • March 04, 2026 • In AI, Multilingual, NLP, Open Source, Games • Project @wikilangs

I built a word game for more than 300 languages, each drawing on its own Wikipedia as the source. Here's the thing nobody tells you: building a simple word game for most of these languages meant building things that didn't exist.

Can you recognize this language?

But first let's take a step back. Did you know your language probably has a Wikipedia, hundreds or thousands of articles, written by your community and freely available? Most people have no idea it exists. But should you want to search in your language? Autocomplete? Autocorrect? And don't even get me started on LLMs?

Until recently, your language didn't have the bare minimum needed for the digital age, most probably.

Not exotic research infrastructure, basic things. A clean word list. A tokenizer that actually understands how that language breaks into meaningful units. A vocabulary with frequency data so you know which words people actually use versus which words only appear in one article about medieval history. This is quite difficult to come across. Sometimes it simply hasn't been done.

I learned this the hard way in my journey to build Sawalni, the first LLM for Darija (Moroccan Arabic). I re-discovered and built the entire stack. From a basic inventory of the language's words, language identification, to semantic & phonetic embeddings as well as standardization, translation and transliteration models. What are all these obscure things you might ask? I also didn't know at the beginning and had to learn while on a tight schedule to ship.

Learn more about NLP Basics with spacy.

Interestingly, Darija's destiny was deeply intertwined with other languages, whether because they happen to confuse the system and I had to train models to distinguish them, or because there are synergetic effects I am trying to leverage. For the non-initiated, another interesting observation is that most underserved languages share similar characteristics and challenges, starting with the limited data - most of it being informal on social networks, but they're also often influenced by foreign languages (amount of loan words), written in a mix of local and foreign script (also called diglossia), lacking a standard or a formal dictionary for the language ..

Either way, I ended up with many assets for these languages even though I aimed at Darija originally. And that made me think, what if I could build the same foundation for every language with a Wikipedia? Maybe even build a flywheel between the Wiki community and this AI initiative?

I brought that idea to the team at Featherless AI. They didn't just listen: they went ahead and generously backed it.

Wikilangs is the result: open NLP models for every Wikipedia language edition, freely available, installable in one line. It includes tokenizers in four vocabulary sizes, n-gram language models, word embeddings, full vocabularies with frequency and IDF data, and utilities for extending existing large language models with vocabulary for any of those 300+ languages so you can build your own community LLM. Everything runs without a GPU. Everything uses the same API regardless of which language you're working in.

Wikilangs exists as an open-source initiative because of their continued support, and I want to be direct about that: this kind of work doesn't happen through individual heroics alone, it takes a village.

For Arpitan, Bashkir, Finnish, Somali, Tachelhit, Maltese, Mossi, Quechua, Xhosa and many others: it's the same interface, different languages, all with a first-class treatment, and a fun playground to explore our global heritage.

The Wikipedia communities behind these 358 languages put extraordinary work into something they gave and keep giving freely to the world. Wikilangs is an attempt to close that loop .. to take what they built and turn it into tools that flow back to their communities directly, not just into the training pipelines of companies they'll never hear from.

Oh don't forget the game itself, perhaps the first game for some communities in their own language. Come beat me on the leaderboard!

Explore languages at Wikilangs.org, or experience the game in your own language at games.wikilangs.org.

Interested in collaboration?

I'm open to research partnerships, compute collaborations, or contributing to low-resource language AI.

Get in touch
© 2026 Omar Kamali. All rights reserved. · Contact
Made in 🇲🇦 & 🇩🇪