Sawalni
First LLM for Moroccan Darija
Sawalni is the first large language model built specifically for Moroccan Darija, supporting both Arabic and Latin scripts. It serves as an AI assistant that understands the linguistic subtleties and cultural context of Morocco, making AI accessible to 30+ million Darija speakers.
Related posts
Beyond Tokenization: The Four Taxes and the Path Forward
The compounding tax stack low-resource languages carry, why vision encoders might hold the key, and the open research questions.
The Hidden Tax Your LLM Pays for Bad Tokenization
How bad tokenization forces language models to waste capacity on reconstruction instead of reasoning.
Tokenization is Killing Our Multilingual LLM Dream
Why tokenization is the hidden bottleneck blocking truly multilingual AI — lessons from building Sawalni and Wikilangs.
Why I stopped trusting the official Wikipedia dataset, and what I did about it
It all started with a DM from a friend, member and contributor to the Moroccan Wikipedia community. "Are you using the current version of Wikipedia? The official dataset is severely outdated. We added so many cool articles nowhere on huggingface" He was right. I was running a 2023 snapshot in 2025.
A Wordle for the Worldle
I built a word game for more than 300 languages, each drawing on its own Wikipedia as the source. Here's the thing nobody tells you: building a simple word game for most of these languages meant building things that didn't exist.
Introducing Wikipedia Monthly: Fresh, Clean Wikipedia Dumps for NLP & AI Research
Announcing Wikipedia Monthly, an always fresh dataset to support research for low-resource languages
2024: A Year of Growth, Innovation, and Community
As we leave 2024 behind, I found myself reflecting over the holidays on a transformative year that reshaped my grasp of technology's role in human connection.
Shaping the Future with Sawalni: The Dawn of Moroccan AI
I've been asked multiple times, "Why are you creating a Moroccan AI?" Today I want to share the story behind Sawalni, the first AI in history to speak our beautiful Moroccan Darija, with all of you.