Omar Kamali

Omar Kamali

Independent AI Researcher & Builder

I build AI for the languages the industry ignores.

I grew up in Morocco, speaking Darija in a world where every piece of technology spoke back in someone else's language. That experience is the origin of everything I build.

In 2023, I built Sawalni, the first conversational AI for Moroccan Darija, supporting both Arabic and Latin script. Corpus, pipeline, model architecture: all built from scratch, with no prior art to lean on. Thousands of users. Presented at international conferences on Moroccan Arabic linguistics. Featured on Moroccan national television.

I run Omneity Labs on a personal basis, as a private R&D lab focused on low-resource language AI. Current research: training base language models for underrepresented language families. The data pipelines, tokenization methods, and evaluation infrastructure that don't yet exist for these languages. I've published peer-reviewed work on multilingual phonetics, and I maintain open-source datasets and models used by researchers working on similar problems.

I also lead GenAI engineering at Blue Yonder, where I build and deploy LLM systems at enterprise scale. That work informs how I think about training infrastructure and production deployment, not just research prototypes.

My first line of code was BASIC on an Amstrad CPC at age six. I built my first website at nine (it's still online). My first serious research question: why can't I talk to a computer in the language I actually think in? I'm still working on the answer.

I'm interested in collaborating with researchers, communities, and organizations working on language equity, multilingual AI, and open-source NLP. If that's you, I'd like to hear from you.

Work

Sawalni

The first conversational AI for Moroccan Darija & Amazigh. Arabic, Latin, and Tifinagh script. Built from scratch.

sawalni.com

WikiLLM

In development

Open base models for low-resource language families, trained on Wikipedia data.

Coming 2026

wikilangs.org

NLP models derived from 340+ Wikipedia language editions to bootstrap LLM development.

HuggingFace

Open source

Tools, datasets, and infrastructure for multilingual NLP: tokenizers, embeddings, training frameworks, and data pipelines.

github.com/omarkamali

Interested in collaboration?

Or find me on X · HuggingFace · LinkedIn

© 2026 Omar Kamali. All rights reserved. · Contact
Made in 🇲🇦 & 🇩🇪