Research & Open Source

Omneity Labs is a private R&D lab focused on low-resource language AI. It produces open-source tools, datasets, and models covering 340+ languages: the data pipelines, tokenization methods, and evaluation infrastructure that don't yet exist for underrepresented languages.

Publications

Multilingual NLPPhoneticsScript AlignmentLow-Resource Languages

Sawtone: A universal framework for phonetic similarity and alignment across languages and scripts

Lingua Posnaniensis, Vol. 67, Issue 1, pp. 165-200 (2025)

Processing text across different scripts presents significant hurdles in natural language processing, especially when dealing with non-standardized orthographies and informal writing systems common in low-resource languages. To address this, we introduce Sawtone, an integrated framework designed to enable consistent cross-script phonetic alignment and text normalization. At its heart is an architecture built for interoperability, combining a unified phonological feature space rooted in linguistic principles with modular, language-specific adapters. This structure allows for robust mapping and comparison between any pair of scripts.

View publication

Conference Presentations

2024

Moroccan Darija and Generative AI

7th International Congress for Moroccan Arabic

University of Navarra, Spain

2024

TIM'24 Presentation

TIM'24 Conference

University Hassan II, Morocco

Interested in collaboration?

Research partnerships, compute collaborations, or contributing to low-resource language AI.

Get in touch
© 2026 Omar Kamali. All rights reserved. · Contact
Made in 🇲🇦 & 🇩🇪