This project, titled “Open-Source Datasets and Models for Persian Text-to-Speech,” spearheaded by ZabanZad.ai, seeks to fill a significant gap in the realm of Persian speech recognition, specifically concerning production-grade text-to-speech (TTS) models designed specifically for Persian. It identified that despite its cultural richness and wide usage, Persian language support is lacking in AI programs and platforms, such as Google Translate, Google Map voice search, and Alexa Built-in Devices.The project commences with leveraging existing resources, such as the datasets available on Kaggle (a popular data science community) to kickstart the model development. It performs an intensive modeling using these datasets to create a base version of the Persian TTS model. This approach accelerates the prototyping phase and enables rapid testing and fine-tuning to increase the model’s precision over time.
On the technical front, the project puts the power of cutting-edge AI technologies to use. It employs the Variational Transformer-based TTS model known as VITS as the principal technology. This neural network architecture’s strength lies in its flexibility and capability to generate highly expressive and distinctive voices, making it ideal for synthesizing a wide range of Persian accents and dialects.
Next, the project undertakes to fine-tune XTTS, an advanced multilingual TTS model. Currently, XTTS supports only 16 languages, and this project intends to include Persian as the 17th supported language. Fine-tuning XTTS involves adapting the model to Persian’s unique linguistic characteristics, ensuring that the final product preserves the natural inflections of the Persian language.
The three-phase development plan then rolls out, curating and processing Persian TTS datasets, gathering various Persian accents, and building community-driven applications using the developed models.
Collaboration with the University of New Haven’s Secure and Assured Intelligent Learning (SAIL) Lab lends the project a strong technical foundation. The lab’s expertise in natural language processing and innovative fine-tuning methodologies substantiates the project’s direction towards developing high-quality Persian TTS models.
This project encapsulates a rigorous methodology, seasoned partnership, advanced tech stack, and strong community engagement. Its successful execution can pave the way for breakthrough advancements in Persian language technology.
The Problem We Aim to Solve:
- Research Gap: At present, there is no centralized source of comprehensive information concerning Persian Language and Speech Models, creating a significant knowledge gap in this field.
- Limited Support: Major technology platforms provide insufficient support for the Persian language, impeding the development of Persian language-based products and services.
- Learning Tools: Existing Persian language learning tools are often inaccessible and lack engagement, especially for young Iranians living abroad who seek effective language learning resources.
- Language Preservation: By improving the datasets used to train Persian Language Models, we contribute to the preservation of diverse Iranian languages, ensuring they remain vibrant and accessible.
- Entrepreneurial Challenges: Persian-speaking entrepreneurs encounter substantial language barriers, restricting their ability to communicate effectively in the global market, thus limiting their business opportunities.
Our Innovative Solutions:
- Multilingual Translator App: Our innovative application offers cutting-edge speech-to-text and text-to-speech capabilities, real-time translation between Persian and English, a specialized business vocabulary, and an acute understanding of cultural nuances.
- Iranian Storyteller: This interactive app generates captivating stories in various Iranian languages, facilitating vocabulary expansion and improving comprehension skills for users.
- Iranian Tutor: Our personalized language tutor application delivers tailor-made lessons, quizzes, exercises, and simulated conversations, making language practice engaging and highly effective.
- Iranian Culture Explorer: This virtual tour guide app provides users with deep insights into Iranian culture, history, and traditions through interactive content, fostering a richer understanding of the rich cultural context of the Iranian languages.
- Strengthening Persian Language Models: Our project concentrates on enhancing the capabilities of existing language models, elevating their overall performance, and making Persian language learning and usage more accessible and efficient.
Open Source Initiative:
We are committed to the principles of open source. All our discoveries, models, and tools will be made publicly available and open-sourced. This commitment fosters collaboration and encourages further innovation in the realm of Persian language AI.
PI: Vahid Behzadan, Ph.D.
Current Team Members:
Davar Ardalan
Bahareh Arghavani Nobar
Mohammad MH. Rahmani
Tools and Datasets:
Code and Dataset: Github (request to access )
Website : Zabanzad.ai
Gofundme: Gofundme
Demo: Hugging face
Publications:
N/A – In the future