Arabic | English

This Speech corpus has been developed as part of PhD work carried out by Nawar Halabi at the University of Southampton. The corpus was recorded in south Levantine Arabic (Damascian accent) using a professional studio. Synthesized speech as an output using this corpus has produced a high quality, natural voice.

It is released here under the creative commons license specified below. In case further rights are required, or you require consultancy for building Arabic speech corpora, please contact Nawar Halabi by email. Thank you for your interest.

Download Corpus Package

Please feel free to try my Conditional Random Field based, high quality diacritiser for Arabic which can work on mobile phones.

High Quality Diacritiser Demo

More documentation will be added in the future. Please refer to Nawar Halabi's PhD Thesis for more details. Please note that the apostrophe which follows some vowel phonemes in the corpus indicates that the vowel is in a stressed syllable. Feel free to visit the Arabic Speech Corpus Wikipedia page for more information about the corpus. In this repo there is a Docker image for this TTS server which can run on most platforms easily

Thank you very much to Taha Zerrouki, Ahmad Barqawi, Karim Hemina and Oussama Hemina for their work to produce this TTS:

  1. Festival for Arabic
  2. Mishkal Diacritiser
  3. Shakkala Diacritiser

Thank you to Ali Hamdi, Ibrahim Tuffaha, Baraa' Al-Jawarneh and Mahmoud Al-Ayyoub for their work on Shakkelha which is the best diacritiser as far as I know.

Creative Commons License.
Arabic Speech Corpus by Nawar Halabi is licensed under a Creative Commons Attribution 4.0 International License. Based on a work at