Arabic Langauge Technology Center "ALTEC" 

Diacritized Corpus

As the world is witnessing a growing interest in building corpus-based/statistical NLP tools and applications, Arabic faces a critical problem in building such tools because of its lack of language resources

It is very weall known that the absence of diacritics in Arabic is one of the challenges that faces Arabic NLP. Accordingly, this corpus was built by the Arabic Language Technology Centre (ALTEC) ( as language resource for Arabic in order to support research in Natural Language Processing.

Corpus Diacritized (Commercial 1500$- Academy 150$)

Academic version contains the basic version of the databases, and the commercial version contains the full version of the databases.

Egyptian companies listed on the ITIDA and academia interested in the full version will enjoy 20% discount.

ALTEC members enjoy 10% discount.

Related files to Diacritized Corpus

  • you have to login first in order to download files , you can login from here , or register from here