Lead Inventor: Mona T. Diab, PhD
Arabic Diacritization in Written Modern Standard Arabic
Arabic writing is known for being underspecified for short vowels and consonantal gemination (letter doubling), which are typically expressed using diacritics -- marks inserted above or below the letter. These diacritics are extremely useful for readability and comprehension. However, current written materials -- unless used for elementary pedagogy or liturgical reasons -- have only about ~1.5% of the potential diacritics explicitly marked. On the other hand, fully specifying all diacritics also hinders readability. Compounding this problem is the diglossic situation: the native spoken dialects are quite different from the formal Modern Standard Arabic (MSA), the common written and academic language. Achieving an optimal level of Arabic diacritization in written MSA would render the script more readable and comprehensible, and possible have great impact on effective literacy in the Arab world.
Optimal Diacritization (OPTDIAC) Encodes Essential Level of Diacritization in Arabic Writing
The inventors set out to find an optimal diacritization system that encodes the essential level of information to be explicitly marked in the script (OPTDIAC). Their research program investigates different hypothesized diacritization schemes from different scientific perspectives: both psycholinguistic and neurolinguistic studies coupled with computational modeling in the context of natural language processing machinery. OPTDIAC produces the optimal diacritization automatically on already existing text, and may be used by written media publishing houses as well as the internet.
Applications:
- * Automatic diacritization of already existing Arabic text
- * Word processing, document, and publishing tool
- * Web browser integration for automatic diacritization
- * Web server integration for automatic diacritization
- * Extension to Arabic script based languages, including Urdu and Persian
- * Pedagogical tool used in textbooks and teaching systems for students learning Arabic
Advantages:
- * Optimal level of diacritization
- * Automatic processing
- * Standardization of written language
- * Based on psycholinguistic and neurolinguistic studies
Patent Status: Patent Pending
Publications: Diab, Mona, Mahmoud Ghoneim and Nizar Habash. Arabic Diacritization in the Context of Statistical Machine Translation, In Proceedings of the Machine Translation Summit (MT-Summit), Copenhagen, Denmark, 2007.
Licensing Status: Available for Licensing and Sponsored Research Support