Columbia Technology Ventures

Database and Annotation Tool for Computational Modeling of Arabic Nominal Gender, Number and Rationality Morphosyntax

To request an academic license to download and use this software, create an account/log in if you do not have an account/are not logged in, return to this page, and click Express Licensing. For a commercial license, please contact techventures@columbia.edu.

This technology is a linguistic database of Arabic functional gender, functional number, and rationality. These are important features for modeling Arabic morphosyntactic agreement. In addition, this technology includes a tool for annotating the Linguistic Data Consortium (LDC) Arabic treebanks with the morphosyntatic information mentioned above. Arabic has complex agreement patterns and irregular morphology; and current Arabic LDC treebanks represent nominal gender and number by shallow (non-functional) forms and do not include nominal rationality. The database and annotation tool can improve computational modeling of Arabic for natural language processing and linguistics research applications.

The annotation tool requires that researchers obtain Arabic corpora from the LDC.

Lead Inventor:

Nizar Habash, Ph.D., Sarah Alkuhlani

Applications:

  • Annotate Arabic corpora with correct morpho-syntactic agreement computationally.
  • Build computational models of Arabic morphology and syntax.
  • Engineer Arabic language processing systems.
  • Study of Arabic linguistic phenomena.
  • Translate Arabic language with correct nominal gender, number and rationality agreement.

Advantages:

  • Annotates LDC treebanks with missing information regarding nominal gender, number, and rationality agreement.
  • Improves computational modeling of Arabic morphosyntax for natural language processing applications.

Tech Ventures Reference: IR CU14137

Related Publications: