د. هدى بوعمر

Houda Bouamor, Ph.D.

Assistant Teaching Professor
Department of Information Systems
Carnegie Mellon University, Qatar

Research Interests

Natural Language Processing (NLP)
Arabic and Dialectal Arabic Processing
Paraphrase Acquisition
Gender bias in natural language processing
NLP in the financial domain
Paraphrase Validation in Context
Contextual Targeted Paraphrasing
Machine Learning, Deep Learning

Carnegie Mellon University
Qatar Foundation
Doha, State of Qatar

hbouamor [at] cmu.edu

I am an Assistant Teaching Professor in the Information Systems department at CMU-Q. I am also part of the Natural Language Processing Lab and work on Arabic NLP with Nizar Habash and Kemal Oflazer. My Ph.D. is from Paris-Sud University, France, where I worked on paraphrasing in the framework of Computational Linguistics, advised by Anne Vilnat and Aurélien Max. My M.Sc. is in Computer Science from the Paris-Est Marne-La-Vallée University and a bachelor degree in Computer Science from the University of Manouba, Tunisia. She worked on different projects including resolving different NLP issues. Her main research interest revolves around Statistical Machine Translation.

Curriculum Vitae » Research Statement »
Dissertation » Abstract »

Projects

QALB

QALB (Qatar Arabic Language Bank) is a joint project between us and Nizar Habash and colleagues at Columbia University . The project aims to build a large corpus of manually corrected Arabic text for building automatic correction tools for Arabic text. Furthermore, the project includes research on statistical techniques for automatic correction of Arabic text.

Kemal Oflazer, Behrang Mohit, Houda Bouamor, Wajdi Zaghouani, Ossama Obeid

Wiki TEA

Wiki TEA (Wikipedia Translation-English to Arabic) is a project focused on creating techniques, tools and resources for enhancement and expansion of Arabic Wikipedia through statistical machine translation.

Behrang Mohit, Houda Bouamor, Mahmoud Azab, Kemal Oflazer, Ossama Obeid, Wajdi Zaghouani

OptDiac

OptDiac: An Optimal Diacritization Scheme for Arabic Orthographic Representation. The overarching objective is to improve Arabic NLP in general as well as improve readability and comprehension rates for Arabic text thereby potentially having an impact on literacy in the Arabic world as well as creating principled writing standards that extend to the dialects. We believe that some form of partial diacritization can achieve these two goals. We do not hypothesize that the same partial diacritization scheme will be maximally useful for both areas.

Mona Diab, Kemal Oflazer, Houda Bouamor, Wajdi Zaghouani, Zeinab Ibrahim

English-Arabic SMT

Learning from Comparable Corpora for Improved English-Arabic Statistical Machine Translation, Funded by QNRF 12/2010 – 11/2013.

Upcoming Project:
MADAR: Multi-Arabic Dialect Applications and Resources

Our main objective in this project is to improve Dialectal Arabic NLP in general. Specifically we aim to develop a suite of four novel multi-dialectal resources which will be used to conduct original research in two applications that are valuable enabling technologies necessary to support future research in Arabic NLP.

The question of whether machines can think is about as relevant as the question of whether submarines can swim.

— Edsger Dijkstra, 1984

Resources & Tools

AL-BLEU dataset and software for evaluaion of Arabic Machine Translation »
The 2014 Automatic Arabic Error Correction Shared Task »
Arabic Named Entity Translation Lexicon »

Publications

Google Scholar »

...

لا تخجل من السؤال عن شيء تجهله،فخير لك أن تكون جاهلا مرة من أن تظل على جهلك طول العمر