د. هدى بوعمر
Assistant Teaching Professor
Department of Information Systems
Carnegie Mellon University, Qatar
Natural Language Processing (NLP)
Arabic and Dialectal Arabic Processing
Paraphrase Acquisition
Gender bias in natural language processing
NLP in the financial domain
Paraphrase Validation in Context
Contextual Targeted Paraphrasing
Machine Learning, Deep Learning
Carnegie Mellon University
Qatar Foundation
Doha, State of Qatar
I am an Assistant Teaching Professor in the Information Systems department at CMU-Q. I am also part of the Natural Language Processing Lab and work on Arabic NLP with Nizar Habash and Kemal Oflazer. My Ph.D. is from Paris-Sud University, France, where I worked on paraphrasing in the framework of Computational Linguistics, advised by Anne Vilnat and Aurélien Max. My M.Sc. is in Computer Science from the Paris-Est Marne-La-Vallée University and a bachelor degree in Computer Science from the University of Manouba, Tunisia. She worked on different projects including resolving different NLP issues. Her main research interest revolves around Statistical Machine Translation.
Curriculum Vitae » Research Statement »
Dissertation » Abstract »
QALB (Qatar Arabic Language Bank) is a joint project between us and Nizar Habash and colleagues at Columbia University . The project aims to build a large corpus of manually corrected Arabic text for building automatic correction tools for Arabic text. Furthermore, the project includes research on statistical techniques for automatic correction of Arabic text.
Wiki TEA (Wikipedia Translation-English to Arabic) is a project focused on creating techniques, tools and resources for enhancement and expansion of Arabic Wikipedia through statistical machine translation.
OptDiac: An Optimal Diacritization Scheme for Arabic Orthographic Representation. The overarching objective is to improve Arabic NLP in general as well as improve readability and comprehension rates for Arabic text thereby potentially having an impact on literacy in the Arabic world as well as creating principled writing standards that extend to the dialects. We believe that some form of partial diacritization can achieve these two goals. We do not hypothesize that the same partial diacritization scheme will be maximally useful for both areas.
Learning from Comparable Corpora for Improved English-Arabic Statistical Machine Translation, Funded by QNRF 12/2010 – 11/2013.
Our main objective in this project is to improve Dialectal Arabic NLP in general. Specifically we aim to develop a suite of four novel multi-dialectal resources which will be used to conduct original research in two applications that are valuable enabling technologies necessary to support future research in Arabic NLP.
لا تخجل من السؤال عن شيء تجهله،فخير لك أن تكون جاهلا مرة من أن تظل على جهلك طول العمر