Context

Non-native speakers of a language (called the target language here) producing documents in that language (e.g. French authors writing in English) often encounter lexical, grammatical and stylistic difficulties that make their texts difficult for native speakers to understand. As a result, the professionalism and the credibility of these texts is often affected.

Overview

Our main aim is to develop procedures for the correction of those errors which cannot (and will not in the near future) be treated by the most advanced text processing systems such as the Office Suite, OpenOffice and the like. We also aim at correcting style and text-level errors in the user's native language, since those are very frequent.

  • In contrast with text editors, but in the spirit of tutoring systems, we want to leave decisions as to the proper corrections up to the writer, providing him/her with arguments for and against a given correction, in case several corrections are possible.

To achieve these aims we need to produce a model of the cognitive strategies deployed by human experts (e.g. translators correcting texts, teachers) when they detect and correct errors. Our observations show that it is not a simple and straightforward strategy, but that error diagnosis and corrections are often based on a complex analytical and decisional process.

This project includes several aspects in language, linguistics, didactics, artificial intelligence and computational linguistics. It includes the main following items:

  • Error analysis and categorization via corpus analysis and annotation. The focus is on humans writing in a foreign language, or in their own language. A possible extension will be the outputs of machine translation systems,
  • Analysis of the correction process and language Didactics: arguments for and against a given correction, decision process modelling,
  • Linguistic modelling of error sources: calque effects, paraphrasing, overgeneralization, etc.
  • Linguistic modelling of conceptual resources at stake in the correction: lexical contents, grammar rules and language construction modes, stylistic rules and know-how, well-formed textual structures,
  • Computational linguistics aspects: model for resources, correction strategies, concurrency of correction rules, evaluation methods,
  • Reasoning aspects: argumentation theory applied to didactics, decision theory, explanation strategies, knowledge representation,
  • Human-computer interactions: user modelling, user profiling.
  • Main languages studied: French, English, Spanish; Others: Thai

Contributors

Project

Main publications

  • Albert, C., Buscail, L., Garnier, M., Rykner, A., Saint-Dizier, P., Annotating language errors in texts: investigating argumentation and decision schemas, ACL-LAWIII workshop, Singapore, August 2009.
  • Albert, C., Garnier, M., Rykner, A., Saint-Dizier, P., Analyzing a corpus of documents in English produced by French writers: annotating the lexical, grammatical and stylistic errors and their distribution, Corpus linguistics conference, Liverpool, July 2009.
  • Garnier, M., Saint-Dizier, P., An Analysis of the Calque Phenomena Based on Comparable Corpora, ACL-BUCC workshop, Singapore, August 2009.