CorA (Corpus Annotator) - About CorA | Computational Historical Linguistics

About CorA

CorA is a web-based annotation tool for word-level annotation of historical and other non-standard language data. It allows for editing the primary data, e.g. to correct mistakes in a transcription, or to modify token boundaries during the annotation process. It also supports retraining and reapplication of external annotation tools, such as POS taggers.

Originally developed to annotate historical texts for the Anselm corpus and the Reference corpus Early New High German, the tool has since been used for a variety of other projects, including the annotation of social media data, the InterGramm project for grammatical analysis of historical texts, and the Reference corpus Middle Low German/Low Rhenish.

CorA is implemented in PHP and JavaScript, using a MySQL database back-end. The user interface is available in both English and German.

Availability

CorA is open source and available on GitHub:

A comprehensive user documentation, describing CorA’s functionality and usage, is available here:

CorA user documentation

Publications

Marcel Bollmann, Florian Petran, Stefanie Dipper, and Julia Krasselt (2014). CorA: A web-based annotation tool for historical and other non-standard language data. In: Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH), pp. 86–90. Gothenburg, Sweden. [PDF]