Reference Corpora for Middle and Early New High German - Annotations

Annotations

The annotations of the two corpora REM and REF cover the following levels:

Lemma (REM): Each wordform is annotated with the lemma according to Lexer. Eventually the annotated lemmas will be linked to the online database “Wörterbuchnetz”, Mittelhochdeutsches Handwörterbuch von Matthias Lexer. There are, however, some differences between the lemmas in Lexer and REM/REF, e.g.:
- Morpheme boundaries are marked by a hyphen. Lexer only marks the very first prefix (un-geselleschaft), REM/REF mark all prefixes (un-ge-sèlle-schaft).
- Lexer distinguishes between two types of “e”: the original Germanic “e”, represented as “ë”, and all other types, represented as “e” (e.g. her-bërge). REM/REF in addition distinguishes between “e” as the result of Umlaut-a (“è”) and Schwa-e (“e”) (e.g. hèr-bërge); unclear cases are marked by “É”.
- The adjectival suffix “–ig” is represented as “–ec” in Lexer (e.g. manec, manec-valtec-heit), and “-ig” in REM/REF (manig, manig-valtig-hèit)
- Devoiced consonants in the offset: Lexer uses “-t”, “-c”, “-p” etc. to mark them, whereas REM/REF use “-d”, “-g”, “-b” etc. For instance, Lexer’s walt correspond to REM/REF’s wald.
Lemma (REF): Each wordform is annotated with the lemma according to Grimm’s Deutsches Wörterbuch. Eventually, the annotated lemmas will be linked to the online database “Wörterbuchnetz”, Deutsches Wörterbuch von Jacob Grimm und Wilhelm Grimm.
Morphology (REM/REF): Morphological annotations follow the STTS guidelines and record morpho-syntactic features, such as gender, number, case, etc.
Part of speech (REM/REF): Parts of speech are annotated according to HiTS. HiTS distinguishes between annotations that relate to the wordforms as such (called “lemma-related”), and annotations that relate to the wordform in its current use (called “instance-related”). More information on the HiTS website…

The annotations can be searched via the corpus tool ANNIS. For instance, postposition of attributive adjectives can be searched by the query expression pos = "ADJN". A sample match is Hich kelouben an got fater alemactigen unde an den heiligen sun: