Stefanie Dipper

Resources

NEW: First version of the Corpus of Historical German, Bochum (short: HGB), a corpus of currently almost 29K tokens of ReM texts, annotated with an extended UD scheme.
Reference Corpus of Middle High German (short: ReM), a corpus of diplomatically transcribed and annotated texts from Middle High German (1050-1350)
Anselm Corpus, a semi-parallel corpus of diplomatically transcribed and annotated texts from Early New High German (14th-16th centuries)
HiTS: a tagset for historical German, described in Dipper et al. 2013
Guidelines for the Normalization of Historical Data, described in Krasselt et al. 2015

TIGERSearch templates for searching topological fields in treebanks annotated according to the TIGER scheme
A rule-based tokenizer for German
For tools that were developed in my projects, see here
Fun: a generator for Old High German text