Resources
Historical data: corpora and corpus-related resources
-
NEW: First version of the Corpus of Historical German, Bochum (short: HGB), a corpus of currently almost 29K tokens of ReM texts, annotated with an extended UD scheme.
-
Reference Corpus of Middle High German (short: ReM), a corpus of diplomatically transcribed and annotated texts from Middle High German (1050-1350)
-
Anselm Corpus, a semi-parallel corpus of diplomatically transcribed and annotated texts from Early New High German (14th-16th centuries)
-
HiTS: a tagset for historical German, described in Dipper et al. 2013
-
Guidelines for the Normalization of Historical Data, described in Krasselt et al. 2015
Other non-standard data
-
A corpus of Social Media data annotated with normalized word forms and normalization categories, described in Laarmann-Quante & Dipper 2016
Tutorials on Corpus Linguistics for Linguists
- Tutorials focussing on online resources for corpus linguistics, with linguistic case studies, links, slides, etc. (in German)
Software
-
TIGERSearch templates for searching topological fields in treebanks annotated according to the TIGER scheme
-
For tools that were developed in my projects, see here