About Norma
Norma is a tool for automatic spelling normalization of non-standard language data. It was originally developed within the Anselm project for normalizing Early New High German to modern standard German; for example:
- vnse lybe vrouwe → unsere liebe Frau
“our beloved woman”
To achieve this, Norma uses a combination of different normalization techniques that typically require training data (= a list of manually normalized wordforms) and a target dictionary (= a list of valid wordforms in the target language).
Availability
Norma is open source and available on GitHub:
Please note that Norma is only available as a command-line utility – there is no graphical interface. There are also no pre-compiled binaries at this point; you need to compile the source code yourself. Norma is written in C++11, though bindings for Python 2 are provided as well.
Publications
The general architecture and the normalization methods currently implemented in Norma are described in the following paper:
- Marcel Bollmann (2012). (Semi-)Automatic Normalization of Historical Texts using Distance Measures and the Norma tool. In: Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2), pp. 3–14. Lisbon, Portugal. [PDF]