About Norma

Norma is a tool for automatic spelling normalization of non-standard language data. It was originally developed within the Anselm project for normalizing Early New High German to modern standard German; for example:

  • vnse lybe vrouweunsere liebe Frau
    “our beloved woman”

To achieve this, Norma uses a combination of different normalization techniques that typically require training data (= a list of manually normalized wordforms) and a target dictionary (= a list of valid wordforms in the target language).

Availability

Norma is open source and available on GitHub:

Please note that Norma is only available as a command-line utility – there is no graphical interface. There are also no pre-compiled binaries at this point; you need to compile the source code yourself. Norma is written in C++11, though bindings for Python 2 are provided as well.

Publications

The general architecture and the normalization methods currently implemented in Norma are described in the following paper:

  • Marcel Bollmann (2012). (Semi-)Automatic Normalization of Historical Texts using Distance Measures and the Norma tool. In: Proceedings of the Second Workshop on Annotation of Corpora for Research in the Humanities (ACRH-2), pp. 3–14. Lisbon, Portugal. [PDF]