Aleph as a toolkit contains a number of Python libraries that can be used independently of the core tool for data parsing and normalisation.
All of the tools below are packaged as releases regularly and can be installed via the Python package registry using
fingerprintsis a Python library that heavily normalises names of companies and people before comparison. This includes transliteration, word order, and the normalisation of company type suffixes like Limited (Ltd) or Aktiengesellschaft (AG).
normalityand works best when
msgliteis a fork of
msg-extractor, a parser for Microsoft Outlook MSG files. These binary email files are OLE containers (like old-style Word or Excel documents) and require some tickling before they will confess details about the contained email message.
countrynameshelps to turn country names into two-letter ISO codes representing that country. For example,
gb. Due to the work area of the OCCRP, this includes some exotic country designations, such as Yugoslavia, Transnistria and the Soviet Union (now deceased).