Aleph is an open source toolkit for investigative data analysis. It allows generating, searching and analysing large graphs of heterogeneous data, including public records, structured databases and leaked evidence. The system can integrate data from both unstructured data formats (like PDF, Email, and other file types) and structured data such as CSV files, or SQL databases. Data that's been loaded can be securely searched, cross-referenced with other datasets and exported to other systems.
At the core of Aleph's capabilities is Follow the Money (FtM), a shared data model the encapsulates core concepts such as
Contracts. Such data can be generated from tabular inputs, or via the
ingest-file system that extracts data from dozens of input formats (including Word, Powerpoint, PDF, Access, E-Mail, ZIP Archives and so on).
The Aleph system also includes Memorious, a crawler framework that lets you write, manage and control a fleet of scrapers to maintain up-to-date copies of public records from the web.
We're keen to consider pull requests for extensions or bug fixes in all components of the platform. An ideal submission would already follow common coding standards, such as PEP8, and, when significantly changing functionality, include a test case.
Please also consider dropping by in the Slack instance before to discuss your idea.