alephclient is a command-line client for Aleph. It can be used to bulk import document sets via the API, without direct access to the server.
alephclient, you need to have Python 3 installed and working on your computer. You may also want to create a virtual environment using
pyenv. With that done, type:
pip install alephclientalephclient --help
alephclient needs to know the url of an Aleph instance that it will connect to. For privileged operations, it will also need to know the user's API key. The API key can be found in the users profile page on the Aleph web application.
Both settings can be provided by setting the environment variables
ALEPHCLIENT_API_KEY respectively; or by passing them in with
--api-key options directly for the command. The examples below will assume you have set up the environment variables:
export ALEPHCLIENT_HOST=https://data.occrp.org/export ALEPHCLIENT_API_KEY=7b08118ad9e19be0a82906064ac9c19c8eee4869
Really enthusiastic users of Aleph might want to add these settings to their shell login file (
crawldir command crawls through a given directory recursively and uploads all the files and directories inside it to a collection. The external identifier (foreign ID) of the target collection needs to be passed to the command with
--foreign-id option. If a collection with the given foreign ID does not exist, it will be created. The
pathargument needs to be a valid path to a file or directory:
alephclient crawldir --foreign-id wikileaks-cable /Users/sunu/data/cable
When Aleph imports data, it performs optical character recognition (OCR) on images contained in the material. This works better when Aleph already has an idea of the language the documents might use. This can be specified with the
--language option, which expects a 3-letter ISO 639 language code. It can be specified multiple times, for when the directory contains files in more than one language.
alephclient crawldir --language rus --foreign-id russian_leak /Users/sunu/data/russian_leak
write-entities command, users can load JSON-formatted entities formatted in the
followthemoney structure into an aleph collection. This can be used in conjunction with the command-line tools for generating such data provided by
followthemoney. Data that is loaded this way should be aggregated before being written to the API, for example using the
ftm aggregate command-line utility, or the
followthemoney-store database layer.
A typical use might look this:
curl -o md_companies.yml https://raw.githubusercontent.com/alephdata/aleph/master/mappings/md_companies.ymlftm map md_companies.yml | ftm aggregate | alephclient write-entities -f md_companies
stream-entities is the inverse of
write-entities. It will stream entities from the given Aleph instance so that they can be written to a file or piped into another processing step:
alephclient stream-entities -f us_ofac >us_ofac.json
With the Follow the Money tools installed, you can build more complex command like such:
alephclient stream-entities -f us_ofac | ftm validate | ftm export-excel -o OFAC.xlsx
bulkload command executes an entity mapping within the Aleph system. Its only argument is a YAML mapping file:
curl -o md_companies.yml https://raw.githubusercontent.com/alephdata/aleph/master/mappings/md_companies.ymlalephclient bulkload md_companies.yml