Installing Aleph

Below you will find the installation steps on how to install Aleph.

Prerequisites

Aleph requires multiple services to operate. It uses Docker containers to make it easier for development and deployments.

Before we continue, you will need to have Docker and docker-compose installed. Please refer to their manual to learn how to set up Docker and docker-compose.

Developer setup

This section describes how to set up Aleph for software development. Developer mode disables many security measures and it not meant to be used for internet-facing uses, Production deployment is needed instead.

What is developer mode?

Developer mode is a docker configuration for Aleph which makes it easy to do software development and debug the tool without having to install its dependencies on your host machine. These are the features of developer mode:

  • The code for the backend (api) server and the React frontend will automatically reload to reflect any changes you make in your working copy while the application is running.

  • Both backend and frontend will operate in debug mode and give more verbose error messages when a problem occurs.

  • The host machine's file system will be accessible from within Aleph's docker container at /host.

Developer mode overrides the configuration file for docker-compose, using docker-compose.dev.yml instead of docker-compose.yml. This is done by wrapping most developer mode commands using make.

Getting started

As a first step, check out the source code of Aleph from GitHub:

# Use the SSH URL if you have commit access:
git clone https://github.com/alephdata/aleph.git
cd aleph/

Once the code is downloaded, find the file called aleph.env.tmpl in the base directory. This is a template of the configuration file. Make a copy of this file named aleph.env and define settings for your local instance. Check the configuration section for more information regarding the available options.

Also, please execute the following command to allow ElasticSearch to map its memory:

sudo sysctl -w vm.max_map_count=262144

With the settings in place, you can use make all to set everything up and launch the web service. This is equivalent to the following steps:

  1. make build to build the docker images for the application and relevant services. You can run make docker-pull before to pull pre-build release images.

  2. make upgrade to run the latest database migrations and create/update the search index.

  3. make web to run the web-based API server and the user interface.

Open http://localhost:8080/ in your browser to visit the web frontend.

If any of the above steps fail, refer to the troubleshooting section for some common stumbling blocks and their fixes.

Working in a shell

During development, you will need to run command-line operations for certain tasks. In order to do so, you will first need to enter the docker container of Aleph. To do so, run:

make shell
# This will result in a root shell inside the container:
aleph --help

This will enter a docker container where the aleph shell command is available (see Usage for details). You can also access the host computers file system at /host. This means a file stored at /tmp/bla.txt on your computer can be found at /host/tmp/bla.txt inside the container.

Users

For development purposes, you can quickly create a new user with the aleph createuser command, inside a shell:

aleph createuser --email="user@example.com" \
--name="Alice" \
--is_admin \
--password=123abc \
alice

If you pass an email address in the ALEPH_ADMINS environment variable (in your configuration) it will automatically be made into an admin.

After running createuser, the newly created user's API key is printed, which you can use in the Authorization HTTP header of requests to the API. If you pass a password, you can use this email address and password to log into the web interface.

Loading Sample Data

If you want to quickly get some sample data in your Aleph instance you can use crawldir to index a small test data folder.

aleph crawldir /aleph/contrib/testdata

To also get a sample of (non-document) entity data, consider loading sanctions lists.

Running Tests

To run the tests, assuming you already have the docker-compose up and ready, run:

make test
make ingest-test

This will create a new database and run all the tests.

Building from a clean state

You can also build the Aleph images locally. This could be useful while working on the Dockerfile changes and new dependency upgrades.

To build the image you can run make build, which will build the alephdata/aleph image (this will generate a production ready image).

Production deployment

This section details how to set up Aleph in production mode. If you plan to change the source code or quickly test the software, you may wish to use the Developer setup instead.

Aleph is distributed as a set of Docker containers, which can be run on any server that meets the following criteria:

  • 8GB (or more) of RAM. While the software will start with much less, we advise providing ample main memory for ideal performance.

  • A working install of Docker and docker-compose. See the FAQ page for information on not using Docker.

  • An internet connection to download and install the package.

To begin a production deployment:

  • Obtain a copy of Aleph's docker-compose file and base configurations (named aleph.env.tmpl).

  • Make a copy of the configurations file named aleph.env and define settings for your production instance. Check the section on configuration for more information regarding the available options.

Aleph has support for multiple storage engines, including Amazon Web Services Simple Storage Service, Google storage buckets, and local file storage. Remember to configure this in your instance's configuration file.

  • Set ALEPH_TAG environment variable to specify the version of Aleph you want deploy. If ALEPH_TAG is not set, the stable version specified in the docker compose file is deployed.

  • Once you are happy with your configuration, execute the following command to allow ElasticSearch to map its memory:

sudo sysctl -w vm.max_map_count=262144
  • Finally, you can boot the docker-compose environment:

docker-compose up -d

This will run Aleph in detached mode. You can shut down the system at any time by issuing the following command:

docker-compose down

Before Aleph can process any requests or data, you need to make sure it sets up the database and search index correctly by executing an upgrade:

docker-compose run --rm shell aleph upgrade

While the system is running, it will bind to to port 8080 of its host machine and accept incoming connections. You can check that the system is functional with a curl request:

curl http://localhost:8080/api/2/statistics

If your servers firewall configuration allows it, you can now also open http://localhost:8080 in your browser and use the web interface to navigate the application.

Upgrading

Configuration

The main configuration file of Aleph is aleph.env, which is loaded by docker-compose and can modify many aspects of system behaviour. A template for the configuration with details regarding many of the options is available in the aleph.env.tmpl file.

Configuration options

  • You will need to provide a value for the ALEPH_SECRET_KEY. A good example of a value is the output of openssl rand -hex 24.

  • Aleph needs to know the URL under which the web interface is mounted. Make sure to set the correct public ALEPH_UI_URL.

OAuth Credentials

Using OAuth for login is optional. Skip this section (and leave the config commented out) if you don't want to use it.

Aleph supports a couple of OAuth providers out of the box: Google, Facebook and Microsoft Azure.

Using Google OAuth

To get the OAuth credentials please visit the Google Developers Console. There you will need to create an API key. In the Authorised redirect URIs section, use this URL:

http://localhost:8080/api/2/sessions/callback/google

Save the client ID and the client secret as ALEPH_OAUTH_* values.

Using Microsoft Azure

Create a new app over at https://apps.dev.microsoft.com, make a note of the KEY and secret you generate there. Callback URL should be as follows: https:///api/2/sessions/callback . Then add the following to aleph.env, remember to update KEY and SECRET with your values.

ALEPH_OAUTH=true
ALEPH_OAUTH_KEY=***************************
ALEPH_OAUTH_SECRET=***************************
ALEPH_OAUTH_NAME=Azure
ALEPH_OAUTH_BASE_URL=https://login.microsoftonline.com/organizations/oauth2/v2.0/
ALEPH_OAUTH_AUTHORIZE_URL=https://login.microsoftonline.com/organizations/oauth2/v2.0/authorize
ALEPH_OAUTH_TOKEN_URL=https://login.microsoftonline.com/common/oauth2/token
ALEPH_OAUTH_SCOPE=openid profile email user.read
ALEPH_OAUTH_TOKEN_METHOD=POST

Troubleshooting

Missing tables

If running aleph commands gives you warnings about missing tables, you probably need to migrate your database to the lastest schema. Try:

make upgrade

Debugging ElasticSearch

Most problems arise when the ElasticSearch container doesn't startup properly, or in time. If upgrade fails with errors like Connection refused this is what happened.

You can find out specifically what went wrong with ES by consulting the logs for that container:

docker-compose -f docker-compose.dev.yml logs elasticsearch

You will almost certainly need to run the following before you build:

sysctl -w vm.max_map_count=262144

Or to set this permanently, in /etc/sysctl.conf add:

vm.max_map_count=262144

Max file descriptors

If the error in your ES container contains:

elasticsearch_1 | [1]: max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]

Please see the relevent ElasticSearch documentation for this issue.

ElasticSearch read-only mode

When the host machine disk is over 90% full, ElasticSearch can decide to stop writes to the index as an emergency measure. To fix this, try the following:

  • Run docker system prune on the host machine

  • Inside a make shell, run this CURL command: curl -XPUT -H "Content-Type: application/json" http://elasticsearch:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'

Try turning it off and on again

If all else fails, you may just need to wait a little longer for the ES service to initialize before you run upgrade. Doing the following (after make build) should be sufficient:

  1. make shell

  2. Inside the aleph shell run aleph upgrade.

  3. If that succeeds, in a new terminal run make web to launch the UI and API.

Clearing parts of the redis cache

redis-cli --scan --pattern ocr:* | xargs redis-cli del
redis-cli --scan --pattern aleph:authz:* | xargs redis-cli del
redis-cli --scan --pattern aleph:metadata:* | xargs redis-cli del