Below you will find the installation steps on how to install Aleph.
Aleph requires multiple services to operate. It uses Docker containers to make it easier for development and deployments. Before we continue, you will need to have Docker and
docker-composeinstalled. Please refer to their manual to learn how to set up Docker and docker-compose.
This section describes how to set up Aleph for software development. Developer mode disables many security measures and it not meant to be used for internet-facing uses, Production deployment is needed instead.
Developer mode is a docker configuration for Aleph which makes it easy to do software development and debug the tool without having to install its dependencies on your host machine. These are the features of developer mode:
- The code for the backend (api) server and the React frontend will automatically reload to reflect any changes you make in your working copy while the application is running.
- Both backend and frontend will operate in debug mode and give more verbose error messages when a problem occurs.
- The host machine's file system will be accessible from within Aleph's docker container at
Developer mode overrides the configuration file for
docker-compose.yml. This is done by wrapping most developer mode commands using
As a first step, check out the source code of Aleph from GitHub:
# Use the SSH URL if you have commit access:
git clone https://github.com/alephdata/aleph.git
Once the code is downloaded, find the file called
aleph.env.tmplin the base directory. This is a template of the configuration file. Make a copy of this file named
aleph.envand define settings for your local instance. Check the configuration section for more information regarding the available options.
Also, please execute the following command to allow ElasticSearch to map its memory:
When running Docker on macOS, Docker uses a virtual machine as the container host. Therefore, you need to run the command inside of the Docker virtual machine. In order to log in to the virtual machine, execute
docker run -it --rm --privileged --pid=host justincormack/nsenter1.
sysctl -w vm.max_map_count=262144
With the settings in place, you can use
make allto set everything up and launch the web service. This is equivalent to the following steps:
make buildto build the docker images for the application and relevant services. You can run
make docker-pullbefore to pull pre-build release images.
make upgradeto run the latest database migrations and create/update the search index.
make webto run the web-based API server and the user interface.
- 4.In a separate shell, run
make workerto start a worker. If you do not start a worker, background jobs (for example ingesting new documents) won’t be processed.
http://localhost:8080/in your browser to visit the web frontend.
If any of the above steps fail, refer to the troubleshooting section for some common stumbling blocks and their fixes.
During development, you will need to run command-line operations for certain tasks. In order to do so, you will first need to enter the docker container of Aleph. To do so, run:
# This will result in a root shell inside the container:
This will enter a docker container where the
alephshell command is available (see Usage for details). You can also access the host computers file system at
/host. This means a file stored at
/tmp/bla.txton your computer can be found at
/host/tmp/bla.txtinside the container.
When you run Aleph in development mode, the default configuration will not run the worker component used to index documents and do other background work. You can start it either via
make workeror inside an Aleph shell using
For development purposes, you can quickly create a new user with the
aleph createusercommand, inside a shell:
If you pass an email address in the
ALEPH_ADMINSenvironment variable (in your configuration) it will automatically be made into an admin.
createuser, the newly created user's API key is printed, which you can use in the
AuthorizationHTTP header of requests to the API. If you pass a password, you can use this email address and password to log into the web interface.
You can also run Aleph in single-user mode by setting
true. When you run Aleph in single-user mode, authentication is disabled and every user is automatically logged in as an admin user.
If you want to quickly get some sample data in your Aleph instance you can use
crawldirto index a small test data folder.
aleph crawldir /aleph/contrib/testdata
Make sure that a worker is running, otherwise your data won’t be processed. Run
make workerto start a worker in your development environment. If you can’t see your sample data, make sure that you’re signed in, as your data won’t be public by default. See Users for instructions on how to create new user accounts.
To run the tests, assuming you already have the
docker-composeup and ready, run:
This will create a new database and run all the tests.
If you're looking to debug changes that you've made to Alephs python then there are a couple of options. By default, Aleph ships with the vscode python debugger (debugpy) enabled for the API and worker services when in dev mode. This makes it easy to create a launch.json file and attach a debugger to a running instance of the software.
The API is exposed via the standard 5678 port whereas the worker service is exposed via 5679.
Additionally you may wish to configure the debugger to wait for a client to attach. If this is the case you'll need to edit the docker-compose.dev.yml file adding the following as the command for the API:
command: python3 -m debugpy --listen 0.0.0.0:5678 --wait-for-client -m flask run -h 0.0.0.0 -p 5000 --with-threads --reload --debugger
Or in the case of the worker, from the Makefile:
$(COMPOSE) run -p 127.0.0.1:5679:5679 --rm app python3 -m debugpy --listen 0.0.0.0:5679 --wait-for-client /usr/local/bin/aleph worker
If you'd prefer to use the pdb debugger then the first step is to add the following to the api section of the docker-compose.dev.yml:
Once this is done, restart your docker containers and set a breakpoint() in your code. Now, running docker attach aleph_api_1 should provide you the ability to view that breakpoint and make use of pdb's other features.
You can also build the Aleph images locally. This could be useful while working on the Dockerfile changes and new dependency upgrades.
To build the image you can run
make build, which will build the
alephdata/alephimage (this will generate a production ready image).
This section details how to set up Aleph in production mode. If you plan to change the source code or quickly test the software, you may wish to use the Developer setup instead.
Aleph is distributed as a set of Docker containers, which can be run on any server that meets the following criteria:
- 8GB (or more) of RAM. While the software will start with much less, we advise providing ample main memory for ideal performance.
- A working install of Docker and
docker-compose. See the FAQ page for information on not using Docker.
- A domain name or IP address which can be used at the root via HTTPS (i.e. Aleph doesn't support running at a sub-path like
/aleph). You are welcome to contribute fixes for this scenario.
- An internet connection to download and install the package.
To begin a production deployment:
- Make a copy of the configurations file named
aleph.envand define settings for your production instance. Check the section on configuration for more information regarding the available options.
Aleph has support for multiple storage engines, including Amazon Web Services Simple Storage Service, Google storage buckets, and local file storage. Remember to configure this in your instance's configuration file.
ALEPH_TAGenvironment variable to specify the version of Aleph you want deploy. If
ALEPH_TAGis not set, the stable version specified in the docker compose file is deployed.
- Once you are happy with your configuration, execute the following command to allow ElasticSearch to map its memory:
sudo sysctl -w vm.max_map_count=262144
- Finally, you can boot the docker-compose environment:
docker-compose up -d
This will run Aleph in detached mode. You can shut down the system at any time by issuing the following command:
Before Aleph can process any requests or data, you need to make sure it sets up the database and search index correctly by executing an upgrade:
docker-compose run --rm shell aleph upgrade
While the system is running, it will bind to to port
8080of its host machine and accept incoming connections. You can check that the system is functional with a curl request:
If your servers firewall configuration allows it, you can now also open
http://localhost:8080in your browser and use the web interface to navigate the application.
Be careful with the ports exposed from your Docker system on public ports. Docker is known to override some firewall rules so make sure you double-check that you're only exposing the intended ports on your productions system.
Aleph stores persistent data in 3 different systems:
- 1.Blob storage: This is where Aleph stores the uploaded files as blobs.
- 2.SQL Database: Aleph has a couple of different use cases for a SQL database. * A database to store application data like users, sessions, collection metadata etc. This database is the one defined by
ALEPH_DATABASE_URIsetting. * A database to store FtM entities. This database is defined by
FTM_STORE_URI. These two databases can use the same SQL database instance or can use separate instance for each use case.
- 3.ElasticSearch: ElasticSearch powers Aleph's search and stores the contents of all processed documents and entities.
To have a functional Aleph instance, we need all 3 of these components to be operational without data loss and have a restoration plan if we any of these components experience data corruption, data loss or any other failure.
For blob storage, Aleph can use cloud storage services like Google Cloud Storage, AWS S3 which provide automatic backups and availability guarantees. In case you're using the local file system for storage, the docker volume can be mapped to a host directory, and the host directory can be backed up in usual ways. (for example, running
rsyncin a cron job to copy the directory to a backup server.)
For SQL database, Aleph uses PostgreSQL. PostgreSQL can be backed up and restored through
pg_restoreand similar utilities.
For example, here's how you can dump the database:
docker-compose exec postgres pg_dumpall -c -U postgres > dump_`date +%Y-%m-%d"_"%H:%M:%S`.sql
and then restore from the dump:
cat dump_2021-11-03_18:03:54.sql | docker-compose exec -T postgres psql -U aleph -d aleph
The commands will be slightly different depending on your specific set up. If you're running a separate database instance for FtM-Store, you should backup both the Aleph application database and the FtM-Store database. The data dump creation command should be put in a cron job to automate snapshot creation and those snapshots should be copied to a separate backup server.
The ElasticSearch index can be regenerated from the FtM-Store database. So, the backup of that database can serve as the backup of your ElasticsSearch index too.
Note: This works only for Aleph versions greater than 3.8.0.
Here's how you can recreate the ElasticSearch index from the FtM-Store database:
docker-compose run --rm shell aleph upgradeto recreate the indices in Elasticsearch
- 2.Once that finishes running, run
docker-compose run --rm shell aleph reindex-fullto write data into Elasticsearch indices from the FtM-Store PostgreSQL database.This will take some time to run depending on how much data you have.
The main configuration file of Aleph is
aleph.env, which is loaded by docker-compose and can modify many aspects of system behaviour. A template for the configuration with details regarding many of the options is available in the
- You will need to provide a value for the
ALEPH_SECRET_KEY. A good example of a value is the output of
openssl rand -hex 24.
- Aleph needs to know the URL under which the web interface is mounted. Make sure to set the correct public
Some instance-specific information, e.g. 'About' page, is configured with pages mechanism. You can find default examples in
You can enable an app-wide message banner displayed at the top of every page for anonymous and authenticated users. This can be useful if you want to inform Aleph about scheduled maintenance, degraded performance, or newly available datasets.
Use the message banner to inform users about scheduled maintenance.
There are two ways to configure the contents of the message banner.
The easiest way to enable the message banner is to set the
ALEPH_APP_BANNERenvironment variable. For example, the following configuration will display the message “Planned downtime this Sunday”:
ALEPH_APP_BANNER="Planned downtime this Sunday"
Alternatively, you can also configure a JSON endpoint. Aleph will then load messages from that endpoint. Configuring a JSON endpoint allows you to update messages without changing the configuration or restarting your Aleph instance.
As long as the JSON data conforms to the simple schema described below, you can use any method that fits your requirements to generate it. You could manually create a JSON file and upload it to a web server or generate it automatically based on a monitoring tool you use.
We have also created a custom GitHub Action that creates such JSON data based on GitHub Issues and deploys it for free to GitHub Pages. Head over to the repository on GitHub to learn how to set it up for yourself.
In order to enable loading messages via a JSON endpoint, set the value of the
The JSON response returned by the endpoint should conform to the following schema:
"title": "Scheduled maintenance on Jul 10, 2022, 8–10am UTC",
"safeHtmlBody": "We will upgrade Aleph’s server infrastructure. Aleph will be unavailable during this time frame."
levelcan be one of
success. The message banner is color coded based on the level.
- If you set
displayUntilto a valid date, the message banner will only be displayed until that date.
- You can include multiple messages in the list. However, only the first message with an empty or a future
displayUntildate will be displayed.
Using OAuth for login is optional. Skip this section (and leave the config commented out) if you don't want to use it.
Aleph supports a couple of OAuth providers out of the box: Google, Facebook and Microsoft Azure.
To get the OAuth credentials please visit the Google Developers Console. There you will need to create an API key. In the Authorised redirect URIs section, use this URL:
Save the client ID and the client secret as
Create a new app over at https://apps.dev.microsoft.com, make a note of the KEY and secret you generate there. Callback URL should be as follows: https:///api/2/sessions/callback . Then add the following to aleph.env, remember to update KEY and SECRET with your values.
ALEPH_OAUTH_SCOPE=openid profile email user.read