Look Ma, No Docker
This guide de-dockerifies Aleph. I use Ubuntu 19.10 as the distribution since this is the distribution of choice of the Aleph docker files. This howto assumes on Aleph 3.4.4.
We don't recommend using Aleph without Docker. This guide was contributed by a community member, and it is documented here for completeness.
I start with a vanilla Ubuntu server installation and run all commands at the user account that I created during installation. Later on in the installation, I switch to a newly created aleph user account.

Basic setup

1
sudo apt update && sudo apt upgrade
2
sudo apt install \
3
apt-transport-https \
4
build-essential \
5
ca-certificates \
6
curl \
7
gnupg \
8
locales \
9
wget
Copied!
1
sudo sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gen
2
sudo dpkg-reconfigure locales
3
sudo update-locale LANG=en_US.UTF-8
Copied!

Postgresql

The Aleph Dockerfile depend on Postgresql 10. While I believe that Aleph would work with newer versions of Postgresql as well (stock Ubuntu 19.10 comes with Postgresql 11) I choose to stick to the same version of Postgresql.
1
curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
2
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
3
sudo apt update
4
sudo apt install postgresql-10
5
sudo systemctl enable postgresql
Copied!
Create the database for Aleph. Replace the password with something following better password practices.
1
sudo -u postgres psql
2
create database aleph;
3
create user aleph with encrypted password 'aleph';
4
grant all privileges on database aleph to aleph;
Copied!

Elasticsearch

To install Elasticsearch I follow the instructions on elastic.co. Aleph depends on the latest Elasticsearch docker image which at the time of this writing is 7.5.1.
1
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
2
echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.list
3
sudo apt update
4
sudo apt install elasticsearch
Copied!
We install additional plugins for Elasticsearch.
1
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch discovery-gce
2
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch repository-s3
3
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch repository-gcs
4
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch analysis-icu
Copied!
Edit /etc/elasticsearch/elasticsearch.yml and make the server listen only on localhost. As a default, Elasticsearch listens on all IP addresses.
1
network.host: 127.0.0.1
Copied!
Edit /etc/elasticsearch/jvm.options and set a proper heap size.
1
-Xms1g
2
-Xmx1g
Copied!
1
sudo sysctl -w vm.max_map_count=262144
2
sudo systemctl enable elasticsearch
3
sudo systemctl start elasticsearch
Copied!

Redis

Redis is straightforward to install. It requires no additional configuration.
1
sudo apt install redis
2
sudo systemctl enable redis-server
3
sudo systemctl start redis-server
Copied!

Aleph

Aleph itself consists of several services that run independently.
    convert-document
    ingest-file
    worker
    shell
    api
    ui
There is a dedicated repository for convert-document. The other services are part of the standard Aleph repository.
These are all the packages that the OS has to provide for the different parts of Aleph.
1
curl -sL https://deb.nodesource.com/setup_13.x | sudo -E bash -
2
sudo apt update
3
sudo apt install \
4
cython3 \
5
djvulibre-bin \
6
fonts-dejavu \
7
fonts-dejavu-core \
8
fonts-dejavu-extra \
9
fonts-droid-fallback \
10
fonts-dustin \
11
fonts-f500 \
12
fonts-fanwood \
13
fonts-freefont-ttf \
14
fonts-liberation \
15
fonts-lmodern \
16
fonts-lyx \
17
fonts-opensymbol \
18
fonts-sil-gentium \
19
fonts-texgyre \
20
fonts-tlwg-purisa \
21
ghostscript \
22
hyphen-de \
23
hyphen-en-us \
24
hyphen-fr \
25
hyphen-it \
26
hyphen-ru \
27
imagemagick \
28
imagemagick-common \
29
libfreetype6-dev \
30
libicu-dev \
31
libjpeg-dev \
32
libldap2-dev \
33
libleptonica-dev \
34
libmagic-dev \
35
libmediainfo-dev \
36
libpq-dev \
37
libreoffice \
38
libreoffice-common \
39
libreoffice-impress \
40
libreoffice-writer \
41
librsvg2-bin \
42
libsasl2-dev \
43
libtesseract-dev \
44
libtiff-tools \
45
libtiff5-dev \
46
libwebp-dev \
47
libxml2-dev \
48
libxslt1-dev \
49
mdbtools \
50
nginx \
51
nodejs \
52
p7zip-full \
53
pkg-config\
54
poppler-data \
55
poppler-utils \
56
pst-utils \
57
python3-crypto \
58
python3-dev \
59
python3-icu \
60
python3-lxml \
61
python3-pil \
62
python3-pip \
63
python3-psycopg2 \
64
python3-uno \
65
tesseract-ocr \
66
tesseract-ocr-afr \
67
tesseract-ocr-ara \
68
tesseract-ocr-aze \
69
tesseract-ocr-bel \
70
tesseract-ocr-bul \
71
tesseract-ocr-cat \
72
tesseract-ocr-ces \
73
tesseract-ocr-dan \
74
tesseract-ocr-deu \
75
tesseract-ocr-ell \
76
tesseract-ocr-eng \
77
tesseract-ocr-est \
78
tesseract-ocr-fin \
79
tesseract-ocr-fra \
80
tesseract-ocr-frk \
81
tesseract-ocr-heb \
82
tesseract-ocr-hin \
83
tesseract-ocr-hrv \
84
tesseract-ocr-hun \
85
tesseract-ocr-ind \
86
tesseract-ocr-isl \
87
tesseract-ocr-ita \
88
tesseract-ocr-kan \
89
tesseract-ocr-lav \
90
tesseract-ocr-lit \
91
tesseract-ocr-mkd \
92
tesseract-ocr-mlt \
93
tesseract-ocr-msa \
94
tesseract-ocr-nld \
95
tesseract-ocr-nor \
96
tesseract-ocr-pol \
97
tesseract-ocr-por \
98
tesseract-ocr-ron \
99
tesseract-ocr-rus \
100
tesseract-ocr-slk \
101
tesseract-ocr-slv \
102
tesseract-ocr-spa \
103
tesseract-ocr-sqi \
104
tesseract-ocr-srp \
105
tesseract-ocr-swa \
106
tesseract-ocr-swe \
107
tesseract-ocr-tur \
108
tesseract-ocr-ukr \
109
unoconv \
110
unrar \
111
zlib1g-dev \
112
virtualenv
Copied!
The convert-document service requires python to resolve correctly to python3.
1
sudo ln -s /usr/bin/python3 /usr/local/bin/python
Copied!
convert-document further relies on a newer version of unoconv than ships with Ubuntu 19.10. As your regular user
1
curl -O https://raw.githubusercontent.com/unoconv/unoconv/0.8.2/unoconv
2
sudo mv unoconv /usr/local/bin/unoconv
3
sudo chmod +x /usr/local/bin/unoconv
Copied!
We run Aleph as it's own dedicated user.
1
sudo groupadd aleph
2
sudo useradd -g aleph aleph
Copied!
We split all services into three different virtualenvs, one for convert-document, one for ingest-file and another one that runs aleph-api and aleph-worker. The UI is just static files, so there is no need to place them into a dedicated virtualenv.
The following steps are all done as the aleph user.
1
sudo -i -u aleph
2
mkdir {data,venvs}
3
git clone https://github.com/alephdata/aleph.git
4
git clone https://github.com/alephdata/convert-document.git
5
cd
6
virtualenv --python=python3 --system-site-packages ~/venvs/convert-document
7
virtualenv --python=python3 --system-site-packages ~/venvs/ingest-file
8
virtualenv --python=python3 --system-site-packages ~/venvs/aleph
9
exit
Copied!
Once we have the Aleph source code available we can place the synonames.txt for Elasticsearch to find. We execute this command again as our regular user.
1
sudo cp /home/aleph/aleph/services/elasticsearch/synonames.txt /etc/elasticsearch
Copied!

convert-document

Run the following commands as the aleph user.
1
source venvs/convert-document/bin/activate
2
cd ~/convert-document
3
pip install -r requirements.txt
4
pip install .
Copied!
Test that everything is working manually if you like:
1
gunicorn --threads 3 --bind 127.0.0.1:3000 --max-requests 5000 --access-logfile - --error-logfile - --timeout 600 --graceful-timeout 500 convert.app:app
2
curl -o out.pdf -F format=pdf -F '[email protected]' http://localhost:3000/convert
Copied!
Exit the virtualenv.
1
deactivate
Copied!

ingest-file

Run the following commands as the aleph user.run the following commands:
1
source venvs/ingest-file/bin/activate
2
cd ~/aleph/services/ingest-file
3
pip install -r requirements.txt
4
pip install .
Copied!
Exit the virtualenv.
1
deactivate
Copied!

aleph-worker and aleph-api

The following commands are all run as the aleph user.
Generate a new secret key. This command outputs a long string of characters which we use as our secret key in the Aleph configuration.
1
openssl rand -hex 64
Copied!
1
cat <<EOF > ~/aleph/aleph.env
2
ALEPH_SECRET_KEY=2503698e0d74f47bd87b41ce0978c3db7567d90544554189b72a06b3ef62b2c887768bba79cf812987397f55145bd903fe6c581302fdefe2c8584ce4a3ba8005
3
ALEPH_APP_TITLE=Aleph
4
ALEPH_APP_NAME=aleph
5
ALEPH_UI_URL=https://source.bird.tools
6
ALEPH_URL_SCHEME=https
7
ALEPH_SAMPLE_SEARCHES=Vladimir Putin:TeliaSonera
9
ALEPH_PASSWORD_LOGIN=true
10
ALEPH_OAUTH=false
11
ALEPH_OCR_DEFAULTS=eng
12
ALEPH_DEBUG=false
13
ALEPH_NER_MODELS=eng:deu:fra:spa:por
14
ALEPH_ELASTICSEARCH_URI=http://localhost:9200/
15
ALEPH_DATABASE_URI=postgresql://aleph:[email protected]/aleph
16
ALEPH_GEONAMES_DATA=/home/aleph/aleph/contrib/geonames.txt
17
FTM_STORE_URI=postgresql://aleph:[email protected]/aleph
18
REDIS_URL=redis://localhost:6379/0
19
ARCHIVE_TYPE=file
20
ARCHIVE_PATH=~/data
21
UNOSERVICE_URL=http://localhost:3000/convert
22
ALEPH_LID_MODEL_PATH=/home/aleph/aleph/contrib/lid.176.ftz
23
EOF
Copied!
We add some of the Aleph configuration variables into the user's environment as well. This becomes handy later on when interacting with Aleph interactively on the shell. We source the new environment profile right away as well.
1
cat <<EOF >> ~/.profile
2
export ALEPH_ELASTICSEARCH_URI=http://localhost:9200/
3
export ALEPH_DATABASE_URI=postgresql://aleph:[email protected]/aleph
4
export BALKHASH_BACKEND=postgresql
5
export BALKHASH_DATABASE_URI=postgresql://aleph:[email protected]/aleph
6
export REDIS_URL=redis://localhost:6379/0
7
export ARCHIVE_TYPE=file
8
export ARCHIVE_PATH=/home/aleph/data
9
export UNOSERVICE_URL=http://localhost:3000/convert
10
export ALEPHCLIENT_HOST=https://source.bird.tools
11
EOF
12
source ~/.profile
Copied!
With the configuration in place, we can continue with the installation of Aleph itself.
1
source venvs/aleph/bin/activate
2
cd ~/aleph
3
pip install -r requirements-generic.txt
4
pip install -r requirements-toolkit.txt
5
pip install . alephclient spacy==2.2.3
6
python3 -m spacy download xx_ent_wiki_sm && python3 -m spacy link xx_ent_wiki_sm xx
7
python3 -m spacy download en_core_web_sm && python3 -m spacy link en_core_web_sm eng
8
python3 -m spacy download de_core_news_sm && python3 -m spacy link de_core_news_sm deu
9
python3 -m spacy download fr_core_news_sm && python3 -m spacy link fr_core_news_sm fra
10
python3 -m spacy download es_core_news_sm && python3 -m spacy link es_core_news_sm spa
11
python3 -m spacy download pt_core_news_sm && python3 -m spacy link pt_core_news_sm por
Copied!
Initialize the databases for Aleph. Replace the username and password of the database as needed. While connected to the aleph virtualenv run:
1
aleph db init
2
aleph db migrate
3
aleph upgrade
Copied!
Exit the virtualenv.
1
deactivate
Copied!

aleph-ui

Run the following commands as the aleph user.
1
cd ~/aleph/ui
2
npm install
Copied!
Create a .env file and set the REACT_APP_API_ENDPOINT variable.
1
REACT_APP_API_ENDPOINT=https://source.bird.tools/api/2
Copied!
1
npm run build
Copied!

Final touches

The following commands are all executed as the regular user.
To start all Aleph automatically at boot, we need to place Systemd service files for the various services of Aleph and configure the webserver.
Create a systemd service unit for the convert-document service in /etc/systemd/system/convert-document.service:
1
[Unit]
2
Description=convert-document daemon
3
After=network.target
4
5
[Service]
6
Type=notify
7
User=aleph
8
Group=aleph
9
RuntimeDirectory=convert-document
10
Environment="PATH=/home/aleph/venvs/convert-document/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"
11
WorkingDirectory=/home/aleph/convert-document
12
ExecStart=/home/aleph/venvs/convert-document/bin/gunicorn --threads 3 --bind 127.0.0.1:3000 --max-requests 30 --access-logfile - --error-logfile - --timeout 300 --graceful-timeout 300 convert.app:app
13
ExecReload=/bin/kill -s HUP $MAINPID
14
KillMode=mixed
15
TimeoutStopSec=5
16
PrivateTmp=true
17
18
[Install]
19
WantedBy=multi-user.target
Copied!
Create a systemd service unit for the ingest-fileservice in /etc/systemd/system/ingest-file.service:
1
[Unit]
2
Description=ingest-file daemon
3
Wants=redis-server.service postgresql.service convert-document.service
4
After=network.target redis-server.service postgresql.service convert-document.service
5
6
[Service]
7
Type=simple
8
User=aleph
9
Group=aleph
10
RuntimeDirectory=ingest-file
11
Environment="PATH=/home/aleph/venvs/ingest-file/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"
12
EnvironmentFile=/home/aleph/aleph/aleph.env
13
WorkingDirectory=/home/aleph/aleph/services/ingest-file
14
ExecStart=/home/aleph/venvs/ingest-file/bin/ingestors process
15
16
[Install]
17
WantedBy=multi-user.target
Copied!
Create a systemd service unit for the aleph-worker service /etc/systemd/system/aleph-worker.service:
1
[Unit]
2
Description=aleph-worker daemon
3
Wants=redis-server.service postgresql.service elasticsearch.service ingest-file.service
4
After=network.target redis-server.service postgresql.service elasticsearch.service ingest-file.service
5
6
[Service]
7
Type=simple
8
User=aleph
9
Group=aleph
10
RuntimeDirectory=aleph-worker
11
Environment="PATH=/home/aleph/venvs/aleph/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"
12
EnvironmentFile=/home/aleph/aleph/aleph.env
13
WorkingDirectory=/home/aleph/aleph
14
ExecStart=/home/aleph/venvs/aleph/bin/aleph worker
15
16
[Install]
17
WantedBy=multi-user.target
Copied!
Create a systemd service unit for aleph-api service /etc/systemd/system/aleph-api.service:
1
[Unit]
2
Description=aleph-api daemon
3
Wants=redis-server.service postgresql.service elasticsearch.service convert-document.service ingest-file.service aleph-worker.service
4
After=network.target redis-server.service postgresql.service elasticsearch.service convert-document.service ingest-file.service aleph-worker.service
5
6
[Service]
7
Type=notify
8
User=aleph
9
Group=aleph
10
RuntimeDirectory=aleph-api
11
Environment="PATH=/home/aleph/venvs/aleph/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"
12
EnvironmentFile=/home/aleph/aleph/aleph.env
13
WorkingDirectory=/home/aleph/aleph
14
ExecStart=/home/aleph/venvs/aleph/bin/gunicorn -w 6 -b 0.0.0.0:8000 --log-level debug --log-file - aleph.wsgi:app
15
ExecReload=/bin/kill -s HUP $MAINPID
16
KillMode=mixed
17
TimeoutStopSec=5
18
PrivateTmp=true
19
20
[Install]
21
WantedBy=multi-user.target
Copied!
1
sudo systemctl daemon-reload
2
sudo systemctl enable convert-document.service ingest-file.service aleph-worker.service aleph-api.service
3
sudo systemctl start convert-document.service ingest-file.service aleph-worker.service aleph-api.service
Copied!
As the last step, we configure the webserver to proxy requests to Aleph. We use Let's Encrypt to obtain valid SSL certificates. You can find many online resources describing this process, e.g. here.
Create a configuration file for the Nginx server in /etc/nginx/sites-available/source.bird.tools.
1
upstream aleph-api {
2
server localhost:8000;
3
}
4
5
server {
6
if ($host = source.bird.tools) {
7
return 301 https://$host$request_uri;
8
} # managed by Certbot
9
10
listen 80 default_server;
11
listen [::]:80 default_server;
12
13
server_name source.bird.tools;
14
return 404; # managed by Certbot
15
}
16
17
server {
18
server_name source.bird.tools;
19
20
ignore_invalid_headers off;
21
add_header Referrer-Policy "same-origin";
22
add_header X-Clacks-Overhead "GNU Terry Pratchett";
23
add_header X-Content-Type-Options "nosniff";
24
add_header X-Frame-Options "SAMEORIGIN";
25
add_header X-XSS-Protection "1; mode=block";
26
add_header Feature-Policy "accelerometer 'none'; camera 'none'; geolocation 'none'; gyroscope 'none'; magnetometer 'none'; microphone 'none'; payment 'none'; usb 'none'";
27
28
client_max_body_size 1024M;
29
30
listen [::]:443 ssl ipv6only=on; # managed by Certbot
31
listen 443 ssl; # managed by Certbot
32
ssl_certificate /etc/letsencrypt/live/source.bird.tools/fullchain.pem; # managed by Certbot
33
ssl_certificate_key /etc/letsencrypt/live/source.bird.tools/privkey.pem; # managed by Certbot
34
include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
35
ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
36
37
location / {
38
root /home/aleph/aleph/ui/build;
39
try_files $uri $uri/ /index.html;
40
41
gzip_static on;
42
gzip_types text/plain text/xml text/css
43
text/javascript application/x-javascript;
44
}
45
46
location /api {
47
proxy_pass http://aleph-api;
48
proxy_redirect off;
49
proxy_set_header Host $http_host;
50
proxy_set_header X-Real-IP $remote_addr;
51
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
52
}
53
}
Copied!
We enable the configuration and restart the server.
1
cd /etc/nginx/sites-enabled
2
ln -sf /etc/nginx/sites-availble/source.bird.tools
3
nginx -t
4
systemctl restart nginx
Copied!
Aleph should be up and running. You want to create an initial user account. See the following section on how to do that,

Aleph Configuration

Whenever you want to interact with Aleph you have to activate it for your running shell. I use the following commands when interacting with Aleph.
1
sudo -i -u aleph
2
source venvs/aleph/bin/activate
3
export ALEPHCLIENT_API_KEY=<your api key>
Copied!

Create Aleph users

1
aleph createuser -n crito -p SuperSecretPassword -a [email protected]
Copied!
The command outputs an API key that you should save somewhere. You will need it in the future when interacting with Aleph.

Upgrade Aleph

Upgrading Aleph requires to pull the latest changes and reinstall any dependencies required. To upgrade from Aleph 3.4.4 to e.g. 3.4.5 run the following commands:
1
cd aleph
2
git fetch --all
3
git checkout 3.4.5
4
pip install -r requirements-generic.txt
5
pip install -r requirements-toolkit.txt
6
pip install .
Copied!
Unfortunately, some dependencies are missing from the requirement files. Particularly spacy is installed separately. Check the Dockerfile in the aleph directory to see if any additional steps are required.

Import data

1
alephclient crawldir -f test out.pdf
Copied!
Last modified 4d ago