We don't recommend using Aleph without Docker. This guide was contributed by a community member, and it is documented here for completeness.
I start with a vanilla Ubuntu server installation and run all commands at the user account that I created during installation. Later on in the installation, I switch to a newly created aleph
user account.
sudo apt update && sudo apt upgradesudo apt install \apt-transport-https \build-essential \ca-certificates \curl \gnupg \locales \wget
sudo sed -i -e 's/# en_US.UTF-8 UTF-8/en_US.UTF-8 UTF-8/' /etc/locale.gensudo dpkg-reconfigure localessudo update-locale LANG=en_US.UTF-8
The Aleph Dockerfile
depend on Postgresql 10. While I believe that Aleph would work with newer versions of Postgresql as well (stock Ubuntu 19.10 comes with Postgresql 11) I choose to stick to the same version of Postgresql.
curl https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'sudo apt updatesudo apt install postgresql-10sudo systemctl enable postgresql
Create the database for Aleph. Replace the password with something following better password practices.
sudo -u postgres psqlcreate database aleph;create user aleph with encrypted password 'aleph';grant all privileges on database aleph to aleph;
To install Elasticsearch I follow the instructions on elastic.co. Aleph depends on the latest Elasticsearch docker image which at the time of this writing is 7.5.1.
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -echo "deb https://artifacts.elastic.co/packages/7.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-7.x.listsudo apt updatesudo apt install elasticsearch
We install additional plugins for Elasticsearch.
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch discovery-gcesudo /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch repository-s3sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch repository-gcssudo /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch analysis-icu
Edit /etc/elasticsearch/elasticsearch.yml
and make the server listen only on localhost. As a default, Elasticsearch listens on all IP addresses.
network.host: 127.0.0.1
Edit /etc/elasticsearch/jvm.options
and set a proper heap size.
-Xms1g-Xmx1g
sudo sysctl -w vm.max_map_count=262144sudo systemctl enable elasticsearchsudo systemctl start elasticsearch
Redis is straightforward to install. It requires no additional configuration.
sudo apt install redissudo systemctl enable redis-serversudo systemctl start redis-server
Aleph itself consists of several services that run independently.
convert-document
ingest-file
worker
shell
api
ui
There is a dedicated repository for convert-document
. The other services are part of the standard Aleph repository.
These are all the packages that the OS has to provide for the different parts of Aleph.
curl -sL https://deb.nodesource.com/setup_13.x | sudo -E bash -sudo apt updatesudo apt install \cython3 \djvulibre-bin \fonts-dejavu \fonts-dejavu-core \fonts-dejavu-extra \fonts-droid-fallback \fonts-dustin \fonts-f500 \fonts-fanwood \fonts-freefont-ttf \fonts-liberation \fonts-lmodern \fonts-lyx \fonts-opensymbol \fonts-sil-gentium \fonts-texgyre \fonts-tlwg-purisa \ghostscript \hyphen-de \hyphen-en-us \hyphen-fr \hyphen-it \hyphen-ru \imagemagick \imagemagick-common \libfreetype6-dev \libicu-dev \libjpeg-dev \libldap2-dev \libleptonica-dev \libmagic-dev \libmediainfo-dev \libpq-dev \libreoffice \libreoffice-common \libreoffice-impress \libreoffice-writer \librsvg2-bin \libsasl2-dev \libtesseract-dev \libtiff-tools \libtiff5-dev \libwebp-dev \libxml2-dev \libxslt1-dev \mdbtools \nginx \nodejs \p7zip-full \pkg-config\poppler-data \poppler-utils \pst-utils \python3-crypto \python3-dev \python3-icu \python3-lxml \python3-pil \python3-pip \python3-psycopg2 \python3-uno \tesseract-ocr \tesseract-ocr-afr \tesseract-ocr-ara \tesseract-ocr-aze \tesseract-ocr-bel \tesseract-ocr-bul \tesseract-ocr-cat \tesseract-ocr-ces \tesseract-ocr-dan \tesseract-ocr-deu \tesseract-ocr-ell \tesseract-ocr-eng \tesseract-ocr-est \tesseract-ocr-fin \tesseract-ocr-fra \tesseract-ocr-frk \tesseract-ocr-heb \tesseract-ocr-hin \tesseract-ocr-hrv \tesseract-ocr-hun \tesseract-ocr-ind \tesseract-ocr-isl \tesseract-ocr-ita \tesseract-ocr-kan \tesseract-ocr-lav \tesseract-ocr-lit \tesseract-ocr-mkd \tesseract-ocr-mlt \tesseract-ocr-msa \tesseract-ocr-nld \tesseract-ocr-nor \tesseract-ocr-pol \tesseract-ocr-por \tesseract-ocr-ron \tesseract-ocr-rus \tesseract-ocr-slk \tesseract-ocr-slv \tesseract-ocr-spa \tesseract-ocr-sqi \tesseract-ocr-srp \tesseract-ocr-swa \tesseract-ocr-swe \tesseract-ocr-tur \tesseract-ocr-ukr \unoconv \unrar \zlib1g-dev \virtualenv
The convert-document
service requires python to resolve correctly to python3.
sudo ln -s /usr/bin/python3 /usr/local/bin/python
convert-document
further relies on a newer version of unoconv
than ships with Ubuntu 19.10. As your regular user
curl -O https://raw.githubusercontent.com/unoconv/unoconv/0.8.2/unoconvsudo mv unoconv /usr/local/bin/unoconvsudo chmod +x /usr/local/bin/unoconv
We run Aleph as it's own dedicated user.
sudo groupadd alephsudo useradd -g aleph aleph
We split all services into three different virtualenvs, one for convert-document
, one for ingest-file
and another one that runs aleph-api
and aleph-worker
. The UI is just static files, so there is no need to place them into a dedicated virtualenv.
The following steps are all done as the aleph
user.
sudo -i -u alephmkdir {data,venvs}git clone https://github.com/alephdata/aleph.gitgit clone https://github.com/alephdata/convert-document.gitcdvirtualenv --python=python3 --system-site-packages ~/venvs/convert-documentvirtualenv --python=python3 --system-site-packages ~/venvs/ingest-filevirtualenv --python=python3 --system-site-packages ~/venvs/alephexit
Once we have the Aleph source code available we can place the synonames.txt
for Elasticsearch to find. We execute this command again as our regular user.
sudo cp /home/aleph/aleph/services/elasticsearch/synonames.txt /etc/elasticsearch
Run the following commands as the aleph
user.
source venvs/convert-document/bin/activatecd ~/convert-documentpip install -r requirements.txtpip install .
Test that everything is working manually if you like:
gunicorn --threads 3 --bind 127.0.0.1:3000 --max-requests 5000 --access-logfile - --error-logfile - --timeout 600 --graceful-timeout 500 convert.app:appcurl -o out.pdf -F format=pdf -F 'file=@my-doc.odt' http://localhost:3000/convert
Exit the virtualenv.
deactivate
Run the following commands as the aleph
user.run the following commands:
source venvs/ingest-file/bin/activatecd ~/aleph/services/ingest-filepip install -r requirements.txtpip install .
Exit the virtualenv.
deactivate
The following commands are all run as the aleph
user.
Generate a new secret key. This command outputs a long string of characters which we use as our secret key in the Aleph configuration.
openssl rand -hex 64
cat <<EOF > ~/aleph/aleph.envALEPH_SECRET_KEY=2503698e0d74f47bd87b41ce0978c3db7567d90544554189b72a06b3ef62b2c887768bba79cf812987397f55145bd903fe6c581302fdefe2c8584ce4a3ba8005ALEPH_APP_TITLE=AlephALEPH_APP_NAME=alephALEPH_UI_URL=https://source.bird.toolsALEPH_URL_SCHEME=httpsALEPH_SAMPLE_SEARCHES=Vladimir Putin:TeliaSoneraALEPH_ADMINS=christo@cryptodrunks.netALEPH_PASSWORD_LOGIN=trueALEPH_OAUTH=falseALEPH_OCR_DEFAULTS=engALEPH_DEBUG=falseALEPH_NER_MODELS=eng:deu:fra:spa:porALEPH_ELASTICSEARCH_URI=http://localhost:9200/ALEPH_DATABASE_URI=postgresql://aleph:aleph@localhost/alephALEPH_GEONAMES_DATA=/home/aleph/aleph/contrib/geonames.txtFTM_STORE_URI=postgresql://aleph:aleph@localhost/alephREDIS_URL=redis://localhost:6379/0ARCHIVE_TYPE=fileARCHIVE_PATH=~/dataUNOSERVICE_URL=http://localhost:3000/convertALEPH_LID_MODEL_PATH=/home/aleph/aleph/contrib/lid.176.ftzEOF
We add some of the Aleph configuration variables into the user's environment as well. This becomes handy later on when interacting with Aleph interactively on the shell. We source the new environment profile right away as well.
cat <<EOF >> ~/.profileexport ALEPH_ELASTICSEARCH_URI=http://localhost:9200/export ALEPH_DATABASE_URI=postgresql://aleph:aleph@localhost/alephexport BALKHASH_BACKEND=postgresqlexport BALKHASH_DATABASE_URI=postgresql://aleph:aleph@localhost/alephexport REDIS_URL=redis://localhost:6379/0export ARCHIVE_TYPE=fileexport ARCHIVE_PATH=/home/aleph/dataexport UNOSERVICE_URL=http://localhost:3000/convertexport ALEPHCLIENT_HOST=https://source.bird.toolsEOFsource ~/.profile
With the configuration in place, we can continue with the installation of Aleph itself.
source venvs/aleph/bin/activatecd ~/alephpip install -r requirements-generic.txtpip install -r requirements-toolkit.txtpip install . alephclient spacy==2.2.3python3 -m spacy download xx_ent_wiki_sm && python3 -m spacy link xx_ent_wiki_sm xxpython3 -m spacy download en_core_web_sm && python3 -m spacy link en_core_web_sm engpython3 -m spacy download de_core_news_sm && python3 -m spacy link de_core_news_sm deupython3 -m spacy download fr_core_news_sm && python3 -m spacy link fr_core_news_sm frapython3 -m spacy download es_core_news_sm && python3 -m spacy link es_core_news_sm spapython3 -m spacy download pt_core_news_sm && python3 -m spacy link pt_core_news_sm por
Initialize the databases for Aleph. Replace the username and password of the database as needed. While connected to the aleph
virtualenv run:
aleph db initaleph db migratealeph upgrade
Exit the virtualenv.
deactivate
Run the following commands as the aleph
user.
cd ~/aleph/uinpm install
Create a .env
file and set the REACT_APP_API_ENDPOINT
variable.
REACT_APP_API_ENDPOINT=https://source.bird.tools/api/2
npm run build
The following commands are all executed as the regular user.
To start all Aleph automatically at boot, we need to place Systemd service files for the various services of Aleph and configure the webserver.
Create a systemd
service unit for the convert-document
service in /etc/systemd/system/convert-document.service
:
[Unit]Description=convert-document daemonAfter=network.target​[Service]Type=notifyUser=alephGroup=alephRuntimeDirectory=convert-documentEnvironment="PATH=/home/aleph/venvs/convert-document/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"WorkingDirectory=/home/aleph/convert-documentExecStart=/home/aleph/venvs/convert-document/bin/gunicorn --threads 3 --bind 127.0.0.1:3000 --max-requests 30 --access-logfile - --error-logfile - --timeout 300 --graceful-timeout 300 convert.app:appExecReload=/bin/kill -s HUP $MAINPIDKillMode=mixedTimeoutStopSec=5PrivateTmp=true​[Install]WantedBy=multi-user.target
Create a systemd
service unit for the ingest-file
service in /etc/systemd/system/ingest-file.service
:
[Unit]Description=ingest-file daemonWants=redis-server.service postgresql.service convert-document.serviceAfter=network.target redis-server.service postgresql.service convert-document.service​[Service]Type=simpleUser=alephGroup=alephRuntimeDirectory=ingest-fileEnvironment="PATH=/home/aleph/venvs/ingest-file/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"EnvironmentFile=/home/aleph/aleph/aleph.envWorkingDirectory=/home/aleph/aleph/services/ingest-fileExecStart=/home/aleph/venvs/ingest-file/bin/ingestors process​[Install]WantedBy=multi-user.target
Create a systemd
service unit for the aleph-worker
service /etc/systemd/system/aleph-worker.service
:
[Unit]Description=aleph-worker daemonWants=redis-server.service postgresql.service elasticsearch.service ingest-file.serviceAfter=network.target redis-server.service postgresql.service elasticsearch.service ingest-file.service​[Service]Type=simpleUser=alephGroup=alephRuntimeDirectory=aleph-workerEnvironment="PATH=/home/aleph/venvs/aleph/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"EnvironmentFile=/home/aleph/aleph/aleph.envWorkingDirectory=/home/aleph/alephExecStart=/home/aleph/venvs/aleph/bin/aleph worker​[Install]WantedBy=multi-user.target
Create a systemd
service unit for aleph-api
service /etc/systemd/system/aleph-api.service
:
[Unit]Description=aleph-api daemonWants=redis-server.service postgresql.service elasticsearch.service convert-document.service ingest-file.service aleph-worker.serviceAfter=network.target redis-server.service postgresql.service elasticsearch.service convert-document.service ingest-file.service aleph-worker.service​[Service]Type=notifyUser=alephGroup=alephRuntimeDirectory=aleph-apiEnvironment="PATH=/home/aleph/venvs/aleph/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"EnvironmentFile=/home/aleph/aleph/aleph.envWorkingDirectory=/home/aleph/alephExecStart=/home/aleph/venvs/aleph/bin/gunicorn -w 6 -b 0.0.0.0:8000 --log-level debug --log-file - aleph.wsgi:appExecReload=/bin/kill -s HUP $MAINPIDKillMode=mixedTimeoutStopSec=5PrivateTmp=true​[Install]WantedBy=multi-user.target
sudo systemctl daemon-reloadsudo systemctl enable convert-document.service ingest-file.service aleph-worker.service aleph-api.servicesudo systemctl start convert-document.service ingest-file.service aleph-worker.service aleph-api.service
As the last step, we configure the webserver to proxy requests to Aleph. We use Let's Encrypt to obtain valid SSL certificates. You can find many online resources describing this process, e.g. here.
Create a configuration file for the Nginx server in /etc/nginx/sites-available/source.bird.tools
.
upstream aleph-api {server localhost:8000;}​server {if ($host = source.bird.tools) {return 301 https://$host$request_uri;} # managed by Certbot​listen 80 default_server;listen [::]:80 default_server;​server_name source.bird.tools;return 404; # managed by Certbot}​server {server_name source.bird.tools;​ignore_invalid_headers off;add_header Referrer-Policy "same-origin";add_header X-Clacks-Overhead "GNU Terry Pratchett";add_header X-Content-Type-Options "nosniff";add_header X-Frame-Options "SAMEORIGIN";add_header X-XSS-Protection "1; mode=block";add_header Feature-Policy "accelerometer 'none'; camera 'none'; geolocation 'none'; gyroscope 'none'; magnetometer 'none'; microphone 'none'; payment 'none'; usb 'none'";​client_max_body_size 1024M;​listen [::]:443 ssl ipv6only=on; # managed by Certbotlisten 443 ssl; # managed by Certbotssl_certificate /etc/letsencrypt/live/source.bird.tools/fullchain.pem; # managed by Certbotssl_certificate_key /etc/letsencrypt/live/source.bird.tools/privkey.pem; # managed by Certbotinclude /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbotssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot​location / {root /home/aleph/aleph/ui/build;try_files $uri $uri/ /index.html;​gzip_static on;gzip_types text/plain text/xml text/csstext/javascript application/x-javascript;}​location /api {proxy_pass http://aleph-api;proxy_redirect off;proxy_set_header Host $http_host;proxy_set_header X-Real-IP $remote_addr;proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;}}
We enable the configuration and restart the server.
cd /etc/nginx/sites-enabledln -sf /etc/nginx/sites-availble/source.bird.toolsnginx -tsystemctl restart nginx
Aleph should be up and running. You want to create an initial user account. See the following section on how to do that,
Whenever you want to interact with Aleph you have to activate it for your running shell. I use the following commands when interacting with Aleph.
sudo -i -u alephsource venvs/aleph/bin/activateexport ALEPHCLIENT_API_KEY=<your api key>
aleph createuser -n crito -p SuperSecretPassword -a crito@cryptodrunks.net
The command outputs an API key that you should save somewhere. You will need it in the future when interacting with Aleph.
Upgrading Aleph requires to pull the latest changes and reinstall any dependencies required. To upgrade from Aleph 3.4.4 to e.g. 3.4.5 run the following commands:
cd alephgit fetch --allgit checkout 3.4.5pip install -r requirements-generic.txtpip install -r requirements-toolkit.txtpip install .
Unfortunately, some dependencies are missing from the requirement files. Particularly spacy
is installed separately. Check the Dockerfile
in the aleph
directory to see if any additional steps are required.
alephclient crawldir -f test out.pdf