-
Notifications
You must be signed in to change notification settings - Fork 4
Unipept API Load Balancer configuration
Since Unipept has multiple API-servers that are handling client's requests, we have set up a separate load balancer that spreads all requests that it receives over the different servers. HAProxy is a software package that is handling all of this for us and the full configuration of this load balancer can be found in this document.
The configuration file for HAProxy can be found in /etc/haproxy/haproxy.cfg
and looks like this. An explanation for each of the different non-standard configuration options is provided in comments in this file.
global
log /dev/log local0
log /dev/log local1 notice
chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin
stats timeout 30s
user haproxy
group haproxy
daemon
# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private
# See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
# These values are defaults that are taken over by all frontend and backend sections below. These can
# still be overriden if they are specified again in one of the frontend and backend blocks.
defaults
log global
mode http
option httplog
option dontlognull
# The time that HAProxy waits for a TCP connection to the backend to be established.
timeout connect 5s
# This setting measures inactivity during a period that we expect the client to be speaking.
timeout client 5s
# This setting measures inactivity during a period that we expect the backend server to be
# speaking.
timeout server 1800s
errorfile 400 /etc/haproxy/errors/400.http
errorfile 403 /etc/haproxy/errors/403.http
errorfile 408 /etc/haproxy/errors/408.http
errorfile 500 /etc/haproxy/errors/500.http
errorfile 502 /etc/haproxy/errors/502.http
errorfile 503 /etc/haproxy/errors/503.http
errorfile 504 /etc/haproxy/errors/504.http
frontend stats
mode http
bind *:8084
stats enable
stats uri /stats
stats refresh 10s
stats admin if LOCALHOST
frontend handlers
mode http
# Allow HAProxy to load balance normal HTTP-requests
bind *:80
# Allow HAProxy to load balance secure HTTPS-requests
# HTTP2 is enabled and is preferred (alpn indicates that HTTP2 is preferred over HTTP1.1)
bind *:443 ssl crt /etc/ssl/unipeptapi.ugent.be/unipeptapi.ugent.be.pem alpn h2,http1.1
# Keep track of the last 100k HTTP-requests and from which ipv6 address they originated.
# The records in this table are automatically removed after 120s (expire 120s)
# By setting http_req_rate(60s) we are telling HAProxy to count the amount of requests made by
# each IP-address in the last 60s.
stick-table type ipv6 size 100k expire 120s store http_req_rate(60s)
# Sticky counter in which the sticky table requests are stored
http-request track-sc0 src
# Allow HAProxy to scan the body of requests
option http-buffer-request
# Allow 5000 requests per minute from a client. If more requests are mode, respond with a status
# 429 (Too Many Requests) error.
# http-request deny deny_status 429 if { sc_http_req_rate(0) gt 5000 }
# Automatically redirect traffic to https if it came from http. This is disabled for the API
# for performance reasons since some clients don't want to use HTTPS
# redirect scheme https code 301 if !{ ssl_fc }
acl letsencrypt-acl path_beg /.well-known/acme-challenge/
use_backend letsencrypt if letsencrypt-acl
acl is_pept2data path_beg /mpa/pept2data
acl is_peptinfo path_beg /api/v2/peptinfo
acl is_protinfo path_beg /api/v2/protinfo
acl is_missed_cleavage req.body -m reg \"missed\":[^,]*true
use_backend ssd_handlers if is_pept2data is_missed_cleavage
use_backend ssd_handlers if is_peptinfo || is_protinfo
default_backend all_handlers
backend letsencrypt
server letsencrypt 127.0.0.1:8888
backend ssd_handlers
# GZIP responses from the backend servers before sending them to clients.
filter compression
compression algo gzip
# Always send new requests to the backend handler that is currently handling the least amount
# of connections.
balance leastconn
mode http
# Check if a backend server is still healthy by periodically contacting a specific endpoint.
option httpchk
# The metadata endpoint does use the database and is very lightweight, making it an ideal
# candidate to use for the HTTP check method. This way we check if both apache and mysql
# are still functioning on the handler.
http-check send meth GET uri /private_api/metadata.json
# List of the different backend servers that are available for handling Unipept API-requests.
server patty patty.ugent.be:80 check maxconn 100
server selma selma.ugent.be:80 check maxconn 100
backend all_handlers
# GZIP responses from the backend servers before sending them to clients.
filter compression
compression algo gzip
# Always send new requests to the backend handler that is currently handling the least amount
# of connections.
balance leastconn
mode http
# Check if a backend server is still healthy by periodically contacting a specific endpoint.
option httpchk
# The metadata endpoint does use the database and is very lightweight, making it an ideal
# candidate to use for the HTTP check method. This way we check if both apache and mysql
# are still functioning on the handler.
http-check send meth GET uri /private_api/metadata.json
# List of the different backend servers that are available for handling Unipept API-requests.
server patty patty.ugent.be:80 check maxconn 100
server selma selma.ugent.be:80 check maxconn 100
server rick rick.ugent.be:80 check maxconn 100
server sherlock sherlock.ugent.be:80 check maxconn 100
In order to check the amount of requests that are being made to the Unipept API, which endpoints are the most popular and how many server resources are being consumed, we have set up a system that automatically analyses and summarizes HAProxy logs and keeps this summary in a local MySQL database that can be accessed by Grafana.
1. Install and configure logrotate
We want to keep track of HAProxy's log files from the last 30 days. In order to keep things structured, we are going to install the logrotate package that is going to process the log file once every day, store it in a separate file and clear the original one. Once a backlogged file is 30 days old, it will automatically be removed and replaced by a new one.
- So, first install logrotate by running
sudo apt install logrotate
. - Then, create a new logrotate config file for HAProxy with
sudo nano -c /etc/logrotate.d/haproxy
and paste in the following configuration:
/var/log/haproxy.log {
daily
# Keep backlog of the last 30 days
rotate 30
missingok
notifempty
compress
# Delay compressing the current log file to the next day
delaycompress
postrotate
[ ! -x /usr/lib/rsyslog/rsyslog-rotate ] || /usr/lib/rsyslog/rsyslog-rotate
endscript
}
- Check that a logrotate timer has been made that automatically triggers at midnight. This timer lives in
/lib/systemd/system/logrotate.timer
and/lib/systemd/system/logrotate.service
. These are normally installed on the system by default. Make sure that
2. Download Unipept's monitoring script
We have developed our own script for parsing and transforming the data from HAProxy's HALog utility, which we are going to download now:
- Navigate into
/usr/local/bin
and clone the repositorygit clone https://github.com/unipept/unipept-utilities.git
. - Install NVM (Node Version Manager):
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.1/install.sh | bash
. - In order to start using NVM, we must first source our profile:
source ~/.bashrc
. - Install Node 20 (or higher) and set as the default:
nvm install 20 && nvm alias default 20
. - Globally install yarn which we need for the
halog-collector
script's dependencies:npm install --global yarn
. - Navigate into the
halog-collector
scripts directory:cd /usr/local/bin/unipept-utilities/scripts/halog-collector
. - Install all required Node-packages:
yarn install
.
3. Install and configure MySQL server
In order to keep track of the log summary that will be produced by a self-made script, we are going to need MySQL server and properly configure it.
- Start by following this guide of DigitalOcean on how to install and set up the MySQL server. Make sure to use the username root for the MySQL root user and run through the mysql_secure_installation script such that the server is only accessible from the localhost and that anonymous access is disabled.
- Create a new database called
load_balancer_stats
by runningmysql -uroot -p$PASSWORD < /usr/local/bin/unipept-utilities/scripts/halog-collector/schema/default_schema.sql
. Replace$PASSWORD
with the password of your MySQL installation that you chose during the previous step. - Create a new user for the MySQL database that can only read data. This user will be used later-on by Grafana and protects our database from accidentally deleting data. Therefor, open a new MySQL-terminal:
mysql -uroot -p$PASSWORD
(replace$PASSWORD
with the real deal) and execute the following SQL commands one by one:
# Replace $password with the real thing!
CREATE USER 'grafana'@'%' IDENTIFIED BY '$password';
GRANT SELECT ON load_balancer_stats.* TO 'grafana'@'localhost';
FLUSH PRIVILEGES;
- In order for this MySQL server to be accessible from the Grafana host, we need to expose the database on a specific port (I chose 4840 in this example). Open the server's configuration (
sudo nano -c /etc/mysql/mysql.conf.d/mysqld.cnf
) and make the following changes:
Port = 4840
bind-address = xxx.xxx.xxx.xxx # actual server's IP address
4. Automatically run halog-collector once a day
We are going to produce statistics for our load balancer once a day and therefor need to run the halog-collector
script at a fixed point in time every day. We will be setting up a new systemd service and timer for this purpose.
- Create a new script that will automatically call
halog-collector
with the correct parameters:sudo nano -c /usr/local/bin/halog-collector.sh
and paste the following contents in there (replacePASSWORD
andUSER
with the correct credentials for your installation of MySQL):
#!/usr/bin/env bash
DB_NAME="load_balancer_stats"
DB_USER="USER"
DB_PASS="PASSWORD"
DB_PORT="4840"
# Always process the HAProxy log from yesterday
cat /var/log/haproxy.log.1 | halog -u -H | node /usr/local/bin/unipept-utilities/scripts/halog-collector/collect.js "$DB_USER" "$DB_PASS" "$DB_PORT" "$DB_NAME"
- Make the new script executable:
chmod u+x /usr/local/bin/halog-collector.sh
. - Create a new systemd service:
sudo nano -c /lib/systemd/system/halog-collector.service
and add the following contents to this file:
[Unit]
Description=Collects and summarizes HALog-files
# This service should only be started once logrotate is finished
After=logrotate.service
RequiresMountsFor=/var/log
[Service]
Type=oneshot
ExecStart=/usr/local/bin/halog-collector.sh
- Now, create a systemd timer that will accompany the service that we've just construced. Create new file (
sudo nano -c /lib/systemd/system/halog-collector.timer
) and add the following contents:
[Unit]
Description=Daily summary of load balancing endpoints
[Timer]
OnCalendar=daily
AccuracySec=1h
Persistent=true
[Install]
WantedBy=timers.target