How to Compile HAProxy From Source and Setup a Basic Configuration

Wed 14 August 2013 Category: Networking

To learn more about HAProxy I decided to compile it from source and use it to load-balance traffic to louwrentius.com across two different web servers.

I run HAProxy on a VPS based on Ubuntu 12.04 LTS. Let's dive right in.

First, we need to download the source. Don't copy/pased the exact code, you should download the latest version of HAProxy.

cd /usr/src
wget "http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.24.tar.gz"
tar xzf haproxy-1.4.24.tar.gz
cd haproxy-1.4.24

Before you can compile software, you must make sure you have a working build-environment. With Ubuntu or Debian, you should run:

apt-get install build-essential

If you open the README file in the root directory, you will find some detailed instructions on how to compile HAProxy, which is really straight-forward.

Compiling HAProxy

Best CPU performance

The manual states that by default, it will compile HAProxy with no CPU-specific optimisations. To enable CPU-specific optimisations, you need to use the 'native' option.

The extra argument we are supplying to 'make' wil be:

CPU=native

Libpcre support

It recommends to compile HAproxy with libpcre as it provides way better performance than other libc PCRE implementations. You need to install libpcre like this:

apt-get install libpcre3-dev

The extra argument we are supplying to 'make' wil be:

USE_PCRE=1

Splicing support

A Linux-specific feature is support for the splice() system call. This system call allows data to be moved between file descriptors within kernel space, not touching user space. It entirely depends on your setup if this feature will be of any use to you. As splicing can be disabled within the configuration file of HAProxy, I would recommend compiling HAProxy with support for splicing.

The extra argument we are supplying to 'make' wil be:

USE_LINUX_SPLICE=1

Transparent mode support

I learned that HAProxy also supports a transparent mode where it seems to 'spoof' the client IP-address to the backend servers. This way, the backend servers see the actual client IP-address, not the IP-address of the HAProxy load-balancer(s).

For this setup to work, you need additional firewall rules and meet some routing requirements. I'm not sure why this would be important and the linked article also mentions a work-around where an additional HTTP-header is used: x-forwarded-for.

I found this article about how to configure lighttpd to log the x-forwarded-for header. Here are some instructions for Ngnix.

The extra argument we are supplying to 'make' wil be:

USE_LINUX_TPROXY=1

Encrypted password support

It's possible to limit access to HAProxy features (like statistics) to specific users and their passwords. These passwords can be stored in plain-text or as a (more secure) hash of the password, using crypt.

The extra argument we are supplying to 'make' wil be:

USE_LIBCRYPT=1

Compiling HAproxy

If we would use all discussed options, our Make command would look like this:

make TARGET=custom CPU=native USE_PCRE=1 USE_LIBCRYPT=1 USE_LINUX_SPLICE=1 USE_LINUX_TPROXY=1

Installing HAproxy

By default, HAProxy is installed in /usr/local/haproxy with the following command:

make install

If you want to start HAProxy at boot time, you need a startup script. HAProxy does provide a startup script for Redhat-based distro's, but not for Debian-based distros.

HAProxy is also available pre-compiled as an Ubuntu or Debian package. These packages also contain a startup script. I used such a script and modified it to work with the HAProxy version I compiled from source. Basically, I only altered some paths, but you can find it here

Configuration

HAProxy is very versatile and the actual configuration will entirely depend on your specific needs. I will document some basic scenario's with some examples.

HAProxy has many configuration options, but don't worry, those are often well-documented.

Scenario 1: Load-balancing

In this scenario, we have one load balancer based on HAProxy and it's goal is to load-balance traffic across two backend HTTP-servers.

global
    daemon
    user haproxy
    group haproxy
    chroot /home/haproxy
    maxconn 256

defaults
    mode http
    timeout connect 5000ms
    timeout client 50000ms
    timeout server 50000ms

frontend http-in
    bind *:80
    default_backend servers

backend servers
    balance roundrobin  
    server ws01 1.1.1.1:80 
    server ws02 1.1.1.2:80

Reading the global section, we learn that HAProxy should run as a daemon, that it should run as a specific system user and thus drop all privileges after startup. It also should chroot to /home/haproxy, a directory which should be empty and not writable by the HAProxy user or group. HAProxy will permit at most 256 simultaneous connections.

The defaults section learns us that we are running in HTTP mode. HAProxy can load-balance any TCP-traffic. In HTTP mode, it can understand and read HTTP header information and apply different actions, allowing for more control.

Now we encounter the interesting part. The default_backend keyword shows that all traffic entering on TCP-port 80 should be directed to the backend 'servers'. The 'backend' section contains the actual backend servers that will be able to handle traffic. The load-balancing algorithm used is round-robin: every web server is used in turn. Visitor 1 hits webserver 1. Visitor 2 hits webserver 2. Visitor 3 hits webserver 1, and so on.

Scenario 2: Fail-over

In scenario 1, we only discussed load-balancing. However, if one of the servers becomes unavailable, users will be facing error-messages generated by HAProxy. This is often undesired, we want HAProxy to check the status of the backend servers and direct traffic only to servers that are available. HAProxy should not forward clients to backend servers that are not responsive.

This desired behaviour requires a few extra options within the 'backend' section.

backend servers
    balance roundrobin
    option httpchk
    server ws01 1.1.1.1:80 check inter 4000
    server ws02 1.1.1.2:80 check inter 4000

This configration makes HAProxy check both backend webservers for every 4000ms (4 seconds) for availability. By default, HAProxy only tests if it's possible to make a TCP-connection with the webserver. Ofcourse, this will not always tell you if a webserver is properly operational. This is why 'option httpchk' is added to the configuration. HAProxy will then connect to the backend webserver and issue an HTTP OPTIONS-request, which will be a better gauge to determine if the web server service is active. With additional options you can make HAProxy request specific URIs.

Additional configuration options

Logging

HAProxy supports logging to Syslog. You can configure it to log to the local syslog daemon, or to a centralised log server.

global
    log 127.0.0.1 local0 debug
    log-tag haproxy

All log messages are prefixed with 'haproxy'. They are sent to localhost and the verbosity is 'debug'.

defaults
    log global

frontend http-in
    log global
    option httplog clf

Option httplog clf makes HAProxy log in a similar log format as Apache. A tool like AWstats can then easily parse the log and generate some statistics.

backend servers
    log global

The 'backend' section will only log messages related to the availability of backend servers. Actual request-logging is performed through the 'frontent' section.

Prioritising backend servers

Some backend servers may have more performance and bandwidth available then others. Using the 'weight' parameter, you can make sure that certain services get more traffic then others.

backend servers
    balance roundrobin
    option httpchk
    server ws01 1.1.1.1:80 check inter 4000 weight 10
    server ws02 1.1.1.2:80 check inter 4000 weight 20

In this example, webserver ws02 will receive twice as many request as webserver ws01. But the load will still be balanced across both webservers.

Enabling statistics

HAProxy has a build-in webpage that shows performance metrics and the status of backend hosts. This webpage is not enabled by default.

defaults
    stats enable
    stats auth username:password
    stats uri /mystatspage
    stats refresh 5s

Please note that with this configuration, the statistics page may be accessible from the internet. As the page may provide some information about your environment that could be of benefit to attackers, it's wise to configure strong passwords and to configure a uri that is not easy to predict/guess. Beware that the password is transmitted in clear-text!

For security reasons I would recommend to have the statistics page only accessible from within your own network and not accessible directly from the internet in any way.

In this next scenario I assume that the load balancer has two network interfaces and is connected to both the internet and an internal 'backend' network that uses IP-addresses in the 10.x.x.x range.

For security reasons, I would bind the statistics web page to the 'backend' interface, so it will never be accessible through the internet.

listen HAProxy-stats 10.0.10.10:81
    stats enable
    stats auth user:pass
    stats uri /stats
    stats refresh 5s
    stats show-legends

Final words

This basic tutorial should leave you with an up-and-running HAProxy. There are some topics I did not discuss, like handling of SSL-traffic. HAProxy 1.4 does not support SSL but version 1.5 will have native SSL-support. In the mean time, you will need to use Ngnix or 'stud' for SSL-offloading.

Comments