1. How to Compile HAProxy From Source and Setup a Basic Configuration

    Wed 14 August 2013

    To learn more about HAProxy I decided to compile it from source and use it to load-balance traffic to louwrentius.com across two different web servers.

    I run HAProxy on a VPS based on Ubuntu 12.04 LTS. Let's dive right in.

    First, we need to download the source. Don't copy/pased the exact code, you should download the latest version of HAProxy.

    cd /usr/src
    wget "http://haproxy.1wt.eu/download/1.4/src/haproxy-1.4.24.tar.gz"
    tar xzf haproxy-1.4.24.tar.gz
    cd haproxy-1.4.24
    

    Before you can compile software, you must make sure you have a working build-environment. With Ubuntu or Debian, you should run:

    apt-get install build-essential
    

    If you open the README file in the root directory, you will find some detailed instructions on how to compile HAProxy, which is really straight-forward.

    Compiling HAProxy

    Best CPU performance

    The manual states that by default, it will compile HAProxy with no CPU-specific optimisations. To enable CPU-specific optimisations, you need to use the 'native' option.

    The extra argument we are supplying to 'make' wil be:

    CPU=native
    

    Libpcre support

    It recommends to compile HAproxy with libpcre as it provides way better performance than other libc PCRE implementations. You need to install libpcre like this:

    apt-get install libpcre3-dev
    

    The extra argument we are supplying to 'make' wil be:

    USE_PCRE=1
    

    Splicing support

    A Linux-specific feature is support for the splice() system call. This system call allows data to be moved between file descriptors within kernel space, not touching user space. It entirely depends on your setup if this feature will be of any use to you. As splicing can be disabled within the configuration file of HAProxy, I would recommend compiling HAProxy with support for splicing.

    The extra argument we are supplying to 'make' wil be:

    USE_LINUX_SPLICE=1
    

    Transparent mode support

    I learned that HAProxy also supports a transparent mode where it seems to 'spoof' the client IP-address to the backend servers. This way, the backend servers see the actual client IP-address, not the IP-address of the HAProxy load-balancer(s).

    For this setup to work, you need additional firewall rules and meet some routing requirements. I'm not sure why this would be important and the linked article also mentions a work-around where an additional HTTP-header is used: x-forwarded-for.

    I found this article about how to configure lighttpd to log the x-forwarded-for header. Here are some instructions for Ngnix.

    The extra argument we are supplying to 'make' wil be:

    USE_LINUX_TPROXY=1
    

    Encrypted password support

    It's possible to limit access to HAProxy features (like statistics) to specific users and their passwords. These passwords can be stored in plain-text or as a (more secure) hash of the password, using crypt.

    The extra argument we are supplying to 'make' wil be:

    USE_LIBCRYPT=1
    

    Compiling HAproxy

    If we would use all discussed options, our Make command would look like this:

    make TARGET=custom CPU=native USE_PCRE=1 USE_LIBCRYPT=1 USE_LINUX_SPLICE=1 USE_LINUX_TPROXY=1
    

    Installing HAproxy

    By default, HAProxy is installed in /usr/local/haproxy with the following command:

    make install
    

    If you want to start HAProxy at boot time, you need a startup script. HAProxy does provide a startup script for Redhat-based distro's, but not for Debian-based distros.

    HAProxy is also available pre-compiled as an Ubuntu or Debian package. These packages also contain a startup script. I used such a script and modified it to work with the HAProxy version I compiled from source. Basically, I only altered some paths, but you can find it here

    Configuration

    HAProxy is very versatile and the actual configuration will entirely depend on your specific needs. I will document some basic scenario's with some examples.

    HAProxy has many configuration options, but don't worry, those are often well-documented.

    Scenario 1: Load-balancing

    In this scenario, we have one load balancer based on HAProxy and it's goal is to load-balance traffic across two backend HTTP-servers.

    global
        daemon
        user haproxy
        group haproxy
        chroot /home/haproxy
        maxconn 256
    
    defaults
        mode http
        timeout connect 5000ms
        timeout client 50000ms
        timeout server 50000ms
    
    frontend http-in
        bind *:80
        default_backend servers
    
    backend servers
        balance roundrobin  
        server ws01 1.1.1.1:80 
        server ws02 1.1.1.2:80
    

    Reading the global section, we learn that HAProxy should run as a daemon, that it should run as a specific system user and thus drop all privileges after startup. It also should chroot to /home/haproxy, a directory which should be empty and not writable by the HAProxy user or group. HAProxy will permit at most 256 simultaneous connections.

    The defaults section learns us that we are running in HTTP mode. HAProxy can load-balance any TCP-traffic. In HTTP mode, it can understand and read HTTP header information and apply different actions, allowing for more control.

    Now we encounter the interesting part. The default_backend keyword shows that all traffic entering on TCP-port 80 should be directed to the backend 'servers'. The 'backend' section contains the actual backend servers that will be able to handle traffic. The load-balancing algorithm used is round-robin: every web server is used in turn. Visitor 1 hits webserver 1. Visitor 2 hits webserver 2. Visitor 3 hits webserver 1, and so on.

    Scenario 2: Fail-over

    In scenario 1, we only discussed load-balancing. However, if one of the servers becomes unavailable, users will be facing error-messages generated by HAProxy. This is often undesired, we want HAProxy to check the status of the backend servers and direct traffic only to servers that are available. HAProxy should not forward clients to backend servers that are not responsive.

    This desired behaviour requires a few extra options within the 'backend' section.

    backend servers
        balance roundrobin
        option httpchk
        server ws01 1.1.1.1:80 check inter 4000
        server ws02 1.1.1.2:80 check inter 4000
    

    This configration makes HAProxy check both backend webservers for every 4000ms (4 seconds) for availability. By default, HAProxy only tests if it's possible to make a TCP-connection with the webserver. Ofcourse, this will not always tell you if a webserver is properly operational. This is why 'option httpchk' is added to the configuration. HAProxy will then connect to the backend webserver and issue an HTTP OPTIONS-request, which will be a better gauge to determine if the web server service is active. With additional options you can make HAProxy request specific URIs.

    Additional configuration options

    Logging

    HAProxy supports logging to Syslog. You can configure it to log to the local syslog daemon, or to a centralised log server.

    global
        log 127.0.0.1 local0 debug
        log-tag haproxy
    

    All log messages are prefixed with 'haproxy'. They are sent to localhost and the verbosity is 'debug'.

    defaults
        log global
    
    frontend http-in
        log global
        option httplog clf
    

    Option httplog clf makes HAProxy log in a similar log format as Apache. A tool like AWstats can then easily parse the log and generate some statistics.

    backend servers
        log global
    

    The 'backend' section will only log messages related to the availability of backend servers. Actual request-logging is performed through the 'frontent' section.

    Prioritising backend servers

    Some backend servers may have more performance and bandwidth available then others. Using the 'weight' parameter, you can make sure that certain services get more traffic then others.

    backend servers
        balance roundrobin
        option httpchk
        server ws01 1.1.1.1:80 check inter 4000 weight 10
        server ws02 1.1.1.2:80 check inter 4000 weight 20
    

    In this example, webserver ws02 will receive twice as many request as webserver ws01. But the load will still be balanced across both webservers.

    Enabling statistics

    HAProxy has a build-in webpage that shows performance metrics and the status of backend hosts. This webpage is not enabled by default.

    defaults
        stats enable
        stats auth username:password
        stats uri /mystatspage
        stats refresh 5s
    

    Please note that with this configuration, the statistics page may be accessible from the internet. As the page may provide some information about your environment that could be of benefit to attackers, it's wise to configure strong passwords and to configure a uri that is not easy to predict/guess. Beware that the password is transmitted in clear-text!

    For security reasons I would recommend to have the statistics page only accessible from within your own network and not accessible directly from the internet in any way.

    In this next scenario I assume that the load balancer has two network interfaces and is connected to both the internet and an internal 'backend' network that uses IP-addresses in the 10.x.x.x range.

    For security reasons, I would bind the statistics web page to the 'backend' interface, so it will never be accessible through the internet.

    listen HAProxy-stats 10.0.10.10:81
        stats enable
        stats auth user:pass
        stats uri /stats
        stats refresh 5s
        stats show-legends
    

    Final words

    This basic tutorial should leave you with an up-and-running HAProxy. There are some topics I did not discuss, like handling of SSL-traffic. HAProxy 1.4 does not support SSL but version 1.5 will have native SSL-support. In the mean time, you will need to use Ngnix or 'stud' for SSL-offloading.

Page 1 / 1