1. How to Setup a Local or Private Ubuntu Mirror

    Wed 18 January 2023


    In this article I provide instructions on how to setup a local Ubuntu mirror using debmirror. To set expectations: the mirror will work as intended and distribute packages and updates, but a do-release upgrade from one major version of Ubuntu to the next won't work.


    By default, Ubuntu systems get their updates straight from the internet at archive.ubuntu.com. In an environment with lots of Ubuntu systems (servers and/or desktops) this can cause a lot of internet traffic as each system needs to download the same updates.

    In an environment like this, it would be more efficient if one system would download all Ubuntu updates just once and distribute them to the clients. In this case, updates are distributed using the local network, removing any strain on the internet link1.


    We call such a system a local mirror and it's just a web server with sufficient storage to hold the Ubuntu archive (or part of it). A local mirror is especially relevant for sites with limited internet bandwidth, but there are some extra benefits.

    To sum up the main benefits:

    1. Reduced internet bandwidth usage
    2. Faster update proces using the local network (often faster than internet link)
    3. Update or install systems even during internet or upstream outage

    The main drawbacks of a local mirror are:

    1. An extra service to maintain and monitor
    2. Storage requirement: starts at 1TB
    3. Initial sync can take a long time depending on internet speed

    Mirror solutions

    Ubuntu mirror script

    This solution is geared towards ISPs or companies who like to run their own regional mirror. It is meant to mirror the entire, unfiltered Ubuntu package archive.

    As of 2023 you should expect 2.5TB for archive.ubuntu.com and also around 2.5 TB for ports.ubuntu.com (ARM/RISCV and others).

    This is a lot of storage and likely not what most environments need. Even so, if this is what you want to run you can consult this web page and use the script mentioned here.


    Based on my own research, it seems that the tool Debmirror is the most simple and straight-forward way to create a local Ubuntu mirror with a reasonable data footprint of about 480 GB (2023) for both Jammy AMD64 (22.04) and Focal AMD64 (20.04).

    Based on on your needs, you can further finetune Debmirror to only download the pacakges that you need for your environment.


    The tool apt-cacher-ng acts as a caching proxy and only stores updates that are requested by clients. Missing or new updates are only downloaded once the first client requests this download, although there seem to be option to pre-download updates.

    Although I expect a significantly smaller footprint than debmirror, I could not find any information about actual real-life disk usage.

    Creating an Ubuntu mirror with debmirror

    Although apt-cacher-ng is quite a capable solution which many additional features, I feel that a simple mirror solution like debmirror is extremely simple to setup and maintain. This article will this focus on debmirror.


    1 - Computer

    First of all we need a computer - which can be either physical or virtual - that can act as the local mirror. I've used a Raspberry Pi 4B+ as a mirror with an external USB hard drive and it can saturate a local 1 Gbit network with ease.

    2 - 1TB storage capacity (minimum)

    I'm mirroring Ubuntu 22.04 and 20.04 for AMD64 architecture and that uses around 480 GB (2023). For ARM64, you should expect a similar storage footprint. There should be some space available for future growth so that's why I recommend to have at least 1 TB of space available.

    Aside from capacity, you should also think about the importance of redundancy: what if the mirror storage device dies and you have to redownload all data? Would this impact be worth the investment in redundancy / RAID?

    It might even be interesting to use a filesystem (layer) like ZFS or LVM that support snapshots to quickly restore the mirror to a known good state if there has been an issue with a recent sync.

    3 - Select a local public Ubuntu archive

    It's best to sync your local mirror with a public Ubuntu archive close to your physical location. This provides the best internet performance and you also reduce the strain on the global archive. Use the linked mirror list to pick the best mirror for your location.

    In my case, I used nl.archive.ubuntu.com as I'm based in The Netherlands.

    Ubuntu Mirror configuration

    01 - Add the storage device / volume to the fstab

    If you haven't done so already, make sure you create a directory as a mountpoint for the storage we will use for the mirror.

    In my case I've created the /mirror directory...

    mkdir /mirror

    ... and updated the fstab like this (example!):

    /dev/disk/by-uuid/154d28fb-83d0-4848-ac1d-da1420252422 /mirror xfs noatime 0 0

    I recommend using the by-uuid or by-id path for mounting the storage device as it's most stable and don't forget the use the correct filesystem (xfs/ext4).

    Now we can issue:

    mount /mirror

    02 - Install required software

    We need a webserver installed on the mirror to serve the deb packages to the clients. Installation is straightforward and no further configuration is required. In this example I'm using Apache2 but you can use any webserver you're comfortable with.

    If you want to synchronise with the upstream mirror using regular HTTP you don't need additional software.

    apt-get update
    apt install apache2 debmirror gnupg xz-utils

    I think that using rsync for synchronisation is more efficient and faster but you have to configure your firewall to allow outbound traffic to TCP port 873 (which is outside the scope of this tutorial)

    apt install rsync

    Tip: make sure you run debmirror on a 20.04 or 22.04 system as older versions don't support current ubuntu mirrors and some required files won't be downloaded.

    03 - Creating file paths

    I've created this directory structure to host my local mirror repos.

    ├── debmirror
    │   ├── amd64
    │   │   ├── dists
    │   │   ├── pool
    │   │   └── project
    │   └── mirrorkeyring
    └── scripts
    mkdir /mirror/debmirror
    mkdir /mirror/debmirror/amd64
    mkdir /mirror/debmirror/mirrorkeyring
    mkdir /mirror/scripts

    The folders within the amd64 directory will be created by debmirror so they don't have to be created in advance.

    04 - install GPG keyring

    gpg --no-default-keyring --keyring /mirror/debmirror/mirrorkeyring/trustedkeys.gpg --import /usr/share/keyrings/ubuntu-archive-keyring.gpg

    05 - Create symlinks

    We need to create symlinks in the apache2 /var/www/html directory that point to our mirror like this:

    cd /var/www/html
    ln -s /mirror/debmirror/amd64 ubuntu

    06 - Configure debmirror

    Debmirror is just a command-line tool that takes a lot of arguments. If we want to run this tool daily to keep our local mirror in sync, it's best to use a wrapper script that can be called by cron.

    Such a wrapper script is provided by this page and I have included my own customised version here.

    You can download this script and place it in /mirror/scripts like this:

    cd /mirror/scripts
    wget https://louwrentius.com/files/debmirroramd64.sh.txt -O debmirroramd64.sh 
    chmod +x debmirroramd64.sh

    Now we need to edit this script and change some parameters to your specific requirements. The changes I've made as compared to the example are:

    export GNUPGHOME=/mirror/debmirror/mirrorkeyring

    The Ubuntu installer ISO for 20.04 and 22.04 seem to require the -backports releases too so those are included.

    Limitations I've not been able (yet) to make the do-release-upgrade process work to upgrade a system from focal to jammy. I found this old resource but those instructions don't seem to work for me.

    07 - Limiting bandwidth

    The script by default doesn't provide a way to limit rsync bandwidth usage. In my script, I've added some lines to make bandwidth limiting work as an option.

    A new variable is added that must be uncommented and can be set to the desired limit. In this case 1000 means 1000 Kilobytes per second.


    You also need to uncomment this line:

    --rsync-options "-aIL --partial --bwlimit=$bwlimit" \

    08 - Initial sync

    It may be advised not to first run the initial sync before we configure a periodic cron job to do a daily sync. The first sync can take a long time and may interfere with the cron job. It may be advised to only enable the cronjob once the initial sync is completed.

    As the initial sync can take a while, I like to run this job with screen. If you accidentally close the terminal, the rsync process isn't interrupted (although this isnot a big deal if that happens, it just continues where it left off).

    apt install screen
    screen /mirror/scripts/debmirroramd64.sh

    09 - Setup cron job

    When the initial sync is completed we can configure the cron job to sync periodically.

    0 1 * * * /mirror/scripts/debmirroramd64.sh

    In this case the sync runs daily at 1 AM.

    The mirror includes all security updates so depending on your environment, it's recommended to synchronise the mirror at least daily.

    10 - Client configuration

    All clients should point to your local mirror in their /etc/apt/sources.list file. You can use the IP-address of your mirror, but if you run a local DNS, it's not much effort to setup a DNS-record like mirror.your.domain and have all clients reconfigured to connect to the domain name.

    This is the /etc/apt/sources.list for the client

    deb http://mirror.your.domain/ubuntu RELEASE main restricted universe multiverse
    deb http://mirror.your.domain/ubuntu RELEASE-security main restricted universe multiverse
    deb http://mirror.your.domain/ubuntu RELEASE-updates main restricted universe multiverse

    The RELEASE value should be changed to the appropriate ubuntu release, like bionic, focal or jammy.

    If you have an environment with a lot of Ubuntu systems, this configuration is likely provisioned with tools like ansible.

    11 - Monitoring

    Although system monitoring is out-of-scope for this blog post, there are two topics to monitor:

    1. disk space usage (alert if space is running out)
    2. succesfull synchronisation script execution (alert if script fails)

    If you don't monitor the synchronisation process, the mirror will become out-dated and will lack the latest security updates.

    Closing words

    As many environments are either cloud-native or moving towars a cloud-environment, running a local mirror seems less and less relevant. Yet there may still be environments that could benefit from a local mirror setup. Maybe this instruction is helpful.

    1. You may notice that cloud provides actually also run their own Ubuntu archive mirror to reduce the load on their upstream and peering links. When you deploy a standard virtual machine based on Ubuntu, it is by default configured to use the local mirror. 

    Tagged as : Linux
  2. Understanding the Ubuntu 20.04 LTS Server Autoinstaller

    Thu 11 February 2021


    Ubuntu Server version 18.04 LTS uses the debian-installer (d-i) for the installation process. This includes support for 'preseeding' to create unattended (automated) installations of ubuntu.


    the debian installer

    With the introduction of Ubuntu Server 20.04 'Focal Fossa' LTS, back in April 2020, Canonical decided that the new 'subiquity server installer' was ready to take it's place.

    After the new installer gained support for unattended installation, it was considered ready for release. The unattended installer feature is called 'Autoinstallation'.

    I mostly run Ubuntu 18.04 LTS installations but I decided in February 2021 that I should get more acquainted with 20.04 LTS, especially when I discovered that preseeding would no longer work.

    In this article I assume that the reader is familiar with PXE-based unattended installations.

    20.04 LTS and 22.04 LTS Can't install on USB drive (Update Aug 2022)

    I run a few HP Microservers that boot from a SATA SSD in an USB drive enclosure using the internal USB header. Automated installs for Ubuntu 18.04 LTS have no problem installing and booting from a USB device.

    Unfortunately, both Ubuntu 20.04 and 22.04 LTS install fine, but no matter what I do, they won't boot from USB. I've tried different enclosures and most of then won't work. Only one powered USB dock (that is way to big to fit inside the Microserver) does work and let 22.04 boot from USB.

    Again: Ubuntu 18.04 and the Latest Debian work fine so this seems an issue specific to the new Autoinstall mechanism.

    Note: I didn't try this on other hardware (which I don't have) so it might be an issue specific to the HP Microserver Gen8.

    Why this new installer?

    Canonical's desire to unify the codebase for Ubuntu Desktop and Server installations seems to be the main driver for this change.

    From my personal perspective, there aren't any new features that benefit my use-cases, but that could be different for others. It's not a ding on the new Autoinstaller, it's just how I look at it.

    There is one conceptual difference between the new installer and preseeding. A preseed file must answer all questions that the installer needs answered. It will switch to interactive mode if a question is not answered, breaking the unattended installation process. From my experience, there is a bit of trial-and-error getting the preseed configuration right.

    The new Subiquity installer users defaults for all installation steps. This means that you can fully automate the installation proces with just a few lines of YAML. You do't need an answer for each step.

    The new installer as other features such as the ability to SSH into an installer session. It works by generating an at-hoc random password on the screen / console which you can use to logon remotely over SSH. I have not used it yet as I never found it necessary.

    Documentation is a bit fragmented

    As I was trying to learn more about the new Autoinstaller, I noticed that there isn't a central location with all relevant (links to) documentation. It took a bit of searching and following links, to collect a set of useful information sources, which I share below.

    Link Description
    Reference Manual Reference for each particular option of the user-data YAML
    Introduction An overview of the new installer with examples
    Autoinstall quick start Example of booting a VM using the installer using KVM
    Netbooting the installer A brief instruction on how to setup PXE + TFTP with dnsmasq in order to PXE boot the new installer
    Call for testing Topic in which people give feedback on their experience with the installer (still active as of February 2021) with a lot of responses
    Stack Exchange post Detailed tutorial with some experiences.
    Medium Article Contains an example and some experiences.
    github examples Github repo with about twenty examples of more complex configurations

    The reference documentation only supports some default use-cases for the unattended installation process. You won't be able to build more complex configuration such as RAID-based installations using this reference.

    Under-the-hood, the installer uses curtin. The linked documentation can help you further build complex installations, such as those which use RAID.

    I think the curtin syntax is a bit tedious and fortunately it is probably not required to learn curtis and piece together more complex configurations by hand. There is a nice quality-of-life feature that takes care of this.

    More on that later.

    How does the new installer work?

    With a regular PXE-based installation, we use the 'netboot' installer, which consists of a Linux kernel and an initrd image (containing the actual installer).

    This package is about 64 Megabytes for Ubuntu 18.04 LTS and it is all you need, assuming that you have already setup a DHCP + TFTP + HTTP environment for a PXE-based installation.

    The new Subiquity installer for Ubuntu 20.04 LTS deprecates this 'netboot' installer. It is not provided anymore. Instead, you have to boot a 'live installer' ISO file which is about 1.1 GB in size.

    The process looks like this:

    1. Download the Live installer ISO
    2. Mount the iso to acquire the 'vmlinuz' and 'initrd' files for the TFTP root
    3. Update your PXE menu (if any) with a stanza like this:
    LABEL focal2004-preseed
                    MENU LABEL Focal 20.04 LTS x64 Manual Install
                    KERNEL linux/ubuntu/focal/vmlinuz
                    INITRD linux/ubuntu/focal/initrd
                    APPEND root=/dev/ram0 ramdisk_size=1500000 ip=dhcp url=

    This process is documented here with detailed how-to steps and commands.

    We have not yet discussed the actual automation part, but first we must address a caveat.

    The new installer requires 3GB of RAM when PXE-booting

    UPDATE May 2022:

    The 20.04.4 update won't work on a machine with 4096 MB of memory when PXE booting. My testing shows that with 20.04 update 04 requires at least 4300MB of memory.

    So if we talk about a physical machine, based on regular memory DIMM sizes, it's likely that the machine must have 6 to 8 GB of memory. Just to install Ubuntu Linux over PXE. I think that's not right.

    Although it is not explicitly documented1, the new mechanism of PXE-booting the new Ubuntu installer using the Live ISO requires a minimum of 3072 MB of memory. And I assure you, 3000 MB is not enough.

    It seems that the ISO file is copied into memory over the network and extracted on the RAM disk. With a RAM disk of 1500 MB and an ISO file of 1100 MB we are left with maybe 472 MB of RAM for the running kernel and initrd image.

    To put this into perspective: I could perform an unattended install of Ubuntu 18.04 LTS with only 512 MB of RAM2.

    Due to this 'new' installation process, Ubuntu Server 20.04 has much more demanding minimum system requirements than Windows 2019 Server, which is fine with 'only' 512 MB of RAM, even during installation. I have to admit I find this observation a bit funny.

    It seems that this 3 GB memory requirement for the installation process is purely and solely because of the new installation process. Obviously Ubuntu 20.04 can run in a smaller memory footprint once installed.

    Under the hood, a tool called 'casper' is used to bootstrap the installation process and that tool only supports a local file system (in this case on a RAM disk). On paper, casper does support installations using NFS or CIFS but that is not supported nor tested. From what I read, some people tried to use it but it didn't work out.

    As I undertand it, the curent status is that you can't install Ubuntu Server 20.04 LTS on any hardware with less than 3GB of memory using PXE-boot. This probably affects older and less potent hardware, but I can remember a time that this was actually part of the point of running Linux.

    It just feels wrong conceptually, that a PXE-based server installer requires 3GB of memory.


    The 3GB memory requirement is only valid for PXE-based installations. If you boot from ISO / USB stick you can install Ubuntu Server on a system with less memory. I've verified this with a system with only 1 GB of memory.

    The Autoinstall configuration

    Now let's go back to the actual automation part of Autoinstaller. If we would want to automate our installation, our PXE menu item should be expanded like this:

    APPEND root=/dev/ram0 ramdisk_size=1500000 ip=dhcp url= autoinstall ds=nocloud-net;s=

    In this case, the cloud-init folder - as exposed through an HTTP-server - must contain two files:

    • meta-data
    • user-data

    The meta-data file contains just one line:

    instance-id: focal-autoinstall

    The user-data file is the equivalent of a preseed file, but it is YAML-based instead of just regular plain text.

    Minimum working configuration

    According to the documentation, this is a minimum viable configuration for the user-data file:

      version: 1
        hostname: ubuntu-server
        password: "$6$exDY1mhS4KUYCE/2$zmn9ToZwTKLhCw.b4/b.ZRTIZM30JZ4QrOQ2aOXJ8yk96xpcCof0kxKwuX1kqLG/ygbJ1f8wxED22bTL4F46P0"
        username: ubuntu

    This works fine and performs a basic install + apt upgrade of the new system with a disk configuration based on an LVM layout.

    My preferred minimum configuration:

    Personally, I like to keep the unattended installation as simple as possible. I use Ansible to do the actual system configuration, so the unattended installation process only has to setup a minimum viable configuration.

    I like to configure the following parameters:

    • Inject a SSH public key into the authorized_keys file for Ansible
    • Configure the Apt settings to specify which repository to use during installation (I run a local debian/ubuntu mirror)
    • Update to the latest packages during installation

    I'll give examples of the required YAML to achieve this with the new installer.

    Injecting a public SSH key for the default user:

        authorized-keys: |
           ssh-rsa <PUB KEY>
        install-server: true
        allow-pw: no

    Notice that we also disable password-authentication for SSH access.

    Configure APT during installation

    I used this configuration to specify a particular mirror for both the installation process and for the system itself post-installation.

      mirror: "http://mirror.mynetwork.loc"
      preserve_sources_list: false
        - arches: [amd64]
          uri: "http://mirror.mynetwork.loc/ubuntu"

    Performing an apt upgrade

    By default, the installer seems to install security updates but it doesn't install the latest version of softare. This is a deviation from the d-i installer which always ends up with a fully up-to-date system, when done.

    The same end-result can be accomplished by running an apt update and apt upgrade at the end of the installation process.

    - curtin in-target --target=/target -- apt update           
    - curtin in-target --target=/target -- apt upgrade -y

    So all in all, this is not a big deal.

    Network configuration

    The network section can be configured using regular Netplan syntax. An example:

      version: 2
      renderer: networkd
          dhcp4: no
              - mynetwork.loc

    Storage configuration

    The installer only supports a 'direct' or 'lvm' layout. It also selects the largest drive in the system as the boot drive, for installation.

    If you want to setup anything more complex as you may want to setup RAID or a specific partion layout, you need to use the curtin syntax.

    It is not immediately clear how to setup a RAID configuration based on the available documentation.

    Fortunately, the new installer supports creating a RAID configuration or custom partition layout if you perform a manual install.

    It turns out that when the manual installation is done you can find the cloud-init user-data YAML for this particular configuration in the following file:


    I think this is extremely convenient.

    So to build a proper RAID1-based installation, I followed these instructions.

    So what does the YAML for this RAID setup look like?

    This is the storage section of my user-data file (brace yourself):

        - {ptable: gpt, serial: VBOX_HARDDISK_VB50546281-4e4a6c24, path: /dev/sda, preserve: false,
          name: '', grub_device: true, type: disk, id: disk-sda}
        - {ptable: gpt, serial: VBOX_HARDDISK_VB84e5a275-89a2a956, path: /dev/sdb, preserve: false,
          name: '', grub_device: true, type: disk, id: disk-sdb}
        - {device: disk-sda, size: 1048576, flag: bios_grub, number: 1, preserve: false,
          grub_device: false, type: partition, id: partition-0}
        - {device: disk-sdb, size: 1048576, flag: bios_grub, number: 1, preserve: false,
          grub_device: false, type: partition, id: partition-1}
        - {device: disk-sda, size: 524288000, wipe: superblock, flag: '', number: 2, preserve: false,
          grub_device: false, type: partition, id: partition-2}
        - {device: disk-sdb, size: 524288000, wipe: superblock, flag: '', number: 2, preserve: false,
          grub_device: false, type: partition, id: partition-3}
        - {device: disk-sda, size: 1073741824, wipe: superblock, flag: '', number: 3,
          preserve: false, grub_device: false, type: partition, id: partition-4}
        - {device: disk-sdb, size: 1073741824, wipe: superblock, flag: '', number: 3,
          preserve: false, grub_device: false, type: partition, id: partition-5}
        - {device: disk-sda, size: 9136242688, wipe: superblock, flag: '', number: 4,
          preserve: false, grub_device: false, type: partition, id: partition-6}
        - {device: disk-sdb, size: 9136242688, wipe: superblock, flag: '', number: 4,
          preserve: false, grub_device: false, type: partition, id: partition-7}
        - name: md0
          raidlevel: raid1
          devices: [partition-2, partition-3]
          spare_devices: []
          preserve: false
          type: raid
          id: raid-0
        - name: md1
          raidlevel: raid1
          devices: [partition-4, partition-5]
          spare_devices: []
          preserve: false
          type: raid
          id: raid-1
        - name: md2
          raidlevel: raid1
          devices: [partition-6, partition-7]
          spare_devices: []
          preserve: false
          type: raid
          id: raid-2
        - {fstype: ext4, volume: raid-0, preserve: false, type: format, id: format-0}
        - {fstype: swap, volume: raid-1, preserve: false, type: format, id: format-1}
        - {device: format-1, path: '', type: mount, id: mount-1}
        - {fstype: ext4, volume: raid-2, preserve: false, type: format, id: format-2}
        - {device: format-2, path: /, type: mount, id: mount-2}
        - {device: format-0, path: /boot, type: mount, id: mount-0}

    That's quite a long list of instructions to 'just' setup a RAID1. Although I understand all the steps involved, I'd never have come up with this by myself on short notice using just the documentation, so I think that the autoinstall-user-data file is a life saver.

    After the manual installation, creating a RAID1 mirror, I copied the configuration above into my own custom user-data YAML. Then I performed an unattended installation and it worked on the first try.

    So if you want to add LVM into the mix or make some other complex storage configuration, the easiest way to automate it, is to first do a manual install and then copy the relevant storage section from the autoinstall-user-data file to your custom user-data file.

    Example user-data file for download

    I've published a working user-data file that creates a RAID1 here.

    You can try it out by creating a virtual machine with two (virtual) hard drives. I'm assuming that you have a PXE boot environment setup.

    Obviously, you'll have to change the network settings for it to work.

    Generating user passwors

    If you do want to logon onto the console with the default user, you must generate a salt+password hash and copy/past that into the user-data file.

    I juse the 'mkpasswd' command for this like so:

    mkpasswd -m sha-512

    The mkpasswd utility is part of the 'whois' package.

    Closing words

    For those who are still on Ubuntu Server 18.04 LTS, there is no need for immediate action as this version is supported until 2023. Only support for new hardware has come to an end for the 18.04 LTS release.

    At some point, some time and effort will be required to migrate towards the new Autoinstall solution. Maybe this blogpost helps you with this transition.

    It took me a few evenings to master the new user-data solution, but the fact that a manual installation basically results in a perfect pre-baked user-data file3 is a tremendous help.

    I think I just miss the point of all this effort of revamping installers but maybe I'm not hurt by the limitations of the older existing solution. If you have any thoughts on this, feel free to let me know in the comments.

    1. I had to discover this information on StackExchange somewhere down below in this very good article after experiencing problems. 

    2. I tried 384 MB and it didn't finish, just got stuck. 

    3. /var/log/installer/autoinstall-user-data 

    Tagged as : Linux
  3. ZFS: Performance and Capacity Impact of Ashift=9 on 4K Sector Drives

    Thu 31 July 2014

    Update 2014-8-23: I was testing with ashift for my new NAS. The ashift=9 write performance deteriorated from 1.1 GB/s to 830 MB/s with just 16 TB of data on the pool. Also I noticed that resilvering was very slow. This is why I decided to abandon my 24 drive RAIDZ3 configuration.

    I'm aware that drives are faster at the outside of the platter and slower on the inside, but the performance deteriorated so dramatically that I did not wanted to continue further.

    My final setup will be a RAIDZ2 18 drive VDEV + RAIDZ2 6 drive VDEV which will give me 'only' 71 TiB of storage, but read performance is 2.6 GB/s and write performance is excellent at 1.9 GB/s. I've written about 40+ TiB to the array and after those 40 TiB, write performance was about 1.7 GB/s, so still very good and what I would expect as drives fill up.

    So actually, based on these results, I have learned not to deviate from the ZFS best practices too much. Use ashift=12 and put drives in VDEVS that adhere to the 2^n+parity rule.

    The uneven VDEVs (18 disk vs. 6 disks) are not according to best practice but ZFS is smart: it distributes data across the VDEVs based on their size. So they fill up equally.

    Choosing between ashift=9 and ashift=12 for 4K sector drives is not always a clear cut case. You have to choose between raw performance or storage capacity.

    My testplatform is Debian Wheezy with ZFS on Linux. I'm using a system with 24 x 4 TB drives in a RAIDZ3. The drives have a native sector size of 4K, and the array is formatted with ashift=12.

    First we create the array like this:

    zpool create storage -o ashift=12 raidz3 /dev/sd[abcdefghijklmnopqrstuvwx]

    Note: NEVER use /dev/sd? drive names for an array, this is just for testing, always use /dev/disk/by-id/ names.

    Then we run a simple sequential transfer benchmark with dd:

    root@nano:/storage# dd if=/dev/zero of=ashift12.bin bs=1M count=100000 
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 66.4922 s, 1.6 GB/s
    root@nano:/storage# dd if=ashift12.bin of=/dev/null bs=1M
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 42.0371 s, 2.5 GB/s

    This is quite impressive. With these speeds, you can saturate 10Gbe ethernet. But how much storage space do we get?

    df -h:

    Filesystem                            Size  Used Avail Use% Mounted on
    storage                                69T  512K   69T   1% /storage

    zfs list:

    storage  1.66M  68.4T   435K  /storage

    Only 68.4 TiB of storage? That's not good. There should be 24 drives minus 3 for parity is 21 x 3.6 TiB = 75 TiB of storage.

    So the performance is great, but somehow, we lost about 6 TiB of storage, more than a whole drive.

    So what happens if you create the same array with ashift=9?

    zpool create storage -o ashift=9 raidz3 /dev/sd[abcdefghijklmnopqrstuvwx]

    These are the benchmarks:

    root@nano:/storage# dd if=/dev/zero of=ashift9.bin bs=1M count=100000 
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 97.4231 s, 1.1 GB/s
    root@nano:/storage# dd if=ashift9.bin of=/dev/null bs=1M
    100000+0 records in
    100000+0 records out
    104857600000 bytes (105 GB) copied, 42.3805 s, 2.5 GB/s

    So we lose about a third of our write performance, but the read performance is not affected, probably by read-ahead caching but I'm not sure.

    With ashift=9, we do lose some write performance, but we can still saturate 10Gbe.

    Now look what happens to the available storage capacity:

    df -h:

    Filesystem                         Size  Used Avail Use% Mounted on
    storage                             74T   98G   74T   1% /storage

    zfs list:

    storage   271K  73.9T  89.8K  /storage

    Now we have a capacity of 74 TiB, so we just gained 5 TiB with ashift=9 over ashift=12, at the cost of some write performance.

    So if you really care about sequential write performance, ashift=12 is the better option. If storage capacity is more important, ashift=9 seems to be the best solution for 4K drives.

    The performance of ashift=9 on 4K drives is always described as 'horrible' but I think it's best to run your own benchmarks and decide for yourself.

    Caveat: I'm quite sure about the benchmark performance. I'm not 100% sure how reliable the reported free space is according to df -h or zfs list.

    Edit: I have added a bit of my own opinion on the results.

    Tagged as : ZFS Linux

Page 1 / 5