1. How to Setup a Local or Private Ubuntu Mirror

    Wed 18 January 2023

    Preface

    In this article I provide instructions on how to setup a local Ubuntu mirror using debmirror.

    Introduction

    By default, Ubuntu systems get their updates straight from the internet at archive.ubuntu.com. In an environment with lots of Ubuntu systems (servers and/or desktops) this can cause a lot of internet traffic as each system needs to download the same updates.

    In an environment like this, it would be more efficient if one system would download all Ubuntu updates just once and distribute them to the clients. In this case, updates are distributed using the local network, removing any strain on the internet link1.

    diagram

    We call such a system a local mirror and it's just a web server with sufficient storage to hold the Ubuntu archive (or part of it). A local mirror is especially relevant for sites with limited internet bandwidth, but there are some extra benefits.

    To sum up the main benefits:

    1. Reduced internet bandwidth usage
    2. Faster update proces using the local network (often faster than internet link)
    3. Update or install systems even during internet or upstream outage

    The main drawbacks of a local mirror are:

    1. An extra service to maintain and monitor
    2. Storage requirement: starts at 1TB
    3. Initial sync can take a long time depending on internet speed

    Mirror solutions

    Ubuntu mirror script

    This solution is geared towards ISPs or companies who like to run their own regional mirror. It is meant to mirror the entire, unfiltered Ubuntu package archive.

    As of 2023 you should expect 2.5TB for archive.ubuntu.com and also around 2.5 TB for ports.ubuntu.com (ARM/RISCV and others).

    This is a lot of storage and likely not what most environments need. Even so, if this is what you want to run you can consult this web page and use the script mentioned here.

    debmirror

    Based on my own research, it seems that the tool Debmirror is the most simple and straight-forward way to create a local Ubuntu mirror with a reasonable data footprint of about 480 GB (2023) for both Jammy AMD64 (22.04) and Focal AMD64 (20.04).

    Based on on your needs, you can further finetune Debmirror to only download the pacakges that you need for your environment.

    apt-cacher-ng

    The tool apt-cacher-ng acts as a caching proxy and only stores updates that are requested by clients. Missing or new updates are only downloaded once the first client requests this download, although there seem to be option to pre-download updates.

    Although I expect a significantly smaller footprint than debmirror, I could not find any information about actual real-life disk usage.

    Creating an Ubuntu mirror with debmirror

    Although apt-cacher-ng is quite a capable solution which many additional features, I feel that a simple mirror solution like debmirror is extremely simple to setup and maintain. This article will this focus on debmirror.

    Preparation

    1 - Computer

    First of all we need a computer - which can be either physical or virtual - that can act as the local mirror. I've used a Raspberry Pi 4B+ as a mirror with an external USB hard drive and it can saturate a local 1 Gbit network with ease.

    2 - 1TB storage capacity (minimum)

    I'm mirroring Ubuntu 22.04 and 20.04 for AMD64 architecture and that uses around 480 GB (2023). For ARM64, you should expect a similar storage footprint. There should be some space available for future growth so that's why I recommend to have at least 1 TB of space available.

    Aside from capacity, you should also think about the importance of redundancy: what if the mirror storage device dies and you have to redownload all data? Would this impact be worth the investment in redundancy / RAID?

    It might even be interesting to use a filesystem (layer) like ZFS or LVM that support snapshots to quickly restore the mirror to a known good state if there has been an issue with a recent sync.

    3 - Select a local public Ubuntu archive

    It's best to sync your local mirror with a public Ubuntu archive close to your physical location. This provides the best internet performance and you also reduce the strain on the global archive. Use the linked mirror list to pick the best mirror for your location.

    In my case, I used nl.archive.ubuntu.com as I'm based in The Netherlands.

    Ubuntu Mirror configuration

    01 - Add the storage device / volume to the fstab

    If you haven't done so already, make sure you create a directory as a mountpoint for the storage we will use for the mirror.

    In my case I've created the /mirror directory...

    mkdir /mirror
    

    ... and updated the fstab like this (example!):

    /dev/disk/by-uuid/154d28fb-83d0-4848-ac1d-da1420252422 /mirror xfs noatime 0 0
    

    I recommend using the by-uuid or by-id path for mounting the storage device as it's most stable and don't forget the use the correct filesystem (xfs/ext4).

    Now we can issue:

    mount /mirror
    

    02 - Install required software

    We need a webserver installed on the mirror to serve the deb packages to the clients. Installation is straightforward and no further configuration is required. In this example I'm using Apache2 but you can use any webserver you're comfortable with.

    If you want to synchronise with the upstream mirror using regular HTTP you don't need additional software.

    apt-get update
    apt install apache2 debmirror gnupg xz-utils
    

    I think that using rsync for synchronisation is more efficient and faster but you have to configure your firewall to allow outbound traffic to TCP port 873 (which is outside the scope of this tutorial)

    apt install rsync
    

    Tip: make sure you run debmirror on a 20.04 or 22.04 system as older versions don't support current ubuntu mirrors and some required files won't be downloaded.

    03 - Creating file paths

    I've created this directory structure to host my local mirror repos.

    /mirror/
    ├── debmirror
    │   ├── amd64
    │   │   ├── dists
    │   │   ├── pool
    │   │   └── project
    │   └── mirrorkeyring
    └── scripts
    
    
    mkdir /mirror/debmirror
    mkdir /mirror/debmirror/amd64
    mkdir /mirror/debmirror/mirrorkeyring
    mkdir /mirror/scripts
    

    The folders within the amd64 directory will be created by debmirror so they don't have to be created in advance.

    04 - install GPG keyring

    gpg --no-default-keyring --keyring /mirror/debmirror/mirrorkeyring/trustedkeys.gpg --import /usr/share/keyrings/ubuntu-archive-keyring.gpg
    

    05 - Create symlinks

    We need to create symlinks in the apache2 /var/www/html directory that point to our mirror like this:

    cd /var/www/html
    ln -s /mirror/debmirror/amd64 ubuntu
    

    06 - Configure debmirror

    Debmirror is just a command-line tool that takes a lot of arguments. If we want to run this tool daily to keep our local mirror in sync, it's best to use a wrapper script that can be called by cron.

    Such a wrapper script is provided by this page and I have included my own customised version here.

    You can download this script and place it in /mirror/scripts like this:

    cd /mirror/scripts
    wget https://louwrentius.com/files/debmirroramd64.sh.txt -O debmirroramd64.sh 
    chmod +x debmirroramd64.sh
    

    Now we need to edit this script and change some parameters to your specific requirements. The changes I've made as compared to the example are:

    export GNUPGHOME=/mirror/debmirror/mirrorkeyring
    release=focal,focal-security,focal-updates,focal-backports,jammy,jammy-security,jammy-updates,jammy-backports
    server=nl.archive.ubuntu.com
    proto=rsync
    outPath=/mirror/debmirror/amd64
    

    The Ubuntu installer ISO for 20.04 and 22.04 seem to require the -backports releases too so those are included.

    07 - Limiting bandwidth

    The script by default doesn't provide a way to limit rsync bandwidth usage. In my script, I've added some lines to make bandwidth limiting work as an option.

    A new variable is added that must be uncommented and can be set to the desired limit. In this case 1000 means 1000 Kilobytes per second.

    bwlimit=1000
    

    You also need to uncomment this line:

    --rsync-options "-aIL --partial --bwlimit=$bwlimit" \
    

    08 - Initial sync

    It may be advised not to first run the initial sync before we configure a periodic cron job to do a daily sync. The first sync can take a long time and may interfere with the cron job. It may be advised to only enable the cronjob once the initial sync is completed.

    As the initial sync can take a while, I like to run this job with screen. If you accidentally close the terminal, the rsync process isn't interrupted (although this isnot a big deal if that happens, it just continues where it left off).

    apt install screen
    screen /mirror/scripts/debmirroramd64.sh
    

    09 - Setup cron job

    When the initial sync is completed we can configure the cron job to sync periodically.

    0 1 * * * /mirror/scripts/debmirroramd64.sh
    

    In this case the sync runs daily at 1 AM.

    The mirror includes all security updates so depending on your environment, it's recommended to synchronise the mirror at least daily.

    10 - Client configuration

    All clients should point to your local mirror in their /etc/apt/sources.list file. You can use the IP-address of your mirror, but if you run a local DNS, it's not much effort to setup a DNS-record like mirror.your.domain and have all clients reconfigured to connect to the domain name.

    This is the /etc/apt/sources.list for the client

    deb http://mirror.your.domain/ubuntu RELEASE main restricted universe multiverse
    deb http://mirror.your.domain/ubuntu RELEASE-security main restricted universe multiverse
    deb http://mirror.your.domain/ubuntu RELEASE-updates main restricted universe multiverse
    

    The RELEASE value should be changed to the appropriate ubuntu release, like bionic, focal or jammy.

    If you have an environment with a lot of Ubuntu systems, this configuration is likely provisioned with tools like ansible.

    11 - Monitoring

    Although system monitoring is out-of-scope for this blog post, there are two topics to monitor:

    1. disk space usage (alert if space is running out)
    2. succesfull synchronisation script execution (alert if script fails)

    If you don't monitor the synchronisation process, the mirror will become out-dated and will lack the latest security updates.

    Closing words

    As many environments are either cloud-native or moving towars a cloud-environment, running a local mirror seems less and less relevant. Yet there may still be environments that could benefit from a local mirror setup. Maybe this instruction is helpful.


    1. You may notice that cloud provides actually also run their own Ubuntu archive mirror to reduce the load on their upstream and peering links. When you deploy a standard virtual machine based on Ubuntu, it is by default configured to use the local mirror. 

    Tagged as : Linux
  2. I Resurrected My Dutch Movie Review Site From 2003

    Thu 09 June 2022

    Introduction

    Between 2003 and 2006, I ran a Dutch movie review site called moevie.nl.1 I built the site and wrote the reviews. It never made any money. It cost me money to host, and it cost me a lot of time writing reviews, but I remember enjoying writing reviews about films I liked.

    The gimmick of the site was that the reviews had two parts. The first part is spoiler-free, just giving a recommendation with some context to make up your own mind. The second part contained a reflection of the movie, which included spoilers.

    moevie

    Even back then, the site didn't win any design awards (from archive.org - click to enlarge)2

    I started building the site a few months after finishing college (IT) in 2002 as I felt inept and had little confidence. Building something tangible felt like a good way to build up and demonstrate skills. And I had something to say about movies.

    Although moevie.nl did not help me gain employment as far as I know, it was fun while it lasted. At some point, I didn't get much joy out of writing movie reviews and I let the site die.

    I did keep backups of the database the code and the pictures though. And now after 18+ years I decided to resurrect the site, including all (old) reviews.

    Why resurrect a dead website gone for 16+ years?

    Rebuilding the site was just a way to spend time, a small hobby project. Something to be bussy with. The second reason is some kind of misplaced nostalgia. I sometimes regret shutting down the site, wondering what could have been if I persevered.

    Losing and regaining the domain

    Back in 2006, my hosting provider (non-profit with just a few servers) abruptly stopped operating due to hardware failure 3 and I was forced to move my domain to another company. At that time, private citizens in The Netherlands could not register an .nl domain, only businesses could, so that was a bit of a hassle.

    Not long thereafter however, I decided to let the domain expire. It was quickly scooped up by 'domain resellers'. Years later I decided that I wanted moevie.nl back, but the sellers always asked insane amounts of money.

    In 2019, I visited moevie.nl on a whim. To my surprise it didn't resolve anymore, the domain was available! I quickly scooped it up, but I didn't do much with it for a long time, until now.

    Rebuilding the site

    I really wanted to preserve the aesthetic of moevie.nl as it was back then. Especially in the context of modern web design, it does stand out. As a sore thumb - but still - I had a goal.

    Having the code and database dump is one thing, but it doesn't tell you what it actually looked like in 2003-2006. I could have tried to get the old (PHP4) code working, but I just didn't feel like it.

    Instead, I chose to visit Archive.org and indeed, it captured old snapshots of my site back in 2006. So those were of great help. The screenshots at the top of this blog post are lifted from this page on archive.org. This snapshot was taken just before I decided to close the site.

    The challenge of mobile device screens

    To set the stage a bit: the rise and fall of moevie.nl happened a year before the iPhone was first announced. Smartphones from Blackberry were popular. I had a Palm VX PDA and later a HP Compaq PDA.

    Most people didn't have mobile data connections so as far as I know, the mobile web really wasn't a thing yet.

    So moevie.nl was primarily developed for the desktop. When I thought I was finished rebuilding the site, I quickly discovered that the site was unusable on my iPhone and way too small and finicky to use on my iPad.

    For somebody who has no experience with modern web development, it was quite a steep learning-curve discovering how to deal with the various screen sizes in CSS4.

    A very large part of the entire effort of rebuilding the site was spend on making the site workable on all different device sizes. Fortunately, iOS device simulators were of great help on that front.

    Technology

    I've recreated moevie.nl with Python and Django. For the database, I chose Postgresql, although that is total overkill, I could have used SQLite without any issues.

    I chose Django because I'm quite familiar with Python so that was a straight-forward choice. I selected Postgresql mostly just to regain some knowledge about it.

    Hosting

    I'm self-hosting moevie.nl on the same Raspbery Pi4 that is hosting this blog. This Raspberry Pi is powered by the sun.

    So moevie.nl is solar-powered during the day and battery-powered during the night.

    Closing words

    I'm not sure if I really want to start writing movie reviews again, knowing full well how much effort it takes. Also I'm not sure I have anything to say about movies anymore, but we'll see.

    The overall experience of rebuilding the site was frustrating at times due to the severe lack of experience and knowledge. Now that the site is done and working, even on mobile devices, that feels good.


    1. The name is based on the phonetic pronunciation in Dutch of the English word 'movie'. 

    2. sorry for the language but I could not find a better screenshot. 

    3. I was neglecting the site at that time due to losing motivation. 

    4. I admit I only tested with iOS devices so Android-based smartphones could experience issues. 

    Tagged as : web
  3. An Ode to the 10,000 RPM Western Digital (Veloci)Raptor

    Sat 30 October 2021

    Introduction

    Back in 2004, I visited a now bankrupt Dutch computer store called MyCom1, located at the Kinkerstraat in Amsterdam. I was there to buy a Western Digital Raptor model WD740, with 74 GB of capacity, running at 10,000 RPM.

    mywd

    When I bought this drive, we were still in the middle of the transition from the PATA interface to SATA2. My raptor hard drive still had a molex connector because older computer power supplies didn't have SATA power connectors.

    olds

    You may notice that I eventually managed to break off the plastic tab of the SATA power connector. Fortunately, I could still power the drive through the Molex connector.

    A later version of the same drive came with the Molex connector disabled, as you can see below.

    news

    Why did the Raptor matter so much?

    I was very eager to get this drive as it was quite a bit faster than any consumer drive on the market at that time.

    This drive not only made your computer start up faster, but it made it much more responsive. At least, it really felt like that to me at the time.

    The faster spinning drive wasn't so much about more throughput in MB/s - although that improved too - it was all about reduced latency.

    A drive that spins faster3 can complete more I/O operations per second or IOPs4. It can do more work in the same amount of time, because each operation takes less time, compared to slower turning drives.

    The Raptor - mostly focussed on desktop applications5 - brought a lot of relief for professionals and consumer enthusiasts alike. Hard disk performance, notably latency, was one of the big performance bottlenecks at the time.

    For the vast majority of consumers or employees this bottleneck would start to be alleviated only well after 2010 when SSDs slowly started to become standard in new computers.

    And that's mostly also the point of SSDs: their I/O operations are measured in micro seconds instead of milliseconds. It's not that throughput (MB/s) doesn't matter, but for most interactive applications, you care about latency. That's what makes an old computer feel as new when you swap out the hard drive for an SSD.

    The Raptor as a boot drive

    For consumers and enthusiast, the Raptor was an amazing boot drive. The 74 GB model was large enough to hold the operating system and applications. The bulk of the data would still be stored on a second hard drive either also connected through SATA or even still through PATA.

    Running your computer with a Raptor for the boot drive, resulted in lower boot times and application load times. But most of all, the system felt more responsive.

    And despite the 10,000 RPM speed of the platters, it wasn't that much louder than regular drives at the time.7.

    In the video above, a Raspberry Pi4 boots from a 74 GB Raptor hard drive.

    Alternatives to the raptor at that time

    To put things into perspective, 10,000 RPM drives were quite common even in 2003/2004 for usage in servers. The server-oriented drives used the SCSI interface/protocol which was incompatible with the on-board IDE/SATA controllers.

    Some enthusiasts - who had the means to do so - did buy both the controller8 and one or more SCSI 'server' drives to increase the performance of their computer. They could even get 15,000 RPM hard drives! These drives however, were extremely loud and had even less capacity.

    The Raptor did perform remarkably well in almost all circumstances, especially those who mattered to consumers and consumer enthusiasts alike. Suddenly you could get SCSI/Server performance for consumer prices.

    The in-depth review of the WD740 by Techreport really shows how significant the raptor was.

    The Velociraptor

    The Raptor eventually got replaced with the Velociraptor. The Velociraptor had a 2.5" formfactor, but it was much thicker than a regular 2.5" laptop drive. Because it spun at 10,000 RPM, the drive would get hot and thus it was mounted in an 'icepack' to disipate the generated heat. This gave the Velociraptor a 3.5" formfactor, just like the older Raptor drives.

    velociraptor

    In the video below, a Raspberry Pi4 boots from a 500 GB Velociraptor hard drive.

    Benchmarking the (Veloci)raptor

    Hard drives do well with sequential read/write patterns, but their performance implodes when the data access pattern becomes random. This is due to the mechanical nature of the device. That random access pattern is where 10,000 RPM outperform their slower turning siblings.

    Random 4K read performance showing both IOPs and latency. This is kind of a worst-case benchmark to understand the raw I/O and latency performance of a drive.

    fios

    Drive ID Form Factor RPM Size (GB) Description
    ST9500423AS 2.5" 7200 500 Seagate laptop hard drive
    WD740GD-75FLA1 3.5" 10,000 74 Western Digital Raptor WD740
    SAMSUNG HD103UJ 3.5" 7200 1000 Samsung Spintpoint F1
    WDC WD5000HHTZ 2.5" in 3.5" 10,000 500 Western Digital Velociraptor
    ST2000DM008 3.5" 7200 2000 Seagate 3.5" 2TB drive
    MB1000GCWCV 3.5" 7200 1000 HP Branded Seagate 1 TB drive

    I've tested the drives on an IBM M1015 SATA RAID card flashed to IT mode (HBA mode, no RAID firmware). The image is generated with fio-plot, which also comes with a tool to run the fio benchmarks.

    It is quite clear that both 10,000 RPM drives outperform all 7200 rpm drives, as expected.

    If we compare the original 3.5" Raptor to the 2.5" Velociraptor, the performance increase is significant: 22% more IOPs and 18% lower latency. I think that performance increase is due to a combination of the higher data density, the smaller size (r/w head is faster in the spot it needs to be) and maybe better firmware.

    Both the laptop and desktop Seagate drives seem to be a bit slower than they should be based on theory. The opposite is true for the HP (rebranded Seagate), which seem to perform better than expected for the capacity and rotational speed. I have no idea why that is. I can only speculate that because the HP drive came out of a server, that the fireware was tuned for server usage patterns.

    Closing words

    Although the performance increase of the (veloci)raptor was quite significant, it never gained wide-spread adoption. Especially when the Raptor first came to marked, its primary role was that of a boot drive because of its small capacity. You still needed a second drive for your data. So the increase in performance came at a significant extra cost.

    The Raptor and Velociraptor are now obsolete. You can get a solid state drive for $20 to $40 and even those budget-oriented SSDs will outperform a (Veloci)raptor many times over.

    If you are interested in more pictures and details, take a look at this article.

    This article was discussed on Hacker News here.

    Reddit thread about this article can be found here


    1. Mycom, a chain store with quite a few shops in all major cities in The Netherlands, went bankrupt twice, once in 2015 and finally in 2019. 

    2. We are talking about the first SATA version, with a maximum bandwidth capacity of 150 MB/s. Plenty enough for hard drives at that time. 

    3. https://en.wikipedia.org/wiki/Hard_disk_drive_performance_characteristics 

    4. https://louwrentius.com/understanding-storage-performance-iops-and-latency.html 

    5. I read that WD intended the first Raptor (34 GB version) to be used in low-end servers as a cheaper alternative to SCSI drives . After the adoption of the Raptor by computer enthusiasts and professionals, it seems that Western Digital pivoted, so the next version - the 74 GB I have - was geared more towards desktop usage. That also meant that this 74 GB model got fluid bearings, making it quieter6

    6. The 74 GB model is actually rather quiet drive at idle. Drive activity sounds rather smooth and pleasant, no rattling. 

    7. Please note that the first model, the 37 GB version, used ball bearings in stead of fluid bearings, and was reported to be significant louder. 

    8. Low-end SCSI card were often used to power flatbed scanners, Iomega ZIP drives, tape drives or other peripherals, but in order to benefit from the performance of those server hard drives, you needed a SCSI controller supporting higher bandwidth and those were more expensive. 

    Tagged as : Storage

Page 1 / 72