1. Benchmarking Cheap SSDs for Fun, No Profit (Be Warned)

    Sun 26 March 2023

    The price of Solid-state drives (SSDs) has dropped significantly over the last few years. It's now possible to buy a 1TB solid-state drive for less than €60. However, at such low price points, there is a catch.

    Although cheap SSDs do perform fine regarding reads, sustained write performance can be really atrocious. To demonstrate this concept, I bought a bunch of the cheapest SATA SSDs I could find - as listed below - and benchmarked them with Fio.

    Model Capacity Price
    ADATA Ultimate SU650 240 GB € 15,99
    PNY CS900 120 GB € 14,56
    Kingston A400 120 GB € 20,85
    Verbatim Vi550 S3 128 GB € 14,99

    I didn't have the budget to buy a bunch of 1TB of 2TB SSD, so these ultra-cheap, low capacity SSDs are a bit of a stand-in. I've also added a Crucial MX500 1TB (CT1000MX500SSD1) SATA1 SSD - which I already owned - to the benchmarks to see how well those small-capacity SSDs stack up to a cheap SSD with a much larger capacity.

    Understanding SSD write performance

    To understand the benchmark results a bit better, we discuss some SSD concepts in this section. Feel free to skip to the actual benchmarks if you're already familiar with them.

    SLC Cache

    SSDs originally used single-level cell (SLC) flash memory, which can hold a single bit and is the fastest and most reliable flash memory available. Unfortunately, it's also the most expensive. To reduce cost, multi-level cell (MLC) flash was invented, which can hold two bits instead of one, at the cost of speed and longevity2. This is even more so for triple-level cell (TLC) and quad-level cell (QLC) flash memory. All 'cheap' SSDs I benchmark use 3D v-nand3 TLC flash memory.

    One technique to temporarily boost SSD performance is to use a (small) portion of (in our case) TLC flash memory as if it was SLC memory. This SLC memory then acts as a fast write cache4. When the SSD is idle, data is moved from the SLC cache to the TLC flash memory in the background. However, this process is limited by the speed of the 'slower' TLC flash memory and can take a while to complete.

    While this trick with SLC memory works well for brief, intermittent write loads, sustained write loads will fill up the SLC cache and cause a significant drop in performance as the SSD is forced to write data into slower TLC memory.

    DRAM cache

    As flash memory has a limited lifespan and can only take a limited number of writes, a wear-leveling mechanism is used to distribute writes over all cells evenly, regardless of where data is written logically. Keeping track of this mapping between logical and physical 'locations' can be sped up with a DRAM cache (chip) as DRAM tend to be faster than flash memory. In addition, the DRAM can also be used to cache writes, improving performance. Cheap SSDs don't use DRAM cache chips to reduce cost, thus they have to update their data mapping tables in flash memory, which is slower. This can also impact (sustained) write performance. To be frank, I'm not sure how much a lack of DRAM impacts our benchmarks.

    Benchmark method

    Before I started benchmarking I submitted a trim command to clear each drive. Next, I performed a sequential write benchmark of the entire SSD with a block size of 1 megabyte and a queue depth of 32. The benchmark is performed on the 'raw' device, no filesystem is used. I used Fio for these benchmarks.

    Benchmark results

    The chart below shows write bandwith over time for all tested SSDs. Each drive has been benchmarked in full, but the data is truncated to the first 400 seconds for readability (performance didn't change). The raw Fio benchmark data can be found here (.tgz).

    chart

    click for a larger image

    It's funny to me that some cheap SSDs initially perform way better than the more expensive Crucial 1TB SSD5. As soon as their SLC cache runs out, the Crucial 1TB has the last laugh as it shows best sustained throughput, beating all cheaper drives, but the Kingston A400 comes close.

    Of all the cheap SSDs only the Kingston shows the best sustained write speed at around 100 MB/s and there are no intermittent drops in performance. The ADATA, PNY and Verbatim SSDs show flakey behaviour and basically terrible sustained write performance. But make no mistake, I would not call the performance of the Kingston SSD, nor the Crucial SSD - added as a reference - 'good' by any definition of that word. Even the Kingston can't saturate gigabit Ethernet.

    The bandwidth alone doesn't tell the whole story. The latency or responsiveness of the SSDs is also significantly impacted:

    chart

    click for a larger image

    The Crucial 1TB SSD shows best latency overall, followed by the Kingston SSD. The rest of the cheap SSDs show quite high latency spikes and very high latency overall, even when some of the spikes settle, like for the ADATA SSD. When latency is measured in seconds, things are bad.

    To put things a bit in perspective, let's compare these results to a Toshiba 8 TB 7200 RPM hard drive I had lying around.

    chart

    click for a larger image

    The hard drive shows better write throughput and latency6 as compared to most of the tested SSDs. Yes, except for the initial few minutes where the cheap SSDs tend to be faster (except for the Kingston & Crucial SSDs) but how much does that matter?

    As we've shown the performance of a hard drive to contrast the terrible write performance of the cheap SSDs, it's time to also compare them to a more expensive, higher-tier SSD.

    chart

    click for a larger image

    I've bought this Samsung SSD in 2019 for €137 euro, so that's quite a different price point. I think the graph speaks for itself, especially if you consider that this graph is not truncated, this is the full drive write.

    Evaluation & conclusion

    One of the funnier conclusions to draw is that it's beter to use a hard drive than to use cheap SSDs if you need to ingest a lot of data. Even the Crucial 1TB SSD could not keep up with the HDD.

    A more interesting conclusion is that the 1TB SSD didn't perform that much better than the small cheaper SSDs. Or to put it differently: although the performance of the small, cheap SSDs is not representative of the larger SSD, it is still quite in the same ball park. I don't think it's a coincidence that the Kingston SSD came very close to the performance of the Crucial SSD, as it's the most 'expensive' of the cheap drives.

    In the end, my intend was to demonstrate with actual benchmarks how cheap SSDs show bad sustained write performance and I think I succeeded. I hope it helps people to understand that good SSD write performance is not a given, especially for cheaper drives.

    The Hacker News discussion of this blog post can be found here

    Disclaimer

    I'm not sponsored in any way. All mentioned products have been bought with my own money.

    The graphs are created with fio-plot, a tool I've made and maintain. The benchmarks have been performed with bench-fio, a tool included with fio-plot, to automate benchmarking with Fio.


    1. As I don't have a test system with NVMe, I had to use SATA-based SSDs. The fact that the SATA interface was not the limiting factor in any of the tests, is foreboding. 

    2. As a general note, I think the vast majority of users should not worry about SSD longevity in general. Only people with high-volume write workloads should keep an eye on write endurance of SSD and buy a suitable product. 

    3. instead of packing the bits really dense together in a cell horizontally, the bits are stacked vertically, saving horizontal space. This allows for higher data densities in the same footprint. 

    4. Some SSDs have a static SLC cache, but others size the SLC cache in accordance to how full an SSD is. When the SSD starts to fill up, the SLC cache size is reduced. 

    5. After around 45-50 minutes of testing, performance of the Crucial MX 500 also started to drop to around 40 MB/s and fluctuate up and down. Evidence

    6. it's so funny to me that a hard drive beats an SSD on latency. 

    Tagged as : storage
  2. How to Setup a Local or Private Ubuntu Mirror

    Wed 18 January 2023

    Preface

    In this article I provide instructions on how to setup a local Ubuntu mirror using debmirror. To set expectations: the mirror will work as intended and distribute packages and updates, but a do-release upgrade from one major version of Ubuntu to the next won't work.

    Introduction

    By default, Ubuntu systems get their updates straight from the internet at archive.ubuntu.com. In an environment with lots of Ubuntu systems (servers and/or desktops) this can cause a lot of internet traffic as each system needs to download the same updates.

    In an environment like this, it would be more efficient if one system would download all Ubuntu updates just once and distribute them to the clients. In this case, updates are distributed using the local network, removing any strain on the internet link1.

    diagram

    We call such a system a local mirror and it's just a web server with sufficient storage to hold the Ubuntu archive (or part of it). A local mirror is especially relevant for sites with limited internet bandwidth, but there are some extra benefits.

    To sum up the main benefits:

    1. Reduced internet bandwidth usage
    2. Faster update proces using the local network (often faster than internet link)
    3. Update or install systems even during internet or upstream outage

    The main drawbacks of a local mirror are:

    1. An extra service to maintain and monitor
    2. Storage requirement: starts at 1TB
    3. Initial sync can take a long time depending on internet speed

    Mirror solutions

    Ubuntu mirror script

    This solution is geared towards ISPs or companies who like to run their own regional mirror. It is meant to mirror the entire, unfiltered Ubuntu package archive.

    As of 2023 you should expect 2.5TB for archive.ubuntu.com and also around 2.5 TB for ports.ubuntu.com (ARM/RISCV and others).

    This is a lot of storage and likely not what most environments need. Even so, if this is what you want to run you can consult this web page and use the script mentioned here.

    debmirror

    Based on my own research, it seems that the tool Debmirror is the most simple and straight-forward way to create a local Ubuntu mirror with a reasonable data footprint of about 480 GB (2023) for both Jammy AMD64 (22.04) and Focal AMD64 (20.04).

    Based on on your needs, you can further finetune Debmirror to only download the pacakges that you need for your environment.

    apt-cacher-ng

    The tool apt-cacher-ng acts as a caching proxy and only stores updates that are requested by clients. Missing or new updates are only downloaded once the first client requests this download, although there seem to be option to pre-download updates.

    Although I expect a significantly smaller footprint than debmirror, I could not find any information about actual real-life disk usage.

    Creating an Ubuntu mirror with debmirror

    Although apt-cacher-ng is quite a capable solution which many additional features, I feel that a simple mirror solution like debmirror is extremely simple to setup and maintain. This article will this focus on debmirror.

    Preparation

    1 - Computer

    First of all we need a computer - which can be either physical or virtual - that can act as the local mirror. I've used a Raspberry Pi 4B+ as a mirror with an external USB hard drive and it can saturate a local 1 Gbit network with ease.

    2 - 1TB storage capacity (minimum)

    I'm mirroring Ubuntu 22.04 and 20.04 for AMD64 architecture and that uses around 480 GB (2023). For ARM64, you should expect a similar storage footprint. There should be some space available for future growth so that's why I recommend to have at least 1 TB of space available.

    Aside from capacity, you should also think about the importance of redundancy: what if the mirror storage device dies and you have to redownload all data? Would this impact be worth the investment in redundancy / RAID?

    It might even be interesting to use a filesystem (layer) like ZFS or LVM that support snapshots to quickly restore the mirror to a known good state if there has been an issue with a recent sync.

    3 - Select a local public Ubuntu archive

    It's best to sync your local mirror with a public Ubuntu archive close to your physical location. This provides the best internet performance and you also reduce the strain on the global archive. Use the linked mirror list to pick the best mirror for your location.

    In my case, I used nl.archive.ubuntu.com as I'm based in The Netherlands.

    Ubuntu Mirror configuration

    01 - Add the storage device / volume to the fstab

    If you haven't done so already, make sure you create a directory as a mountpoint for the storage we will use for the mirror.

    In my case I've created the /mirror directory...

    mkdir /mirror
    

    ... and updated the fstab like this (example!):

    /dev/disk/by-uuid/154d28fb-83d0-4848-ac1d-da1420252422 /mirror xfs noatime 0 0
    

    I recommend using the by-uuid or by-id path for mounting the storage device as it's most stable and don't forget the use the correct filesystem (xfs/ext4).

    Now we can issue:

    mount /mirror
    

    02 - Install required software

    We need a webserver installed on the mirror to serve the deb packages to the clients. Installation is straightforward and no further configuration is required. In this example I'm using Apache2 but you can use any webserver you're comfortable with.

    If you want to synchronise with the upstream mirror using regular HTTP you don't need additional software.

    apt-get update
    apt install apache2 debmirror gnupg xz-utils
    

    I think that using rsync for synchronisation is more efficient and faster but you have to configure your firewall to allow outbound traffic to TCP port 873 (which is outside the scope of this tutorial)

    apt install rsync
    

    Tip: make sure you run debmirror on a 20.04 or 22.04 system as older versions don't support current ubuntu mirrors and some required files won't be downloaded.

    03 - Creating file paths

    I've created this directory structure to host my local mirror repos.

    /mirror/
    ├── debmirror
    │   ├── amd64
    │   │   ├── dists
    │   │   ├── pool
    │   │   └── project
    │   └── mirrorkeyring
    └── scripts
    
    
    mkdir /mirror/debmirror
    mkdir /mirror/debmirror/amd64
    mkdir /mirror/debmirror/mirrorkeyring
    mkdir /mirror/scripts
    

    The folders within the amd64 directory will be created by debmirror so they don't have to be created in advance.

    04 - install GPG keyring

    gpg --no-default-keyring --keyring /mirror/debmirror/mirrorkeyring/trustedkeys.gpg --import /usr/share/keyrings/ubuntu-archive-keyring.gpg
    

    05 - Create symlinks

    We need to create symlinks in the apache2 /var/www/html directory that point to our mirror like this:

    cd /var/www/html
    ln -s /mirror/debmirror/amd64 ubuntu
    

    06 - Configure debmirror

    Debmirror is just a command-line tool that takes a lot of arguments. If we want to run this tool daily to keep our local mirror in sync, it's best to use a wrapper script that can be called by cron.

    Such a wrapper script is provided by this page and I have included my own customised version here.

    You can download this script and place it in /mirror/scripts like this:

    cd /mirror/scripts
    wget https://louwrentius.com/files/debmirroramd64.sh.txt -O debmirroramd64.sh 
    chmod +x debmirroramd64.sh
    

    Now we need to edit this script and change some parameters to your specific requirements. The changes I've made as compared to the example are:

    export GNUPGHOME=/mirror/debmirror/mirrorkeyring
    release=focal,focal-security,focal-updates,focal-backports,jammy,jammy-security,jammy-updates,jammy-backports
    server=nl.archive.ubuntu.com
    proto=rsync
    outPath=/mirror/debmirror/amd64
    

    The Ubuntu installer ISO for 20.04 and 22.04 seem to require the -backports releases too so those are included.

    Limitations I've not been able (yet) to make the do-release-upgrade process work to upgrade a system from focal to jammy. I found this old resource but those instructions don't seem to work for me.

    07 - Limiting bandwidth

    The script by default doesn't provide a way to limit rsync bandwidth usage. In my script, I've added some lines to make bandwidth limiting work as an option.

    A new variable is added that must be uncommented and can be set to the desired limit. In this case 1000 means 1000 Kilobytes per second.

    bwlimit=1000
    

    You also need to uncomment this line:

    --rsync-options "-aIL --partial --bwlimit=$bwlimit" \
    

    08 - Initial sync

    It may be advised not to first run the initial sync before we configure a periodic cron job to do a daily sync. The first sync can take a long time and may interfere with the cron job. It may be advised to only enable the cronjob once the initial sync is completed.

    As the initial sync can take a while, I like to run this job with screen. If you accidentally close the terminal, the rsync process isn't interrupted (although this isnot a big deal if that happens, it just continues where it left off).

    apt install screen
    screen /mirror/scripts/debmirroramd64.sh
    

    09 - Setup cron job

    When the initial sync is completed we can configure the cron job to sync periodically.

    0 1 * * * /mirror/scripts/debmirroramd64.sh
    

    In this case the sync runs daily at 1 AM.

    The mirror includes all security updates so depending on your environment, it's recommended to synchronise the mirror at least daily.

    10 - Client configuration

    All clients should point to your local mirror in their /etc/apt/sources.list file. You can use the IP-address of your mirror, but if you run a local DNS, it's not much effort to setup a DNS-record like mirror.your.domain and have all clients reconfigured to connect to the domain name.

    This is the /etc/apt/sources.list for the client

    deb http://mirror.your.domain/ubuntu RELEASE main restricted universe multiverse
    deb http://mirror.your.domain/ubuntu RELEASE-security main restricted universe multiverse
    deb http://mirror.your.domain/ubuntu RELEASE-updates main restricted universe multiverse
    

    The RELEASE value should be changed to the appropriate ubuntu release, like bionic, focal or jammy.

    If you have an environment with a lot of Ubuntu systems, this configuration is likely provisioned with tools like ansible.

    11 - Monitoring

    Although system monitoring is out-of-scope for this blog post, there are two topics to monitor:

    1. disk space usage (alert if space is running out)
    2. succesfull synchronisation script execution (alert if script fails)

    If you don't monitor the synchronisation process, the mirror will become out-dated and will lack the latest security updates.

    Closing words

    As many environments are either cloud-native or moving towars a cloud-environment, running a local mirror seems less and less relevant. Yet there may still be environments that could benefit from a local mirror setup. Maybe this instruction is helpful.


    1. You may notice that cloud provides actually also run their own Ubuntu archive mirror to reduce the load on their upstream and peering links. When you deploy a standard virtual machine based on Ubuntu, it is by default configured to use the local mirror. 

    Tagged as : Linux
  3. I Resurrected My Dutch Movie Review Site From 2003

    Thu 09 June 2022

    Introduction

    Between 2003 and 2006, I ran a Dutch movie review site called moevie.nl.1 I built the site and wrote the reviews. It never made any money. It cost me money to host, and it cost me a lot of time writing reviews, but I remember enjoying writing reviews about films I liked.

    The gimmick of the site was that the reviews had two parts. The first part is spoiler-free, just giving a recommendation with some context to make up your own mind. The second part contained a reflection of the movie, which included spoilers.

    moevie

    Even back then, the site didn't win any design awards (from archive.org - click to enlarge)2

    I started building the site a few months after finishing college (IT) in 2002 as I felt inept and had little confidence. Building something tangible felt like a good way to build up and demonstrate skills. And I had something to say about movies.

    Although moevie.nl did not help me gain employment as far as I know, it was fun while it lasted. At some point, I didn't get much joy out of writing movie reviews and I let the site die.

    I did keep backups of the database the code and the pictures though. And now after 18+ years I decided to resurrect the site, including all (old) reviews.

    Why resurrect a dead website gone for 16+ years?

    Rebuilding the site was just a way to spend time, a small hobby project. Something to be bussy with. The second reason is some kind of misplaced nostalgia. I sometimes regret shutting down the site, wondering what could have been if I persevered.

    Losing and regaining the domain

    Back in 2006, my hosting provider (non-profit with just a few servers) abruptly stopped operating due to hardware failure 3 and I was forced to move my domain to another company. At that time, private citizens in The Netherlands could not register an .nl domain, only businesses could, so that was a bit of a hassle.

    Not long thereafter however, I decided to let the domain expire. It was quickly scooped up by 'domain resellers'. Years later I decided that I wanted moevie.nl back, but the sellers always asked insane amounts of money.

    In 2019, I visited moevie.nl on a whim. To my surprise it didn't resolve anymore, the domain was available! I quickly scooped it up, but I didn't do much with it for a long time, until now.

    Rebuilding the site

    I really wanted to preserve the aesthetic of moevie.nl as it was back then. Especially in the context of modern web design, it does stand out. As a sore thumb - but still - I had a goal.

    Having the code and database dump is one thing, but it doesn't tell you what it actually looked like in 2003-2006. I could have tried to get the old (PHP4) code working, but I just didn't feel like it.

    Instead, I chose to visit Archive.org and indeed, it captured old snapshots of my site back in 2006. So those were of great help. The screenshots at the top of this blog post are lifted from this page on archive.org. This snapshot was taken just before I decided to close the site.

    The challenge of mobile device screens

    To set the stage a bit: the rise and fall of moevie.nl happened a year before the iPhone was first announced. Smartphones from Blackberry were popular. I had a Palm VX PDA and later a HP Compaq PDA.

    Most people didn't have mobile data connections so as far as I know, the mobile web really wasn't a thing yet.

    So moevie.nl was primarily developed for the desktop. When I thought I was finished rebuilding the site, I quickly discovered that the site was unusable on my iPhone and way too small and finicky to use on my iPad.

    For somebody who has no experience with modern web development, it was quite a steep learning-curve discovering how to deal with the various screen sizes in CSS4.

    A very large part of the entire effort of rebuilding the site was spend on making the site workable on all different device sizes. Fortunately, iOS device simulators were of great help on that front.

    Technology

    I've recreated moevie.nl with Python and Django. For the database, I chose Postgresql, although that is total overkill, I could have used SQLite without any issues.

    I chose Django because I'm quite familiar with Python so that was a straight-forward choice. I selected Postgresql mostly just to regain some knowledge about it.

    Hosting

    I'm self-hosting moevie.nl on the same Raspbery Pi4 that is hosting this blog. This Raspberry Pi is powered by the sun.

    So moevie.nl is solar-powered during the day and battery-powered during the night.

    Closing words

    I'm not sure if I really want to start writing movie reviews again, knowing full well how much effort it takes. Also I'm not sure I have anything to say about movies anymore, but we'll see.

    The overall experience of rebuilding the site was frustrating at times due to the severe lack of experience and knowledge. Now that the site is done and working, even on mobile devices, that feels good.


    1. The name is based on the phonetic pronunciation in Dutch of the English word 'movie'. 

    2. sorry for the language but I could not find a better screenshot. 

    3. I was neglecting the site at that time due to losing motivation. 

    4. I admit I only tested with iOS devices so Android-based smartphones could experience issues. 

    Tagged as : web

Page 1 / 72