1. Understanding the Ubuntu 20.04 LTS Server Autoinstaller

    Thu 11 February 2021

    Introduction

    Ubuntu Server version 18.04 LTS uses the debian-installer (d-i) for the installation process. This includes support for 'preseeding' to create unattended (automated) installations of ubuntu.

    d-i

    the debian installer

    With the introduction of Ubuntu Server 20.04 'Focal Fossa' LTS, back in April 2020, Canonical decided that the new 'subiquity server installer' was ready to take it's place.

    After the new installer gained support for unattended installation, it was considered ready for release. The unattended installer feature is called 'Autoinstallation'.

    I mostly run Ubuntu 18.04 LTS installations but I decided in February 2021 that I should get more acquainted with 20.04 LTS, especially when I discovered that preseeding would no longer work.

    In this article I assume that the reader is familiar with PXE-based unattended installations.

    Why this new installer?

    Canonical's desire to unify the codebase for Ubuntu Desktop and Server installations seems to be the main driver for this change.

    From my personal perspective, there aren't any new features that benefit my use-cases, but that could be different for others. It's not a ding on the new Autoinstaller, it's just how I look at it.

    There is one conceptual difference between the new installer and preseeding. A preseed file must answer all questions that the installer needs answered. It will switch to interactive mode if a question is not answered, breaking the unattended installation process. From my experience, there is a bit of trial-and-error getting the preseed configuration right.

    The new Subiquity installer users defaults for all installation steps. This means that you can fully automate the installation proces with just a few lines of YAML. You do't need an answer for each step.

    The new installer as other features such as the ability to SSH into an installer session. It works by generating an at-hoc random password on the screen / console which you can use to logon remotely over SSH. I have not used it yet as I never found it necessary.

    Documentation is a bit fragmented

    As I was trying to learn more about the new Autoinstaller, I noticed that there isn't a central location with all relevant (links to) documentation. It took a bit of searching and following links, to collect a set of useful information sources, which I share below.

    Link Description
    Reference Manual Reference for each particular option of the user-data YAML
    Introduction An overview of the new installer with examples
    Autoinstall quick start Example of booting a VM using the installer using KVM
    Netbooting the installer A brief instruction on how to setup PXE + TFTP with dnsmasq in order to PXE boot the new installer
    Call for testing Topic in which people give feedback on their experience with the installer (still active as of February 2021) with a lot of responses
    Stack Exchange post Detailed tutorial with some experiences.
    Medium Article Contains an example and some experiences.
    github examples Github repo with about twenty examples of more complex configurations

    The reference documentation only supports some default use-cases for the unattended installation process. You won't be able to build more complex configuration such as RAID-based installations using this reference.

    Under-the-hood, the installer uses curtin. The linked documentation can help you further build complex installations, such as those which use RAID.

    I think the curtin syntax is a bit tedious and fortunately it is probably not required to learn curtis and piece together more complex configurations by hand. There is a nice quality-of-life feature that takes care of this.

    More on that later.

    How does the new installer work?

    With a regular PXE-based installation, we use the 'netboot' installer, which consists of a Linux kernel and an initrd image (containing the actual installer).

    This package is about 64 Megabytes for Ubuntu 18.04 LTS and it is all you need, assuming that you have already setup a DHCP + TFTP + HTTP environment for a PXE-based installation.

    The new Subiquity installer for Ubuntu 20.04 LTS deprecates this 'netboot' installer. It is not provided anymore. Instead, you have to boot a 'live installer' ISO file which is about 1.1 GB in size.

    The process looks like this:

    1. Download the Live installer ISO
    2. Mount the iso to acquire the 'vmlinuz' and 'initrd' files for the TFTP root
    3. Update your PXE menu (if any) with a stanza like this:
    LABEL focal2004-preseed
                    MENU LABEL Focal 20.04 LTS x64 Manual Install
                    KERNEL linux/ubuntu/focal/vmlinuz
                    INITRD linux/ubuntu/focal/initrd
                    APPEND root=/dev/ram0 ramdisk_size=1500000 ip=dhcp url=http://10.10.11.1/ubuntu-20.04.1-live-server-amd64.iso
    

    This process is documented here with detailed how-to steps and commands.

    We have not yet discussed the actual automation part, but first we must address a caveat.

    The new installer requires 3GB of RAM when PXE-booting

    Although it is not explicitly documented1, the new mechanism of PXE-booting the new Ubuntu installer using the Live ISO requires a minimum of 3072 MB of memory. And I assure you, 3000 MB is not enough.

    It seems that the ISO file is copied into memory over the network and extracted on the RAM disk. With a RAM disk of 1500 MB and an ISO file of 1100 MB we are left with maybe 472 MB of RAM for the running kernel and initrd image.

    To put this into perspective: I could perform an unattended install of Ubuntu 18.04 LTS with only 512 MB of RAM2.

    Due to this 'new' installation process, Ubuntu Server 20.04 has much more demanding minimum system requirements than Windows 2019 Server, which is fine with 'only' 512 MB of RAM, even during installation. I have to admit I find this observation a bit funny.

    It seems that this 3 GB memory requirement for the installation process is purely and solely because of the new installation process. Obviously Ubuntu 20.04 can run in a smaller memory footprint once installed.

    Under the hood, a tool called 'casper' is used to bootstrap the installation process and that tool only supports a local file system (in this case on a RAM disk). On paper, casper does support installations using NFS or CIFS but that is not supported nor tested. From what I read, some people tried to use it but it didn't work out.

    As I undertand it, the curent status is that you can't install Ubuntu Server 20.04 LTS on any hardware with less than 3GB of memory using PXE-boot. This probably affects older and less potent hardware, but I can remember a time that this was actually part of the point of running Linux.

    It just feels wrong conceptually, that a PXE-based server installer requires 3GB of memory.


    Important

    The 3GB memory requirement is only valid for PXE-based installations. If you boot from ISO / USB stick you can install Ubuntu Server on a system with less memory. I've verified this with a system with only 1 GB of memory.


    The Autoinstall configuration

    Now let's go back to the actual automation part of Autoinstaller. If we would want to automate our installation, our PXE menu item should be expanded like this:

    APPEND root=/dev/ram0 ramdisk_size=1500000 ip=dhcp url=http://10.10.11.1/ubuntu-20.04.1-live-server-amd64.iso autoinstall ds=nocloud-net;s=http://10.10.11.1/preseed/cloud-init/
    

    In this case, the cloud-init folder - as exposed through an HTTP-server - must contain two files:

    • meta-data
    • user-data

    The meta-data file contains just one line:

    instance-id: focal-autoinstall
    

    The user-data file is the equivalent of a preseed file, but it is YAML-based instead of just regular plain text.

    Minimum working configuration

    According to the documentation, this is a minimum viable configuration for the user-data file:

    #cloud-config
    autoinstall:
      version: 1
      identity:
        hostname: ubuntu-server
        password: "$6$exDY1mhS4KUYCE/2$zmn9ToZwTKLhCw.b4/b.ZRTIZM30JZ4QrOQ2aOXJ8yk96xpcCof0kxKwuX1kqLG/ygbJ1f8wxED22bTL4F46P0"
        username: ubuntu
    

    This works fine and performs a basic install + apt upgrade of the new system with a disk configuration based on an LVM layout.

    My preferred minimum configuration:

    Personally, I like to keep the unattended installation as simple as possible. I use Ansible to do the actual system configuration, so the unattended installation process only has to setup a minimum viable configuration.

    I like to configure the following parameters:

    • Inject a SSH public key into the authorized_keys file for Ansible
    • Configure the Apt settings to specify which repository to use during installation (I run a local debian/ubuntu mirror)
    • Update to the latest packages during installation

    I'll give examples of the required YAML to achieve this with the new installer.

    Injecting a public SSH key for the default user:

    ssh:
        authorized-keys: |
           ssh-rsa <PUB KEY>
        install-server: true
        allow-pw: no
    

    Notice that we also disable password-authentication for SSH access.

    Configure APT during installation

    I used this configuration to specify a particular mirror for both the installation process and for the system itself post-installation.

    Mirror:
      mirror: "http://mirror.mynetwork.loc"
    apt:
      preserve_sources_list: false
      primary:
        - arches: [amd64]
          uri: "http://mirror.mynetwork.loc/ubuntu"
    

    Performing an apt upgrade

    By default, the installer seems to install security updates but it doesn't install the latest version of softare. This is a deviation from the d-i installer which always ends up with a fully up-to-date system, when done.

    The same end-result can be accomplished by running an apt update and apt upgrade at the end of the installation process.

    late-commands:
    - curtin in-target --target=/target -- apt update           
    - curtin in-target --target=/target -- apt upgrade -y
    

    So all in all, this is not a big deal.

    Network configuration

    The network section can be configured using regular Netplan syntax. An example:

    network:
      version: 2
      renderer: networkd
      ethernets:
        enp0s3:
          dhcp4: no
          addresses:
            - 10.10.50.200/24
          gateway4: 10.10.50.1
          nameservers:
            search:
              - mynetwork.loc
            addresses:
              - 10.10.50.53
              - 10.10.51.53
    

    Storage configuration

    The installer only supports a 'direct' or 'lvm' layout. It also selects the largest drive in the system as the boot drive, for installation.

    If you want to setup anything more complex as you may want to setup RAID or a specific partion layout, you need to use the curtin syntax.

    It is not immediately clear how to setup a RAID configuration based on the available documentation.

    Fortunately, the new installer supports creating a RAID configuration or custom partition layout if you perform a manual install.

    It turns out that when the manual installation is done you can find the cloud-init user-data YAML for this particular configuration in the following file:

        /var/log/installer/autoinstall-user-data
    

    I think this is extremely convenient.

    So to build a proper RAID1-based installation, I followed these instructions.

    So what does the YAML for this RAID setup look like?

    This is the storage section of my user-data file (brace yourself):

    storage:
        config:
        - {ptable: gpt, serial: VBOX_HARDDISK_VB50546281-4e4a6c24, path: /dev/sda, preserve: false,
          name: '', grub_device: true, type: disk, id: disk-sda}
        - {ptable: gpt, serial: VBOX_HARDDISK_VB84e5a275-89a2a956, path: /dev/sdb, preserve: false,
          name: '', grub_device: true, type: disk, id: disk-sdb}
        - {device: disk-sda, size: 1048576, flag: bios_grub, number: 1, preserve: false,
          grub_device: false, type: partition, id: partition-0}
        - {device: disk-sdb, size: 1048576, flag: bios_grub, number: 1, preserve: false,
          grub_device: false, type: partition, id: partition-1}
        - {device: disk-sda, size: 524288000, wipe: superblock, flag: '', number: 2, preserve: false,
          grub_device: false, type: partition, id: partition-2}
        - {device: disk-sdb, size: 524288000, wipe: superblock, flag: '', number: 2, preserve: false,
          grub_device: false, type: partition, id: partition-3}
        - {device: disk-sda, size: 1073741824, wipe: superblock, flag: '', number: 3,
          preserve: false, grub_device: false, type: partition, id: partition-4}
        - {device: disk-sdb, size: 1073741824, wipe: superblock, flag: '', number: 3,
          preserve: false, grub_device: false, type: partition, id: partition-5}
        - {device: disk-sda, size: 9136242688, wipe: superblock, flag: '', number: 4,
          preserve: false, grub_device: false, type: partition, id: partition-6}
        - {device: disk-sdb, size: 9136242688, wipe: superblock, flag: '', number: 4,
          preserve: false, grub_device: false, type: partition, id: partition-7}
        - name: md0
          raidlevel: raid1
          devices: [partition-2, partition-3]
          spare_devices: []
          preserve: false
          type: raid
          id: raid-0
        - name: md1
          raidlevel: raid1
          devices: [partition-4, partition-5]
          spare_devices: []
          preserve: false
          type: raid
          id: raid-1
        - name: md2
          raidlevel: raid1
          devices: [partition-6, partition-7]
          spare_devices: []
          preserve: false
          type: raid
          id: raid-2
        - {fstype: ext4, volume: raid-0, preserve: false, type: format, id: format-0}
        - {fstype: swap, volume: raid-1, preserve: false, type: format, id: format-1}
        - {device: format-1, path: '', type: mount, id: mount-1}
        - {fstype: ext4, volume: raid-2, preserve: false, type: format, id: format-2}
        - {device: format-2, path: /, type: mount, id: mount-2}
        - {device: format-0, path: /boot, type: mount, id: mount-0}
    

    That's quite a long list of instructions to 'just' setup a RAID1. Although I understand all the steps involved, I'd never have come up with this by myself on short notice using just the documentation, so I think that the autoinstall-user-data file is a life saver.

    After the manual installation, creating a RAID1 mirror, I copied the configuration above into my own custom user-data YAML. Then I performed an unattended installation and it worked on the first try.

    So if you want to add LVM into the mix or make some other complex storage configuration, the easiest way to automate it, is to first do a manual install and then copy the relevant storage section from the autoinstall-user-data file to your custom user-data file.

    Example user-data file for download

    I've published a working user-data file that creates a RAID1 here.

    You can try it out by creating a virtual machine with two (virtual) hard drives. I'm assuming that you have a PXE boot environment setup.

    Obviously, you'll have to change the network settings for it to work.

    Generating user passwors

    If you do want to logon onto the console with the default user, you must generate a salt+password hash and copy/past that into the user-data file.

    I juse the 'mkpasswd' command for this like so:

    mkpasswd -m sha-512
    

    The mkpasswd utility is part of the 'whois' package.

    Closing words

    For those who are still on Ubuntu Server 18.04 LTS, there is no need for immediate action as this version is supported until 2023. Only support for new hardware has come to an end for the 18.04 LTS release.

    At some point, some time and effort will be required to migrate towards the new Autoinstall solution. Maybe this blogpost helps you with this transition.

    It took me a few evenings to master the new user-data solution, but the fact that a manual installation basically results in a perfect pre-baked user-data file3 is a tremendous help.

    I think I just miss the point of all this effort of revamping installers but maybe I'm not hurt by the limitations of the older existing solution. If you have any thoughts on this, feel free to let me know in the comments.


    1. I had to discover this information on StackExchange somewhere down below in this very good article after experiencing problems. 

    2. I tried 384 MB and it didn't finish, just got stuck. 

    3. /var/log/installer/autoinstall-user-data 

    Tagged as : Linux
  2. A 12.48 Inch (1304x984) Three-Color E-Paper Display by Waveshare

    Tue 22 December 2020

    Introduction

    I'm running a solar-powered blog and I wanted to add a low-power display to show the daily solar 'harvest'1 and maybe some additional information.

    So I decided to use an e-paper display. I wanted a display that would be readable from a distance, so bigger would be better. I therefore chose the Waveshare 12.48 inch e-paper display2.

    "e-paper display"

    an example based on data from the summer, generated with Graphite

    This particular display costs $179 excluding taxes and shipping at the time this article was written.

    Specifications

    Waveshare sells a two-color (black and white) and a three-color version (black, white, red). I bought the three-color version. The three-color version is the (B) model.

    Specifications:

    Screen size     :      12.48 inches
    Resolution      :      1304 x 984
    Colors          :      black, white, red
    Greyscale       :      2 levels
    Refresh rate    :      16 seconds
    Partial refresh :      Not supported
    Interfaces      :      Raspberry Pi, ESP32, STM32, Arduino
    

    The two-color variant of this display has a refresh rate of 8 seconds.

    This display is clearly quite slow. Furthermore, the lack of partial refresh support could make this display unsuitable for some applications. I was OK with this slow refresh rate.

    The image below demonstrates different fonts and sizes. I think DejaVuSansMono-Bold looks really well on the display, better than the font supplied by Waveshare.

    "e-paper display"

    The Interfaces

    The display includes a microcontroller that in turn can be driven through one of four interfaces:

    1. A Raspberry Pi (Worked)
    2. An ESP32 (Not tested)
    3. An Arduino (Didn't work)
    4. An STM32 (Not tested)

    I've tried the Arduino header with an Arduino Uno, but the supplied demo code didn't work. I did not investigate further why this was the case. It could be a problem with voltage regulation.

    "e-paper display"

    In the image above, the black plastic backplate is removed.

    Image quality

    These e-paper displays are mostly sold as product information displays for supermarkets and other businesses. However, the quality is good enough to display images. Especially the support for red can make an image stand out.

    Below is an example of an image that incorporates the third (red) color.

    "e-paper display"

    The display seems suitable to display art.

    "e-paper display"

    It looks quite good in real life (sorry for the glare).

    How the display shows three colors

    The display acts like it is actually two displays in one. A black and white display, and a red and white display.

    First, the black and white image is drawn. Next, the red and white image is put on top.

    Because the display has to draw two images in succession, it takes 16 seconds to refresh the screen. This explains why the black-and-white version of this screen does a refresh in eight seconds: it doesn't have to refresh the red color.

    Please note that the entire process of displaying content on the screen takes much longer.

    A demonstration:

    Displaying an image is cumbersome (On Raspberry Pi 3B+)

    At the time this article was written, I could not find any information or tools for this display3.

    Many Waveshare e-paper displays are popular and have a decent community support. However, it seems that this display is rather unknown.

    Therefore, it seems that there are no tools available to display an arbitrary image on this display. You can use the example Python code to display an image but you have to follow these steps:

    1. Create a black-and-white version of the image
    2. Create a red-and-white version of the image, that contains only data for the red parts of the image
    3. If the source image doesn't match the required resolution, you have to resize, crop and fill the image where appropriate.

    Both 'black' and 'red' images need to exactly match the resolution of the display (1304x984) or the library will abort with an error.

    As I found this proces tedious, I automated it.

    A new tool to make displaying an image easy

    I've used the python library as supplied by Waveshare and created a command-line tool (Github) on top of it to perform all the required steps as described in the previous section. I'm using Imagemagick for all the image processing.

    The script works like this:

    ./display -i <image file> [--rotate 90] [--fuzz 35] [--color yellow]
    

    The --fuzz and --color parameters may require some clarification.

    The color red is extracted from an image but it's not always perfect. By applying the --fuzz parameter (the argument is a percentage), it is possible to capture more of the red (or selected color) of an image.

    The --color option specifies which color should be 'converted' to red. By default this color is 'red' (obviously). The 'solar chart' (at the start of this article) is an example where a yellow line was converted to red.

    Very slow: it takes about 55 seconds to display an image using the Raspberry Pi 3B+. Half of that minute is spend converting the images to the appropriate format using Imagemagick.

    Informational: The Python program includes a modified version of the Waveshare Python library. This library has been altered to prevent double conversion of images, which significantly degrades image quality.

    Slow performance

    If you use the provided Python library (Python3 compatible) it takes about 30+ seconds to draw an image on the screen. (This excludes the image processing performed with the 'display' tool.)

    Further testing showed that the Python library converts and dithers the image before it is sent to the display. And it does so for both black and red. Dithering is performed by looping in Python over every of the 1.3 milion pixels.

    Each of these loops (for black and red) take about 10 seconds on the Raspberry Pi 3B+, which explains why it takes so long to update the display. Therefore, I think the combination of Python + the Raspberry Pi 3B+ is not ideal in this case.

    Evaluation

    I wanted to share my experience with this display to make other people aware of its existence. The tool I created should make it simple to get up and running and display an image.

    It clearly has some drawbacks but due to the size, resolution and third color, it seems to be unique and may therefore be interesting.

    Although I never tried the display with an ESP32, I think its an ideal for the purpose of a low-power picture frame.

    This article was discussed on Hacker News (briefly). This resulted in about 9000 unique visitors for this article.

    Appendix A - Remark on other displays

    Please note: Waveshare also sells a smaller 10.3 inch black-and-white e-paper display for a similar price with some significant benefits:

    Screen size     :      10.3 inches
    Resolution      :      1872 x 1404
    Colors          :      black and white
    Greyscale       :      16 levels
    Refresh rate    :      450 miliseconds
    Partial refresh :      Supported
    

    This particular display is smaller but has a higher resolution, supports 16 grayscale levels and updates in half a second. This display may better suit your particular needs. For example, I believe that this display may have been used in this project, a solar-powered digital photo frame.

    Appendix B - How to make the display work on a Raspberry Pi

    This information is straight from the Waveshare site but I include it for completeness and ease of use.

    PART I: Enable the SPI interface with raspi-config

    1. sudo raspi-config
    2. select Interfacing Options
    3. Select SPI
    4. Select Yes
    5. Reboot the Raspberry Pi

    PART II: Install the required libraries

    Install BCM283

    Website

    wget http://www.airspayce.com/mikem/bcm2835/bcm2835-1.60.tar.gz
    tar zxvf bcm2835-1.60.tar.gz 
    cd bcm2835-1.60/
    sudo ./configure
    sudo make
    sudo make check
    sudo make install
    

    Install wiringPi

    website

    sudo apt-get install wiringpi
    cd /tmp
    wget https://project-downloads.drogon.net/wiringpi-latest.deb
    sudo dpkg -i wiringpi-latest.deb
    gpio -v
    

    Caution: The library seems to be deprecated. The Raspberry Pi 4 is supported but future versions of the Raspberry Pi may not.

    The wiringPi library is used as part of a compiled library called "DEV_Config.so" as found in the ./lib directory.

    @raspberrypi:~/epaper_display $ ldd lib/DEV_Config.so 
        linux-vdso.so.1 (0x7ee0d000)
        /usr/lib/arm-linux-gnueabihf/libarmmem-${PLATFORM}.so => /usr/lib/arm-linux-gnueabihf/libarmmem-v7l.so (0x76f1e000)
        libwiringPi.so => /usr/lib/libwiringPi.so (0x76f00000)
        libm.so.6 => /lib/arm-linux-gnueabihf/libm.so.6 (0x76e7e000)
        libc.so.6 => /lib/arm-linux-gnueabihf/libc.so.6 (0x76d30000)
        libpthread.so.0 => /lib/arm-linux-gnueabihf/libpthread.so.0 (0x76d06000)
        librt.so.1 => /lib/arm-linux-gnueabihf/librt.so.1 (0x76cef000)
        libcrypt.so.1 => /lib/arm-linux-gnueabihf/libcrypt.so.1 (0x76caf000)
        /lib/ld-linux-armhf.so.3 (0x76f46000)
    

    Install Python3 and required libraries

    sudo apt-get update
    sudo apt-get install python3-pip
    sudo apt-get install python3-pil
    sudo pip3 install RPi.GPIO
    sudo pip3 install spidev
    

    Apendix C - E-paper hacking

    I see myself as a consumer and I don't have any desire to hack the display for lower refresh-rates or partial refresh support, with the risk of damaging the display in the process.

    However one resource about this topic I find very informative is a video from the Youtube Channel "Applied Science" (By Ben Krasnow), called "E-paper hacking: fastest possible refresh rate".

    Apendix D - Available libraries

    Example code for all supported platforms can be found in this github location.

    I also found this github repository that may support this display. This code (also) didn't work for me on my Arduino uno. This could be due to a voltage mismatch, but I'm not willing to solder and potentially destroy the display.

    Apendix E - Links to other e-paper projects

    Very large and expensive display (Medium paywall)

    E-paper calendar

    Solar-powered digital photo frame


    1. During fall and winter, there is almost no power generation due to the very sub-optimal location of my balcony. 

    2. You can go larger but at a cost

    3. My google skills may be at fault but that point is now moot. 

    Tagged as : Hardware
  3. Most Technical Debt Is Just Bullshit

    Fri 25 September 2020

    Introduction

    I made an offhand remark about technical debt to a friend and he interrupted me, saying: "technical debt is just bullshit". In his experience, people talking about technical debt were mostly trying to:

    • cover up bad code
    • cover up unfinished work

    mess

    source1

    Calling these issues 'technical debt' seems to be a tactic of distancing oneself from these problems. A nice way of avoiding responsibility. To sweep things under the rug.

    Intrigued, I decided to take a better look at the metaphor of techical debt, to better understand what is actually meant.

    Tip: this article on Medium by David Vandegrift also tackles this topic.

    A definition of technical debt

    Right off the bat, I realised that my own understanding of technical debt was wrong. Most people seem to understand technical debt as:

    "cut a corner now, to capture short-term business value (taking on debt), and clean up later (repaying the debt)".

    I think that's wrong.

    Ward Cunningham, who coined the metaphor of technical debt, wrote:

    You know, if you want to be able to go into debt that way by developing software that you don't completely understand, you are wise to make that software reflect your understanding as best as you can, so that when it does come time to refactor, it's clear what you were thinking when you wrote it, making it easier to refactor it into what your current thinking is now.

    In some sense, this reads to me as a form of prototyping. To try out and test design/architecture to see if it fits the problem space at hand. But it also incorporates the willingness to spend extra time in the future to change the code to better reflect the current understanding of the problem at hand.

    ... if we failed to make our program align with what we then understood to be the proper way to think about our financial objects, then we were gonna continually stumble over that disagreement and that would slow us down which was like paying interest on a loan.

    The misalignment of the design/architecture and the problem domain creates a bottleneck, slowing down future development.

    So I think it's clearly not about taking shortcuts for a short-term business gain.

    It is more a constant reinvestment in the future. It may temporary halt feature work, but it should result in more functionality and features in the long run. It doesn't seem short-term focussed at all to me. And you need to write 'clean' code and do your best because it is likely that you will have to rewrite parts of it.

    These two articles by Ron Jeffries already discuss this in great detail.

    A logical error

    Reading up on the topic, I noticed something very peculiar. Somehow along the way, everything that hinders software development has become 'technical debt'.

    Anything that creates a bottleneck, is suddenly put into the basket of technical debt. I started to get a strong sense that a lot of people are somehow making a logical fallacy.

    If you have technical debt, you'll experience friction when trying to ignore it and just plow ahead. The technical debt creates a bottleneck.

    But then people reason the wrong way around: I notice a bottleneck in my software development process, so we have 'technical debt'.

    However, because technical debt creates a bottleneck, it doesn't follow that every bottleneck is thus technical debt.

    I think it's this flawed reasoning that turns every perceived obstacle into technical debt2.

    Maybe I'm creating a straw man argument, but I think I have some examples that show that people are thinking the wrong way around.

    If we look at the wikipedia page about technical debt, there is a long list of possible causes of technical debt.

    To site some examples:

    • Insufficient up-front definition
    • Lack of clear requirements before the start of development
    • Lack of documentation
    • Lack of a test suite
    • Lack of collaboration / knowledge sharing
    • Lack of knowledge/skills resulting in bad or suboptimal code
    • Poor technical leadership
    • Last minute specification changes

    Notice that these issues are called 'technical debt' because they can have a similar outcome as technical debt. They can create a bottleneck.

    But why the hell would we call these issues technical debt?

    These issues are self-explanatory. Calling them technical debt not only seems inappropriate, it just obfuscates the cause of these problems and it doesn't provide any new insight. Even in conversations with laypeople.

    A mess is not a Technical Debt

    A blogpost by Uncle Bob with the same title3 also hits on this issue that a lot of issues are incorrectly labeled as 'technical debt'.

    Unfortunately there is another situation that is sometimes called “technical debt” but that is neither reasoned nor wise. A mess.

    ...

    A mess is not a technical debt. A mess is just a mess. Technical debt decisions are made based on real project constraints. They are risky, but they can be beneficial. The decision to make a mess is never rational, is always based on laziness and unprofessionalism, and has no chance of paying of in the future. A mess is always a loss.

    Cunningham's definition of technical debt shows that it's a very conscious and deliberate process. Creating a mess isn't. It's totally inappropriate to call that technical debt. It's just a mess.

    I think that nicely relates back to that earlier list from wikipedia. Just call things out for what they actually are.

    Is quibbling over 'technical debt' as a metaphor missing the point?

    In this blogpost, Martin Fowler addresses the blogpost by Uncle Bob and argues that technical debt as a metaphor is (still) very valuable when communicating with non-technical people.

    He even introduces a quadrant:

    RecklessPrudent
    Deliberate"We don't have time for design""We must ship now and deal with consequences (later)"
    inadvertent"What's Layering?""Now we know how we should have done it"

    This quadrant makes me extremely suspicious. Because in this quadrant, everything is technical debt. He just invents different flavours of technical debt. It's never not technical debt. It's technical debt all the way down.

    It seems to me that Martin Fowler twists the metaphor of technical debt into something that can never be falsified, like psychoanalysis.

    It's not 'bad code', a 'design flaw' or 'a mess', it's 'inadvertent & reckless technical debt'. What is really more descriptive of the problem?

    Maybe it's just my lack of understanding, but I fail to see why it is in any way helpful to call every kind of bottleneck 'technical debt'. I again fail to see how this conveys any meaning.

    In the end, what Fowler does is just pointing out that bottlenecks in software development can be due to the four stages of competence.

    IncompetenceCompetence
    Concious"We don't have time for design""We must ship now and deal with consequences (later)"
    Unconscious"What's Layering?""Now we know how we should have done it"

    I don't think we need new metaphors for things we (even laypeople) already understand.

    Does technical debt (even) exists?

    The HFT Guy goes as far as to argue that technical debt doesn't really exists, it isn't a 'real' concept.

    After decades of software engineering, I came to the professional conclusion that technical debt doesn’t exist.

    His argument boils down to the idea that what people call technical debt is actually mostly maintenance.

    So reincorporating a better understanding of the problem at hand into the code (design) is seen as an integral and natural part of software development, illustrated by the substitute metaphor of mining (alternating between digging and reinforcing). At least that's how I understand it.

    Substituting one metaphor with another, how useful is that really? But in this case it's at least less generic and more precise.

    Closing words

    Although Cunningham meant well, I think the metaphor of technical debt started to take on a life of its own. To a point where code that doesn't conform to some Platonic ideal is called technical debt4.

    Every mistake, every changing requirement, every tradeoff that becomes a bottleneck within the development process is labeled 'technical debt'. I don't think that this is constructive.

    I think my friend was right: the concept of technical debt has become bullshit. It doesn't convey any better insight or meaning. On the contrary, it seems to obfuscate the true cause of a bottleneck.

    At this point, when people talk about technical debt, I would be very sceptical and would want more details. Technical debt doesn't actually explain why we are where we are. It has become a hollow, hand-wavy 'explanation'.

    With all due respect to Cunningham, because the concept is so widely misunderstood and abused, it may be better to retire it.


    1. I discovered this image in this blogpost

    2. if you are not working on a new feature, you are working on technical debt. 

    3. I think that Uncle Bob's definition of technical debt in this article is not correct. He also defines it basically as cutting corners for short-term gain. 

    4. See again Martin Fowlers article about technical debt. 

    Tagged as : None

Page 2 / 70