1. Systemd Forward Secure Sealing of System Logs Makes Little Sense

    November 22, 2014

    Systemd is a more modern replacement of sysvinit and its in the process of being integrated into most mainstream Linux distributions. I'm a bit troubled by one of it's features.

    I'd like to discuss the Forward Secure Sealing (FSS) feature for log files that is part of systemd. FSS cryptographically signs the local system logs, so you can check if log files have been altered. This should make it more difficult for an attacker to hide his or her tracks.

    Regarding log files, an attacker can do two things:

    1. delete them
    2. alter them (remove / change incriminating lines)

    The FSS feature does not prevent any of these risks. But it does help you detect that there is something fishy going on if you would verify the signatures regularly. So basically FSS acts a bit like Tripwire.

    FSS can only tell you wether or not a log file has been changed. It cannot tell you anything else. More specifically, it cannot tell you the reason why. So I wonder how valuable this feature is.

    There is also something else. Signing (sealing) a log file is done every 15 minutes by default. This gives an attacker ample time to alter or delete the most recent log events, often exactly those events that need to be altered/deleted. Even lowering this number to 10 seconds would allow an attacker to delete (some) initial activities using automation. So how useful is this?

    What may help in determining what happened to a system is the unaltered log contents themselves. What FSS cannot do by principle is protect the actual contents of the log file. If you want to preserve log events the only secure option is to send them to an external log host (assumed not accessible by an attacker).

    However, to my surprise, FSS is presented as an alternative to external logging. Quote from Lennart Poettering:

    Traditionally this problem has been dealt with by having an external secured log server 
    to instantly log to, or even a local line printer directly connected to the log system. 
    But these solutions are more complex to set up, require external infrastructure and have 
    certain scalability problems. With FSS we now have a simple alternative that works without 
    any external infrastructure.
    

    This quote is quite troubling because it fails to acknowledge one of the raison d'être of external log hosts. It seems to suggest that FSS provides an alternative for external logging, where in fact it does not and cannot do so on principle. It can never address the fact that an attacker can alter or delete logs, whereas external logging can mitigate this risk.

    It seems to me that systemd now also wants to play the role as some crude intrusion detection system. It feels a bit like scope creep to me.

    Personally I just wonder what more useful features could have been implemented instead of allowing you to transfer a log file verification key using a QR code to your smartphone (What the hell?).

    This whole observation is not original, in the comments of the systemd author's blogpost, the same argument is made by Andrew Wyatt (two years earlier). The response from the systemd author was to block him. (see the comments of Lennart Poettering's blogpost I linked to earlier).

    Update: Andrew Wyatt behaved a bit immature towards Lennart Poettering at first so I understand some resentment from his side, but Andrews criticism was valid and never addressed by him.

    If the systemd author would just have implemented sending log events to an external log server, that would have been way more useful security-wise, I think. Until then, this may do...

    Tagged as : Logging
  2. Getting the Sitecom AC600 Wi-Fi Adapter Running on Linux

    November 01, 2014

    TL;DR Yes it works with some modifications of the driver source.

    A USB Wi-Fi adapter I used with a Raspberry Pi broke as I dropped it on the floor, so I had to replace it. I just went to a local shop and bought the Sitecom AC600 adapter as that's what they had available (with support for 5Ghz networking).

    I had some hope that I would just plug it in and it would 'just work™'. But no. Linux. In the end, the device cost me 30 euro's including taxes, but the time spend to get it to work may have made this a very expensive USB Wi-Fi dongle. And it's funny to think about the fact that the Wi-Fi dongle is almost the same price as the Raspberry Pi board itself.

    But I did get it working and I'd like to show you how.

    It started with a google for 'sitecom ac600 linux' which landed me on this page. This page told me the device uses a MediaTek chipset (MT7610U).

    So you need to download the driver from MediaTek. Here is a direct link

    So you may do something like this:

    cd /usr/src
    wget http://s3.amazonaws.com/mtk.cfs/Downloads/linux/mt7610u_wifi_sta_v3002_dpo_20130916.tar.bz2
    tar xjf mt7610u_wifi_sta_v3002_dpo_20130916.tar.bz2
    cd mt7610u_wifi_sta_v3002_dpo_20130916
    

    Now you would hope that it's just like this:

    make
    make install
    

    And we're happy right? Linux FTW! Well, NO! We're using Linux so we have to work for stuff that works right out of the box on Windows and Mac OS.

    So we first start with editing "include/os/rt_linux.h" and go to line ~279. There we make sure that we edit the struct like this:

        typedef struct _OS_FS_INFO_
     {
        kuid_t              fsuid;
        kgid_t              fsgid;
        mm_segment_t    fs;
     } OS_FS_INFO;
    

    Basically, the words int are replaced by kuid_t and kgid_t, or else, compilation will abort with an error.

    Ofcourse, the Sitecom AC600 has an USB identifier that is unknown to the driver, so after compilation, it still doesn't work.

    lsusb output:

    Bus 001 Device 004: ID 0df6:0075 Sitecom Europe B.V.
    

    So google landed me on this nice thread by 'praseodym' that explained the remaining steps. I stole the info below from this thread.

    So while we are in the source directory of the module, we are going to edit "common/rtusb_dev_id.c" and add

    {USB_DEVICE(0x0DF6,0x0075)}, /* MT7610U */
    

    So this will make the AC600 gets recognised by the driver. Now we also need to edit "os/linux/confik.mk" and change these lines like this:

    HAS_WPA_SUPPLICANT=y
    HAS_NATIVE_WPA_SUPPLICANT_SUPPORT=y
    

    So no, we are still not ready yet. I'm not 100 percent sure that this is required anymore, but I found this nice thread in Italian and a very small comment by 'shoe rat' tucked away at the end that may make the difference between a working device or not.

    We need to edit the file "os/linux/config.mk" and go to line ~663. Then, around that line, change

    CHIPSET_DAT = 2860
    

    to:

    CHIPSET_DAT = 2870
    

    Yes. Finally! Now you can do:

    make
    make install
    

    Imagine that such a 'make' takes about 20 minutes on a Raspbery Pi. No joke.

    Now you can either do this:

    modprobe mt7650u_sta
    

    You should see something like this:

    root@raspberrypi:/usr/src# lsmod
    Module                  Size  Used by
    snd_bcm2835            16181  0 
    snd_pcm                63684  1 snd_bcm2835
    snd_page_alloc          3604  1 snd_pcm
    snd_seq                43926  0 
    snd_seq_device          4981  1 snd_seq
    snd_timer              15936  2 snd_pcm,snd_seq
    snd                    44915  5 snd_bcm2835,snd_timer,snd_pcm,snd_seq,snd_seq_device
    soundcore               4827  1 snd
    mt7650u_sta           895786  1 
    pl2303                  7951  0 
    usbserial              19536  1 pl2303
    

    You should be able to see a 'ra0' device when entering ifconfig -a or iwconfig and just configure it like any wireless device (out-of-scope).

    So once up-and-running, the Sitecom AC600 works fine under Linux and even sees and connects to 5 GHz networks. But not without a caveat of-course. I needed to configure a 5 GHz channel below 100 (I chose 48) on my Apple Airport Extreme, or the Wi-Fi dongle would not see the 5GHz network and would not be able to connect to it.

    So I hope somebody else is helped by this information.

    Tagged as : Wi-Fi
  3. The ZFS Event Daemon on Linux

    August 29, 2014

    If something goes wrong with my zpool, I'd like to be notified by email. On Linux using MDADM, the MDADM daemon took care of that.

    With the release of ZoL 0.6.3, a brand new 'ZFS Event Daemon' or ZED has been introduced.

    I could not find much information about it, so consider this article my notes on this new service.

    If you want to receive alerts there is only one requirement: you must setup an MTA on your machine and that is outside the scope of this article.

    When you install ZoL, the ZED daemon is installed automatically and will start on boot.

    The configuration file for ZED can be found here: /etc/zfs/zed.d/zed.rc. Just uncomment the "ZED_EMAIL=" section and fill out your email address. Don't forget to restart the service.

    ZED seems to hook into the zpool event log that is kept in the kernel and monitors these events in real-time.

    You can see those events yourself:

    root@debian:/etc/zfs/zed.d# zpool events
    TIME                           CLASS
    Aug 29 2014 16:53:01.872269662 resource.fs.zfs.statechange
    Aug 29 2014 16:53:01.873291940 resource.fs.zfs.statechange
    Aug 29 2014 16:53:01.962528911 ereport.fs.zfs.config.sync
    Aug 29 2014 16:58:40.662619739 ereport.fs.zfs.scrub.start
    Aug 29 2014 16:58:40.670865689 ereport.fs.zfs.checksum
    Aug 29 2014 16:58:40.671888655 ereport.fs.zfs.checksum
    Aug 29 2014 16:58:40.671905612 ereport.fs.zfs.checksum
    ...
    

    You can see that a scrub was started and that incorrect checksums were discovered. A few seconds later I received an email:

    The first email:

    A ZFS checksum error has been detected:
    
      eid: 5
     host: debian
     time: 2014-08-29 16:58:40+0200
     pool: storage
     vdev: disk:/dev/sdc1
    

    And soon thereafter:

    A ZFS pool has finished scrubbing:
    
      eid: 908
     host: debian
     time: 2014-08-29 16:58:51+0200
     pool: storage
    state: ONLINE
    status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
    action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
      see: http://zfsonlinux.org/msg/ZFS-8000-9P
     scan: scrub repaired 100M in 0h0m with 0 errors on Fri Aug 29 16:58:51 2014
    config:
    
        NAME        STATE     READ WRITE CKSUM
        storage     ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdb     ONLINE       0     0     0
            sdc     ONLINE       0     0   903
    
    errors: No known data errors
    

    Awesome!

    The ZED daemon executes commands based on the event class. So it can do more than just send emails, you can customise different actions based on the event class. The event class can be seen in the zpool events output.

    One of the more interesting features is automatic replacement of a defect drive with a hot spare, so full fault tolerance is restored as soon as possible.

    I've not been able to get this to work. The ZED scripts would not automatically replace a failed/faulted drive.

    There seem to be some known issues. The fixes seem to be in a pending pull request.

    Just to make sure I got alerted, I've simulated the ZED configuration for my production environment in a VM.

    I simulated a drive failure with dd as stated earlier, but the result was that for every checksum error I received one email. With thousands of checksum errors, I had to clear 1000+ emails from my inbox.

    It seems that this option, which is uncommented by default, was not enabled.

    ZED_EMAIL_INTERVAL_SECS="3600"
    

    This option implements a cool-down period where an event is just reported once and suppressed afterwards until the interval expires.

    It would be best if this option would be enabled by default.

    The ZED authors acknowledge that ZED is a bit rough around the edges, but it sends out alerts consistently and that's what I was looking for, so I'm happy.

    Tagged as : ZFS event daemon
  4. Installation of ZFS on Linux Hangs on Debian Wheezy

    August 29, 2014

    After a fresh net-install of Debian Wheezy, I was unable to compile the ZFS for Linux kernel module. I've installed apt-get install build-essential but that wasn't enough.

    The apt-get install debian-zfs command would just hang.

    I noticed a 'configure' process and I killed it, and after a few seconds, the installer continued after spewing out this error:

    Building initial module for 3.2.0-4-amd64
    Error! Bad return status for module build on kernel: 3.2.0-4-amd64 (x86_64)
    Consult /var/lib/dkms/zfs/0.6.3/build/make.log for more information.
    

    So I ran ./configure manually inside the mentioned directory and then I got this error:

    checking for zlib.h... no
    configure: error: in `/var/lib/dkms/zfs/0.6.3/build':
    configure: error: 
        *** zlib.h missing, zlib-devel package required
    See `config.log' for more details
    

    So I ran apt-get install zlib1g-dev and no luck:

    checking for uuid/uuid.h... no
    configure: error: in `/var/lib/dkms/zfs/0.6.3/build':
    configure: error: 
        *** uuid/uuid.h missing, libuuid-devel package required
    See `config.log' for more details
    

    I searched a bit online and then I found this link that listed some additional packages that may be missing and I installed them all with:

    apt-get install zlib1g-dev uuid-dev libblkid-dev libselinux-dev parted
    lsscsi wget
    

    This time the ./configure went fine and I could manually make install the kernel module and import my existing pool.

    Tagged as : ZFS Wheezy
  5. Please Use ZFS With ECC Memory

    August 27, 2014

    Some people say that it's OK (acceptable risk) to run a ZFS NAS without ECC memory.

    I'd like to make the case that this is very bad advice and that these people are doing other people a disservice.

    Running ZFS without ECC memory gives you a false sense of security and it can lead to serious data corruption or even loss of the whole zpool. You may lose all your data. Here is a nice paper about ZFS and how it handles corrupt memory (it doesnt!).

    ZFS was designed to be run on hardware with ECC memory and it trusts memory blindly. ZFS addresses data integrity for disks. ECC memory addresses data integrity of data in memory. Each tool has it's own purposes and use the right tool for the job.

    ZFS combined with bad RAM may be a significantly bigger threat to your data on your NAS than using EXT4/XFS/UFS. Not only because the file system may get corrupt and cannot be imported anymore, but also because there are no file system recovery tools available for ZFS. With the older file system, at least you may stand some chance to save some of your data.

    ZFS amplifies the impact of bad memory

    Aaron Toponce explains the danger of bad non-ECC memory with some examples. ZFS tries to repair data if it thinks it is corrupt. But since ZFS trust RAM memory it cannot distinct between bad RAM or bad disk data and will start to 'repair' good data. This will cause further corruption and will further damage data on disk. Imagine what will happen if you perform regular scrubs of your data.

    Personally, I think that even for a home NAS, it's best to use ECC memory regardless if you use ZFS. It makes for more stable hardware. If money is a real constraint, it's better to take a look at AMD's offerings then to skip on ECC memory for a bit more performance.

    ZFS is just one part of the data integrity/availability puzzle

    From a technical perspective, it is always a bad choice to buy non-ECC memory for your DIY NAS. But you may have non-technical reasons not to buy ECC memory, like 'monetary' reasons.

    ECC memory is a bit more expensive, but the question is: what is your goal?

    If you care about your data and would lose sleep over the risk of silent data corruption, you need to go all the way to be safe. ZFS covers the risk of drives spewing corrupt data, extra drives cover the risk of drive failures and ECC memory covers the risk of bad memory.

    ZFS itself is free. But data integrity and availability is not. We know that hardware can fail, in particular hard drives. So we buy some extra drives and sacrifice capacity in exchange for reliability. We pay real money to gain some safety. Why not with memory? Why is it suddenly not necessary to do exactly the same with memory what ZFS covers for hard disk drives?

    The ECC vs. non-ECC debate is about wether the likelihood and the impact of a RAM bitflip warrants the extra costs of ECC memory for home usage. But before we look at the numbers, let's just think about this for a moment.

    The only argument is that the likelihood that memory corruption occurs is low. But there is no data on this for home environments. It's just anekdotes and here-say. The trouble is that non-ECC machines never tell you in your face that you just encountered some memory bit error. It just crashes, reboots, some app crashes or some file is suddenly lost. How do you know you've never experienced bad memory?

    The chance is low, but if it goes wrong, the impact could be very high.

    I like this argument from Andrew Galloway who has an ever stronger opinion in this debate:

    Would you press a button with a 100$ reward if there's a one in ten thousand
    chance that you will get zapped by a lightning strike and die instead of
    getting that 100$?
    

    Is the small risk of losing all your data worth the reward of 100$?

    Vendors like HP or Dell, do not ship a single server or workstation with non-ECC RAM. Even the cheapest tower model servers for small businesses contain ECC memory. Please let that sink in for a moment.

    On the FreeNAS forum, the've seen multiple people lose their data because of memory corruption rendering their zpool unusable. For some nice and very opinionated read check this topic.

    non-ECC hardware will not warn you

    How long will it take for you to notice that your NAS has memory problems? By the very nature of non-ECC memory and related hardware (motherboard), there is no way to tell if memory has gone bad. By the time you will notice, it may be too late. Just think what will happen if a scrub starts.

    ECC motherboards log memory events to the BIOS and those events can often be read through IPMI from within the operating system.

    The Google study

    Now let's take a look at some data. I'm using the Google study that some of you may already be familiar with.

    Our first observation is that memory errors are not rare events.
    About a third of all machines in the fleet experience at least one memory
    error per year [...]
    

    One in three machines faces at least one memory error per year. But a machine contains multiple memory modules.

    Around 20% of DIMMs in Platform A and B are affected by correctable errors
    per year, compared to less than 4% of DIMMs in Platform C and D.
    

    So let's assume that your hardware is of better design like platform C and D. In that case, each memory module has a four percent chance per year to see a correctable error. Remember that your NAS has at least two memory modules.

    So the chance of seeing no errors per module per year is 96%. So 0,96 x 0,96 = 92% chance that everything will be fine that year. Or you could say: 8% chance that some failure will occur. With four memory modules, the risk is 15% per year that you will face a single memory error.

    A memory error may not immediately lead total loss of you pool, but still. I find this number quite high.

    There are more interesting observations in this paper.

    Memory errors can be classified into soft errors, which randomly corrupt
    bits, but do not leave any physical damage; and hard errors, which corrupt
    bits in a repeatable manner because of a physical defect (e.g. “stuck bits”).
    
    [...]
    
    Conclusion 7: Error rates are unlikely to be dominated by soft errors.
    We observe that CE rates are highly correlated with system utilization,
    even when isolating utilization effects from the effects of temperature.
    

    So If I understand this correctly, soft errors are mostly caused by high usage of CPU and RAM and cosmic radiation does not seem to be the cause that often.

    Please note that Google did not measure hard or soft errors directly as they can't distinguish between them.

    Brian Moses blogged about his reasons why he did not choose ECC memory for his NAS box. Although most of his arguments are not very strong in my opinion, he pointed out something interesting.

    Google found that there is a strong correlation between memory errors and the CPU/RAM usage of the machine.

    We observe clear trends of increasing correctable error rates with
    increasing CPU utilization and allocated memory. Averaging across all
    platforms, it seems that correctable error rates grow roughly
    logarithmically as a function of utilization levels (based on the roughly
    linear increase of error rates in the graphs, which have log scales on the
    X-axis).
    

    A major difference between the Google server and your home NAS will be that your home NAS won't see much usage of both memory and CPU in general, so if the relation is logarithmic in nature, the risk of seeing memory errors in a low-utilisation environment should be reduced. But what kind of number can we put on that? 1% per memory module per year? Or 0.1%?

    Are you the person who is going to find out?

    This information may be used as an indication that for a home environment, memory problems less likely compared to high-usage systems in a data center, but are you going to bet your data on that assumption?

    Most people run their NAS 24/7. Often, it has other tasks beside storing files and this may cause a load on the system. Further more, ZFS tend to use as much memory as possible for caching purposes, increasing the risk of hitting bad memory. And ZFS users will need to perform regular scrubs of their pool, which cause a lot of disk, CPU and RAM activity.

    Inform people and give them a choice

    When people seek advice on their NAS builds, ECC memory should always be recommended and I think that nobody should create the impression that it's OK for home use not to use ECC RAM for technical reasons.

    Even if it were true that home builds may be less susceptible to memory errors it would not be fair to create the impression that the likelihood of bad memory is so small that we can just ignore the impact and save a few bucks.

    People are free to choose not to go for ECC memory for monetary reasons, but that does not justify the choice from a technical perspective and they should be aware that they are taking a risk.

    Tagged as : ZFS ECC

Page 1 / 35