Many people have asked me why I do not use ZFS for my NAS storage box. This is a good question and I have multpile reasons why I do not use ZFS and probably never will.


Updated October 18, 2012. I decided to test ZFS on my download server and created a blog post about that.


The demise of Solaris

ZFS is invented by Sun for the Solaris operating system. When I was building my NAS, the only full-featured and production-ready version of ZFS is implemented in Sun Solaris. The only usable version of Solaris was Open Solaris. I dismissed using Open Solaris because of the lack of hardware support and the small user base. This small user base is very important to me. More users is more testing. More support.

The FreeBSD implementation of ZFS became only stable in January 2010, 6 months after I build my NAS (summer 2009). So FreeBSD was not an option at that time.

I am glad that I didn't go for Open Solaris, as Suns new owner Oracle has killed this operating system in August 2010. Although ZFS is open source software, I think it is actually closed source already. The only open source version was through Open Solaris. That software is now killed. Oracle will close the source of ZFS just by not publishing the code of new features and updates. Only their proprietary closed source Solaris platform will obtain updates. But I must say that I don't have proof on this. However, Oracle seems to have at least no interest in open source software and almost seems to be hostile towards it.

FreeBSD and ZFS

So I build my NAS when basically ZFS was not around yet. But with FreeBSD as of today you can build a NAS based on ZFS right? Sure, you can do that. I had no choice back then but you do. But to be honest, I still would not use ZFS. As of March 1th, 2011, I would still go with Linux software RAID and XFS.

The reasons are maybe not that great, I just provide them for you. It's up for you to decide.

I sincerely do respect the FreeBSD community and platform, but it is not for me. It may be that I have just much more experience with Debian Linux and just don't like changing platforms. I find the installation process much more user friendly, I see a year over year improvement on Debian, I see none on the 8.2 FreeBSD release. Furthermore, I'm just thrilled with the really big APT repository. Last, I cannot oversee future requirements. But I'm sure that those requirements have a higher chance to support Linux than BSD.

Furthermore, although FreeBSD has a community, it is relatively small. Resources on Debian an Ubuntu are abundant. I consider Linux a safer bet, also on the part of hardware support. My NAS must be simple to build and rock stable. I don't want to have a day time job just getting my NAS to work and maintain it.

If you are experienced with FreeBSD, by all means, built a ZFS setup if you want. If you have to learn either BSD or Linux, I consider knowledge about Linux more valuable in the long run.

ZFS is a hype

This is the part where people may strongly disagree with me. I admire ZFS, but I consider it total overkill for home usage. I have seen many people talking about ZFS like Apple users about Apple products. It is a hype. Don't get me wrong. As a long-time Mac user I'm also mocking myself here. I get the impression that ZFS is regarded as the second coming of Jesus Christ. It solves problems that I didn't know of in the first place. The only thing it can't do is beat Chuck Norris. But it does vacuum your house if you ask it to.

As a side note, one of the things I do not like about ZFS is the terminology. It is just RAID 0, RAID 1, RAID 5 or 6 but no, the ZFS people had to use different, more cool sounding terms like RAID Z or something. But it is basically the same thing.

Okay, now back to the point: nobody at home needs ZFS. You may argue that nobody needs 18 TB of storage space at home, but that's another story. Running ZFS means using FreeBSD or an out-of-the-box NAS solution based on FreeBSD. And there aren't any other relevant options.

Now, lets take a look at the requirements of most NAS builders. They want as much storage that is possible at the lowest price possible. That's about it. Many people want to add additional disk drives as their demand for storage capacity increases. So people buy a solution with a capacity for say 10 drives and start out with 4 drives and add disks when they need it.

Linux allows you to 'grow' or 'expand' an array, just like most hardware RAID solutions. As far as I know, this is a feature is still not available in ZFS. Maybe this feature is not relevant in the enterprise world, but it is for most people who actually have to think about how they spend their money.

Furthermore, I don't understand Why I can run any RAID array with decent performance with maybe 512 MB of RAM while ZFS would just totally crash with so little memory installed. You seem to need at least 2 GB to prevent crashing your system. More is recommended if you want to prevent it from crashing under high load or something. I really can't wrap my mind about this. Honestly, I think this is insane.

ZFS does great things. Management is easy. Many features are cool. Snapshots, other stuff. But most features are just not required for a home setup. ZFS seems to solve a lot of 'scares' that I've only heard about since ZFS came along. Like the RAID 5/6 write hole. Where others just hookup a UPS in the first place (if you don't use a UPS on your NAS, you might as well also try and see if you are lucky running RAID 0) they find a solution that prevents data loss when power fails. One of the most interesting features to me is though that ZFS checksums all data and detects corruption. But I like it because it sounds useful, but how high are the chances that you need this stuff?

If ZFS would be available under Linux as a native option instead of through FUSE, I would probably consider using it if I would know in advance that I would not want to expand or grow my array in the future. But I am pessimistic about this scenario. It is not in Oracle's interest to change the license on ZFS in order to allow Linux to incorporate support for it in the kernel.

To build my 20 disk RAID array, I had to puzzle with my drives to keep all data while migrating to the new system. Some of the 20 disks came from my old NAS system, so I had to repeatedly grow the array and add disks, which I couldn't have done with ZFS.

Why I choose to build this setup.

The array is just a single 20 disk RAID 6 volume created with a single MDADM command. The second command I issued to make my array operational was to format this new 'virtual' disk with XFS, which just takes seconds. A UPS protects the systems against power failure and I'm happy with it for 1.5 years now. Never had any problems. Never had a disk failure... A single RAID 6 array is simple and fast. XFS is old but reliable. My whole setup is just this: extremely simple. I just love simple.

My array does not use LVM, so I cannot create snapshots or stuff like that. But I don't need it. I just want so much storage that I don't have to think about it. And I think most people just want some storage share with lots of space. In that case, you don't need LVM or stuff like that. Just an array with a file system on top of it. If you can grow the array and the file system, you're set for the future. Speaking about the future: please note that on Linux, XFS is the only file system that is capable of addressing more than 16 TB of data. EXT4 is still limited to 16 TB.

For the future, my hopes are that BTRFS will become a modern viable alternative to ZFS.

Intel and Apple released Thunderbolt a high-speed (10 Gigabit/s) interface, that seems to replace both USB and Firewire. It is mainly targeted at end-user systems allowing to connect peripherals with just a single cable to a computer. Thunderbolt devices, like external hard drives or displays can be daisy chained, like Firewire. In short, Thunderbolt removes the cable clutter and ads a significant speed bonus.

For NAS owners and storage enthusiasts, this is also a very interesting technology. Just like Firewire, it seems to support computer-to-computer communication. So Thunderbolt could be used as a high-speed link between your homegrown NAS device and your PC workstation. Or between two storage / server system.

Thunderbolt

The only downside to Thunderbolt is the maximum cable length of 3 meters between devices. Thunderbolt doesn't seem to be the ideal replacement for your Gigabit network, but if most of your computer systems are close to each other, it might be very interesting.

Update: I found this interesting story about getting two Infiniband cards + cable. These are 10 Gbit cards and the author shows that 700+ MB/s transfer rates (as in megabytes) is possible.

I wanted to see how dificult it is to setup an instant messaging server based on open source software. Now I know that it is very easy, unless you are stubborn and do things your own way. In this example, I'm setting up a small IM server that is only for internal company use, but there is no difference if you want to expose the server to the internet.

First a bit background information. There is an open IETF standard for instant messaging called "XMPP" which stands for "Extensible Messaging and Presence Protocol". This protocol originated as part of the open source Jabber IM server software.

Setting up ejabberd

I decided to use ejabberd which is part of the Debian software archive. It is written in Erlang, but I can live with that. This blog posts documents how to setup the IM server with two accounts that can chat with each other. The configuration I use also enforces the use of SSL/TLS so authentication and all messages are encrypted.

Steps to get things running:

  • apt-get update
  • apt-get install ejabberd
  • cd /etc/ejabberd
  • edit ejabberd.cfg

Change the following line to your needs:

%% Hostname
{hosts, ["localhost", "jabber.domain.local"]}.

Also enforce the use of encryption like this:

starttls, {certfile, "/etc/ejabberd/ejabberd.pem"}

Must be changed to:

starttls_required, {certfile, "/etc/ejabberd/server.pem"}

Generating a custom SSL certificate

Security wise, it is very wrong to use the default SSL certificate as provided by the installation package for the server certificate. Anyone with access to this key material can decrypt encrypted communication. So you must generate your own server certificate. This is also required because IM clients may verifiy the certificate against the domain name used within the certificate. If there is no match, it will not work or it will at least complain.

openssl req -new -x509 -newkey rsa:2048 -days 365 -keyout privkey.pem \ 
-out server.pem

So this creates a public key (server.pem) and a private key (privkey.pem) which are valid for a year. Feel free to make the certificate valid for a longer period, this is an example. You will have to fill in some stuff, the most important part is this part:

Common Name (eg, YOUR name) []:jabber.domain.local

You are forced to set a password on the private key, but we want to remove this because otherwise the ejabberd service will not start automatically.

openssl rsa -in privkey.pem -out privkey.pem

Just enter the password you entered earlier and you're done. We now have separate files for the public and private key, but ejabberd expects them in one single file.

cat privkey.pem >> server.pem
rm privkey.pem

Set proper file system permissions:

chown ejabberd server.pem
chmod 600 server.pem

Now we are done. Restart ejabberd to use the new settings.

/etc/init.d/ejabberd restart

Security caveats

Please note that the ejabberd daemon provides a small build-in web interface for administration purposes on TCP port 5280. By default it is not protected by SSL or TLS and cannot be used unless you add users to this part of the confiuration file:

{acl, admin, {user, "", "localhost"}}.

Example:

{acl, admin, {user, "admin", "localhost"}}.

The user must also be registered as a normal IM user as described in the next section.

Warning: it seems to me that this interface is not very secure, for example, there is no logout button.

Furthermore, you might consider disabling the following section:

ejabberd_s2s_in

This prevents your IM server from communicating with other IM servers source. But we are not finished. When you install ejabberd, some other services are also started on the system. It is thus very important that you configure your firewall to block these ports. This small nmap port scan output shows some interesting services:

4369/tcp  open  epmd?
5222/tcp  open  jabber  ejabberd (Protocol 1.0)
5269/tcp  open  jabber  ejabberd
5280/tcp  open  http    ejabberd http admin
|_http-title: Site doesn't have a title (text/html; charset=utf-8).
|_http-methods: No Allow or Public header in OPTIONS response (status code 400)
36784/tcp open  unknown

Port 4369, 36784 and 5280 should be blocked by your firewall and not accessible from the internet.

Adding users

It is now time to create some IM users. A user account always looks like an email addres, for example:

peter@jabber.domain.local

To add accounts, use the ejabberdctl utiliy:

ejabberdctl register peter jabber.domain.local <password>

Please note that passwords that are entered on the command line end up in your bash_history file, so beware. Also, users running ps aux may be able to see the command for a brief moment. So be carefull.

By registering two account, you can test your new server.

Additional resources

Nice to know: the domain names used for your accounts can differ from the domain used for the IM server.

If you have a Windows Active Directory domain, you could consider authenticating your users against LDAP.

Other resources: - tutorial 1 - tutorial 2


This article has been updated to reflect the changes for John version 1.7.8 as released in june 2011. The most important change is the fact that MPI support is now integrated in the jumbo patch.


The original John the Ripper off-line password cracker only uses a single processor (core) when performing brute-force or dictionary attacks.

JtR does not use multiple cores (or machines). However, there is a patch available that enables support of MPI. MPI allows you to distribute the workload of a program across multiple instances, thus cores or even machines, but your application must support it.

The fun thing with MPI is that it is very easy to create a password cracking cluster. But for now let's just focus on using all these unused CPU cores to help us with cracking passwords.

I am using Ubuntu and Debian Linux as my platform but Mac OS X works also perfectly.

install MPI support

Note: Mac users have mpi support installed by default and don't need to install this.

  • apt-get install libopenmpi-dev openmpi-bin

download John the Ripper with extra patches

  • Get the john-1.7.8-jumbo-2.tar.gz file.

extract John & edit the Make file

  • tar xzf john-1.7.8-jumbo-2.tar.gz
  • cd john-1.7.8-jumbo-2/src
  • uncomment the following lines in the Makefile:

    CC = mpicc -DHAVE_MPI -DJOHN_MPI_BARRIER -DJOHN_MPI_ABORT`
    MPIOBJ = john-mpi.o`
    

Compile John the Ripper with MPI support

  • Run make and choose the most appropriate processor architecture. Example:

    make linux-x86-64 (for 64-bit i386)
    make linux-x86-sse2 (for 32-bit i386)
    make macosx-x86-64 (for 64 bit Mac OS X)
    

Test john the Ripper

  • cd ../run
  • ./john --test

Look at the benchmark values of the first test and remember them. Now let's see if MPI does any better:

  • mpirun -np [number of processor (virtual) cores] ./john --test

Let's asume that you have an iMac 27" with a Core i7 with 4 real cores and hyper threading enabled. This will provide a total of 8 virtual cores.

  • mpirun -np 8 ./john --test

If you notice a significant increase in performance, you know that MPI is working properly.

Some benchmarks without and with MPI support (Traditional DES)

These are the benchmark test results when using a single core on an old Nehalem Core i7 920:

Many salts: 2579K c/s real, 2579K c/s virtual
Only one salt:  2266K c/s real, 2266K c/s virtual

These are the benchmark test results when using MPI and thus all 8 cores:

Many salts: 11015K c/s real, 11015K c/s virtual
Only one salt:  9834K c/s real, 9834K c/s virtual

And just look at the performance improvement when we overclock from 2,66 to 3,6 Ghz:

Many salts: 15004K c/s real, 15004K c/s virtual
Only one salt:  13232K c/s real, 13232K c/s virtual

That is very significant. Now admire how the Core i7 920 @ 3.6 Ghz is blown away by the Sandy bridge based Core i7-2600 @ 3.4 Ghz:

Many salts: 20007K c/s real, 20209K c/s virtual
Only one salt:  16881K c/s real, 16881K c/s virtual

Setting up an MPI cluster

MPI clustering is based on using SSH keys. There is a single master that uses all nodes to perform the computation. The nodes are put into a text file nodes.txt like this:

node01  slots=2
node02  slots=2
node03  slots=4 
node04  slots=4

In this example, node 2 and 3 are dual-core systems, while node 3 and 4 are installed with quad-core processors. You must create an account on all your nodes with the same name that is used on the master, when running the master process. You also must generate a private SSH key and distribute the public part as the authorized_keys file to all nodes. This is outside the scope of this post. Please note that the SSH private key should be loaded with ssh-agent if used with a passphrase, or do not configure a passphrase on the key. If you do not use a pass phrase, understand that anyone with access to the key can access all nodes.

You may also have to put the nodexx entries in your /etc/hosts file if the names cannot be resolved by DNS.

Now I'm assuming that you are able to ssh into all nodes without requireing a password, thus ssh is properly setup.

* mpirun -np 12 -hostfile nodes.txt ./john --test

Now you should see increased performance, beyond the limit of a single host.

Some benchmarks

I ran a password cracking test on some data using a large dictionary. These are the performance differences when using all 8 cores of my Core i7 920 instead of just one:

single: 0:00:04:48      c/s: 11192K
mpi:    0:00:01:26      c/s: 46568K

The performance increase is significant.

20 DISK 18 TERRABYTE NAS

Just for fun, I've build myself an 18 TB NAS based on Debian Linux, software RAID, 20 disks and a Norco 4020 case.

Projects

Contact

Donate

If you find PPSS, WFS or LFS, usefull, consider a donation.

Categories

Archives