Creating a Basic ZFS File System on Linux

February 01, 2014 Category: ZFS

Here are some notes on creating a basic ZFS file system on Linux, using ZFS on Linux.

I'm documenting the scenario where I just want to create a file system that can tollerate at least a single drive failure and can be shared over NFS.

Identify the drives you want to use for the ZFS pool

The ZFS on Linux project advices not to use plain /dev/sdx (/dev/sda, etc.) devices but to use /dev/disk/by-id/ or /dev/disk/by-path device names.

Device names for storage devices are not fixed, so /dev/sdx devices may not always point to the same disk device. I've been bitten by this when first experimenting with ZFS, because I did not follow this advice and then could not access my zpool after a reboot because I removed a drive from the system.

So you should pick the appropriate device from the /dev/disk/by-[id|path] folder. However, it's often difficult to determine which device in those folders corresponds to an actual disk drive.

So I wrote a simple tool called showdisks which helps you identify which identifiers you need to use to create your ZFS pool.

diskbypath

You can install showdisks yourself by cloning the project:

git clone https://github.com/louwrentius/showtools.git

And then just use showdisks like

./showdisks -sp  (-s (size) and -p (by-path) )

For this example, I'd like to use all the 500 GB disk drives for a six-drive RAIDZ1 vdev. Based on the information from showdisks, this is the command to create the vdev:

zfs create tank raidz1 pci-0000:03:00.0-scsi-0:0:21:0 pci-0000:03:00.0-scsi-0:0:19:0 pci-0000:02:00.0-scsi-0:0:9:0 pci-0000:02:00.0-scsi-0:0:11:0 pci-0000:03:00.0-scsi-0:0:22:0 pci-0000:03:00.0-scsi-0:0:18:0

The 'tank' name can be anything you want, it's just a name for the pool.

Please note that with newer bigger disk drives, you should test if the ashift=12 option gives you better performance.

zfs create -o ashift=12 tank raidz1 <devices>

I used this option on 2TB disk drives and the performance and the read performance improved twofold.

How to setup a RAID10 style pool

This is how to create the ZFS equivalent of a RAID10 setup:

zfs create tank mirror <device 1> <device 2> mirror <device 3> <device 4> mirror <device 5> <device 6>

How many drives should I use in a vdev

I've learned to use a 'power of two' (2,4,8,16) of drives for a vdev, plus the appropriate number of drives for the parity. RAIDZ1 = 1 disk, RAIDZ2 = 2 disks, etc.

So the optimal number of drives for RAIDZ1 would be 3,5,9,17. RAIDZ2 would be 4,6,10,18 and so on. Clearly in the example above with six drives in a RAIDZ1 configuration, I'm violating this rule of thumb.

How to disable the ZIL or disable sync writes

You can expect bad throughput performance if you want to use the ZIL / honour synchronous writes. For safety reasons, ZFS does honour sync writes by default, it's an important feature of ZFS to guarantee data integrity. For storage of virtual machines or databases, you should not turn of the ZIL, but use an SSD for the SLOG to get performance to acceptable levels.

For a simple (home) NAS box, the ZIL is not so important and can quite safely be disabled, as long as you have your servers on a UPS and have it cleanly shutdown when the UPS battery runs out.

This is how you turn of the ZIL / support for synchronous writes:

zfs set sync=disabled <pool name>

Disabling sync writes is especially important if you use NFS which issues sync writes by default.

Example:

zfs set sync=disabled tank

How to add an L2ARC cache device

Use showdisks to lookup the actual /dev/disk/by-path identifier and add it like this:

zpool add tank cache <device>

Example:

zpool add tank cache pci-0000:00:1f.2-scsi-2:0:0:0

This is the result (on another zpool called 'server'):

root@server:~# zpool status
  pool: server
 state: ONLINE
  scan: none requested
config:

    NAME                               STATE     READ WRITE CKSUM
    server                             ONLINE       0     0     0
      raidz1-0                         ONLINE       0     0     0
        pci-0000:03:04.0-scsi-0:0:0:0  ONLINE       0     0     0
        pci-0000:03:04.0-scsi-0:0:1:0  ONLINE       0     0     0
        pci-0000:03:04.0-scsi-0:0:2:0  ONLINE       0     0     0
        pci-0000:03:04.0-scsi-0:0:3:0  ONLINE       0     0     0
        pci-0000:03:04.0-scsi-0:0:4:0  ONLINE       0     0     0
        pci-0000:03:04.0-scsi-0:0:5:0  ONLINE       0     0     0
    cache
      pci-0000:00:1f.2-scsi-2:0:0:0    ONLINE       0     0     0

How to monitor performance / I/O statistics

One time sample:

zpool iostat

A sample every 2 seconds:

    zpool iostat 2

More detailed information every 5 seconds:

    zpool iostat -v 5

Example output:

                                      capacity     operations    bandwidth
pool                               alloc   free   read  write   read  write
---------------------------------  -----  -----  -----  -----  -----  -----
server                             3.54T  7.33T      4    577   470K  68.1M
  raidz1                           3.54T  7.33T      4    577   470K  68.1M
    pci-0000:03:04.0-scsi-0:0:0:0      -      -      1    143  92.7K  14.2M
    pci-0000:03:04.0-scsi-0:0:1:0      -      -      1    142  91.1K  14.2M
    pci-0000:03:04.0-scsi-0:0:2:0      -      -      1    143  92.8K  14.2M
    pci-0000:03:04.0-scsi-0:0:3:0      -      -      1    142  91.0K  14.2M
    pci-0000:03:04.0-scsi-0:0:4:0      -      -      1    143  92.5K  14.2M
    pci-0000:03:04.0-scsi-0:0:5:0      -      -      1    142  90.8K  14.2M
cache                                  -      -      -      -      -      -
  pci-0000:00:1f.2-scsi-2:0:0:0    55.9G     8M      0     70    349  8.69M
---------------------------------  -----  -----  -----  -----  -----  -----

How to start / stop a scrub

Start:

zfs scrub <pool>

Stop:

zfs scrub -s <pool>

Mount ZFS file systems on boot

Edit /etc/defaults/zfs and set this parameter:

ZFS_MOUNT='yes'

How to enable sharing a file system over NFS:

zfs set sharenfs=on <poolname>

How to create a zvol for usage with iSCSI

zfs create -V 500G <poolname>/volume-name

How to force ZFS to import the pool using disk/by-path

Edit /etc/default/zfs and add

ZPOOL_IMPORT_PATH=/dev/disk/by-path/

Links to important ZFS information sources:

Tons of information on using ZFS on Linux by Aaron Toponce:

https://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux/

Understanding the ZIL (ZFS Intent Log)

http://nex7.blogspot.nl/2013/04/zfs-intent-log.html

Information about 4K sector alignment problems

http://www.opendevs.org/ritk/zfs-4k-aligned-space-overhead.html

Important read about using the proper number of drives in a vdev

http://forums.freenas.org/threads/getting-the-most-out-of-zfs-pools.16/

Comments