Choosing between ashift=9 and ashift=12 for 4K sector drives is not always a clear cut case. You have to choose between raw performance or storage capacity.
My testplatform is Debian Wheezy with ZFS on Linux. I'm using a system with 24 x 4 TB drives in a RAIDZ3. The drives have a native sector size of 4K, and the array is formatted with ashift=12.
First we create the array like this:
zpool create storage -o ashift=12 raidz3 /dev/sd[abcdefghijklmnopqrstuvwx]
Note: NEVER use /dev/sd? drive names for an array, this is just for testing, always use /dev/disk/by-id/ names.
Then we run a simple sequential transfer benchmark with dd:
root@nano:/storage# dd if=/dev/zero of=ashift12.bin bs=1M count=100000 100000+0 records in 100000+0 records out 104857600000 bytes (105 GB) copied, 66.4922 s, 1.6 GB/s root@nano:/storage# dd if=ashift12.bin of=/dev/null bs=1M 100000+0 records in 100000+0 records out 104857600000 bytes (105 GB) copied, 42.0371 s, 2.5 GB/s
This is quite impressive. With these speeds, you can saturate 10Gbe ethernet. But how much storage space do we get?
Filesystem Size Used Avail Use% Mounted on storage 69T 512K 69T 1% /storage
NAME USED AVAIL REFER MOUNTPOINT storage 1.66M 68.4T 435K /storage
Only 68.4 TiB of storage? That's not good. There should be 24 drives minus 3 for parity is 21 x 3.6 TiB = 75 TiB of storage.
So the performance is great, but somehow, we lost about 6 TiB of storage, more than a whole drive.
So what happens if you create the same array with ashift=9?
zpool create storage -o ashift=9 raidz3 /dev/sd[abcdefghijklmnopqrstuvwx]
These are the benchmarks:
root@nano:/storage# dd if=/dev/zero of=ashift9.bin bs=1M count=100000 100000+0 records in 100000+0 records out 104857600000 bytes (105 GB) copied, 97.4231 s, 1.1 GB/s root@nano:/storage# dd if=ashift9.bin of=/dev/null bs=1M 100000+0 records in 100000+0 records out 104857600000 bytes (105 GB) copied, 42.3805 s, 2.5 GB/s
So we lose about a third of our write performance, but the read performance is not affected, probably by read-ahead caching but I'm not sure.
With ashift=9, we do lose some write performance, but we can still saturate 10Gbe.
Now look what happens to the available storage capacity:
Filesystem Size Used Avail Use% Mounted on storage 74T 98G 74T 1% /storage
NAME USED AVAIL REFER MOUNTPOINT storage 271K 73.9T 89.8K /storage
Now we have a capacity of 74 TiB, so we just gained 5 TiB with ashift=9 over ashift=12, at the cost of some write performance.
So if you really care about sequential write performance, ashift=12 is the better option. If storage capacity is more important, ashift=9 seems to be the best solution for 4K drives.
The performance of ashift=9 on 4K drives is always described as 'horrible' but I think it's best to run your own benchmarks and decide for yourself.
Caveat: I'm quite sure about the benchmark performance. I'm not 100% sure how reliable the reported free space is according to df -h or zfs list.
Edit: I have added a bit of my own opinion on the results.