some things about ZFS

Recently I stumbled across this blog-entry. This guy, Marcelo Leal, wrote a ZFS internals book and blogged his thoughts about ZFS.

But his thoughts/ opinions are misleading.

1. he says, RAIDZ performs poorly and he gives the advice not to use RAIDZ but mirrors.
It’s not useless, you can use it at home, even for backups in a datacenter solution, but you need to have a robust infrastructure to deal with long resilvers and really poor restore procedures. Everything is a whole stripe solves one problem and creates a bottleneck for performance and a nightmare for the resilver process (ZFS needs to traverse the whole filesystem to resiver a disk). If you care, i can give you an advice: if you want to use it, three discs in a set is the maximum you can obtain for it.

Pardon, but this is complete nonsens.
The bad performance of his „datacenter solution“ is not root caused by ZFS but by his very cheap and slow SATA drives.
These drives are for consumer purposes but not for real datacenters.
The duration of resilvering depends on the performance of disks and size of the filesystem. ( 2 TB SATA disk with 7200 rpms ). An fsck also takes its time. An sync executed by a volume manger reads block by block and sync block by block.
But to speed up, use faster disks for enterprise purposes instead of consumer disks.

ZFS consumes some more ressources than other filesystems, that is true. Applications consumes ressources, Volume Managers, too. So what?
Usually todays boxes have enough ressources to handle this, so that is not the point. In return ZFs provides better reliability.

Secondly, you do not need to buy any raid controllers. Other vendors, like the one I recently met, boosts the performance by the numbers of disks and having RAID Controllers on every single tray!!! Ops, this might boost the performance since the calculation of raid algorithms are distributed over the trays, but every singel raid controller is a single point of failure.

His advice to use a MAXIMUM of three disks is nonsens. You cannot create a RAIDZ with two discs 😉 so what’s the minimum?
Additionally, take in consideration that the performance is much slower using only 3 disks than using 6 or 7 disks. The more disks, the better the performance, untill the break-even, when too many disks flatten the perfomance.

Then he forgot to mention that RAIDZ doesn’t work a usual RAID5.
ZFS uses dynamic block sizes, so ZFS calculates how many disks are used and the size of the blocks to store the data redundantly. Take a look at my raid-z document and you see an image how ZFS allocates blocks and what is the difference of RAIDZ to common RAID level.
So RAIDZ consumes less disk space than other RAID5 sets and, of course, mirroring.
If you want a detailed explanation why mirroring performs in some cases better than RAIDZ read this article „WHEN TO (AND NOT TO) USE RAID-Z“.

2. He claims about the lack of L2ARC redundancy and performance.
The data is on disk, we can access it from there in the case of loosing the SSD data. No, no, we can not get it from there! ZFS loves cheap discs, our discs is 7200 SATA discs… they do know how to store data, but do not know how to read it. These 7200 SATA drives should have a banner saying: Pay to write it and pray for read it.

To compensate the bad performance of SATA disks the system needs cache ( usually nvram ) to read and write. This is why people now uses seperate devices like SSDs.

And what is he talking about?
Read performance or write performance ?

If data is written on SSD as seperate ZIL devices the data is written on stable storage! Period.

If data is read, than the data has been already written to stable storage and is now cached, whereever that is ( RAM or SSD, disk ).
If you uses slow disks, you have slow performance unless you read from faster devices like RAM or SSD.

It is true, the L2Arc needs a warm up, since the L2Arc ist the second-level cache which means:
1. the data is cached in ARC, and when data is evicted by ZFS it is
2. written to L2ARC, for instance on SSD. So there must be read requesst and somewhat a „load“ to fill up the L2ARC.

How the ARC and L2ARC works is described in my L2ARC document.
If you like to read more about seperate cache devices, take my advice and read this written by experts.

BTW, using SATA discs with 7200 rpms is not a good solution for speed up the performance, either for read nor for write requests. It is just 7200 rpm nstead of 15000 rpms.

My advice, do not buy cheap, buy some faster, realiable disks , use a good storage architecture, and everythings speeds up. 🙂

Third, he talks about fragmentation caused by incremental snapshots in ZFS.
This is not in particular a problem of ZFS.
Every filesystems suffer fragmentation over time.

And 4th about some another „myths“.
If you won’t read about myths, read the ZFS-Best Practices Guide, written by ZFS experts.

Kommentar verfassen

Trage deine Daten unten ein oder klicke ein Icon um dich einzuloggen:

Du kommentierst mit Deinem Abmelden /  Ändern )

Google+ Foto

Du kommentierst mit Deinem Google+-Konto. Abmelden /  Ändern )


Du kommentierst mit Deinem Twitter-Konto. Abmelden /  Ändern )


Du kommentierst mit Deinem Facebook-Konto. Abmelden /  Ändern )


Verbinde mit %s