some things about ZFS

Recently I stumbled across this blog-entry. This guy, Marcelo Leal, wrote a ZFS internals book and blogged his thoughts about ZFS.

But his thoughts/ opinions are misleading.

1. he says, RAIDZ performs poorly and he gives the advice not to use RAIDZ but mirrors.
It’s not useless, you can use it at home, even for backups in a datacenter solution, but you need to have a robust infrastructure to deal with long resilvers and really poor restore procedures. Everything is a whole stripe solves one problem and creates a bottleneck for performance and a nightmare for the resilver process (ZFS needs to traverse the whole filesystem to resiver a disk). If you care, i can give you an advice: if you want to use it, three discs in a set is the maximum you can obtain for it.

Pardon, but this is complete nonsens.
The bad performance of his „datacenter solution“ is not root caused by ZFS but by his very cheap and slow SATA drives.
These drives are for consumer purposes but not for real datacenters.
The duration of resilvering depends on the performance of disks and size of the filesystem. ( 2 TB SATA disk with 7200 rpms ). An fsck also takes its time. An sync executed by a volume manger reads block by block and sync block by block.
But to speed up, use faster disks for enterprise purposes instead of consumer disks.

ZFS consumes some more ressources than other filesystems, that is true. Applications consumes ressources, Volume Managers, too. So what?
Usually todays boxes have enough ressources to handle this, so that is not the point. In return ZFs provides better reliability.

Secondly, you do not need to buy any raid controllers. Other vendors, like the one I recently met, boosts the performance by the numbers of disks and having RAID Controllers on every single tray!!! Ops, this might boost the performance since the calculation of raid algorithms are distributed over the trays, but every singel raid controller is a single point of failure.

His advice to use a MAXIMUM of three disks is nonsens. You cannot create a RAIDZ with two discs 😉 so what’s the minimum?
Additionally, take in consideration that the performance is much slower using only 3 disks than using 6 or 7 disks. The more disks, the better the performance, untill the break-even, when too many disks flatten the perfomance.

Then he forgot to mention that RAIDZ doesn’t work a usual RAID5.
ZFS uses dynamic block sizes, so ZFS calculates how many disks are used and the size of the blocks to store the data redundantly. Take a look at my raid-z document and you see an image how ZFS allocates blocks and what is the difference of RAIDZ to common RAID level.
So RAIDZ consumes less disk space than other RAID5 sets and, of course, mirroring.
If you want a detailed explanation why mirroring performs in some cases better than RAIDZ read this article „WHEN TO (AND NOT TO) USE RAID-Z“.

2. He claims about the lack of L2ARC redundancy and performance.
The data is on disk, we can access it from there in the case of loosing the SSD data. No, no, we can not get it from there! ZFS loves cheap discs, our discs is 7200 SATA discs… they do know how to store data, but do not know how to read it. These 7200 SATA drives should have a banner saying: Pay to write it and pray for read it.

To compensate the bad performance of SATA disks the system needs cache ( usually nvram ) to read and write. This is why people now uses seperate devices like SSDs.

And what is he talking about?
Read performance or write performance ?

If data is written on SSD as seperate ZIL devices the data is written on stable storage! Period.

If data is read, than the data has been already written to stable storage and is now cached, whereever that is ( RAM or SSD, disk ).
If you uses slow disks, you have slow performance unless you read from faster devices like RAM or SSD.

It is true, the L2Arc needs a warm up, since the L2Arc ist the second-level cache which means:
1. the data is cached in ARC, and when data is evicted by ZFS it is
2. written to L2ARC, for instance on SSD. So there must be read requesst and somewhat a „load“ to fill up the L2ARC.

How the ARC and L2ARC works is described in my L2ARC document.
If you like to read more about seperate cache devices, take my advice and read this written by experts.

BTW, using SATA discs with 7200 rpms is not a good solution for speed up the performance, either for read nor for write requests. It is just 7200 rpm nstead of 15000 rpms.

My advice, do not buy cheap, buy some faster, realiable disks , use a good storage architecture, and everythings speeds up. 🙂

Third, he talks about fragmentation caused by incremental snapshots in ZFS.
This is not in particular a problem of ZFS.
Every filesystems suffer fragmentation over time.

And 4th about some another „myths“.
If you won’t read about myths, read the ZFS-Best Practices Guide, written by ZFS experts.

1 Jahr ist schon vorbei, seit dem es Sun offiziell nicht mehr gibt

Dieser Artikel stimmt einen sehr traurig, enthaelt aber leider auch sehr viel Wahrheit.

„Nun ist es an den Communities rund um Distributionen wie OpenIndiana/Illumos, Nexenta Core Platform, BeleniX oder SchilliX zu zeigen, ob und wie es mit dem „offenen“ Solaris-System vorangeht.“

Wer ein modernes, performantes und Enterprise-fähiges Unix auf seinen Systemen betreiben möchte, der sollte als Kunde IllumOS und die Nexenta Produkte nicht vergessen!!
Lang lebe Solaris …. auch ohne Oracle.

The last days as Sunnie

I am going to leave Sun/Oracle at New Year’s Eve. It is quite the right time now to leave and to say bye-bye to Sun.

I like to say Thank You to all my colleagues! It was an amazing time and always fun  to work with you great people.

I like to thank all my customers and partners!!  It was a pleasure to meet you all.

So, Thank You, Thank you, Thank you to you all!! You and you and you ….. and you , too.

I will miss good ol‘ Sun.

TIme to move on…..

Who likes live-music and like to meet me, it would be a great pleasure to welcome you at the 17th of December at 21.00 h at the Chesters-Inn in Berlin, Glogauerstr. 2, Berlin-Kreuzberg.

Colleagues and me will rock’n roll. We are „Lager14“.



Neues Buch zu Oracle Solaris 10 System Virtualization

Von den Autoren Jeff Victor, Jeff Savit, Gary Combs, Simon Hayler und Bob Netherton wurde ein Buch ueber Virtualisierungstechnologien von Oracle geschrieben.

Hier eine Beschreibung des Inhalts ( geklaut von der Amazon Seite ;o) ):

Oracle® Solaris™ 10 System Virtualization Essentials provides an accessible introduction to computer virtualization, specifically the system virtualization technologies that use the Oracle Solaris or OpenSolaris operating systems. This accessible guide covers the key concepts system administrators need to understand and explains how to

  • Use Dynamic Domains to maximize workload isolation on Sun SPARC systems
  • Use Oracle VM Server for SPARC to deploy different Oracle Solaris 10 and OpenSolaris environments on SPARC CMT (chip multithreading) systems
  • Use Oracle VM Server for x86 or xVM hypervisor to deploy a server with heterogeneous operating systems
  • Use Oracle VM VirtualBox to develop and test software in heterogeneous environments
  • Use Oracle Solaris Containers to maximize efficiency and scalability of workloads
  • Use Oracle Solaris Containers to migrate Solaris 8 and Solaris 9 workloads to new hardware systems
  • Mix virtualization technologies to maximize workload density

Achso und falls es noch nicht bekannt war, hier noch ein Buch Tip:

Solaris 10 ZFS Essentials (Solaris System Administration) von Scott Watanabe.