In the first command below, we gave ourselves access to the apt-add-repository command, which makes it much simpler to safely add PPAs to our repository list. Then we added the PPA, updated our source list to reflect that, and installed the package itself.
sudo apt-get update sudo apt-get install python-software-properties sudo apt-add-repository ppa:zfs-native/stable sudo apt-get update sudo apt-get install ubuntu-zfs
Load the ZFS module
modprobe zfs
Ubuntu 16.04 LTS comes with built-in support for ZFS, so it's just a matter of installing and enabling ZFS
sudo apt-get install zfsutils-linux zfs-initramfs sudo modprobe zfs
First get a listing of all the disk device names you will be using by using this command:
fdisk -l|more
You should get a listing like below:
Disk /dev/sda: 4000.8 GB, 4000787030016 bytes 255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 Disk /dev/sdb: 4000.8 GB, 4000787030016 bytes 255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 Disk /dev/sdc: 4000.8 GB, 4000787030016 bytes 255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 Disk /dev/sdd: 4000.8 GB, 4000787030016 bytes 255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 Disk /dev/sde: 4000.8 GB, 4000787030016 bytes 255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 Disk /dev/sdf: 4000.8 GB, 4000787030016 bytes 255 heads, 63 sectors/track, 486401 cylinders, total 7814037168 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disk identifier: 0x00000000 Disk /dev/sdg: 32.0 GB, 32017047552 bytes 255 heads, 63 sectors/track, 3892 cylinders, total 62533296 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0005ae32
In my particular case, I will be using all the 4000.8 GB drives in my zpool so, I will be using the following devices:
/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf
I will not be using the 32GB /dev/sdg since it's my boot drive. So now that we have the devices names, let's get a listing of all the drives in the system by wwn ID (this is the preferred method of adding drives in your zpool just in case the /dev/sdx assignment ever changes in your system. Additionally, the WWN ID is usually printed on the actual drive itself just in case you have to replace it later, you will know exactly which one it is):
ls -l /dev/disk/by-id
You should get a listing like below:
lrwxrwxrwx 1 root root 9 Jun 8 10:48 ata-TOSHIBA_MD04ACA400_15O8K1NGFSBA -> ../../sda lrwxrwxrwx 1 root root 9 Jun 8 10:48 ata-TOSHIBA_MD04ACA400_15PDKBNIFSAA -> ../../sde lrwxrwxrwx 1 root root 9 Jun 8 10:48 ata-TOSHIBA_MD04ACA400_15Q1KCFMFSAA -> ../../sdf lrwxrwxrwx 1 root root 9 Jun 8 10:48 ata-TOSHIBA_MD04ACA400_15Q2KETKFSAA -> ../../sdc lrwxrwxrwx 1 root root 9 Jun 8 10:48 ata-TOSHIBA_MD04ACA400_15Q2KETLFSAA -> ../../sdd lrwxrwxrwx 1 root root 9 Jun 8 10:48 ata-TOSHIBA_MD04ACA400_15Q3KFGKFSAA -> ../../sdb lrwxrwxrwx 1 root root 9 Jun 8 10:48 ata-TSSTcorp_DVD+_-RW_TS-H653B -> ../../sr0 lrwxrwxrwx 1 root root 9 Jun 8 10:48 ata-V4-CT032V4SSD2_200118513 -> ../../sdg lrwxrwxrwx 1 root root 10 Jun 8 10:48 ata-V4-CT032V4SSD2_200118513-part1 -> ../../sdg1 lrwxrwxrwx 1 root root 10 Jun 8 10:48 ata-V4-CT032V4SSD2_200118513-part2 -> ../../sdg2 lrwxrwxrwx 1 root root 10 Jun 8 10:48 ata-V4-CT032V4SSD2_200118513-part5 -> ../../sdg5 lrwxrwxrwx 1 root root 9 Jun 8 10:48 wwn-0x500003960b704511 -> ../../sdf lrwxrwxrwx 1 root root 9 Jun 8 10:48 wwn-0x500003960b784775 -> ../../sdc lrwxrwxrwx 1 root root 9 Jun 8 10:48 wwn-0x500003960b784776 -> ../../sdd lrwxrwxrwx 1 root root 9 Jun 8 10:48 wwn-0x500003960b804868 -> ../../sdb lrwxrwxrwx 1 root root 9 Jun 8 10:48 wwn-0x500003960ba809f1 -> ../../sda lrwxrwxrwx 1 root root 9 Jun 8 10:48 wwn-0x500003960bd03569 -> ../../sde lrwxrwxrwx 1 root root 9 Jun 8 10:48 wwn-0x500a07560bed90f1 -> ../../sdg lrwxrwxrwx 1 root root 10 Jun 8 10:48 wwn-0x500a07560bed90f1-part1 -> ../../sdg1 lrwxrwxrwx 1 root root 10 Jun 8 10:48 wwn-0x500a07560bed90f1-part2 -> ../../sdg2 lrwxrwxrwx 1 root root 10 Jun 8 10:48 wwn-0x500a07560bed90f1-part5 -> ../../sdg5
Now we match the device names to the corresponding wwn ID, so in my particular case, I will be using the following wwn IDs:
sda --> wwn-0x500003960ba809f1 sdb --> wwn-0x500003960b804868 sdc --> wwn-0x500003960b784775 sdd --> wwn-0x500003960b784776 sde --> wwn-0x500003960bd03569 sdf --> wwn-0x500003960b704511
We will be creating a RAID6 ZFS pool. I prefer RAID6 over RAID5 since it has more resiliency that RAID5 since it can withstand two drive failures before the array goes down. Just for reference, the following RAID levels can be created:
Let's create the RAID6 ZFS pool named array1 using 4K blocksizes (-o ashift=12) vs the default 512 byte:
sudo zpool create -o ashift=12 -f array1 raidz2 /dev/disk/by-id/wwn-0x500003960ba809f1 /dev/disk/by-id/wwn-0x500003960b804868 /dev/disk/by-id/wwn-0x500003960b784775 /dev/disk/by-id/wwn-0x500003960b784776 /dev/disk/by-id/wwn-0x500003960bd03569 /dev/disk/by-id/wwn-0x500003960b704511
Check the newly created zpool:
sudo zpool status
should output the following:
pool: array1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM array1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 wwn-0x500003960ba809f1 ONLINE 0 0 0 wwn-0x500003960b804868 ONLINE 0 0 0 wwn-0x500003960b784775 ONLINE 0 0 0 wwn-0x500003960b784776 ONLINE 0 0 0 wwn-0x500003960bd03569 ONLINE 0 0 0 wwn-0x500003960b704511 ONLINE 0 0 0
Show the zpool listing:
sudo zpool list
The command above, will output the raw NOT the usable capacity of the zpool since two of our drives are taken up for parity:
NAME SIZE ALLOC FREE EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT array1 21.8T 756K 21.7T - 0% 0% 1.00x ONLINE -
Running the zfs list command
sudo zfs list
will output the usable capacity of the zpool:
NAME USED AVAIL REFER MOUNTPOINT array1 480K 14.3T 192K /array1
This cannot be stressed enough. If you intent on turning off ZIL you absolutely must have UPS battery backup that will gracefully shutdown your server when the battery runs out. If you don't, your data will get fucked!!!!
If you intent to use your ZFS pool to store virtual machines or databases, you should not turn off the ZIL but instead use an SSD for the SLOG to boost performance (Explained below)
If you intent to use your ZFS pool for NFS which issues sync writes by default, then you should turn OFF ZIL. What if you want to store virtual machines on NFS then? Then you simply set the async flag for your NFS share, duh!!
Turn off ZIL support for synchronous writes with the following command:
sudo zfs set sync=disabled array1
Let's create some filesystems on the newly created zfs pool. In ZFS, filesystems look like folders under the zfs pool. We could simply create folders, but then we would lose the ability to create snapshots or set properties such as compression, deduplication, quotas etc.
In my particular case, I need some of the ZFS pool for iSCSI target. So, I'm going to create a iscsi filesystem:
sudo zfs create array1/iscsi
Running df -h will output the following. Notice the array1/iscsi filesystem that was created:
Filesystem Size Used Avail Use% Mounted on /dev/sdg1 14G 2.3G 11G 18% / none 4.0K 0 4.0K 0% /sys/fs/cgroup udev 7.9G 4.0K 7.9G 1% /dev tmpfs 1.6G 684K 1.6G 1% /run none 5.0M 0 5.0M 0% /run/lock none 7.9G 0 7.9G 0% /run/shm none 100M 0 100M 0% /run/user array1 15T 128K 15T 1% /array1 array1/iscsi 15T 128K 15T 1% /array1/iscsi
You can create as many filesystems as you need in the ZFS pool and set properties.
If I want to enable compression on the newly created filesystem, I would issue the following command:
sudo zfs set compression=on array1/iscsi
To turn off compression use the following command:
sudo zfs set compression=off array1/iscsi
Important note. Simply setting compression=on defaults the compression algorithm to lzjb. It's recommended to use the lz4 algorithm.
That is easily set by issuing the following command:
sudo zfs set compression=lz4 array1/iscsi
If I wanted to set a quota, I would issue the following command:
sudo zfs set quota=200G array1/iscsi
To remove the quota, use the following command:
sudo zfs set quota=none array1/iscsi
If for some reason, you ever want to destroy your zpool you issue the following command which will force it:
sudo zpool destroy -f array1
If you happen to have an SSD drive, you can utilize that drive as a cache drive (L2ARC Cache) for your Zpool. The idea behind it is data read from the SSD drive will have significantly faster access times than traditional spinning disks. So for instance, if you were to add a 250GB SSD drive, then 250GB of the most frequently accessed data will be kept in the cache. Now, it goes without saying that in case of power failure, any data that was kept in the cache and wasn't written to the spinning disks would be lost so a good battery backup is an absolute must.
Identify your SSD drive wwn ID as described above. So, assuming the wwn ID for your SSD drive is wwn-0x50025388500f8522. So, we'll add it to our previously created Zpool like below:
zpool add array1 cache -f /dev/disk/by-id/wwn-0x50025388500f8522
Check the Zpool status:
sudo zpool status
pool: array1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM array1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 wwn-0x500003960ba809f1 ONLINE 0 0 0 wwn-0x500003960b804868 ONLINE 0 0 0 wwn-0x500003960b784775 ONLINE 0 0 0 wwn-0x500003960b784776 ONLINE 0 0 0 wwn-0x500003960bd03569 ONLINE 0 0 0 wwn-0x500003960b704511 ONLINE 0 0 0 cache wwn-0x50025388500f8522 ONLINE 0 0 0
As you can see the drive has been added as cache.
ZIL (ZFS Intent Log) drives can be added to a ZFS pool to speed up the write capabilities of any level of ZFS RAID. It writes the metadata for a file to a very fast SSD drive to increase the write throughput of the system. When the physical spindles have a moment, that data is then flushed to the spinning media and the process starts over. We have observed significant performance increases by adding ZIL drives to our ZFS configuration. One thing to keep in mind is that the ZIL should be mirrored to protect the speed of the ZFS system. If the ZIL is not mirrored, and the drive that is being used as the ZIL drive fails, the system will revert to writing the data directly to the disk, severely hampering performance. Alternatively, you can always remove the bad drive and add another one as a ZIL drive.
If the ZIL drive fails you will lost a few seconds of data. If that's acceptable to you, then a mirror is not necessary. If you are going to be storing MISSION CRITICAL data where even a few seconds of lost data will cost significant sums of money, adding ZIL drives in mirror configuration is a MUST!!!!
If you are going to be using two SSD drives in mirror mode, identify the SSD drives by wwn ID as described above and then add them to your array in mirror mode like below:
zpool add array1 log mirror -f /dev/disk/by-id/wwn-0x50025388500f8668 /dev/disk/by-id/wwn-0x50025388500ffg12
If you are going to be using one SSD drives, identify the SSD drive by wwn ID as described above and then add it to your array like below:
zpool add array1 log -f /dev/disk/by-id/wwn-0x50025388500f5af8
Check the Zpool status:
sudo zpool status
pool: array1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM array1 ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 wwn-0x500003960ba809f1 ONLINE 0 0 0 wwn-0x500003960b804868 ONLINE 0 0 0 wwn-0x500003960b784775 ONLINE 0 0 0 wwn-0x500003960b784776 ONLINE 0 0 0 wwn-0x500003960bd03569 ONLINE 0 0 0 wwn-0x500003960b704511 ONLINE 0 0 0 logs wwn-0x50025388500f5af8 ONLINE 0 0 0 cache wwn-0x50025388500f8522 ONLINE 0 0 0
As you can see it has been added as a ZIL drive.
An issue I've ran into is after a reboot if you do a zpool status, your zpool will show the devicenames vs the device IDs like below:
sudo zpool status
pool: array1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM array1 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 sdb ONLINE 0 0 0 sdd ONLINE 0 0 0 mirror-1 ONLINE 0 0 0 sdc ONLINE 0 0 0 sdg ONLINE 0 0 0 mirror-2 ONLINE 0 0 0 sde ONLINE 0 0 0 sdi ONLINE 0 0 0 logs sdh ONLINE 0 0 0 cache sdf ONLINE 0 0 0 errors: No known data errors
This only seems to be a cosmetic issue because issuing the zdb command shows the device IDs like it's supposed to:
sudo zdb
array1: version: 5000 name: 'array1' state: 0 txg: 200 pool_guid: 12136950353410592998 errata: 0 hostid: 2831217162 hostname: 'nas3' vdev_children: 4 vdev_tree: type: 'root' id: 0 guid: 12136950353410592998 children[0]: type: 'mirror' id: 0 guid: 7548278309220334221 metaslab_array: 39 metaslab_shift: 35 ashift: 12 asize: 4000771997696 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 2562845451665823060 path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E5N8KZ7N-part1' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 291777340882840666 path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4EDYVYU2J-part1' whole_disk: 1 create_txg: 4 children[1]: type: 'mirror' id: 1 guid: 8578547322301695916 metaslab_array: 37 metaslab_shift: 35 ashift: 12 asize: 4000771997696 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 2041375668167635066 path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4ENPN3V47-part1' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 15162795176142751617 path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E6YRLCHH-part1' whole_disk: 1 create_txg: 4 children[2]: type: 'mirror' id: 2 guid: 302043060234775242 metaslab_array: 35 metaslab_shift: 35 ashift: 12 asize: 4000771997696 is_log: 0 create_txg: 4 children[0]: type: 'disk' id: 0 guid: 5285723468079384932 path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4EDYVY6HT-part1' whole_disk: 1 create_txg: 4 children[1]: type: 'disk' id: 1 guid: 5203540854438335529 path: '/dev/disk/by-id/ata-WDC_WD40EFRX-68WT0N0_WD-WCC4E1VUAYCJ-part1' whole_disk: 1 create_txg: 4 children[3]: type: 'disk' id: 3 guid: 1510858814325079212 path: '/dev/disk/by-id/ata-Samsung_SSD_840_EVO_250GB_S1DDNEAF407950E-part1' whole_disk: 1 metaslab_array: 49 metaslab_shift: 31 ashift: 13 asize: 250045005824 is_log: 1 create_txg: 62 features_for_read: com.delphix:hole_birth com.delphix:embedded_data
You MAY be able to fix the issue by issuing the following commands:
zpool export array1 zpool import -d /dev/disk/by-id/ array1 zpool set cachefile= array1 update-initramfs -k all -u
Reboot the machine and do a zpool status. Again, this is only a cosmetic issue and it shouldn't affect anything.
The following script will run every hour to check every Zpool's status and it will notify you in case a Zpool encounters a problem such as a failed drive.
First of all, install mailutils package if not already installed:
apt-get install mailutils
Next, create a script in /etc/cron.hourly/ named zpoolstatus
vi /etc/cron.hourly/zpoolstatus
Paste the following, adjust the someone@domain.tld to the email address you want to the notifications sent and save the file:
#!/bin/bash EMAIL_ADD=someone@domain.tld zpool status -x | grep 'all pools are healthy' if [ $? -ne 0 ]; then /bin/date > /tmp/zfs.stat echo >> /tmp/zfs.stat /bin/hostname >> /tmp/zfs.stat echo >> /tmp/zfs.stat /sbin/zpool status -x >> /tmp/zfs.stat cat /tmp/zfs.stat | /usr/bin/mail -s "Disk failure in server : `hostname`" $EMAIL_ADD fi
Make the file executable:
chmod +x /etc/cron.hourly/zpoolstatus
Verify that the file will run every hour by running the following command:
sudo run-parts --report --test /etc/cron.hourly
should give you the following output, if blank, check your script again. Ensure the script does not have .sh extension on it or it will not work:
/etc/cron.hourly/zpoolstatus
That's it!