Skip to content

This document is a WORK IN PROGRESS.
This is just a quick personal cheat sheet: treat its contents with caution!


ZFS

ZFS is a combined file system and logical volume manager (designed by Sun Microsystems). ZFS is scalable, and includes extensive protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of file system and volume management, snapshots and CoW clones, continuous integrity checking and automatic repair, RAID Z, native NFSv4 ACLs, and can be very precisely configured.

Reference(s)

Table of contents


Install

TODO


Config

TODO


Use

TODO

  • Show disk space utilization info:

    $ zfs list
    

  • Show ZFS storage pool debugging and consistency information:

    $ zdb
    

  • Show all properties for $POOLNAME or $DATASET_NAME:

    $ zfs get all $POOLNAME
    $ zfs get all $DATASET_NAME
    

  • Check zpool status of all pools with extra verbose information:

    $ zpool status -v
    

  • Check zpool status of specific pool $POOLNAME with extra verbose information:

    $ zpool status -v $POOLNAME
    

  • Show verbose information about pools filesystem statistics:

    $ zpool list -v
    

  • Show verbose IO statistics for all pools:

    $ zpool iostat -v
    

  • Show verbose IO statistics for a specific pool $POOLNAME:

    $ zpool iostat -v $POOLNAME
    

  • Show useful and advanced information on how ZFS's ARCG Cache is being used:

    $ arcstat
    
    $ arc_summary
    

  • Show the serial number of a disk (e.g. /dev/sdx):

    $ smartctl -a /dev/sdx | grep Serial
    

How to replace a disk

  • Get the name of the disk you want to replace:
$ zpool status -v $POOLNAME

    pool: <POOLNAME>
   state: ONLINE
  status: One or more devices are configured to use a non-native block size.
          Expect reduced performance.
  action: Replace affected devices with devices that support the
          configured block size, or migrate data to a properly configured
          pool.
    scan: resilvered 186G in 1 days 17:17:59 with 0 errors on Wed Sep 11 04:10:14 2024
  config:

          NAME                                STATE     READ WRITE CKSUM
          <POOLNAME>                          ONLINE       0     0     0
            raidz2-0                          ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:3:0   ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:4:0   ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:5:0   ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:6:0   ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:7:0   ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:8:0   ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:9:0   ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:10:0  ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:11:0  ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:12:0  ONLINE       0     0     0  block size: 512B configured, 4096B native
              pci-0000:03:00.0-scsi-0:0:13:0  ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:14:0  ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:15:0  ONLINE       0     0     0

  errors: No known data errors

E.g. let's say pci-0000:03:00.0-scsi-0:0:12:0 has to be replaced.

  • Find the path to that disk:
$ zdb | grep "pci-0000:03:00.0-scsi-0:0:12:0"

  path: '/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:12:0-part1'
  phys_path: 'pci-0000:03:00.0-scsi-0:0:12:0'
  • Put the disk offline:
$ zpool offline $POOLNAME /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:12:0

Note that if you put the wrong disk offline, you can take it bake online with:

$ zpool online $POOLNAME /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:12:0
  • Make sure that the last command has been correctly issued before moving on:
zpool status -v $POOLNAME

  pool: <POOLNAME>
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 186G in 1 days 17:17:59 with 0 errors on Wed Sep 11 04:10:14 2024
config:

        NAME                                STATE     READ WRITE CKSUM
        <POOLNAME>                          DEGRADED     0     0     0
          raidz2-0                          DEGRADED     0     0     0
            pci-0000:03:00.0-scsi-0:0:3:0   ONLINE       0     0     0
            pci-0000:03:00.0-scsi-0:0:4:0   ONLINE       0     0     0
            pci-0000:03:00.0-scsi-0:0:5:0   ONLINE       0     0     0
            pci-0000:03:00.0-scsi-0:0:6:0   ONLINE       0     0     0
            pci-0000:03:00.0-scsi-0:0:7:0   ONLINE       0     0     0
            pci-0000:03:00.0-scsi-0:0:8:0   ONLINE       0     0     0
            pci-0000:03:00.0-scsi-0:0:9:0   ONLINE       0     0     0
            pci-0000:03:00.0-scsi-0:0:10:0  ONLINE       0     0     0
            pci-0000:03:00.0-scsi-0:0:11:0  ONLINE       0     0     0
            pci-0000:03:00.0-scsi-0:0:12:0  OFFLINE      0     0     0  block size: 512B configured, 4096B native
            pci-0000:03:00.0-scsi-0:0:13:0  ONLINE       0     0     0
            pci-0000:03:00.0-scsi-0:0:14:0  ONLINE       0     0     0
            pci-0000:03:00.0-scsi-0:0:15:0  ONLINE       0     0     0

errors: No known data errors
  • Locate the disk to remove, e.g. by blinking its LED with ledctl:
$ ledctl locate=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:12:0
  • Physically remove the disk, and replace it with a new one.

  • You can stop the LED blinking, e.g. with ledctl:

$ ledctl off=/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:12:0
  • Tell ZFS to replace/resilver the disk:
$ zpool replace $POOLNAME /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:12:0
  • Now we wait! You can keep an eye on the status with the below command:
$ watch zpool status -v $POOLNAME

    pool: <POOLNAME>
   state: DEGRADED
  status: One or more devices is currently being resilvered.  The pool will
          continue to function, possibly in a degraded state.
  action: Wait for the resilver to complete.
    scan: resilver in progress since Wed Oct  9 09:47:36 2024
          590G scanned at 18.4G/s, 7.84M issued at 251K/s, 2.46T total
          0B resilvered, 0.00% done, no estimated completion time
  config:

          NAME                                  STATE     READ WRITE CKSUM
          <POOLNAME>                            DEGRADED     0     0     0
            raidz2-0                            DEGRADED     0     0     0
              pci-0000:03:00.0-scsi-0:0:3:0     ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:4:0     ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:5:0     ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:6:0     ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:7:0     ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:8:0     ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:9:0     ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:10:0    ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:11:0    ONLINE       0     0     0
              replacing-9                       DEGRADED     0     0     0
                old                             OFFLINE      0     0     0  block size: 512B configured, 4096B native
                pci-0000:03:00.0-scsi-0:0:12:0  ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:13:0    ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:14:0    ONLINE       0     0     0
              pci-0000:03:00.0-scsi-0:0:15:0    ONLINE       0     0     0

  errors: No known data errors

sanoid

Sanoid is a policy-driven snapshot management tool for ZFS filesystems.

  • How to install (see https://repology.org/project/sanoid/versions):

    # apk add sanoid
    
    # apt install sanoid
    

    TODO

    TODO

    # nix-env -iA nixos.sanoid
    
    # nix-env -iA nixpkgs.sanoid
    

    Install with AUR:

    $ mkdir -p ~/apps/aur-apps
    $ cd ~/apps/aur-apps
    $ git https://aur.archlinux.org/sanoid.git
    $ cd sanoid
    $ makepkg -is # --syncdeps to auto-install deps, --install to install after building
    

    For Artix users

    If you are not using systemd, you might have to translate the sanoid systemd services and timer yourself

    TODO

    # xbps-install -S sanoid
    

    TODO

  • How to use / configure:

    You will also find a more detailed example in /etc/sanoid/sanoid.defaults.conf or /usr/share/sanoid/sanoid.defaults.conf, in order to consult all possible options for the template part of the configuration file.

    • Now let's list the available pool(s) for ZFS snapshots:

      $ zfs list
      
          NAME                USED  AVAIL     REFER  MOUNTPOINT
          production_data    1.11T  11.1T     1.00T  /prod
          archiving_data     3.33T  33.3T     3.00T  /arch
          test_data             1G     1T        1G  /test
          ...
      
    • Let's say you want to snapshot the production_data pool. In order to do so, you can create your own configuration file in /etc/sanoid/sanoid.conf, e.g. :

      [production_data]
          use_template = production
      
      [template_production]
          frequently = 0
          hourly = 0
          daily = 0
          weekly = 1
          monthly = 1
          yearly = 1
          autosnap = yes
          autoprune = yes
      

    This configuration will keep a weekly, a monthly and a yearly snapshot.

    You will have to wait for a 15mn (maximum) before the sanoid.timer applies your configuration and start to take snapshots.

    • You can list your snapshots like so:

      $ zfs list -t snapshot
      
  • ℹ️ Note about systemd: by default, sanoid will work with systemd, i.e. after installation, a systemd timer and two systemd services will be up and running:

    $ systemctl status sanoid.service
      ...
    
    $ systemctl status sanoid-prune.service
      ...
    
    $ systemctl status sanoid.timer
      ...
    

The sanoid.timer timer unit runs sanoid-prune.service followed by sanoid.service every 15 minutes. To edit any of the command-line options, you can edit these service files.

  • ℹ️ Note about snapshots location: by default, all snapshots are saved to the sanoid cache directory: /var/cache/sanoid

RAIDZ expansion

TODO


If this cheat sheet has been useful to you, then please consider leaving a star here.