This document is a WORK IN PROGRESS.
This is just a quick personal cheat sheet: treat its contents with caution!
Smartmontools¶
Smartmontools is a set of tools to monitor storage systems and to provide advanced warning of disk degradation.
Reference(s)
Table of contents¶
Install¶
Config¶
Reference(s)
$ man smartd.conf
Configure the SMART daemon:
# vi /etc/smartd.conf
> ...
> # DEVICESCAN # ⚠️ Comment this line ⚠️
> ...
> # Monitors...
> # ...SMART health status ("-H")
> # ...SMART log errors and selftest ("-l error" and "-l selftest")
> # ...failure of any 'usage' attributes ("-f")
> # &
> # Start self-test ("-s") of type 'short' ("S/../../5/03") every week on
> # Friday at 3:00 a.m ("S/../../5/03").
> # &
> # Send a test mail every time SMART daemon start up and for every repport
> # if there is a problem (with a daily reminder in this case).
>
> /dev/sda -m alerts@example.com -M test -H -l error -l selftest -f -s S/../../5/03
> /dev/sdb -m alerts@example.com -M test -H -l error -l selftest -f -s S/../../5/03
The following entry:
Will also execute/usr/share/smartmontools/smartd_warning.d/script
before sending any mail to
usr1@add1
The following entry:
will execute/path/to/script
instead of sending a mail. From the script you can access smartd
environment variables:
-
STDIN
-
SMARTD_MAILER
: set to the argument of-M exec
if present, or else to 'mail'. -
SMARTD_DEVICE
: set to the device path (e.g.:/dev/sda
). -
SMARTD_DEVICETYPE
: set to the device type specified by-d
directive or 'auto' if none. -
SMARTD_DEVICESTRING
: set to the device description. -
SMARTD_DEVICEINFO
: set to device identify information (most of the info insmartctl -i
). -
SMARTD_FAILTYPE
: set to the reason for the warning or message email. Possible value are:-
EmailTest
: this is an email test message. -
Health
: the SMART health status indicates imminent failure. -
Usage
: a usage Attribute has failed. -
SelfTest
: the number of self test failures has increased. -
ErrorCount
: the number of errors in theATA
error log has increased. -
CurrentPendingSector
: one of more disk sectors could not be read and are marked to be reallocated (replaced with spare sectors). -
OfflineUncorrectableSector
: during off-line testing, or self testing, one or more disk sectors could not be read. -
Temperature: Temperature reached critical limit (see
-W
directive). -
FailedHealthCheck
: the SMART health status command failed. -
FailedReadSmartData
: the command to read SMART Attribute data failed. -
FailedReadSmartErrorLog
: the command to read the SMART error log failed. -
FailedReadSmartSelfTestLog
: the command to read the SMART self test log failed. -
FailedOpenDevice
: the open() command to the device failed.
-
-
SMARTD_ADDRESS
: set to the address argument ADD of the-m
Directive. -
SMARTD_MESSAGE
: set to the one sentence summary warning email message string fromsmartd
. -
SMARTD_FULLMESSAGE
: set to the contents of the entire email warning message string fromsmartd
. -
SMARTD_TFIRST
: set to the time and date at which the first problem of this type was reported. -
SMARTD_TFIRSTEPOCH
: set to an integer, the Unix epoch forSMARTD_TFIRST
. -
SMARTD_PREVCNT
: set to an integer specifying the number of previous messages sent. -
SMARTD_NEXTDAYS
: set to an integer specifying the number of days until the next message will be sent.
Start the SMART daemon and add it to the init system:
If you don't want to use the default mailer ("mail") with the SMART daemon, but you rather prefer Neomutt for example:
Use¶
Use smartctl
:
# smartctl -i /dev/sda # print some drive information
# smartctl -a /dev/sda # print all drive information
# smartctl --info /dev/sda | grep 'SMART support is:'
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
# smartctl -s on /dev/sda # enable SMART if not already enabled
# smartctl -c /dev/sda # print SMART capabilities
# smartctl -t short /dev/sda # run SMART short test, other possible tests are:
# offline (default test, w/o logs: just visible with '-l error' opt)
# long (extended test)
# conveyance (test intended to identify damage after transportation)
# smartctl -X # abort test
# smartctl -l selftest /dev/sda # print test results
# smartctl -l error /dev/sda # print errors if any
# smartctl -H /dev/sda # print SMART health status
# smartctl -A /dev/sda # print SMART health status
-
Print messages only if error(s):
-
Print in a neat, individual, predictably separate way (for script parsing):
Note: see https://blog.inf.ed.ac.uk/chris/smartctl-and-megaraid/
-
Get disk temperature
Smartmontools with hardware RAID¶
-
Get all info on hardware LSI RAID (on physical disk 4):
-
For other type of hardware RAID: see https://www.smartmontools.org/wiki/Supported_RAID-Controllers
Troubleshooting¶
- "Read failure" status. If after a test you get this kind of results:
If this cheat sheet has been useful to you, then please consider leaving a star here.