This document is a WORK IN PROGRESS.
This is just a quick personal cheat sheet: treat its contents with caution!
Smartmontools¶
Smartmontools is a set of tools to monitor storage systems and to provide advanced warning of disk degradation.
Reference(s)
Table of contents¶
Install¶
Config¶
Reference(s)
$ man smartd.conf
Configure the SMART daemon:
# vi /etc/smartd.conf
> ...
> # DEVICESCAN # ⚠️ Comment this line ⚠️
> ...
> # Monitors...
> # ...SMART health status ("-H")
> # ...SMART log errors and selftest ("-l error" and "-l selftest")
> # ...failure of any 'usage' attributes ("-f")
> # &
> # Start self-test ("-s") of type 'short' ("S/../../5/03") every week on
> # Friday at 3:00 a.m ("S/../../5/03").
> # &
> # Send a test mail every time SMART daemon start up and for every repport
> # if there is a problem (with a daily reminder in this case).
>
> /dev/sda -m alerts@example.com -M test -H -l error -l selftest -f -s S/../../5/03
> /dev/sdb -m alerts@example.com -M test -H -l error -l selftest -f -s S/../../5/03
The following entry:
Will also execute/usr/share/smartmontools/smartd_warning.d/script
before sending any mail to
usr1@add1
The following entry:
will execute/path/to/script
instead of sending a mail. From the script you can access smartd
environment variables:
-
STDIN
-
SMARTD_MAILER
: set to the argument of-M exec
if present, or else to 'mail'. -
SMARTD_DEVICE
: set to the device path (e.g.:/dev/sda
). -
SMARTD_DEVICETYPE
: set to the device type specified by-d
directive or 'auto' if none. -
SMARTD_DEVICESTRING
: set to the device description. -
SMARTD_DEVICEINFO
: set to device identify information (most of the info insmartctl -i
). -
SMARTD_FAILTYPE
: set to the reason for the warning or message email. Possible value are:-
EmailTest
: this is an email test message. -
Health
: the SMART health status indicates imminent failure. -
Usage
: a usage Attribute has failed. -
SelfTest
: the number of self test failures has increased. -
ErrorCount
: the number of errors in theATA
error log has increased. -
CurrentPendingSector
: one of more disk sectors could not be read and are marked to be reallocated (replaced with spare sectors). -
OfflineUncorrectableSector
: during off-line testing, or self testing, one or more disk sectors could not be read. -
Temperature: Temperature reached critical limit (see
-W
directive). -
FailedHealthCheck
: the SMART health status command failed. -
FailedReadSmartData
: the command to read SMART Attribute data failed. -
FailedReadSmartErrorLog
: the command to read the SMART error log failed. -
FailedReadSmartSelfTestLog
: the command to read the SMART self test log failed. -
FailedOpenDevice
: the open() command to the device failed.
-
-
SMARTD_ADDRESS
: set to the address argument ADD of the-m
Directive. -
SMARTD_MESSAGE
: set to the one sentence summary warning email message string fromsmartd
. -
SMARTD_FULLMESSAGE
: set to the contents of the entire email warning message string fromsmartd
. -
SMARTD_TFIRST
: set to the time and date at which the first problem of this type was reported. -
SMARTD_TFIRSTEPOCH
: set to an integer, the Unix epoch forSMARTD_TFIRST
. -
SMARTD_PREVCNT
: set to an integer specifying the number of previous messages sent. -
SMARTD_NEXTDAYS
: set to an integer specifying the number of days until the next message will be sent.
Start the SMART daemon and add it to the init system:
If you don't want to use the default mailer ("mail") with the SMART daemon, but you rather prefer Neomutt for example:
Use¶
-
Print S.M.A.R.T. information:
# smartctl -i /dev/sda # print some drive information # smartctl -a /dev/sda # print all drive information
See S.M.A.R.T. attributes details for more details about drive information.
-
Use
smartctl
:# smartctl --info /dev/sda | grep 'SMART support is:' > SMART support is: Available - device has SMART capability. > SMART support is: Enabled # smartctl -s on /dev/sda # enable SMART if not already enabled # smartctl -c /dev/sda # print SMART capabilities # smartctl -t short /dev/sda # run SMART short test, other possible tests are: # offline (default test, w/o logs: just visible with '-l error' opt) # long (extended test) # conveyance (test intended to identify damage after transportation) # smartctl -X # abort test # smartctl -l selftest /dev/sda # print test results # smartctl -l error /dev/sda # print errors if any # smartctl -H /dev/sda # print SMART health status # smartctl -A /dev/sda # print SMART health status
-
Print messages only if error(s):
-
Print in a neat, individual, predictably separate way (for script parsing):
Note: see https://blog.inf.ed.ac.uk/chris/smartctl-and-megaraid/
-
Get disk temperature
Smartmontools with hardware RAID¶
-
Get all info on hardware LSI RAID (on physical disk 4):
-
For other type of hardware RAID: see https://www.smartmontools.org/wiki/Supported_RAID-Controllers
Troubleshooting¶
-
"Read failure" status. If after a test you get this kind of results:
# smartctl -l selftest /dev/sda # print test results > ... > == START OF READ SMART DATA SECTION == > SMART Self-test log structure revision number 1 > Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error > # 1 Extended offline Completed: read failure 40% 32017 1374022944 > ...
🚧 WIP 🚧
If this cheat sheet has been useful to you, then please consider leaving a star here.