Self-Monitoring, Analysis, and Reporting Technology is builtin to hard drives that is designed to alert you of a failed or failing drive. This is not a fool-proof technology because drives can be working fine one day and not be detected the next.
From December 2005 to August 2006 Google performed a field study that covered 100,000 consumer grade drives. The study found that a drive was 39 times more likely to fail within the 60 days after the first uncorrectable error (attribute 198) than drives that had no errors. Drives that first detected errors in reallocations, offline reallocations, and pending pending sectors, attributes 5, 196, and 197, respectively). 56% of failed drives didn't record any counts in attributes 5, 196, 197, 198 while 36% didn't record any error at all.
Most home users don't know or worry about this. The only time that they'll see a SMART message is when they turn on their computer telling them their drive is about to fail. For mission critical systems, it is important to monitor various attributes of the drives to avoid possible issues down the line.
In instances where you have a RAID array or heaps of disk drives, then knowing when there is an issue will be beneficial. I have a total three different servers with a total of 36 hard drives between them. Each of those servers have a RAID and it is more critical that I know if a drive is failing. The sooner I know a drive is failing the quicker I can get it replaced eliminating the possibility that other drives could fail during the rebuild. Thus risking the entire array.
For this, I am running Linux and smartmontools. Beyond that you need an email service provider that you can use to send out alerts. We can either use GMail for this or go a different route and use a mail delivery service such as mailgun.
I decided to go with mailgun because I didn’t want to risk alerts being marked as SPAM. There are other services that do the same thing as mailgun; SendGrid and Postmark are two of them. These two offer a free tier that offers 100 emails/month at the time of writing this. Mailgun offers a free trial with their “entry-level” tier being $35/mo which is called Foundation. In full disclosure, I use mailgun but I have been grandfathered in on the old free tier that they offered a while ago. I really like mailgun’s interface and ease of use. The $35/mo for what I use it for is a bit steep. I did read something on their site that sounds like you may be able to contact sales and set up a "basic" account for $5/mo for 1000 emails.
Again, what you will need are the following
- curl if using APIs
- mail delivery service (API and SMTP)
- msmtp if sending SMTP
- Email account (SMTP only)
Setup and Configuration
euse -p sys-apps/smartmontools -E update-drivedb emerge sys-apps/smartmontools
For other distros, Debian, Fedora, RedHat, etc.
apt-get install -y smartmontools dnf install smartmontools yum install smartmontools
The curl program should already be installed. If not, install the curl package.
Like smartmontools, installation of msmtp is identical.
emerge mail-mta/msmtp apt-get install msmtp dnf install msmtp yum install msmtp
The smartd configuration file found at /etc/smartd.conf has a lot of helpful information with examples. The default configuration is using the
DEVICESCAN directive. This monitors all drives in the system. From here there are two options, you can append various options to the
DEVICESCAN directive or add each of the devices you want to monitor to the file.
# HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE. # PLEASE SEE THE smartd.conf MAN PAGE FOR DETAILS # # -d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N # -T TYPE set the tolerance to one of: normal, permissive # -o VAL Enable/disable automatic offline tests (on/off) # -S VAL Enable/disable attribute autosave (on/off) # -n MODE No check. MODE is one of: never, sleep, standby, idle # -H Monitor SMART Health Status, report if failed # -l TYPE Monitor SMART log. Type is one of: error, selftest # -f Monitor for failure of any 'Usage' Attributes # -m ADD Send warning email to ADD for -H, -l error, -l selftest, and -f # -M TYPE Modify email warning behavior (see man page) # -s REGE Start self-test when type/date matches regular expression (see man page) # -p Report changes in 'Prefailure' Normalized Attributes # -u Report changes in 'Usage' Normalized Attributes # -t Equivalent to -p and -u Directives # -r ID Also report Raw values of Attribute ID with -p, -u or -t # -R ID Track changes in Attribute ID Raw value with -p, -u or -t # -i ID Ignore Attribute ID for -f Directive # -I ID Ignore Attribute ID for -p, -u or -t Directive # -C ID Report if Current Pending Sector count non-zero # -U ID Report if Offline Uncorrectable count non-zero # -W D,I,C Monitor Temperature D)ifference, I)nformal limit, C)ritical limit # -v N,ST Modifies labeling of Attribute N (see man page) # -a Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198 # -F TYPE Use firmware bug workaround. Type is one of: none, samsung # -P TYPE Drive-specific presets: use, ignore, show, showall # # Comment: text after a hash sign is ignored # \ Line continuation character # Attribute ID is a decimal integer 1 <= ID <= 255 # except for -C and -U, where ID = 0 turns them off. # All but -d, -m and -M Directives are only implemented for ATA devices # # If the test string DEVICESCAN is the first uncommented text # then smartd will scan for devices. # DEVICESCAN may be followed by any desired Directives. DEVICESCAN -m root
Below are some examples of setting up individual scanning rules.
# DEVICESCAN must be commented out if you want to setup individual monitoring rules. # DEVICESCAN -m root /dev/sda -a -m root # Same as above /dev/sdb -H -f -t -l error -l selftest -C 197 -U 198 -m root # Monitoring the error and selftest logs /dev/sdc -a -l error -l selftest # Perform short self-test every day at 2 A.M. and long-test every Sunday at 3 A.M. /dev/sdd -a -s (S/../.././02|L/../../7/03) # Send an email to root user and then execute the /etc/smartd_warning.d/email-notify.sh script /dev/sda -a -m root -M exec /etc/smartd_warning.d/email-notify.sh
-s is in the following format T/MM/DD/d/HH
- T is the type of test:
- L - Long self-test
- S - Short self-test
- C - Conveyance test
- O - Offline immediate test
- MM is the month of the year, From 01 (January) to 12 (December)
- DD is the day of the month. From 01 - 31.
- d is the day of the week. Where 1 is Monday and 7 is Sunday.
- HH is the hour of day in 24 hour format.
The example above uses dots ( . ) which is a wildcard character.
The next step is to enable the smartd service that will continually watch the SMART attributes of the specified drives and run tests.
systemctl enable --now smartd
Msmtp can be configured for each user but since this is will be utilized by the system all configuration changes will go in the global configuration file located at /etc/msmtprc
defaults auth on tls on tls_trust_file /etc/ssl/certs/ca-certificates.crt logfile /var/log/msmtp.log # Gmail configuration account gmail host smtp.gmail.com port 587 from email@example.com user your-username password app-specific-password # MailGun configuration account mailgun host smtp.mailhun.org port 587 from firstname.lastname@example.org user email@example.com password SMTP_PASSWORD account default: gmail
As you can see, configuring msmtp is very easy. I have added a MailGun configuration for completeness and other mail delivery systems that support SMTP can be added here as well. For each of those, refer to their respective documentation on how to generate SMTP passwords and other information.
When using SMTP the /etc/smartd_warning.sh is what is used to generate the default email. You can create a script that will allow you to customize the email by adding
-M exec /etc/smartd_warning.d/email-notify.sh to the smartd.conf file.
To: firstname.lastname@example.org Subject: $SMARTD_SUBJECT $SMARTD_FULLMESSAGE EOM
To use the API of a mail delivery service you will have to create a script that calls a curl command. Here we will create the file /etc/smartd_warning.d/email-notify.sh
curl -s --user "api:key-YOUR_API_KEY" https://api.mailgun.net/v3/notifications.paulus.io/messages \ -F from="SMART ALERTS <$(hostname -s)@domain.tld" \ -F to="email@example.com" \ -F to="firstname.lastname@example.org" \ -F subject="$SMARTD_SUBJECT" \ -F text="$SMARTD_FULLMESSAGE"
You can create a custom email message but I decided to use the default
SMARTD_FULLMESSAGE since it includes everything I want. Other variables that you can use in your own message include
SMARTD_MESSAGE. There are a few others that provide dates and times when the first failure occured. For a full list and explaination about each variable see the man page for smartd.conf(5).
That is how to set up and configure smartmontools to email you about failing drives.