Smartd.v7
| Installing | New Server | Mrepo | smartd | RAID | Hardening | YUM | Crontabs | LogWatch | systemctl | firewalld | CentOS 7 | |
|
Apache | Bind | Cacti | DHCP | mariadb | Samba | Sarg | Sendmail | Smokeping | Rsync | Work Apps | |
| Problems | VPN | VPN Win | Extras | Bash | MailScanner | Horde | Google CE | Wake Up | KVM | |||
| Other | Computer Lab | ISO2USB | aiContact | Google CE | Android | USB Live | SRS XML |
Contents
smartd
YUM
yum install smartmontools
systemctl
systemctl status smartd systemctl start smartd systemctl enable smartd
Usage
smartctl -H /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.10.0-123.9.3.el7.x86_64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED
Try also
smartctl -i /dev/sda
Testing
smartctl --test=short /dev/sda
Takes about a minute then run
smartctl -a /dev/sda
smartctl --test=long /dev/sda
This can take several hours, while it is running you can
smartctl -a /dev/sda
And near the top you will see:
Self-test execution status: ( 244) Self-test routine in progress...
40% of test remaining.
Once the test has completed
Reporting
emacs /etc/smartmontools/smartd.conf
Standard
DEVICESCAN -H -m root -M exec /usr/libexec/smartmontools/smartdnotify -n standby,10,q
-d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N -T TYPE set the tolerance to one of: normal, permissive -o VAL Enable/disable automatic offline tests (on/off) -S VAL Enable/disable attribute autosave (on/off) -n MODE No check. MODE is one of: never, sleep, standby, idle -H Monitor SMART Health Status, report if failed -l TYPE Monitor SMART log. Type is one of: error, selftest -f Monitor for failure of any 'Usage' Attributes -m ADD Send warning email to ADD for -H, -l error, -l selftest, and -f -M TYPE Modify email warning behavior (see man page) -s REGE Start self-test when type/date matches regular expression (see man page) -p Report changes in 'Prefailure' Normalized Attributes -u Report changes in 'Usage' Normalized Attributes -t Equivalent to -p and -u Directives -r ID Also report Raw values of Attribute ID with -p, -u or -t -R ID Track changes in Attribute ID Raw value with -p, -u or -t -i ID Ignore Attribute ID for -f Directive -I ID Ignore Attribute ID for -p, -u or -t Directive -C ID Report if Current Pending Sector count non-zero -U ID Report if Offline Uncorrectable count non-zero -W D,I,C Monitor Temperature D)ifference, I)nformal limit, C)ritical limit -v N,ST Modifies labeling of Attribute N (see man page) -a Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198 -F TYPE Use firmware bug workaround. Type is one of: none, samsung -P TYPE Drive-specific presets: use, ignore, show, showall # Comment: text after a hash sign is ignored \ Line continuation character Attribute ID is a decimal integer 1 <= ID <= 255 except for -C and -U, where ID = 0 turns them off. All but -d, -m and -M Directives are only implemented for ATA devices If the test string DEVICESCAN is the first uncommented text then smartd will scan for devices /dev/hd[a-l] and /dev/sd[a-z] DEVICESCAN may be followed by any desired Directives.
Hash the previous DEVICESCAN line and add
/dev/sda -a -d sat -m root -M test /dev/sda -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/libexec/smartmontools/smartdnotify /dev/sdb -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/libexec/smartmontools/smartdnotify /dev/sda -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/libexec/smartmontools/smartdnotify
and for as many HDs that you have or want to monitor. You should also get a test email.
Here’s what each option does:
/dev/sda: Replace this with the device file you’ve been using in smartctl commands.
- -a: This enables some common options. You almost certainly want to use it.
- -d sat: On my system, smartctl correctly guesses that I have a serial ata drive. smartd on the other hand does not. If you had to add a “-d TYPE” parameter to the smartctl commands, you’ll almost certainly have to do the same here. If you didn’t, try leaving it out initially. You can add it later if smartd fails to start.
- -o on, -S on: These have the same meaning as the smartctl equivalents
- -s (S/../.././02|L/../../6/03): This schedules the short and long self-tests. In this example, the short self-test will run daily at 2:00 A.M. The long test will run on Saturday’s at 3:00 A.M. For more information, see the smartd.conf man page.
- -m root: If any errors occur, smartd will send email to root. On my system, mail for root is forwarded to my normal email account. If you don’t have a similar setup, replace root with your normal email address. This option also requires a working email setup. Most Linux distributions automatically have working outbound email.
- -M exec /usr/share/smartmontools/smartd-runner: This last part may be specific to the Debian and Ubuntu smartmontools packages. Check if your system has /usr/share/smartmontools/smartd-runner. If it doesn’t, remove this option. Instead of sending email directly, “-M exec” makes smartd run a different command when errors occur. On Debian, smartd-runner will run each script in /etc/smartmontools/run.d/, one of which emails the user specified by the “-m” option.
What you can expect.
This email was generated by the smartd daemon running on: host name: vmsrv.somewhere.co.nz DNS domain: somewhere.co.nz NIS domain: (none) The following warning/error was logged by the smartd daemon: TEST EMAIL from smartd for device: /dev/sdc For details see host's SYSLOG (default: /var/log/messages).
Reporting
You will also have to set up send mail to enable reporting.
http://ai.net.nz/wiki/index.php?title=Sendmail.v7
Setup reporting back to our server address
nano /etc/aliases
Add
root: servers@xx.net.nz
Save and run
newaliases
What is bad?
A couple of articles to try and make sense of what constitutes a bad error, and when a hard drive needs to be replaced. http://www.computerworld.com/article/2846009/the-5-smart-stats-that-actually-predict-hard-drive-failure.html, https://www.backblaze.com/blog/hard-drive-smart-stats/.
Insummary:
- SMART 5 - Reallocated_Sector_Count.
- SMART 187 - Reported_Uncorrectable_Errors.
- SMART 188 - Command_Timeout.
- SMART 197 - Current_Pending_Sector_Count.
- SMART 198 - Offline_Uncorrectable
Centos 7
Centos 6
Centos 5
For Centos 5
emacs /etc/aliases
Forward root to your server monitor email address.
newaliases
Let's test
mail -s "Testing from my new server" root </root/.bashrc
Check that you received the email.
smartd.conf
If your smartd.conf has
DEVICESCAN -H -m root
Add to this
DEVICESCAN -H -m root -M test
Restart smartd
/etc/init.d/smartd restart
and you should get a test email Try now setting up extensive testing, hash out the previous DEVICESCAN testing line and add the following.
/dev/sda -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M test
You should also get a test email. Finally to set up monitoring testing I would use something like this
/dev/sda -a -d sat -m root -M test /dev/sda -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/libexec/smartmontools/smartdnotify /dev/sdb -a -d sat -o on -S on -s (S/../.././02|L/../../6/03) -m root -M exec /usr/libexec/smartmontools/smartdnotify
And for any more drives I want to test.
The first line gives me an email everything the daemon is restarted, I like that, and the following lines do the testing.
HP Raid Controller
https://www.thegeekstuff.com/2014/07/hpacucli-examples/
Download and install: https://support.hpe.com/hpsc/swd/public/detail?swItemId=MTX_04bffb688a73438598fef81ddd
OK
- Did you find this page useful?
- Do you have an issue that you have not yet fixed?
We can do this for you.
I am available for technical support. Please follow this link. Tech Support Request.
+64-6-880-0000 : ++1-808-498-7146 : help@ai.net.nz
Getting us to help you