Simplified SMART and Btrfs Reporting
Maintaining one data storage device is trivial. Maintaining twenty is tedious. This outlines my solution for getting simple view of SMART and Btrfs attributes from many disks in many servers with one command.
The Report
Source: github.com/markmcb/storage-scripts
Let's start with the result. Here's what I get emailed to me twice a week, or on-demand if I'm ever investigating something storage-related.
CORE up 2 days on 5.4.12-200.fc31.x86_64
DEV CAPAC AT TMP AGE SLFTEST PREFAIL
sda 466Gi 8 24C 1.8 6/043/. ....
sdh 466Gi 15 27C 1.8 6/104/. ....
sdd 932Gi 3 33C 1.6 3/012/. ....
sdb 7.3Ti 0 30C 2.7 5/072/. ........
sdc 7.3Ti 2 33C 1.3 4/103/. .......
sde 7.3Ti 4 29C 2.7 1/010/. ........
sdf 7.3Ti 6 29C 2.7 1/042/. ........
sdg 7.3Ti 10 29C 2.7 7/072/. ........
BTRFS_PATH SIZE AVAIL USE% SCB ERR
/ 111G 75G 32% 3 .
/mnt/ops 932G 356G 54% 28 .
/mnt/store 19T 4.4T 76% 18 .
BTRFS_PATH MAP DEVICE WRFCG
/ luks-8506 nvme0n1 .....
/mnt/ops luks-a4aa sda .....
/mnt/ops luks-fd8b sdh .....
/mnt/ops luks-0b9c sdd .....
/mnt/store luks-6e45 sde .....
/mnt/store luks-f5bd sdf .....
/mnt/store luks-fe18 sdb .....
/mnt/store luks-d375 sdg .....
/mnt/store luks-cb43 sdc .....
In the actual report that block of text would get repeated several times, once for each multi-disk server I monitor.
Plain Text
As you probably noticed, the report is all text. This is important for two reasons:
- Mobile: With text only, if I keep the output 45'ish characters wide, I can use a non-microscopic fixed-width font in an html email report that I look at on my phone. This avoids lines that wrap and break the column alignment.
- Headless: In addition to getting a mobile report, I'd like to be able to run the reports whenever I like. As I have no graphical desktop environment (i.e., headless) on my servers, they must produce only text output such that I can run them from my shell.
With that in mind, let's break down what we're looking at.
Recycled Content
This report is a bash script using recycled command output. It's 100% generated by manipulating the output of common commands like lsblk
, smartctl
, btrfs
and others, and then using some simple logic and formatting to get them into a denser view. The sections below walk through how the data is captured and assembled.
Header
The first line is basic server info. No real magic here.
CORE up 2 days on 5.4.12-200.fc31.x86_64
The value of the $HOST
environment variable with the domain removed is shown first and identifies which server the tables below belong to. The uptime shown after the server's name is the result of uptime | sed -E 's/.*(up[^,]*),.*/\1/'
. No real utility, just interesting to know. The last part is the Linux kernel version, which is a result of uname -r
. This is very important in the context of Btrfs. The pace of development on that file system is fast and furious, so the kernel version will clarify what features are/aren't available on that server.
Table 1: Block Device & SMART Information
The first table focuses on block devices (and not file systems).
DEV CAPAC AT TMP AGE SLFTEST PREFAIL
sda 466Gi 8 24C 1.8 6/043/. ....
sdh 466Gi 15 27C 1.8 6/104/. ....
sdd 932Gi 3 33C 1.6 3/012/. ....
sdb 7.3Ti 0 30C 2.7 5/072/. ........
sdc 7.3Ti 2 33C 1.3 4/103/. .......
sde 7.3Ti 4 29C 2.7 1/010/. ........
sdf 7.3Ti 6 29C 2.7 1/042/. ........
sdg 7.3Ti 10 29C 2.7 7/072/. ........
DEV
The device (DEV) column is gathered from lsblk -l -o NAME,TYPE -n | grep disk
. This gives us the values we'll loop through in the script to collect the remaining information.
CAPAC, AT
The capacity of the device (CAPAC) and path to the device (AT) are accessible from Linux in the /dev
and /sys
locations.
To get the capacity, I use this one-liner: echo "($(cat /sys/block/sda/size)*512)" | bc | numfmt --to=iec-i
, which grabs the raw value from /sys
, then uses bc
to multiply by 512, and finally numfmt to give a nice Gi or Ti suffix depending on the value.
To get the path, there is a little bit of complexity. I collect the value at /dev/disk/by-path
and then use a regex (that I pass via an argument so it can be server specific) to reduce it to a single number. find -L /dev/disk/by-path/ -samefile /dev/sda | sed -E "s/.*\///" | sed -E "s${pathsubst}"
. For example, the find command will return /dev/disk/by-path/pci-0000:02:00.0-sas-phy8-lun-0
, which has a lot of info that's not useful. The first sed reduces it to pci-0000:02:00.0-sas-phy8-lun-0
, and the second when given /.*phy([[:digit:]]+).*/\1/'
as an arg will result in 8
.
Why is this useful? When somthing fails, you usually want to act with certainty, especially if you're pulling it "hot," i.e., while the system is operational. If I wanted to pull sda for whatever reason I could reference this report, physically locate the disk and pull it.
TMP, AGE, PREFAIL
The temperature (TMP), years of power-on time (AGE), and failure flags (PREFAIL) are all derived from the smartctl -A /dev/sda
command, which produces this output:
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.12-200.fc31.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 15645
12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 44
177 Wear_Leveling_Count 0x0013 097 097 000 Pre-fail Always - 38
179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0
181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0
182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0
183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0
187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0032 076 050 000 Old_age Always - 24
195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0
199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0
235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 36
241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 48496243602
The temperature is gathered simply by looking for ID 190 or 194, grabbing the 10th column with awk with smartctl -A /dev/sda | grep -m1 -E "^19(0|4)" | awk '{print $10}'
and then appending a "C".
A similar approach is taken for the age, but with some division to go from hours to years.
The Pre-fail flags work like this. For each "Pre-fail" SMART attribute, take VALUE and subtract THRESH. If the result is 0 or less, then display x
, otherwise display a .
. So in the report we see ....
which corresponds to the 4 pre-fail attributes we see above. They're all .
because all values are well about the pre-failure threshold. If the report showed x...
then I'd know to go check the smart attributes and see what's below the threshold.
Alternatively, I can run the report and pass an argument to show SMART pre-fail attributes for all disks side-by-side.
$ storage_report -d -C -F
DEV CAPAC RRE TP SUT RSC SER STP SRC HL WLC URB RBB
sda 466Gi --- --- --- 90 --- --- --- --- 97 90 90
sdh 466Gi --- --- --- 90 --- --- --- --- 97 90 90
sdd 932Gi --- --- --- 90 --- --- --- --- 99 90 90
sdb 7.3Ti 84 76 127 95 33 108 40 75 --- --- ---
sdc 7.3Ti 84 73 160 95 33 108 40 --- --- --- ---
sde 7.3Ti 84 78 129 95 33 108 40 75 --- --- ---
sdf 7.3Ti 84 77 128 95 33 108 40 75 --- --- ---
sdg 7.3Ti 84 77 130 95 33 108 40 75 --- --- ---
I like this view. The numbers shown are VALUE minus THRESH, i.e., 0 or less is bad. ---
means this attribute isn't relevant for the device. I sort by storage capacity as it puts similar drives near each other. So if the pre-fail values for my 7.3Ti drives were 80, 80, 80, 80, 20 then I'd might investigate the last disk as it would appear to be trending toward failure faster than the others.
Since those headers can be hard to remember, I added an easy way to transpose the table and use long names for each row with -y
. In this example I've also included the Old_age attributes with -O
.
$ storage_report -d -C -F -O -y
DEVICE sda sdh sdd sdb sdc sde sdf sdg
CAPACITY 466Gi 466Gi 932Gi 7.3Ti 7.3Ti 7.3Ti 7.3Ti 7.3Ti
Raw_Read_Error_Rate --- --- --- 84 84 84 84 84
Throughput_Performance --- --- --- 76 73 78 77 77
Spin_Up_Time --- --- --- 127 160 129 128 130
Reallocated_Sector_Ct 90 90 90 95 95 95 95 95
Seek_Error_Rate --- --- --- 33 33 33 33 33
Seek_Time_Performance --- --- --- 108 108 108 108 108
Spin_Retry_Count --- --- --- 40 40 40 40 40
Helium_Level --- --- --- 75 --- 75 75 75
Wear_Leveling_Count 97 97 99 --- --- --- --- ---
Used_Rsvd_Blk_Cnt_Tot 90 90 90 --- --- --- --- ---
Runtime_Bad_Block 90 90 90 --- --- --- --- ---
Start_Stop_Count --- --- --- 100 100 100 100 100
Power_On_Hours 96 96 97 97 99 97 97 97
Power_Cycle_Count 99 99 99 100 100 100 100 100
Program_Fail_Cnt_Total 90 90 90 --- --- --- --- ---
Erase_Fail_Count_Total 90 90 90 --- --- --- --- ---
Uncorrectable_Error_Cnt 100 100 100 --- --- --- --- ---
Airflow_Temperature_Cel 76 73 68 --- --- --- --- ---
Power-Off_Retract_Count --- --- --- 62 70 63 62 62
Load_Cycle_Count --- --- --- 62 70 63 62 62
Temperature_Celsius --- --- --- 206 203 206 200 214
ECC_Error_Rate 200 200 200 --- --- --- --- ---
Reallocated_Event_Count --- --- --- 100 100 100 100 100
Current_Pending_Sector --- --- --- 100 100 100 100 100
Offline_Uncorrectable --- --- --- 100 100 100 100 100
CRC_Error_Count 100 100 100 200 200 200 200 200
POR_Recovery_Count 99 99 99 --- --- --- --- ---
Total_LBAs_Written 99 99 99 --- --- --- --- ---
This detailed view is great while investigating, but to answer "should I investigate anything?" I came up with the .
or x
approach.
SLFTEST
The self-test (SLFTEST) column uses the command smartctl -l selftest /dev/sda
, which produces:
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.12-200.fc31.x86_64] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 15487 -
# 2 Short offline Completed without error 00% 15319 -
# 3 Short offline Completed without error 00% 15151 -
# 4 Short offline Completed without error 00% 14983 -
# 5 Short offline Completed without error 00% 14815 -
# 6 Short offline Completed without error 00% 14647 -
# 7 Extended offline Completed without error 00% 14605 -
# 8 Short offline Completed without error 00% 14479 -
# 9 Short offline Completed without error 00% 14310 -
#10 Short offline Completed without error 00% 14142 -
I have my short tests scheduled to run once a week and extended tests scheduled every 3 months, such that every day and every month some disk is running its self-tests.
The first two values of 6/043/.
are days since the last short and extended tests respectively. The days are calculated by getting the Power_on_Hours
SMART attribute (shown in previous section) and subtracting this from LifeTime(hours)
for the last short and extended test in the table. If no test has been run, then -
or ---
are shown.
As with the pre-fail flags, the last character is a .
if the last test result is Completed without error
and an x
otherwise.
So 6/043/.
means a short test ran 6 days ago, an extended test ran 43 days ago, and the last test completed with out error.
Table 2: Btrfs Filesystem Information
This table focuses on the Btrfs file systems, reporting where they're mounted, the size and available space, the last time they were scrubbed and if any errors were reported during the last scrub.
BTRFS_PATH SIZE AVAIL USE% SCB ERR
/ 111G 75G 32% 3 .
/mnt/ops 932G 356G 54% 28 .
/mnt/store 19T 4.4T 76% 18 .
For this report we loop through all Btrfs mount points. We get them all into an array with paths=( $(df -T | grep btrfs | sed -E "s/.*% //" | tr '\n' ' ') )
.
BTRFS_PATH, SIZE, AVAIL, USE%
The first four columns all come from a single command, df -h -t btrfs --output=size,avail,pcent ${path} | tail -n1
. That gets returns the path, the size of that mount point, remaining available space, and usage as a percentage.
SCB, ERR
The last two columns come from variations of btrfs scrub status
. The days since last scrub (SCB) is obtained with "$(( ($(date +%s) - $(date --date="$(btrfs scrub status ${path} | grep "started" | sed -E "s/Scrub started: *//")" +%s) )/(60*60*24) ))"
, which converts today's date to seconds since the epoch, converts the last scrub start to seconds since the epoch, subtracts the two, the converts the resulting seconds to days. The errors (ERR) column simple checks if "no errors found" is in the scrub status and reports a .
if so and an x
otherwise with [[ $(btrfs scrub status ${path} | grep "summary") == *"no errors found"* ]] && echo -n "." || echo -n "x"
.
Table 3: Btrfs Device Information
This table focuses on a few per-device statuses Btrfs tracks for each physical device that is part of the file system.
BTRFS_PATH MAP DEVICE WRFCG
/ luks-8506 nvme0n1 .....
/mnt/ops luks-a4aa sda .....
/mnt/ops luks-fd8b sdh .....
/mnt/ops luks-0b9c sdd .....
/mnt/store luks-6e45 sde .....
/mnt/store luks-f5bd sdf .....
/mnt/store luks-fe18 sdb .....
/mnt/store luks-d375 sdg .....
/mnt/store luks-cb43 sdc .....
To better understand this report table, let's look at the raw output of btrfs device stats
. This command expects 1 arg so let's consider our first multi-disk path:
btrfs device stats /mnt/ops
[/dev/mapper/luks-a4aa4856-393b-4620-8c76-88c883cdb632].write_io_errs 0
[/dev/mapper/luks-a4aa4856-393b-4620-8c76-88c883cdb632].read_io_errs 0
[/dev/mapper/luks-a4aa4856-393b-4620-8c76-88c883cdb632].flush_io_errs 0
[/dev/mapper/luks-a4aa4856-393b-4620-8c76-88c883cdb632].corruption_errs 0
[/dev/mapper/luks-a4aa4856-393b-4620-8c76-88c883cdb632].generation_errs 0
[/dev/mapper/luks-fd8b8b99-7171-4c66-a8b9-bffe96e2c8af].write_io_errs 0
[/dev/mapper/luks-fd8b8b99-7171-4c66-a8b9-bffe96e2c8af].read_io_errs 0
[/dev/mapper/luks-fd8b8b99-7171-4c66-a8b9-bffe96e2c8af].flush_io_errs 0
[/dev/mapper/luks-fd8b8b99-7171-4c66-a8b9-bffe96e2c8af].corruption_errs 0
[/dev/mapper/luks-fd8b8b99-7171-4c66-a8b9-bffe96e2c8af].generation_errs 0
[/dev/mapper/luks-0b9c674e-1520-4c6c-aa8d-291f3b304a1f].write_io_errs 0
[/dev/mapper/luks-0b9c674e-1520-4c6c-aa8d-291f3b304a1f].read_io_errs 0
[/dev/mapper/luks-0b9c674e-1520-4c6c-aa8d-291f3b304a1f].flush_io_errs 0
[/dev/mapper/luks-0b9c674e-1520-4c6c-aa8d-291f3b304a1f].corruption_errs 0
[/dev/mapper/luks-0b9c674e-1520-4c6c-aa8d-291f3b304a1f].generation_errs 0
This output is simple, but very verbose when several disks are involved. Five counters are shown for each device. In my case, the devices are all /dev/mapper devices since I've used full disk encryption with LUKS. It's not immediately obvious which /dev/sdX devices they map to. As far as the numbers go, 0=good. Anything else is bad.
Just like in the previous table, we use df
to get a list of all the Btrfs mount points.
BTRFS_PATH, MAP
As we loop though the devices, we output them as BTRFS_PATH. We then collect all the devices listed in [square brackets]
from the btrfs device stats
command and remove duplicates. To keep output minimal, I decided to keep only the first 9 characters, e.g., luks-a4aa
, in the report, but a -v
option exists to show the full path.
DEVICE
In order to show sda
alongside luks-a4aa
we need something that associates the too. Once again lsblk
has what we need, though it requires some critical manipulation. Let's consider just one device to illustrate the challenge.
lsblk -n -i -p -o NAME,TYPE,UUID /dev/sda
/dev/sda disk a4aa4856-393b-4620-8c76-88c883cdb632
`-/dev/mapper/luks-a4aa4856-393b-4620-8c76-88c883cdb632 crypt 709f7252-4bdb-43a2-9601-7ee94c15d501
You can see the graphical connection, but it's much easier to match and lookup things when it's all on one line. lsblk -n -i -p -o NAME,TYPE,UUID | while read line; do printf "%s" "$([[ "${line}" == *"disk"* ]] && printf $'\n'"%s" "$line" || printf "%s " "$line")"; done
does exactly that. With that output, we can grep
a line for luks-a4aa...
and then pipe the result through awk '{printf $1}'
to get the first column of the results.
WRFCG
This column is just the first letter of write_io_errs, read_io_errs, flush_io_errs, corruption_errs, and generation_errs. As with other tables in this report, I check the values of each and report .
if the value is 0 and x
if it's greater than 0.
More Than One Server In The Report
{% include image.html file="ios_report.png" link_to_self="true" alt="The report as viewed on an iPhone." figure_style="float: right; max-width: 40%; margin: 10px 0 10px 30px" caption="The report as viewed on an iPhone." %}
So everything above describes the steps to get a report locally on a single server. But I have three. One solution would be to have each server run its own report and email it to me, but then I'd get three emails. While not terrible, I'm far more likely to scan over the report if it's just one email.
To accomplish this single email approach, I have one server scheduled via a cronjob to run a script that runs the report locally, and then ssh into each server I'm interested in, run the report, and pull the results into a single email. I also leverage this script to make a multi-part email as I've found most email clients will render plain text email with a non-fixed-width font. I use the html portion of the email to explicitly declare a monospace font-family, which ensures the report columns stay aligned and readable.
To ssh into all my servers in the cron'ed script, I use the keychain command, which interfaces with ssh-agent
and allows me to ssh to my other servers without entering my ssh key's password each time.
Consider this snippet. The source
calls ensure I'm using an existing keychain instance. The first plain_text_report
assignment is a local one. The second is over ssh.
source "/home/${USER}/.keychain/${HOSTNAME}-sh"
plain_text_report="CORE $(uptime -p) on $(uname -r)"$'\n\n'
plain_text_report+="ECHO $(source "/home/${USER}/.keychain/${HOSTNAME}-sh";
ssh -l ${USER} -i /home/${USER}/.ssh/id_ed25519 10.0.1.202 "uptime -p")
Final Notes
This isn't a particularly fast report. For my purposes, which is running this every few days or on demand, it works just fine and it's very hackable. There are some obvious areas that could be optimized (e.g., caching smartctl calls) that maybe I'll get to someday. If you'd like to improve it, it's on github.