FreeBSD's SES-2 Utility
If you have a headless server, you probably don't physically look at it often. As such, it's easy to forget the physical layout of things like storage devices. When a disk fails, there's a bit of panic as you scramble to deal with it before another disk fails. (Side note: there are interesting statistics showing that HDDs from the same batch tend to fail together. I've worked through this scenario. If you buy several disks together, have a plan to act fast should one of them fail.)
You don't want to make any mistakes when resolving it. Probably the easiest mistake to make is pulling the wrong drive and causing other problems. You want to know the physical location of the failed disk, ideally without shutting down the whole system.
Luckily the SES-2 standard exists for enclosure management and FreeBSD has support for this in the base OS.
Using sesutil
Using sesutil(8) will give you output like this.
# sesutil show
ses0: <AHCI SGPIO Enclosure 2.00>; ID: 3061686369656d30
Desc Dev Model Ident Size/Status
Slot 00 ada0 WDC WD100EFAX-68LHPN0 JEJXXXXX 10T
Slot 01 ada1 WDC WD100EFAX-68LHPN0 JEK9XXXX 10T
Slot 02 ada2 WDC WD140EFFX-68VBXN0 9RH5XXXX 14T
Slot 03 ada3 WDC WD140EFFX-68VBXN0 X0GEXXXX 14T
Slot 04 ada4 WDC WD140EFFX-68VBXN0 Z2JKXXXX 14T
Slot 05 ada5 WDC WD140EFFX-68VBXN0 9RK0XXXX 14T
Slot 06 ada6 Samsung SSD 870 EVO 2TB S6PNNJ0W413XXXX 2T
Slot 07 ada7 Samsung SSD 870 EVO 2TB S6PNNJ0W413XXXX 2T
Notice the slot numbers. Those correspond to physical locations, which is a good thing to know when pulling a drive. The extra info helps to confirm you're addressing the correct drive.
sesutil
will also let you make the light blink on the drive bay, which adds yet another safeguard to ensure you don't pull the wrong drive.
SES-2 in Linux
There is no SES-2 utility installed on Debian by default. You can install sg3-utils
(it's also available on FreeBSD), but it's a bit like bringing an atomic bomb to a fist fight.
You'll like need to visit the developer's site to get some hints on how to start. Of note,
Without any options sg_ses reports the names of the supported diagnostic pages. The SAS disk does support some pages but none that belong to the SES standard. On Linux it helps to list all SCSI devices with the lsscsi utility (including showing the generic device names with the '-g' option; 'modprobe sg' may be needed if hyphens appear in the last column)
Ok, so we need to install lsscsi because it's also not installed by default. On my Debian system, I get the following.
# lsscsi -g
[0:0:0:0] disk ATA WDC WD201KFGX-68 0A83 /dev/sdb /dev/sg0
[0:0:1:0] disk ATA WDC WD201KFGX-68 0A83 /dev/sda /dev/sg1
[0:0:2:0] disk ATA Samsung SSD 870 2B6Q /dev/sdc /dev/sg2
[0:0:3:0] disk ATA WDC WD201KFGX-68 0A83 /dev/sdf /dev/sg3
[0:0:4:0] disk ATA Samsung SSD 870 2B6Q /dev/sde /dev/sg4
[0:0:5:0] disk ATA WDC WD201KFGX-68 0A83 /dev/sdd /dev/sg5
[0:0:6:0] enclosu SMC SC826-P 100b - /dev/sg6
[N:0:4:1] disk Samsung SSD 970 EVO Plus 1TB__1 /dev/nvme0n1 -
You might be thinking those numbers correspond to physical locations, but you'd be wrong. For comparison, here's the output of a custom script that I wrote to identify drive locations.
# report_physical.sh
╭─02────────────┬─05────────────┬─08────────────┬─11────────────╮
│ │ │ │ │
│ │ │ │ │
│ │ │ │ │
├─01:sdb[LB]────┼─04:sda[LB]────┼─07:sdf[LB]────┼─10:sdd[LB]────┤
│ WDC 20TB 7200 │ WDC 20TB 7200 │ WDC 20TB 7200 │ WDC 20TB 7200 │
│ SATA 3.5 6Gbs │ SATA 3.5 6Gbs │ SATA 3.5 6Gbs │ SATA 3.5 6Gbs │
│ 37C 1.2 yrs │ 37C 1.2 yrs │ 37C 1.1 yrs │ 39C 1.1 yrs │
├─00────────────┼─03────────────┼─06:sdc[LB]────┼─09:sde[LB]────┤
│ │ │ SAM 2TB SSD │ SAM 2TB SSD │
│ │ │ SATA 3.3 6Gbs │ SATA 3.3 6Gbs │
│ │ │ 34C 0.4 yrs │ 34C 0.4 yrs │
╰───────────────┴───────────────┴───────────────┴───────────────╯
As we see, slots 1, 4, 6, 7, 9, and 10 are populated.
Now things get a lot more complicated. Running sg_ses
leaves you with hundreds of lines of hard to read info. I'm sure it makes sense to someone, but it's pretty intense. You won't find a direct answer in the output either. What you find is a trail of bread crumbs that you'll need to follow. For example, in the output below I know Slot 0 is empty because "target port for:" is empty, whereas Slot 1
shows a SATA device.
Correlating all this will be tedious.
# sg_ses -a /dev/bsg/0:0:6:0
[...lines omitted...]
Slot00 [0,0] Element type: Array device slot [360/4176]
Enclosure Status:
Predicted failure=0, Disabled=0, Swap=0, status: Not installed
OK=0, Reserved device=0, Hot spare=0, Cons check=0
In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0
App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0
Ready to insert=0, RMV=0, Ident=0, Report=0
App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0
Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0
Additional Element Status:
Transport protocol: SAS
number of phys: 1, not all phys: 0, device slot number: 0
phy index: 0
SAS device type: no SAS device attached
initiator port for:
target port for:
attached SAS address: 0x0
SAS address: 0x0
phy identifier: 0x0
Slot01 [0,1] Element type: Array device slot
Enclosure Status:
Predicted failure=0, Disabled=0, Swap=0, status: OK
OK=0, Reserved device=0, Hot spare=0, Cons check=0
In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0
App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0
Ready to insert=0, RMV=0, Ident=0, Report=0
App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0
Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0
Additional Element Status:
Transport protocol: SAS
number of phys: 1, not all phys: 0, device slot number: 1
phy index: 0
SAS device type: no SAS device attached
initiator port for:
target port for: SATA_device
attached SAS address: 0x5003048020cc31bf
SAS address: 0x5003048020cc3181
phy identifier: 0x0
[... hundreds of lines omitted ...]
As an alternative, you might be able to visually parse the output of /dev/disk. For example, if your disks are plugged into a SAS expander:
ls /dev/disk/by-path
lrwxrwxrwx 1 root root 9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy10-lun-0 -> ../../sdd
lrwxrwxrwx 1 root root 9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy1-lun-0 -> ../../sdb
lrwxrwxrwx 1 root root 9 Aug 13 18:05 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy4-lun-0 -> ../../sda
lrwxrwxrwx 1 root root 9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy6-lun-0 -> ../../sdc
lrwxrwxrwx 1 root root 9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy7-lun-0 -> ../../sdf
lrwxrwxrwx 1 root root 9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy9-lun-0 -> ../../sde
Note that "phyX" bit near the end. That correlates to the physical location. In my experience, whether or not this works will depend on your distribution.
It's not the easiest approach. And when disaster strikes, it's not likely that you're going to be in the mood to learn something complex or parse verbose output.
Simplicity = I'll Use It
I use sesutil
because it's simple. It does a few things really well, and in a pinch the manual is concise enough to re-educate yourself quickly on its use.