If you have a headless server, you probably don't physically look at it often. As such, it's easy to forget the physical layout of things like storage devices. When a disk fails, there's a bit of panic as you scramble to deal with it before another disk fails. (Side note: there are interesting statistics showing that HDDs from the same batch tend to fail together. I've worked through this scenario. If you buy several disks together, have a plan to act fast should one of them fail.)
You don't want to make any mistakes when resolving it. Probably the easiest mistake to make is pulling the wrong drive and causing other problems. You want to know the physical location of the failed disk, ideally without shutting down the whole system.
Luckily the SES-2 standard exists for enclosure management and FreeBSD has support for this in the base OS.
Using sesutil(8) will give you output like this.
# sesutil show ses0: <AHCI SGPIO Enclosure 2.00>; ID: 3061686369656d30 Desc Dev Model Ident Size/Status Slot 00 ada0 WDC WD100EFAX-68LHPN0 JEJXXXXX 10T Slot 01 ada1 WDC WD100EFAX-68LHPN0 JEK9XXXX 10T Slot 02 ada2 WDC WD140EFFX-68VBXN0 9RH5XXXX 14T Slot 03 ada3 WDC WD140EFFX-68VBXN0 X0GEXXXX 14T Slot 04 ada4 WDC WD140EFFX-68VBXN0 Z2JKXXXX 14T Slot 05 ada5 WDC WD140EFFX-68VBXN0 9RK0XXXX 14T Slot 06 ada6 Samsung SSD 870 EVO 2TB S6PNNJ0W413XXXX 2T Slot 07 ada7 Samsung SSD 870 EVO 2TB S6PNNJ0W413XXXX 2T
Notice the slot numbers. Those correspond to physical locations, which is a good thing to know when pulling a drive. The extra info helps to confirm you're addressing the correct drive.
sesutil will also let you make the light blink on the drive bay, which adds yet another safeguard to ensure you don't pull the wrong drive.
SES-2 in Linux
There is no SES-2 utility installed on Debian by default. You can install
sg3-utils (it's also available on FreeBSD), but it's a bit like bringing an atomic bomb to a fist fight.
You'll like need to visit the developer's site to get some hints on how to start. Of note,
Without any options sg_ses reports the names of the supported diagnostic pages. The SAS disk does support some pages but none that belong to the SES standard. On Linux it helps to list all SCSI devices with the lsscsi utility (including showing the generic device names with the '-g' option; 'modprobe sg' may be needed if hyphens appear in the last column)
Ok, so we need to install lsscsi because it's also not installed by default. On my Debian system, I get the following.
# lsscsi -g [0:0:0:0] disk ATA WDC WD201KFGX-68 0A83 /dev/sdb /dev/sg0 [0:0:1:0] disk ATA WDC WD201KFGX-68 0A83 /dev/sda /dev/sg1 [0:0:2:0] disk ATA Samsung SSD 870 2B6Q /dev/sdc /dev/sg2 [0:0:3:0] disk ATA WDC WD201KFGX-68 0A83 /dev/sdf /dev/sg3 [0:0:4:0] disk ATA Samsung SSD 870 2B6Q /dev/sde /dev/sg4 [0:0:5:0] disk ATA WDC WD201KFGX-68 0A83 /dev/sdd /dev/sg5 [0:0:6:0] enclosu SMC SC826-P 100b - /dev/sg6 [N:0:4:1] disk Samsung SSD 970 EVO Plus 1TB__1 /dev/nvme0n1 -
You might be thinking those numbers correspond to physical locations, but you'd be wrong. For comparison, here's the output of a custom script that I wrote to identify drive locations.
# report_physical.sh ╭─02────────────┬─05────────────┬─08────────────┬─11────────────╮ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ ├─01:sdb[LB]────┼─04:sda[LB]────┼─07:sdf[LB]────┼─10:sdd[LB]────┤ │ WDC 20TB 7200 │ WDC 20TB 7200 │ WDC 20TB 7200 │ WDC 20TB 7200 │ │ SATA 3.5 6Gbs │ SATA 3.5 6Gbs │ SATA 3.5 6Gbs │ SATA 3.5 6Gbs │ │ 37C 1.2 yrs │ 37C 1.2 yrs │ 37C 1.1 yrs │ 39C 1.1 yrs │ ├─00────────────┼─03────────────┼─06:sdc[LB]────┼─09:sde[LB]────┤ │ │ │ SAM 2TB SSD │ SAM 2TB SSD │ │ │ │ SATA 3.3 6Gbs │ SATA 3.3 6Gbs │ │ │ │ 34C 0.4 yrs │ 34C 0.4 yrs │ ╰───────────────┴───────────────┴───────────────┴───────────────╯
As we see, slots 1, 4, 6, 7, 9, and 10 are populated.
Now things get a lot more complicated. Running
sg_ses leaves you with hundreds of lines of hard to read info. I'm sure it makes sense to someone, but it's pretty intense. You won't find a direct answer in the output either. What you find is a trail of bread crumbs that you'll need to follow. For example, in the output below I know Slot 0 is empty because "target port for:" is empty, whereas Slot 1
shows a SATA device.
Correlating all this will be tedious.
# sg_ses -a /dev/bsg/0:0:6:0 [...lines omitted...] Slot00 [0,0] Element type: Array device slot [360/4176] Enclosure Status: Predicted failure=0, Disabled=0, Swap=0, status: Not installed OK=0, Reserved device=0, Hot spare=0, Cons check=0 In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0 App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0 Ready to insert=0, RMV=0, Ident=0, Report=0 App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0 Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0 Additional Element Status: Transport protocol: SAS number of phys: 1, not all phys: 0, device slot number: 0 phy index: 0 SAS device type: no SAS device attached initiator port for: target port for: attached SAS address: 0x0 SAS address: 0x0 phy identifier: 0x0 Slot01 [0,1] Element type: Array device slot Enclosure Status: Predicted failure=0, Disabled=0, Swap=0, status: OK OK=0, Reserved device=0, Hot spare=0, Cons check=0 In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0 App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0 Ready to insert=0, RMV=0, Ident=0, Report=0 App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0 Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0 Additional Element Status: Transport protocol: SAS number of phys: 1, not all phys: 0, device slot number: 1 phy index: 0 SAS device type: no SAS device attached initiator port for: target port for: SATA_device attached SAS address: 0x5003048020cc31bf SAS address: 0x5003048020cc3181 phy identifier: 0x0 [... hundreds of lines omitted ...]
As an alternative, you might be able to visually parse the output of /dev/disk. For example, if your disks are plugged into a SAS expander:
ls /dev/disk/by-path lrwxrwxrwx 1 root root 9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy10-lun-0 -> ../../sdd lrwxrwxrwx 1 root root 9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy1-lun-0 -> ../../sdb lrwxrwxrwx 1 root root 9 Aug 13 18:05 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy4-lun-0 -> ../../sda lrwxrwxrwx 1 root root 9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy6-lun-0 -> ../../sdc lrwxrwxrwx 1 root root 9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy7-lun-0 -> ../../sdf lrwxrwxrwx 1 root root 9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy9-lun-0 -> ../../sde
Note that "phyX" bit near the end. That correlates to the physical location. In my experience, whether or not this works will depend on your distribution.
It's not the easiest approach. And when disaster strikes, it's not likely that you're going to be in the mood to learn something complex or parse verbose output.
Simplicity = I'll Use It
sesutil because it's simple. It does a few things really well, and in a pinch the manual is concise enough to re-educate yourself quickly on its use.