FreeBSD's SES-2 Utility

If you have a headless server, you probably don't physically look at it often. As such, it's easy to forget the physical layout of things like storage devices. When a disk fails, there's a bit of panic as you scramble to deal with it before another disk fails. (Side note: there are interesting statistics showing that HDDs from the same batch tend to fail together. I've worked through this scenario. If you buy several disks together, have a plan to act fast should one of them fail.)

You don't want to make any mistakes when resolving it. Probably the easiest mistake to make is pulling the wrong drive and causing other problems. You want to know the physical location of the failed disk, ideally without shutting down the whole system.

Luckily the SES-2 standard exists for enclosure management and FreeBSD has support for this in the base OS.

Using sesutil

Using sesutil(8) will give you output like this.

# sesutil show
ses0: <AHCI SGPIO Enclosure 2.00>; ID: 3061686369656d30
Desc     Dev     Model                     Ident                Size/Status
Slot 00  ada0    WDC WD100EFAX-68LHPN0     JEJXXXXX             10T
Slot 01  ada1    WDC WD100EFAX-68LHPN0     JEK9XXXX             10T
Slot 02  ada2    WDC WD140EFFX-68VBXN0     9RH5XXXX             14T
Slot 03  ada3    WDC WD140EFFX-68VBXN0     X0GEXXXX             14T
Slot 04  ada4    WDC WD140EFFX-68VBXN0     Z2JKXXXX             14T
Slot 05  ada5    WDC WD140EFFX-68VBXN0     9RK0XXXX             14T
Slot 06  ada6    Samsung SSD 870 EVO 2TB   S6PNNJ0W413XXXX      2T
Slot 07  ada7    Samsung SSD 870 EVO 2TB   S6PNNJ0W413XXXX      2T

Notice the slot numbers. Those correspond to physical locations, which is a good thing to know when pulling a drive. The extra info helps to confirm you're addressing the correct drive.

sesutil will also let you make the light blink on the drive bay, which adds yet another safeguard to ensure you don't pull the wrong drive.

SES-2 in Linux

There is no SES-2 utility installed on Debian by default. You can install sg3-utils (it's also available on FreeBSD), but it's a bit like bringing an atomic bomb to a fist fight.

You'll like need to visit the developer's site to get some hints on how to start. Of note,

Without any options sg_ses reports the names of the supported diagnostic pages. The SAS disk does support some pages but none that belong to the SES standard. On Linux it helps to list all SCSI devices with the lsscsi utility (including showing the generic device names with the '-g' option; 'modprobe sg' may be needed if hyphens appear in the last column)

Ok, so we need to install lsscsi because it's also not installed by default. On my Debian system, I get the following.

# lsscsi -g
[0:0:0:0]    disk    ATA      WDC WD201KFGX-68 0A83  /dev/sdb   /dev/sg0
[0:0:1:0]    disk    ATA      WDC WD201KFGX-68 0A83  /dev/sda   /dev/sg1
[0:0:2:0]    disk    ATA      Samsung SSD 870  2B6Q  /dev/sdc   /dev/sg2
[0:0:3:0]    disk    ATA      WDC WD201KFGX-68 0A83  /dev/sdf   /dev/sg3
[0:0:4:0]    disk    ATA      Samsung SSD 870  2B6Q  /dev/sde   /dev/sg4
[0:0:5:0]    disk    ATA      WDC WD201KFGX-68 0A83  /dev/sdd   /dev/sg5
[0:0:6:0]    enclosu SMC      SC826-P          100b  -          /dev/sg6
[N:0:4:1]    disk    Samsung SSD 970 EVO Plus 1TB__1            /dev/nvme0n1  -

You might be thinking those numbers correspond to physical locations, but you'd be wrong. For comparison, here's the output of a custom script that I wrote to identify drive locations.

# report_physical.sh
╭─02────────────┬─05────────────┬─08────────────┬─11────────────╮
│               │               │               │               │
│               │               │               │               │
│               │               │               │               │
├─01:sdb[LB]────┼─04:sda[LB]────┼─07:sdf[LB]────┼─10:sdd[LB]────┤
│ WDC 20TB 7200 │ WDC 20TB 7200 │ WDC 20TB 7200 │ WDC 20TB 7200 │
│ SATA 3.5 6Gbs │ SATA 3.5 6Gbs │ SATA 3.5 6Gbs │ SATA 3.5 6Gbs │
│ 37C  1.2 yrs  │ 37C  1.2 yrs  │ 37C  1.1 yrs  │ 39C  1.1 yrs  │
├─00────────────┼─03────────────┼─06:sdc[LB]────┼─09:sde[LB]────┤
│               │               │ SAM  2TB SSD  │ SAM  2TB SSD  │
│               │               │ SATA 3.3 6Gbs │ SATA 3.3 6Gbs │
│               │               │ 34C  0.4 yrs  │ 34C  0.4 yrs  │
╰───────────────┴───────────────┴───────────────┴───────────────╯

As we see, slots 1, 4, 6, 7, 9, and 10 are populated.

Now things get a lot more complicated. Running sg_ses leaves you with hundreds of lines of hard to read info. I'm sure it makes sense to someone, but it's pretty intense. You won't find a direct answer in the output either. What you find is a trail of bread crumbs that you'll need to follow. For example, in the output below I know Slot 0 is empty because "target port for:" is empty, whereas Slot 1 shows a SATA device.

Correlating all this will be tedious.

# sg_ses -a /dev/bsg/0:0:6:0
[...lines omitted...]
Slot00 [0,0]  Element type: Array device slot                                                                                           [360/4176]
  Enclosure Status:
    Predicted failure=0, Disabled=0, Swap=0, status: Not installed
    OK=0, Reserved device=0, Hot spare=0, Cons check=0
    In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0
    App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0
    Ready to insert=0, RMV=0, Ident=0, Report=0
    App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0
    Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0
  Additional Element Status:
    Transport protocol: SAS
    number of phys: 1, not all phys: 0, device slot number: 0
    phy index: 0
      SAS device type: no SAS device attached
      initiator port for:
      target port for:
      attached SAS address: 0x0
      SAS address: 0x0
      phy identifier: 0x0
Slot01 [0,1]  Element type: Array device slot
  Enclosure Status:
    Predicted failure=0, Disabled=0, Swap=0, status: OK
    OK=0, Reserved device=0, Hot spare=0, Cons check=0
    In crit array=0, In failed array=0, Rebuild/remap=0, R/R abort=0
    App client bypass A=0, Do not remove=0, Enc bypass A=0, Enc bypass B=0
    Ready to insert=0, RMV=0, Ident=0, Report=0
    App client bypass B=0, Fault sensed=0, Fault reqstd=0, Device off=0
    Bypassed A=0, Bypassed B=0, Dev bypassed A=0, Dev bypassed B=0
  Additional Element Status:
    Transport protocol: SAS
    number of phys: 1, not all phys: 0, device slot number: 1
    phy index: 0
      SAS device type: no SAS device attached
      initiator port for:
      target port for: SATA_device
      attached SAS address: 0x5003048020cc31bf
      SAS address: 0x5003048020cc3181
      phy identifier: 0x0
[... hundreds of lines omitted ...]

As an alternative, you might be able to visually parse the output of /dev/disk. For example, if your disks are plugged into a SAS expander:

ls /dev/disk/by-path
lrwxrwxrwx 1 root root   9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy10-lun-0 -> ../../sdd
lrwxrwxrwx 1 root root   9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy1-lun-0 -> ../../sdb
lrwxrwxrwx 1 root root   9 Aug 13 18:05 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy4-lun-0 -> ../../sda
lrwxrwxrwx 1 root root   9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy6-lun-0 -> ../../sdc
lrwxrwxrwx 1 root root   9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy7-lun-0 -> ../../sdf
lrwxrwxrwx 1 root root   9 Aug 11 21:42 pci-0000:01:00.0-sas-exp0x5003048020cc31bf-phy9-lun-0 -> ../../sde

Note that "phyX" bit near the end. That correlates to the physical location. In my experience, whether or not this works will depend on your distribution.

It's not the easiest approach. And when disaster strikes, it's not likely that you're going to be in the mood to learn something complex or parse verbose output.

Simplicity = I'll Use It

I use sesutil because it's simple. It does a few things really well, and in a pinch the manual is concise enough to re-educate yourself quickly on its use.