Ata: libata-scsi: Sense data errors breaking hdparm with WD drives

After last update of kernel i have problem with issuing command with hdparm and it is confirmed. I am at kernel v6.6.44

$ hdparm -C /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]:  f0 00 01 00 50 40 ff 0a 00 00 78 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 drive state is:  unknown

While the expected output is the following:

$ hdparm -C /dev/sda
/dev/sda:
 drive state is:  active/idle

There was a commit in v6.6.44 and it was backported to the
v6.10.3 and v6.1.103 according to Christian Heusel and it breaks something with hdparm.


#regzbot introduced: 28ab9769117c
#regzbot title: ata: libata-scsi: Sense data errors breaking hdparm with WD drives


Please check because it is affecting not only WD but whole array of SATA devices (my HDD is not WD btw).

Full post from kernel org is here

Full message from **Christian Heusel**

Hello Igor, hello Niklas,

on my NAS I am encountering the following issue since v6.6.44 (LTS),
when executing the hdparm command for my WD-WCC7K4NLX884 drives to get
the active or standby state:

$ hdparm -C /dev/sda
/dev/sda:
SG_IO: bad/missing sense data, sb[]:  f0 00 01 00 50 40 ff 0a 00 00 78 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 drive state is:  unknown

While the expected output is the following:

$ hdparm -C /dev/sda
/dev/sda:
 drive state is:  active/idle

I did a bisection within the stable series and found the following
commit to be the first bad one:

28ab9769117c ("ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and no error")

According to kernel.dance the same commit was also backported to the
v6.10.3 and v6.1.103 stable kernels and I could not find any commit or
pending patch with a “Fixes:” tag for the offending commit.

So far I have not been able to test with the mainline kernel as this is
a remote device which I couldn’t rescue in case of a boot failure. Also
just for transparency it does have the out of tree ZFS module loaded,
but AFAIU this shouldn’t be an issue here, as the commit seems clearly
related to the error. If needed I can test with an untainted mainline
kernel on Friday when I’m near the device.

I have attached the output of hdparm -I below and would be happy to
provide further debug information or test patches.

Cheers,
Christian


#regzbot introduced: 28ab9769117c
#regzbot title: ata: libata-scsi: Sense data errors breaking hdparm with WD drives


2 Likes

I have always gotten this kind of message for some drive while applying spin down after time out with hdparm.
Decided to ingore those…
…was in fact when I tested to commands before I put them into the udev rules.


Why is it widgets installed by Get New… can not be updated in Get New? This has been the case since Plasma 6.0.3
I can remove and reinstall them there, trains your memory muscles! Hope I don’t have to do this each time or use Discover.

(Updates are listed as a different/new version, the updater is supposed to simply remove the old and install the new version, and not tell me it exists already.)

Installation of /tmp/iInvnZ-org.kde.plasma.catwalk.tar.gz failed: /home....local/share/plasma/plasmoids/org.kde.plasma.catwalk already exists
1 Like

Well, i never had any kind of errors before applying last update with latest kernel. And after some digging i found a info about regression.

2 Likes

Well, from the mailing list, one developer might suggest to revert that one commit, however the other pointed out that the user space program is broken as the whole thing is a frankenstein issue to begin with. Let’s see if Never break userspace rule will apply here …

Message from Niklas

unfold

You mean: the user space application is using the sense buffer without first
checking if the returned sense buffer is in descriptor or fixed format.

This seems like a fundamentally flawed assumption by the user space program.
If it doesn’t even bother checking the first field in the sense buffer, sb[0],
perhaps it shouldn’t bother trying to use the sense buffer at all.

(Yes, the D_SENSE bit can be configured by the user, but that doesn’t change
the fact that a user space program must check the format of the returned buffer
before trying to use it.)

Hmm… This is annoying. The kernel is fixed to be spec compliant but that
breaks old/non-compliant applications… We definitely should fix hdparm code,
but I think we still need to revert 28ab9769117c…

Well… if we look at commit:
11093cb1ef56 (“libata-scsi: generate correct ATA pass-through sense”)
libata-scsi: generate correct ATA pass-through sense · torvalds/linux@11093cb · GitHub

We can see that before that commit, the kernel used to call
ata_scsi_set_sense().

Back then ata_scsi_set_sense() was defined as:
linux/drivers/ata/libata-scsi.c at 11093cb1ef56147fe33f5750b1eab347bdef30db · torvalds/linux · GitHub
scsi_build_sense_buffer(0, cmd->sense_buffer, sk, asc, ascq);

Where the first argument to scsi_build_sense_buffer() is if the generated sense
buffer should be fixed or desc format (0 == fixed format), so we used to
generate the sense buffer in fixed format:
linux/drivers/scsi/scsi_common.c at 11093cb1ef56147fe33f5750b1eab347bdef30db · torvalds/linux · GitHub

However, as we can see, the kernel then used to incorrectly just
change sb[0} to say that the buffer was in desc format,
without updating the other fields, e.g. sb[2]:
linux/drivers/ata/libata-scsi.c at 3852e37382664a06cd006bb389a8223e32cedf45 · torvalds/linux · GitHub
so the format was really in some franken format…
following neither fixed or descriptor format.

11093cb1ef56 (“libata-scsi: generate correct ATA pass-through sense”)
did change so that successful ATA-passthrough commands always generated
the sense data in descriptor format. However, that commit also managed to
mess up the offsets for fixed format sense…

The commit that later changed ata_scsi_set_sense() to honor D_SENSE
was commit: 06dbde5f3a44 (“libata: Implement control mode page to select
sense format”)

So basically:
Before commit 11093cb1ef56 (“libata-scsi: generate correct ATA pass-through
sense”), we generated sense data in some franken format for both successful
and failed ATA-passthrough commands.

After commit 11093cb1ef56 (“libata-scsi: generate correct ATA pass-through
sense”) we generate sense data for sucessful ATA-passthrough commands in
descriptor format unconditionally, but still in franken format for failed
ATA-passthrough commands.

After commit 06dbde5f3a44 (“libata: Implement control mode page to select
sense format”) we generate sense data for sucessful ATA-passthrough commands
in descriptor format unconditionally, but for failed commands we actually
honored D_SENSE to generate it either in fixed format or descriptor format.
(However, because of a bug in 11093cb1ef56, if using fixed format, the
offsets were wrong…)

The incorrect offsets for fixed format was fixed recently, in commit
38dab832c3f4 (“ata: libata-scsi: Fix offsets for the fixed format sense data”)

Commit 28ab9769117c (“ata: libata-scsi: Honor the D_SENSE bit for CK_COND=1 and
no error”) fixed so that we actually honor D_SENSE not only for failed
ATA-passthrough commands, but also for successfull ATA-passthrough commands.

TL;DR: it is very hard to say that we have introduced a regression, because
this crap has basically been broken in one way or another since it was
introduced… Personally, I would definitely want all the patches that are in
mainline in the kernel running on my machine, since that is the only thing
that is consistent.

However, that assumes that user space programs that are trying to parse the
sense data actually bothers to check the first field in the sense data,
to see which format the returned sense data is in… Applications that
do not even both with that will have problems on a lot of (historic) kernel
versions.

Kind regards,
Niklas

2 Likes

True
I wanted to share my experience with a message I received, in case others might encounter the same issue. Just to clarify, this isn’t specific to WD drives only because my SATA is not WD.

btw
user in this post might face issues because of same regression. i might be wrong tho
if so, please delete my post there and thank you for your time!
I just wanted to help.

@Rionateam all good. We will monitor the situation and see how to resolve it in time.

2 Likes
Summary

Hm sorry I am sharing info for Debian systems here, I just did some updates on 15 or so “firmware” packages and I do not get the bad-sense data message anymore.

Maybe these contained the fix?

firmware-linux-nonfree (20240709-1~mx23ahs) …
firmware-linux (20240709-1~mx23ahs) …
firmware-misc-nonfree_20240709-1~mx23ahs_all

updates for MX Linux:

Summary

Unpacking mx-system (24.08.01mx23) over (24.05.01mx23) …
Setting up firmware-netxen (20240709-1~mx23ahs) …
Setting up firmware-intel-graphics (20240709-1~mx23ahs) …
Setting up firmware-iwlwifi (20240709-1~mx23ahs) …
Setting up firmware-bnx2x (20240709-1~mx23ahs) …
Setting up firmware-marvell-prestera (20240709-1~mx23ahs) …
Setting up firmware-atheros (20240709-1~mx23ahs) …
Setting up firmware-misc-nonfree (20240709-1~mx23ahs) …
Setting up firmware-nvidia-graphics (20240709-1~mx23ahs) …
Setting up firmware-intel-misc (20240709-1~mx23ahs) …
Setting up firmware-myricom (20240709-1~mx23ahs) …
Setting up firmware-brcm80211 (20240709-1~mx23ahs) …
Setting up firmware-mediatek (20240709-1~mx23ahs) …
Setting up firmware-realtek (20240709-1~mx23ahs) …
Setting up firmware-qlogic (20240709-1~mx23ahs) …
Setting up firmware-ipw2x00 (20240709-1~mx23ahs) …
Setting up firmware-libertas (20240709-1~mx23ahs) …
Setting up firmware-bnx2 (20240709-1~mx23ahs) …
Setting up firmware-intel-sound (20240709-1~mx23ahs) …
Setting up firmware-amd-graphics (20240709-1~mx23ahs) …
Setting up firmware-linux-nonfree (20240709-1~mx23ahs) …
Setting up firmware-linux (20240709-1~mx23ahs) …
Setting up mx-debian-firmware (24.08.10mx23) …
Setting up mx-system (24.08.01mx23) …

EDITED see below

Yes, its a kernel thing… Old kernels work, newer dont.
And not only WD drives, but Samsung and Seagate are affected ( on different motherboards).
Btw.: NVMe drives (M.2) are not affected…
Next:
may be an issue of quemu (udisk2 complains)
`Error probing device: Error sending ATA command IDENTIFY PACKET DEVICE to '/dev/sr0 · Issue #732 · storaged-project/udisks · GitHub
:nauseated_face:
EDIT:
Error probing device: Error sending ATA command IDENTIFY DEVICE to '/dev/sda': Unexpected sense data returned since kernel 6.10.3 update · Issue #1305 · storaged-project/udisks · GitHub

My above assumption was wrong, sorry, error still occurs!

It will be fixed soon
https://git.kernel.org/pub/scm/linux/kernel/git/libata/linux.git/commit/?id=fa0db8e5

2 Likes

The patch is already queued for the next point releases. Example 6.10.6: queue-6.10 - kernel/git/stable/stable-queue.git - Linux kernel stable patch queue

5 Likes

Hurray PhilM is the greatest – thank You so much :smiling_face_with_three_hearts:
EDIT works using linux6.10.6-1.

3 Likes

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.