Timeouts of about 1 minute with new SSD BX500, HD led on, kernel > 4.14

winnie · 20 July 2021 10:28

Have you tried with a different SATA cable and/or different SATA port and/or check for a loose connection? (All three steps above are not mutually exclusive. You can do all three at the same time.)

Are you saying with kernel 4.14, you get none of the above errors when using the drive?

winnie · 20 July 2021 10:29

If it works on KDE Neon kernel 5.8, what makes you believe the drive only behaves on kernel 4.14 and earlier?

gsm · 20 July 2021 11:19

I tried already several cables and another SATA port. No difference then. I will try some more ports.
Here is some dmesg output running kernel 4.14. No lagging at all with kernel 4.14.
Some errors: softreset failed (device not ready).

[ 1.277707] libata version 3.00 loaded.
[ 1.290029] ehci-pci 0000:00:13.5: USB 2.0 started, EHCI 1.00
[ 1.290283] hub 1-0:1.0: USB hub found
[ 1.290290] hub 1-0:1.0: 10 ports detected
[ 1.290787] ohci-pci: OHCI PCI platform driver
[ 1.291418] scsi host0: pata_atiixp
[ 1.291514] scsi host1: pata_atiixp
[ 1.291546] ata1: PATA max UDMA/100 cmd 0x1f0 ctl 0x3f6 bmdma 0xf900 irq 14
[ 1.291547] ata2: PATA max UDMA/100 cmd 0x170 ctl 0x376 bmdma 0xf908 irq 15
[ 1.291605] ahci 0000:00:12.0: version 3.0
[ 1.291719] ahci 0000:00:12.0: controller can’t do 64bit DMA, forcing 32bit
[ 1.291808] ahci 0000:00:12.0: AHCI 0001.0100 32 slots 4 ports 3 Gbps 0xf impl SATA mode
[ 1.291810] ahci 0000:00:12.0: flags: ncq sntf ilck pm led clo pmp pio slum part ccc
[ 1.292332] scsi host2: ahci
[ 1.295784] scsi host3: ahci
[ 1.299266] scsi host4: ahci
[ 1.299394] scsi host5: ahci
[ 1.299431] ata3: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f100 irq 22
[ 1.299434] ata4: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f180 irq 22
[ 1.299436] ata5: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f200 irq 22
[ 1.299438] ata6: SATA max UDMA/133 abar m1024@0xfe02f000 port 0xfe02f280 irq 22
[ 1.299692] ohci-pci 0000:00:13.0: OHCI PCI host controller
[ 1.299699] ohci-pci 0000:00:13.0: new USB bus registered, assigned bus number 2
[ 1.299734] ohci-pci 0000:00:13.0: irq 16, io mem 0xfe02e000
[ 1.307452] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0
[ 1.357585] hub 2-0:1.0: USB hub found
[ 1.357594] hub 2-0:1.0: 2 ports detected
[ 1.357884] ohci-pci 0000:00:13.1: OHCI PCI host controller
[ 1.357889] ohci-pci 0000:00:13.1: new USB bus registered, assigned bus number 3
[ 1.357923] ohci-pci 0000:00:13.1: irq 17, io mem 0xfe02d000
[ 1.417553] hub 3-0:1.0: USB hub found
[ 1.417562] hub 3-0:1.0: 2 ports detected
[ 1.418105] ohci-pci 0000:00:13.2: OHCI PCI host controller
[ 1.418111] ohci-pci 0000:00:13.2: new USB bus registered, assigned bus number 4
[ 1.418144] ohci-pci 0000:00:13.2: irq 18, io mem 0xfe02c000
[ 1.477547] hub 4-0:1.0: USB hub found
[ 1.477555] hub 4-0:1.0: 2 ports detected
[ 1.477770] ohci-pci 0000:00:13.3: OHCI PCI host controller
[ 1.477775] ohci-pci 0000:00:13.3: new USB bus registered, assigned bus number 5
[ 1.477793] ohci-pci 0000:00:13.3: irq 17, io mem 0xfe02b000
[ 1.537528] hub 5-0:1.0: USB hub found
[ 1.537537] hub 5-0:1.0: 2 ports detected
[ 1.537843] ohci-pci 0000:00:13.4: OHCI PCI host controller
[ 1.537851] ohci-pci 0000:00:13.4: new USB bus registered, assigned bus number 6
[ 1.537869] ohci-pci 0000:00:13.4: irq 18, io mem 0xfe02a000
[ 1.597536] hub 6-0:1.0: USB hub found
[ 1.597545] hub 6-0:1.0: 2 ports detected
[ 1.613458] ata6: SATA link down (SStatus 0 SControl 300)
[ 1.770048] ata4: softreset failed (device not ready)
[ 1.770139] ata4: applying PMP SRST workaround and retrying
[ 1.770158] ata3: softreset failed (device not ready)
[ 1.770250] ata3: applying PMP SRST workaround and retrying
[ 1.770269] ata5: softreset failed (device not ready)
[ 1.770362] ata5: applying PMP SRST workaround and retrying
[ 1.800038] usb 5-1: new full-speed USB device number 2 using ohci-pci
[ 1.920058] tsc: Refined TSC clocksource calibration: 2812.809 MHz
[ 1.920068] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x288b84f9b90, max_idle_ns: 440795272445 ns
[ 1.926731] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1.926759] ata5: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1.926786] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 1.927462] ata5.00: HPA detected: current 625140335, native 625142448
[ 1.927541] ata5.00: ATA-8: WDC WD3200AAJS-56M0A0, 01.03E01, max UDMA/133
[ 1.927543] ata5.00: 625140335 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 1.927546] ata5.00: SB600 AHCI: limiting to 255 sectors per cmd
[ 1.928357] ata5.00: SB600 AHCI: limiting to 255 sectors per cmd
[ 1.928359] ata5.00: configured for UDMA/133
[ 1.928923] ata3.00: ATA-10: CT480BX500SSD1, M6CR041, max UDMA/133
[ 1.928925] ata3.00: 937703088 sectors, multi 1: LBA48 NCQ (depth 31/32), AA
[ 1.930350] ata3.00: SB600 AHCI: limiting to 255 sectors per cmd
[ 1.935648] ata3.00: SB600 AHCI: limiting to 255 sectors per cmd
[ 1.935650] ata3.00: configured for UDMA/133
[ 1.935836] scsi 2:0:0:0: Direct-Access ATA CT480BX500SSD1 041 PQ: 0 ANSI: 5
[ 1.937246] sd 2:0:0:0: [sda] 937703088 512-byte logical blocks: (480 GB/447 GiB)
[ 1.937270] sd 2:0:0:0: [sda] Write Protect is off
[ 1.937272] sd 2:0:0:0: [sda] Mode Sense: 00 3a 00 00
[ 1.937321] sd 2:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn’t support DPO or FUA
[ 1.938043] sda: sda1 sda2 sda3
[ 1.938494] sd 2:0:0:0: [sda] Attached SCSI disk
[ 1.940338] ata4.00: ATA-8: WDC WD3200AAJS-22B4A0, 01.03A01, max UDMA/133
[ 1.940339] ata4.00: 625142448 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 1.940343] ata4.00: SB600 AHCI: limiting to 255 sectors per cmd
[ 1.941243] ata4.00: SB600 AHCI: limiting to 255 sectors per cmd
[ 1.941245] ata4.00: configured for UDMA/133
[ 1.941391] scsi 3:0:0:0: Direct-Access ATA WDC WD3200AAJS-2 3A01 PQ: 0 ANSI: 5
[ 1.941601] sd 3:0:0:0: [sdb] 625142448 512-byte logical blocks: (320 GB/298 GiB)
[ 1.941612] sd 3:0:0:0: [sdb] Write Protect is off
[ 1.941613] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[ 1.941629] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn’t support DPO or FUA
[ 1.941688] scsi 4:0:0:0: Direct-Access ATA WDC WD3200AAJS-5 3E01 PQ: 0 ANSI: 5
[ 1.941939] sd 4:0:0:0: [sdc] 625140335 512-byte logical blocks: (320 GB/298 GiB)
[ 1.941959] sd 4:0:0:0: [sdc] Write Protect is off
[ 1.941962] sd 4:0:0:0: [sdc] Mode Sense: 00 3a 00 00
[ 1.941984] sd 4:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn’t support DPO or FUA
[ 1.949290] sdb: sdb1 sdb2
[ 1.949694] sd 3:0:0:0: [sdb] Attached SCSI disk
[ 1.957303] sdc: sdc1 sdc2
[ 1.957727] sd 4:0:0:0: [sdc] Attached SCSI disk

gsm · 20 July 2021 11:23

Other distributions have the same problems too. I also saw some remarks in other forums, where an advice was given to use kernel 4.14. After all this is old hardware, except for the SSD BX500.

winnie · 20 July 2021 11:28

That might be your only safe option. I honestly would not trust saving any data to that drive with the above errors, such as “failed command: READ FPDMA QUEUED” and “device reported invalid CHS sector”

That can lead to data loss and corruption.

gsm · 20 July 2021 11:29

I finally found an interesting topic on this, that might actually describe the same problem.

https://archived.forum.manjaro.org/t/solved-freezes-randomly/140917

winnie · 20 July 2021 11:40

Interesting. Check what scheduler is used, and see if changing it (i.e, “deadline” or “mq-deadline” or “none”) and then rebooting resolves this issue.

Make sure the “discard” option is not enable in your fstab for the SSD. Systemd weekly trims are the preferable route.

D.Dave · 20 July 2021 11:53

kernel 4.14 and below, have only sq schedulers (so not mq), but is possible to have mq schedulers by adding scsi_mod.use_blk_mq=1 to grub. You proably knows this, I just pointed out the instruction for the OP

winnie · 20 July 2021 11:57

I was referring to their issues when using kernels 5.4 and higher (since they said they do not experience such issues on 4.14). If the goal is to be able to use the latest stable or LTS kernel (unless they are happy to stick with 4.14, in which they needn’t change anything.)

Good point though, and clarification never hurts.

gsm · 20 July 2021 15:59

No difference when discard is added, but things get even worse with nodiscard.

UUID=b8a5edaf-c612-41dd-9e28-f1339e8c31e3 /              ext4    defaults,noatime 0 1
tmpfs                                     /tmp           tmpfs   defaults,noatime,mode=1777 0 0

Not sure how to check or configure the scheduler.

gsm · 20 July 2021 16:28

/etc/udev/rules.d/60-ioschedulers.rules
It looks like this one gives good results with kernel 5.10.49-1-MANJARO

# set scheduler for NVMe
ACTION=="add|change", KERNEL=="nvme[0-9]*", ATTR{queue/scheduler}="none"
# set scheduler for SSD and eMMC
ACTION=="add|change", KERNEL=="sd[a-z]|mmcblk[0-9]*", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="bfq"
# set scheduler for rotating disks
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="cfq"

Also i find that discard in fstab works better for the BX500 then nodiscard.

winnie · 20 July 2021 16:42

To check, depends on which device you’re interested in:
cat /sys/block/<device>/queue/scheduler

So for example,

$ cat /sys/block/sda/queue/scheduler
$ [mq-deadline] kyber bfq none

My SSD (/dev/sda, Samsung EVO) is using the multiqueue deadline scheduler (mq-deadline), which I believe is the default out of the box, because my un-edited 60-ioscheduler.rules file looks like so:

ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="deadline"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq-sq"

You can make a backup of the file /etc/udev/rules.d/60-ioscheduler.rules, and then modify it to override the scheduler based on a new rule.

EDIT: If you override a scheduler, make sure to use the exact name as shown in the supported schedulers. For example, if you change the scheduler to use magical-berries, the kernel will fallback to the default for that device, such as mq-deadline.

That’s odd, since discard can put more stress on the SSD to constantly trim on-the-fly. I think it’s safer to leave discard/nodiscard omitted from the fstab entry, and simply rely on the default weekly fstrim timer.

To make sure the timer is active:
systemctl status fstrim.timer

To make sure the service is being triggered by the above weekly timer:
systemctl status fstrim.service

Otherwise, unmask, enable, and start the timer:
systemctl unmask fstrim.timer
systemctl enable fstrim.timer
systemctl start fstrim.timer

gsm · 20 July 2021 18:03

@Winnie. Ok, i will also try your rules, since now i have still very rare timeouts. The system is now already running much better with kernel 5.10. I assume that you mean 60-ioschedulers.rules and not 60-ioscheduler.rules. Note that KDEneon, which runs without issues, uses discard in fstab!
I have another system (HP m8000) where for Manjaro XFCE fstab is also configured with discard for the Samsung SSD 860 EVO 250GB (/ and swap). Seems that this was default at installation time, some years ago.

timer and service info:

● fstrim.timer - Discard unused blocks once a week
Loaded: loaded (/usr/lib/systemd/system/fstrim.timer; enabled; vendor preset: disabled)
Active: active (waiting) since Tue 2021-07-20 19:38:02 CEST; 13min ago
Trigger: Mon 2021-07-26 01:14:03 CEST; 5 days left
Triggers: ● fstrim.service
Docs: man:fstrim

jul 20 19:38:02 gsm-man-kde systemd[1]: Started Discard unused blocks once a week.

○ fstrim.service - Discard unused blocks on filesystems from /etc/fstab
Loaded: loaded (/usr/lib/systemd/system/fstrim.service; static)
Active: inactive (dead)
TriggeredBy: ● fstrim.timer
Docs: man:fstrim(8)

Edit: Sorry to say, but your rules appear to be not good for the BX500: Frequently freezes with timeout. So bfq seems to be the best.

winnie · 20 July 2021 19:15

No need to apologize, but I never said to use one scheduler over another. I gave you an idea and some instructions on how to check the current scheduler and change it on your own (to see which one gives you the best results.)

On my system it is 60-ioscheduler.rules, and I never created it myself. Either it went by a different name in the past, or a recent update uses a new name?

That may or may not be the sole reason for the strange behaviors with your SSD. Remember, we’re talking two completely different distros. In theory, using “discards” in the fstab will create more stress for the drive and can slow things down. In your case, you might experience something different. Either way, it’s good to rule things out, so that if one method solves your problem you can safely omit discards from your fstab (since it’s not needed, and could possibly create issues down the line.) The weekly timer will handle routine trimming, which in turn is used by the SSD’s internal garbage collection.

gsm · 20 July 2021 19:38

You are right and it was a very good suggestion. So either discard in fstab or bfq shows the best results for my BX500, with only very incidently a sort freeze. I am also testing if removing gvfs has some effect.
I think that discard was default in older Manjaro releases. For test purpose only, i removed discard from fstab in KDE neon and that did not make any difference. With KDE neon the trim timer is running.

Thank you very much for your very useful ideas and quick reactions.

gsm · 21 July 2021 13:01

Actually after removing discard from fstab Manjaro Linux KDE works better today. However stll some freezing incidentally. Now testing with mq-deadline.

gsm · 22 July 2021 11:45

Final conclusion for the time being:
The io-scheduler rules settings below work the best, with discard removed from fstab. Very few delays however still happen .
Manjaro is the fastest (startup time) system, in comparison with ArcoLinux, Kubuntu and KDE neon, but it is harder to install Manjaro on the BX500. That is because of the many timeouts (HDD LED on), which do not occur in this way and so often during the installation of the other distributions.
I consider this topic solved.

ACTION=="add|change", KERNEL=="nvme[0-9]*", ATTR{queue/scheduler}="mq-deadline"
ACTION=="add|change", KERNEL=="sd[a-z]|mmcblk[0-9]*", ATTR{queue/rotational}=="0", ATTR{queue/scheduler}="mq-deadline"
ACTION=="add|change", KERNEL=="sd[a-z]", ATTR{queue/rotational}=="1", ATTR{queue/scheduler}="bfq-sq"

El_Brujo · 22 July 2021 16:54

I’m just marking this topic as solved for you based on the above.

gsm · 27 July 2021 19:57

Seems that firmware F8F (Beta!) for the GA-MA770-DS3 (Rev 1.0) may have solved the biggest SATA timeout problem with newer kernels. Still testing. In the end F8F unfortunately is not a solution.
Edit: The BX500 works fine without any error in a HP Pavilion PC m8000. There must be something wrong with the Gigabyte GA-MA770-DS3 (rev 1.0) board.

system · 11 August 2021 19:57

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.