I have two hard disks, each having an 18 TB partition (GPT). One is formatted ExFAT, and shows no problems. The other is formatted Ext4, and that one spins up and down very quickly after use, and each time it’s aceessed, it gives me a 10 seconds time-out, producing the dmesg entries as shown here below.
System info:
Cinnamon version : 6.0.2
Linux Kernel : 6.6.5-1-rt16-MANJARO
Processor : AMD Ryzen 7 5700G with Radeon Graphics x 8
Memory : 15.4 GiB
Hard drives : 39137.1 GB
Graphics card : NVIDIA Corperation TU106 [GeForce RTX 2070]
Display server : X11
Fragment of dmesg:
[ 5834.277610] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[ 5834.277612] ata1.00: revalidation failed (errno=-5)
[ 5838.597295] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 5838.896890] ata1.00: configured for UDMA/133
[ 5838.897002] ata1.00: Entering active power mode
[ 5838.904803] sd 0:0:0:0: [sda] Starting disk
[ 5855.005221] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 5855.005931] sd 0:0:0:0: [sda] Stopping disk
[ 5855.689215] ata1.00: Entering standby power mode
Might be a faulty cable. Check:
sudo smartctl -A /dev/sda
If UDMA_CRC_Error_Count
is not zero, then the cable is/was 100% the problem.
Or maybe a bugged or outdated UEFI firmware (BIOS). Update it.
1 Like
This is just anecdotal:
I have had (I think I still have it …) a laptop hdd which would spin down after a short time.
I remember using hdparm
with the -S
parameter (capital -S)
to alter the time until spin down.
This can be made permanent. I think through the -K
option - the capital -K
, not the lower case -k
,
but the setting can also be applied each time the system is booted.
See man hdparm
.
Entry #199 shows that there are no CRC errors:
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 8679
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 58120
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 087 087 000 Old_age Always - 5287
10 Spin_Retry_Count 0x0033 100 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 875
23 Helium_Condition_Lower 0x0023 100 100 075 Pre-fail Always - 0
24 Helium_Condition_Upper 0x0023 100 100 075 Pre-fail Always - 0
27 MAMR_Health_Monitor 0x0023 100 100 030 Pre-fail Always - 919371
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 099 099 000 Old_age Always - 839
193 Load_Cycle_Count 0x0032 095 095 000 Old_age Always - 58252
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 30 (Min/Max 20/55)
196 Reallocated_Event_Count 0x0033 100 100 010 Pre-fail Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 17563651
222 Loaded_Hours 0x0032 092 092 000 Old_age Always - 3534
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 630
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0
241 Total_LBAs_Written 0x0032 100 100 000 Old_age Always - 23134669401
242 Total_LBAs_Read 0x0032 100 100 000 Old_age Always - 13374160550
… there are disks like this - they will spin down after a very short amount of time
If your’s is one of those, hdparm
(or perhaps sdparm
) can change that.
You should try - IMO.
It’s also not very healthy for a spinning hdd if it has to spin up from zero every few minutes …
1 Like
Alright then. SMART values are in order. So try what @Nachlese mentioned…
sudo hdparm -S 255 /dev/sda
That should go into standby after about ~21min idle, so the maximum. 255 * 5sec.
Little rant: I had WD HDD which was bought with a Case and an USB connection. Even when connected by real SATA, it will spin down after 5min idle. That was a fixed value and even hdparm couldn’t change that. Only the Windows Software was able to do that.
That being said, the default value might be too short, like:
sudo hdparm -S 2 /dev/sda
which is 10sec. You need to expand it.
the -B
option to hdparm
might also be worth checking
again:
man hdparm
Read carefully! - the tool can be a dangerous one, depending on the options you give it.
I think my current drive is the one I used this particular setting on.
Current result:
sudo hdparm -B /dev/sda
/dev/sda:
APM_level = 254
(highest I/O performance)
From memory, I think it’s default value was in between 1 and 127 (permitting spin down).
… but that is just from memory - it has been too long since I fiddled with this
I set the spin-down time to 1 min now, which makes the issue less tedious, but doesn’t solve it yet ofc. What are typical values used by other Manjaro users, I wonder?
hdparm -B /dev/sda
gives:
/dev/sda:
APM_level = 128
In the man pages of hdparm I found the options –dco-freeze, –dco-identify and –dco-restore. I wonder if those could have something to do with the issue?
perhaps
APM_level = 128
is not … enough - although it should prevent spin down
That is just what it currently is - you can set a different value.
There is also the -S
parameter (253 or 254 or 255)
refer to:
man hdparm
I don’t think that -d
or -c
or -o
… would do anything useful for you.
Why just 1 minute?
That’s way too low.
10+ minutes would be more appropriate.
Which would actually mean: no spindown at all, because ext4 regularly writes any changes in the meantime to disk … which will force a spin up.
--dco-restore
could work if your attempts to changing settings on the drive via -B
or -S
don’t have any effect (drive features are locked)
… that’s what the manual seems to say anyway
But it does not seem to be the case here.
copy/paste is quick and cheap - so here are the two relevant sections from
man hdparm
-B Get/set Advanced Power Management feature, if the drive supports it. A low value means aggressive power management and a high value means better per‐
formance. Possible settings range from values 1 through 127 (which permit spin-down), and values 128 through 254 (which do not permit spin-down).
The highest degree of power management is attained with a setting of 1, and the highest I/O performance with a setting of 254. A value of 255 tells
hdparm to disable Advanced Power Management altogether on the drive (not all drives support disabling it, but most do).
-S Put the drive into idle (low-power) mode, and also set the standby (spindown) timeout for the drive. This timeout value is used by the drive to de‐
termine how long to wait (with no disk activity) before turning off the spindle motor to save power. Under such circumstances, the drive may take as
long as 30 seconds to respond to a subsequent disk access, though most drives are much quicker. The encoding of the timeout value is somewhat pecu‐
liar. A value of zero means "timeouts are disabled": the device will not automatically enter standby mode. Values from 1 to 240 specify multiples of
5 seconds, yielding timeouts from 5 seconds to 20 minutes. Values from 241 to 251 specify from 1 to 11 units of 30 minutes, yielding timeouts from 30
minutes to 5.5 hours. A value of 252 signifies a timeout of 21 minutes. A value of 253 sets a vendor-defined timeout period between 8 and 12 hours,
and the value 254 is reserved. 255 is interpreted as 21 minutes plus 15 seconds. Note that some older drives may have very different interpretations
of these values.
I set the spin-down to 10 mins now. Let’s see what it does…
The –dco- options that I mentioned are not composed of a separate -d, -c and -o option. As the man page states for –dco-freeze:
DCO stands for Device Configuration Overlay, a way for vendors to selectively disable
certain features of a drive. The --dco-freeze option will freeze/lock the current drive configuration, thereby preventing software (or malware) from changing any DCO settings until after the next power-on reset.
I wonder if the provider of my hard disk, i.e. Toshiba, optimized the device overlay for Windows.
I’m also curious if other Manjaro users would have the same issues as I when they set their spin-downs that low; I’m still baffled and worried about the “Identification failed, verification failed” thingie.
And, why would my Manjaro specifically be configured so, to have this issue. Why doesn’t everybody have it who uses very large Ext4 partitions.
In this case, it is not the OS (Manjaro).
It is how the device itself is configured (probably by the manufacturer - or by some previous owner).
… I added to the previous post, btw
The drive itself came brand new from the store. According to my computer-guy (I myself am more of a software-savvy), it is an enterprise model, actually to be used in business settings.
Reading the addendum to your post, I think that I should set the -B option to 254, and not use the -S option then. Except for increasing the overall power usage of my computer, it should have no consequences, I think?
So I should do something like:
# hdparm -B254 -K /dev/sda
… that was suggested - and that is what I did
to ensure it would not spin down in a long time
There will always be write attempts coming from the file system itself in the mean time.
System logs are always generated and written to disk (vast generalization here - this, too, can be configured and prevented …)
… and (apparently) no one knows how great the difference is in power consumption between the different levels
I personally don’t care - I have a laptop and the disk runs constantly, to avoid having to wait for it to spin up every time …
I would omit (leave out) the -K
option unless you are very sure.
the hdparm
settings can also be applied via script (udev, I think) on every boot,
without that option.
I added a bit to my previous post. I’ll try that then.
Power consumption would be a bit of an issue for me though, since I often have my computer running for days in a row. It’s never off, actually, and as I do a lot of gaming on it, as I do, it wouldn’t really help making it any cheaper…
Wish I knew how to diagnose the issue deeper.
Could be relatively easly checked:
hook up your PC through a power meter.
Have the disk spin down (hdparm …)
Then have it spin up …
I’ll guess that there is hardly a difference … except for the few seconds of spin up
especially compared to what a gaming PC with a decent graphics requirement will pull at any given time.
That would be my easy solution to enable you to actually know instead of just guessing
if it is that much - which I doubt …
what does 1 W per hour cost?
1 kWh costs ~ € 0,35 (this is Germany 2024 …)
That is 1 W for 1000 hours.
That is less than € 0,35 more per month.
It’s not even relevant in Germany
… with all the cheap green energy from wind and sun - who doesn’t send you a bill
Here in the Netherlands it’s over 1 Euro
I can’t find on internet how to learn more about the actual issue. Even though I adjusted my drive’s parameters now:
# hdparm -B254 -S253 -K1 /dev/sda
I still have those time-outs, and this in my dmesg:
[319837.477620] ata1.00: Entering active power mode
[319847.973504] ata1.00: qc timeout after 10000 msecs (cmd 0x40)
[319847.973517] ata1.00: VERIFY failed (err_mask=0x4)
[319847.973522] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x40)
[319847.973524] ata1.00: revalidation failed (errno=-5)
[319851.877522] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[319852.170625] ata1.00: configured for UDMA/133
[319852.170745] ata1.00: Entering active power mode
[319852.176938] sd 0:0:0:0: [sda] Starting disk
[319870.875908] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[319870.876091] sd 0:0:0:0: [sda] Stopping disk
[319871.492566] ata1.00: Entering standby power mode
What? Really?
more than € 1 per kWh ?
That is the first time I heard of it.
I thought that Germany where pretty much the price champions.
I just changed my provider and they bill me ~ € 0,27 per kWh + a monthly fee of ~ € 12
anyway
… with regards to your actual problem:
I have no other ideas - perhaps it is faulty cables after all?
Hmm, I found this site that seemed to have the solution for me, including a detailed explanation. My preliminary experiences imply that the issue is solved now.
I added this to my grub boot options:
libata.force=1.00:nodmalog
with “1.00” being the ATA port mentioned in the dmesg reports.
I’ll wait a day to see if the issue is really solved now. If so, I’ll mark this thread as such.
Wish me luck, guys!
EDIT: This solution didn’t work for me. Neither did restoring a back-up of that partition’s superblock. The dmesg reports and the time-outs persist
It’s also noteworthy that the disk keeps spinning down instantly after each communication, despite the hdparm command that I issued, and the fact that hdparm reports the settings as I set them.
1 Like
It was suggested but I didn’t see that you tried to set the spin down time via the -S option (from 240 to 251 - or 255)
I don’t know from the logs if the drive is commanded by the OS to go into standby so quickly or whether this is just the logged behavior of the drive itself.
Check power saving settings?
If connected via SATA cable, then try replace the cable or switch the cable for both drives. Or switch the ports. Would be worth a try…
In my view such errors can be result of a broken cable, loose contact or the harddrive is defect. Very rare, but it could also be a kernel bug.
However… a closer look at
For what ever reason, the driver cannot identify your HDD. Usually the kernel sends a request and in your case the HDD sends crap. I/O error, because it cannot read. It reads usually data such like device model, firmware version, supported features etc. If it cannot, it will assume generic stuff and do trail and error, what is not very precious.
1 Like