BTRFS error occurred while copying the zip file

Zesko · 20 September 2021 13:42

Hi everyone,

I got the BTRFS error while copying the zip file in my disk using BTRFS. The Copy process is canceled immediately.

I tried cp

❯ cp ~/MEGA1.zip ~/Downloads 
cp: error reading '/home/zesko/MEGA1.zip': Input/output error

if cp does not work, I tried scp:

❯ scp ~/MEGA1.zip ~/Downloads
cp: error reading '/home/zesko/MEGA1.zip': Input/output error

if scp does not work, I tried rsync:

❯ rsync -c ~/MEGA1.zip ~/Downloads    
rsync: [sender] read errors mapping "/home/zesko/MEGA1.zip": Input/output error (5)
rsync: [sender] read errors mapping "/home/zesko/MEGA1.zip": Input/output error (5)
ERROR: MEGA1.zip failed verification -- update discarded.
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1330) [sender=v3.2.3]

journactl -b -p 3

Sep 20 15:18:35 zesko kernel: BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 70, gen 0
Sep 20 15:18:35 zesko kernel: BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 71, gen 0
Sep 20 15:18:35 zesko kernel: BTRFS error (device nvme0n1p2): bdev /dev/nvme0n1p2 errs: wr 0, rd 0, flush 0, corrupt 72, gen 0

I thought this zip file would be damaged. I tried to compress the same file to zip again, then the same error of copy. This is no coincidence, BTRFS has a bug or not?

Copying other zip files work fine without this issue, but only certain zip file does not work with BTRFS.

Linux kernel 5.14 and 5.10 LTS have the same issue.

Zesko · 20 September 2021 14:45

I compressed the same file to .tar.gz or .7z and then tested copying both archives, but they have no issue except .zip.

BTRFS has some bug with certain code in the zip file that BTRFS cannot read because of this error.
I don’t know which part of the zip file is bad for BTRFS. I’ll check it out later.

winnie · 20 September 2021 14:54

Does the .zip file in question pass its own internal integrity test?

But even if it does, rsync is unable to verify it after the transfer completes.

Ideally, the contents of a .zip archive, or any file for that matter, should be agnostic to the filesystem.

linux-aarhus · 20 September 2021 16:08

The file is corrupt or the filesystem is.

Usually I/O errors in the filesystem is due to the pointers to blocks where the system has registred the next block should be is corrupt.

I have had btrfs puke on me so I am not convinced of the excellence of this specific filesystem.

Zesko · 20 September 2021 16:12

I tried to create 3 zip archives with the same files and the same default algorithm (in KDE Dolphin).

1 of 3 zip archives have the same issue with copy in BTRFS, but 2 of 3 have no issue (after reboot).

All 3 have the same size but different checksum (because of different timestamps?).

Output of ls -ali

1439422 -rw-r--r-- 1 zesko zesko 15544388484 20. Sep 15:06 MEGA1.zip // <-- issue
1445600 -rw-r--r-- 1 zesko zesko 15544388484 20. Sep 17:35 MEGA2.zip // <- no issue
1446242 -rw-r--r-- 1 zesko zesko 15544388484 20. Sep 18:03 MEGA3.zip // <- no issue

AkhIL · 20 September 2021 16:14

Check SSD health with the smartctl. You zip file might be placed in a bad block.

winnie · 20 September 2021 16:30

Without any other context, I find that concerning…

As @linux-aarhus pointed out (no pun intended!), it could be an underlying issue with the drive (or a bug in the filesystem, in this case being Btrfs.)

Zesko · 20 September 2021 16:35

Yes, that is possible.

The SSD health was passed:


❯ sudo smartctl --all /dev/nvme0n1                                                                                                                                                                                 ✘ 2
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.14.2-1-MANJARO] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Corsair MP600 PRO
Serial Number:                      212079150001305720A2
Firmware Version:                   EIFM21.1
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Total NVM Capacity:                 2.000.398.934.016 [2,00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2.000.398.934.016 [2,00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            6479a7 4ea0201b00
Local Time is:                      Mon Sep 20 18:24:01 2021 CEST
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d):     Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x08):         Telmtry_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     70 Celsius
Critical Comp. Temp. Threshold:     110 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.80W       -        -    0  0  0  0        0       0
 1 +     7.10W       -        -    1  1  1  1        0       0
 2 +     5.20W       -        -    2  2  2  2        0       0
 3 -   0.0620W       -        -    3  3  3  3     2000    2000
 4 -   0.0440W       -        -    4  4  4  4    25000   25000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        51 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    1%
Data Units Read:                    6.311.739 [3,23 TB]
Data Units Written:                 4.937.247 [2,52 TB]
Host Read Commands:                 19.489.148
Host Write Commands:                34.059.984
Controller Busy Time:               160
Power Cycles:                       230
Power On Hours:                     268
Unsafe Shutdowns:                   3
Media and Data Integrity Errors:    0
Error Information Log Entries:      336
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 63 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        336     0  0x0001  0x4004  0x028            0     0     -

I think the filesystem has the bug.

winnie · 20 September 2021 16:37

If it weren’t for the legal issues, I’m certain ZFS would have been the defacto CoW filesystem used in Linux.

winnie · 20 September 2021 16:38

What about ruling something out?

Can you run an internal integrity check of the culprit zip file? Don’t copy or anything, just check it in place.

Zesko · 20 September 2021 16:53

Oh, I got the error of zipfile that was mp4:
Run unzip -t MEGA1.zip

...
testing: MEGA/..../Reading GPIO Inputs.mp4  error:  zipfile read error

That means, the zipfile .mp4 is corrupted.
Other zip files have no error.

winnie · 20 September 2021 16:53

Error Information Log Entries:      336

Error Information (NVMe Log 0x01, 16 of 63 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        336     0  0x0001  0x4004  0x028            0     0     -

Is anyone else familiar with the above “errors”. From what I could find out, they’re only logging a language mismatch between controller and client PC.

Zesko · 20 September 2021 16:57

I do not know and just ignored this error in log.

winnie · 20 September 2021 16:57

Something’s amiss!

Either your drive or Btrfs and/or a combination of the two.

Run a scrub when you get the chance, and keep the terminal window open on the side (or minimized) to view the progress,

btrfs-scrub start -B -r [btrfs-filesystem]

Shouldn’t take too long with an SSD.

EDIT: Not sure if you’re only using Btrfs for “/home” or for"/" or both.

Zesko · 20 September 2021 17:12

Do you mean btrfs scrub start -B -r btrfs-filesystem?

❯ sudo btrfs scrub start -B -r /home                                                                                                                                                                 ✘ 1 took  2m 52s
scrub done for 3d47a753-cd50-4c35-934a-b19c5ed130ad
Scrub started:    Mon Sep 20 19:08:00 2021
Status:           finished
Duration:         0:03:28
Total to scrub:   713.02GiB
Rate:             3.38GiB/s
Error summary:    csum=2
  Corrected:      0
  Uncorrectable:  0
  Unverified:     0

I am using full btrfs for / except /boot/efi. But the zip file with the error is in the directory /home/zesko.

winnie · 20 September 2021 17:15

For completeness sake, do everything,

sudo btrfs scrub start -B -r /

I believe “btrfs-scrub” and “btrfs scrub” do the same thing, just an alias for the same command.

Did it output which files/blocks?

Zesko · 20 September 2021 17:22

❯ sudo btrfs scrub start -B -r /                                                                                                                                                                         
scrub done for 3d47a753-cd50-4c35-934a-b19c5ed130ad
Scrub started:    Mon Sep 20 19:16:10 2021
Status:           finished
Duration:         0:03:23
Total to scrub:   713.02GiB
Rate:             3.46GiB/s
Error summary:    csum=2
  Corrected:      0
  Uncorrectable:  0
  Unverified:     0

How can I find two errors?

winnie · 20 September 2021 17:27

I don’t use Btrfs, and I thought it would output or log the blocks/files associated with those two checksum errors. Perhaps in journalctl the scrub task logs more details?

Not even entirely sure what’s going on. You obviously have checksum errors (as seen in your first post), but is that because of an underlying hardware issue, or because a bug in Btrfs?

You could reboot into a live USB session, and run a non-destructive read-write badblocks pass on the SSD to force it to re-map and use reserved blocks. But it should have already done that by now.

With the quirks and hiccups I read about Btrfs, and even @linux-aarhus having it puke, it’s really hard to pin-point if this is an exclusive hardware issue, or if Btrfs is misbehaving. I would hope the former.

Zesko · 20 September 2021 17:41

I do not think BTRFS and SSD caused the problem, but Zip process on CPU is to blame for incorrectly compressing the files in the some time. I unzipped this zip file with Ark but it does not work:

Failed to read data for entry: MEGA/.../Reading GPIO Inputs.mp4

Interesting, BTRFS correctly detected that the zip file was faulty and stopped copy. I learned for the first time.

winnie · 20 September 2021 17:44

Which means a CPU and/or RAM issue, if that’s the case. It would be risky to trust any saved data from that point forwards if you cannot trust the CPU or RAM.

I still believe the more likely culprit is either the SSD or Btrfs.

For future reference, you can use nvme instead of smartctl,

nvme smart-log /dev/nvmeX

Followed by,

nvme smart-log-add /dev/nvmeX