Various programs crash when saving / opening/downloading files

tharangalion · 12 August 2021 15:48

Hello,
I’m having persistent problems with programs, seemingly randomly, crashing when I save or open files.
When I save an Inkscape file, it occasionally becomes unresponsive and has to be killed. Likewise, when adding a folder to Atom, the same problem occurs, and has to be killed. I’ve been using Firefox and downloading files does the same thing.

journalctl -f does not log anything.
Here’s inxi -Fazy:

System:
  Kernel: 5.10.56-1-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0 
  parameters: BOOT_IMAGE=/boot/vmlinuz-5.10-x86_64 
  root=UUID=e3142dad-e257-46d6-835b-733a9073fd46 rw quiet apparmor=1 
  security=apparmor udev.log_priority=3 sysrq_always_enabled=1 
  Desktop: Xfce 4.16.0 tk: Gtk 3.24.29 info: xfce4-panel wm: xfwm 4.16.1 vt: 7 
  dm: LightDM 1.30.0 Distro: Manjaro Linux base: Arch Linux 
Machine:
  Type: Desktop Mobo: ASUSTeK model: PRIME B450M-K v: Rev X.0x 
  serial: <filter> UEFI: American Megatrends v: 3202 date: 06/15/2021 
CPU:
  Info: 6-Core model: AMD Ryzen 5 2600 bits: 64 type: MT MCP arch: Zen+ 
  family: 17 (23) model-id: 8 stepping: 2 microcode: 800820D cache: L2: 3 MiB 
  flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm 
  bogomips: 92446 
  Speed: 1376 MHz min/max: 1550/3850 MHz boost: disabled Core speeds (MHz): 
  1: 1376 2: 1378 3: 1378 4: 1374 5: 1376 6: 1377 7: 1377 8: 1377 9: 1377 
  10: 1374 11: 1373 12: 1375 
  Vulnerabilities: Type: itlb_multihit status: Not affected 
  Type: l1tf status: Not affected 
  Type: mds status: Not affected 
  Type: meltdown status: Not affected 
  Type: spec_store_bypass 
  mitigation: Speculative Store Bypass disabled via prctl and seccomp 
  Type: spectre_v1 
  mitigation: usercopy/swapgs barriers and __user pointer sanitization 
  Type: spectre_v2 mitigation: Full AMD retpoline, IBPB: conditional, STIBP: 
  disabled, RSB filling 
  Type: srbds status: Not affected 
  Type: tsx_async_abort status: Not affected 
Graphics:
  Device-1: NVIDIA GP107 [GeForce GTX 1050 Ti] vendor: ASUSTeK PH-GTX1050TI-4G 
  driver: nvidia v: 470.57.02 alternate: nouveau,nvidia_drm bus-ID: 08:00.0 
  chip-ID: 10de:1c82 class-ID: 0300 
  Device-2: MacroSilicon USB Video type: USB 
  driver: hid-generic,snd-usb-audio,usbhid,uvcvideo bus-ID: 3-1.2:5 
  chip-ID: 534d:2109 class-ID: 0300 
  Display: x11 server: X.Org 1.20.13 compositor: xfwm4 v: 4.16.1 driver: 
  loaded: nvidia display-ID: :0.0 screens: 1 
  Screen-1: 0 s-res: 1920x1080 s-dpi: 92 s-size: 530x301mm (20.9x11.9") 
  s-diag: 610mm (24") 
  Monitor-1: HDMI-0 res: 1920x1080 hz: 60 dpi: 93 size: 527x296mm (20.7x11.7") 
  diag: 604mm (23.8") 
  OpenGL: renderer: NVIDIA GeForce GTX 1050 Ti/PCIe/SSE2 
  v: 4.6.0 NVIDIA 470.57.02 direct render: Yes 
Audio:
  Device-1: NVIDIA GP107GL High Definition Audio vendor: ASUSTeK 
  driver: snd_hda_intel v: kernel bus-ID: 08:00.1 chip-ID: 10de:0fb9 
  class-ID: 0403 
  Device-2: AMD Family 17h HD Audio vendor: ASUSTeK driver: snd_hda_intel 
  v: kernel bus-ID: 0a:00.3 chip-ID: 1022:1457 class-ID: 0403 
  Device-3: MacroSilicon USB Video type: USB 
  driver: hid-generic,snd-usb-audio,usbhid,uvcvideo bus-ID: 3-1.2:5 
  chip-ID: 534d:2109 class-ID: 0300 
  Sound Server-1: ALSA v: k5.10.56-1-MANJARO running: yes 
  Sound Server-2: JACK v: 1.9.19 running: no 
  Sound Server-3: PulseAudio v: 15.0 running: yes 
  Sound Server-4: PipeWire v: 0.3.33 running: no 
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet 
  vendor: ASUSTeK PRIME B450M-A driver: r8169 v: kernel modules: r8168 
  port: f000 bus-ID: 07:00.0 chip-ID: 10ec:8168 class-ID: 0200 
  IF: enp7s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
  IF-ID-1: uk-london5 state: unknown speed: N/A duplex: N/A mac: N/A 
Bluetooth:
  Device-1: Realtek Bluetooth Radio type: USB driver: btusb v: 0.8 
  bus-ID: 1-3:3 chip-ID: 0bda:8771 class-ID: e001 serial: <filter> 
  Report: rfkill ID: hci0 rfk-id: 0 state: up address: see --recommends 
Drives:
  Local Storage: total: 5.46 TiB used: 614.48 GiB (11.0%) 
  SMART Message: Required tool smartctl not installed. Check --recommends 
  ID-1: /dev/sda maj-min: 8:0 vendor: Seagate model: ST1000DM003-1ER162 
  size: 931.51 GiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s 
  type: HDD rpm: 7200 serial: <filter> rev: CC45 scheme: GPT 
  ID-2: /dev/sdb maj-min: 8:16 vendor: Seagate model: ST2000DM005-2CW102 
  size: 1.82 TiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s 
  type: HDD rpm: 5425 serial: <filter> rev: 0001 scheme: GPT 
  ID-3: /dev/sdc maj-min: 8:32 vendor: Samsung model: SSD 870 QVO 1TB 
  size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s 
  type: SSD serial: <filter> rev: 2B6Q scheme: GPT 
  ID-4: /dev/sdd maj-min: 8:48 type: USB vendor: Seagate model: BUP Slim RD 
  size: 931.51 GiB block-size: physical: 4096 B logical: 512 B type: N/A 
  serial: <filter> rev: 0304 scheme: MBR 
  ID-5: /dev/sde maj-min: 8:64 type: USB vendor: Seagate model: BUP Slim BK 
  size: 931.51 GiB block-size: physical: 4096 B logical: 512 B type: N/A 
  serial: <filter> rev: 0304 scheme: MBR 
Partition:
  ID-1: / raw-size: 100 GiB size: 97.87 GiB (97.87%) used: 22.8 GiB (23.3%) 
  fs: ext4 dev: /dev/sdc1 maj-min: 8:33 
  ID-2: /boot/efi raw-size: 513 MiB size: 512 MiB (99.80%) 
  used: 288 KiB (0.1%) fs: vfat dev: /dev/sdc3 maj-min: 8:35 
  ID-3: /home raw-size: 100 GiB size: 97.87 GiB (97.87%) 
  used: 18.97 GiB (19.4%) fs: ext4 dev: /dev/sdc2 maj-min: 8:34 
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default) 
  ID-1: swap-1 type: partition size: 10 GiB used: 0 KiB (0.0%) priority: -2 
  dev: /dev/sdc4 maj-min: 8:36 
Sensors:
  System Temperatures: cpu: 36.0 C mobo: N/A gpu: nvidia temp: 43 C 
  Fan Speeds (RPM): N/A gpu: nvidia fan: 31% 
Info:
  Processes: 298 Uptime: 5h 27m wakeups: 0 Memory: 15.6 GiB 
  used: 2.6 GiB (16.7%) Init: systemd v: 248 tool: systemctl Compilers: 
  gcc: 11.1.0 alt: 10 Packages: pacman: 1331 lib: 346 flatpak: 0 Shell: Bash 
  v: 5.1.8 running-in: xfce4-terminal inxi: 3.3.06

Thanks.

alven · 12 August 2021 16:16

Hi!
May be to check for hardware problems?

Start with RAM test.

Try MemTest86 - Official Site of the x86 and ARM Memory Testing Tool (MemTest86 Free (Version 9.2 Build 2000))
to make bootable USB media your can use the ventoy app and after it to copy an image file to the Ventoy (by default) partition.

MemTest86 by default uses 4 (the same) testing rounds. Make 2 at least. You will see 0 errors either many errors. If many, than stop testing and leave only one RAM module in PC to determine which one is faulty.
Depends on your memory size and speed a single round could take from 15 to 45-60 minutes. So be ready for such test duration, and make 2 full rounds at least.

If MemTest’s errors count is 0, than try to find tests to check if storage and CPU are not fails.

PS
But if RAM or CPU fails I do not know how to determine exactly source: CPU fails could lead to RAM’s checking algorithm to produce an erroneous result.

tharangalion · 12 August 2021 16:25

Thanks for that, @alven. I’ll do a RAM check.

The only hardware that has changed is that now I’m using a Samsung 870 QVO SSD. I never experienced these issues on my USB HDD. I wonder if it’s related to the SSD. Samsung’s up-to-date SSD utilities are not available for Linux.
I ran a clean Manjaro installation on that SSD.

alven · 12 August 2021 16:45

seems to be the desktop PC

Oh, SATA, so could be data and power cables.
If you do not use adapter, then try to change SATA data cable: it could be broken.
I remember my fault in about 2011 year when I suggest to my friend to replace his HDD (it was HDD on that time) cause some time it could be recognized by BIOS and PC can boot up and sometimes not and BIOS shows “insert bootable media” msg. He bought new HDD. The same effect on new also. He changed SATA data cable, and then both HHDs continued to work stable. That was my big fault to suggest him to spend money for a not cheap device.
So try to change SSD cable also (if you do not use PCIe or other cable-less adapter).

Also, before testing.
Electronic components contact’s pins could be dirty (for example by oxide film; all connected metals degrades with a time goes by: oxide film occurs cause of electric current flow), so try to eject and inject for 2-3 times to a bit scratch metal contact’s pins to pure gold layer to get rid of oxide film on connector’s pins surface. Do it at least with easy-to-re-plug components: with RAM modules and SSD storage(s).

alven · 12 August 2021 17:14

Samsung’s SSD’s firmware tools available as OS independent (bootable image files), check the SSD Tools & Software | Download | Samsung Semiconductor
and compare the version you have:

$ inxi -Dazy1

be aware: their images could lack a wireless keyboard driver, so to start to use it (even at the User agreement stage) a corded keyboard could be required (at least I was completely unable to find and to “press any key” on my PC config).

mithrial · 12 August 2021 17:16

Do you have something else mounted? Like NFS, Samba or another network share?

tharangalion · 13 August 2021 02:22

I am marking this as the solution for now, as Memtest86 revealed an error in each pass.
I’ll have a look at those bootable tools as well. Thanks. I’ll have to get more USBs or CDs.

tharangalion · 13 August 2021 02:24

No. Nothing like that. Thanks for asking, though.

tharangalion · 13 August 2021 10:26

I took the memory module back to the store. The tech guy tested it with Windows memory tester and said it was okay. Hmm, their test was about 15 minutes, my tests took 3 hours each.
They were good enough to lend me (with a full-price deposit) a replacement module. I ran the Memtest86 again, without any errors. Now it’s usage testing.

Thanks for the ideas and advice contributors.

tharangalion · 13 August 2021 11:01

The problem has returned. I tried saving a file with Firefox. The save dialogue box opened, but when I clicked to save, it became unresponsive. Then saving in Inkscape crashed. The only thing I can do is reboot.

I tried clearing the caches with, sync; echo 3 | sudo tee /proc/sys/vm/drop_caches

UPDATE: I’ve uninstalled Inkscape, and now downloading with Firefox and opening and closing files is working.

alven · 13 August 2021 14:09

@tharangalion, I do not want to make more difficulties, but continue to using a system after broken RAM replacement could be could reveal more and more errors which was done with the prev. RAM module. So, I am not sure, but if it is able, I would to re-install whole OS from a scratch even w/o setting integration or with manual review of a settings before to integrate them into new OS copy (config files could be damaged also). So review that OS copy as cured after a virus, where you don’t know where was and possible still are a damages parts.
Please do not directly act according to only my point of view of better to re-install all w/ no auto-migration feature (to prevent possible damaged configs and files to be alive after OS re-installation).

Let others will review that suggestion (to re-install all), please.

Partial solution on the current OS copy could be applied via re-installing all packages: pacman/Tips and tricks - ArchWiki
I do not know what about config files: will it be replaced or not.
Also user configs and user files remaining questionable: are they damaged or not. I would create a folder of not reviewed files yet (from old installation) and from time to time trying to sort them to review possible damage and them to move to ordinary folders in re-installed OS copy.

Image what could be if you update a device firmware w/ installed damaged RAM module (will be so lucky if not damaged part of a RAM module would be used during FW upgrade).

Suggest to invest in future (prevent future possible storage sub-system malfunction): save/archive your current SATA data cable and buy a new one SATA3 as it’s minimum specification with clips clicking while plug-in (are there more reliable specs than that?)
I remember that 8 years ago there was ordinary SATA cables and SATA3. They could be more pure copper wires with greater degree of purification from impurities, contain shielding, could be less length or to be not very long as HF-signal became weaker on more length.

May be to replace cable first, and than to re-install if somebody else can review that my suggestion (will it be overkill or not) cause it is big work to do, including sorting/reviewing every user config or user end-payload file from prev. installation.

Remember several (at least two) tests with RAM sleep during 300 seconds? Besides of that their test engine could lack some test, so can’t test it more comprehensive OR the MemTest version contains the bug.

Also share your experience: are there was not many errors (tens, hundreds, thousands), but only a one in each pass?

Sharing mine: As only a user I tested about 7-8 RAMs on about 4-5 different PCs for whole my life.
Every time I got Errors: 0 after 2 full stages/rounds of 4 total (I stopped that test after that).
But one time my friend told me that he was on the stage of analyzing of a memory dumps cause from time to time got a different app crash (he was using Windows family OS). We exchange with RAMs to test them on different machines, that time I got know about a dedicated RAM testing apps.
While a week he was completely ok with my RAM modules, I checked his modules with the MemTest: about 3-5 minutes Errors: 0 But after that I got screen by new screen (scrolling) errors listed. Counter get rest only after several hundreds of them and got them in about 10 seconds. after a several tens of seconds counter increased into more than thousand. I stop the test and test each module independently: only one of two was faulty.

3 hours! Wow. I would not get a patience for so long time if I have only 1 PC in a house. Did you test all 4 passes or even all 4 passed for 2 times? Or may be you have even 64 GB RAM of not very high clock frequencies.
I meant 2 passes of 4 total (4 total - which is by default). Please sorry for my English.

tharangalion · 13 August 2021 15:53

Thank you for your detailed reply.

I ran the test just before I went to bed, so I didn’t need any patience. It’s just one 16GB module.

Ah. So the Inkscape problem could have been caused by the RAM?

I ran fsck on my root and home partitions, which showed no errors, so I’m not sure that a reinstall is necessary.

I’ll monitor how the system is performing over the next few days.
Thanks again.

alven · 13 August 2021 16:06

Of course it has some possibility:
Imagine an app saved to memory it’s data of 0101.
Later the app want to get the data, but got it corrupted, for example as 1101.
And that could be data to process by the app or instruction data (what and how to do - the app’s part (module, function, etc.)).
So the erroneous data or instruction came to play and what will be the result of it? I can’t predict: from nothing or wrong metadata in payload data/result output up to (that module or whole app) crash as an error on some stage produced unexpected instruction or input data for next module to process and a modules chain, which used in process changes it’s behavior also.

The same problem could be then a storage fails or a CPU fails.

The key pointer I saw in your initial post, even title is that random apps affected, so probably it is not the exact user apps you used, but somewhat that used widely: OS components or hardware which used in all operations. You well formed and described your problem.

system · 14 August 2021 16:07

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.