System permanently sluggish after scraping YouTube

wtechgo · 28 February 2022 12:21

I’m suffering a sluggish Manjaro and I seek to resolve this issue but I’m scratching my head on what happened and how to troubleshoot it.
I’ll go ahead to share the sequence of amusing events (imo) that lead up to this.

I’ve been coding together scripts that scrape video channels for metadata with yt-dlp.
Happy with my work, I decided to unleash it at full throttle meaning “scrape metadata of 140 YouTube channels concurrently” xD
We’re talking here about a loop that calls a channel scrape command with & operator to put the process in the background.

I came back 15 minutes later and my screen was black and had to do a hard shutdown.
Only to find on reboot that my beloved Manjaro felt like a virus infected Windows bloat boat.

I then went ahead to revert my system with Timeshift to a point in the past, surprisingly to no real avail.
My system did improve somewhat, from ‘pulling hair’ unworkable to workable with occasional frustrations of slowliness.

I also unleashed ClamTK on my home dir, it found 80 positives, 30 ish false positive and a bunch of JPGs with malformed JFIF structures.
The false positves were from a popular Node package imurmurhash which I left untouched.
I deleted all the JPGs, not sure if they really were an issue but hey.

I don’t quite understand what happened here.
I don’t think its possible that my hardware could have been damaged. If it can, kindly elaborate.
Could Google have gotten angry and hacked or infected my system? That seems far fetched tbh.
Before the event though, I did notice my mouse moving an inch on its own on occasions.
I did think then “wuut RAT hacker” or just a Linux bug? dunno

It’s really the whole system that became slower e.g. the IDE (JetBrains) is also notably slower now while no scraping processes are running.
Browsing is slower…

I don’t have much experience in troubleshooting Linux systems.
What do you guys think?

linux-aarhus · 28 February 2022 12:41

I sincerely hope you didn’t run this with root privs. If you are certain you didn’t use escalated privs then the damage should be contained within your home.

You have the answer right there - I would hone my Linux troubleshooting skills immediately.

This is not a flaw of Manjaro or Arch but a flaw in your code.

I think you have been sloppy in handling the responses - assuming response to be of a certain struct or having a specific content.

Without validation you may have opened your system for execution of PUP - of course - without the your code it is impossible to judge.

If you want to hone your skills - I suggest you use dd to create an image of your disk - save it on some external media with enough space.

Then reinstall your system.

Then you can use forencis tools to deduct what happened to your system after 15 minutes of youtube scraping.

ishaan2479 · 28 February 2022 12:47

@wtechgo

Never heard of that distribution

Don't crosspost!

https://bbs.archlinux.org/viewtopic.php?id=274534

Manjaro is not Arch Linux and vice versa

linux-aarhus · 28 February 2022 12:49

Nover post Manjaro issues to Arch BBS. Prepare to banned on Arch BBS

These forums are for Arch Linux x86_64 ONLY.

Not Artix, or Apricity, or Manjaro, or any of the “easy Arch installers”, nor Arch-ARM; nothing other than vanilla 64-bit Arch Linux. Ask those communities for support.
If you have installed Arch, please read the rules before posting. README: Forum Rules.

wtechgo · 28 February 2022 13:23

They already deleted this post on the Arch forum xD
Manjaro is not Arch !
they said

linux-aarhus · 28 February 2022 13:24

Yup - we know that - you didn’t - although you should - it should be deductable by reading

[Need-To-Know] About Manjaro and AUR

wtechgo · 28 February 2022 13:26

I’d argue this is not a cross post as it was deleted in Arch.
Just a case of ‘my bad, let me put it where it belongs then’.

linux-aarhus · 28 February 2022 13:28

Crossposts is only prohibited within categories of the forum.

But posting a system issue running Manjaro to Arch is definately a no-go.

wtechgo · 28 February 2022 13:29

Fair enough.

Thanks for your input on handling responses. I thought about it.

I’m using yt-dlp, a Python library to do the scraping.
The result is a JSON file so I tend to think I’m quite safe on the response side.

Though I don’t know in what yt-dlp does before it gets to its JSON result.

linux-aarhus · 28 February 2022 13:32

From years of coding - one rule to rule them all - especially when it comes to data fetched through an API

validate - validate and validate

only then you can be sure the data you are handling is actually the data you want.

linux-aarhus · 28 February 2022 13:36

Just a thought for you to sink in - the rule to rule them all

If the data you get is the data you want - then what if the data you get is not the data you want - then validate, validate, validate

Think - what could possibly go wrong - what do I expect - then validate, validate, validate

validate should be your mantra when processing input … there is no other way!

omano · 28 February 2022 14:24

Is the system in the same state under a newly created user?

wtechgo · 28 February 2022 18:27

that was a good idea to try, but it is slow too with new user

omano · 28 February 2022 18:40

And on a live environment from fresh ISO?

leledumbo · 1 March 2022 08:52

Have you checked how all resources are? i.e.: CPU, RAM, disk, any of those struggling?

Lol they won’t care. They have proper rate limiter implemented at server side.

jaroMAN · 2 March 2022 14:24

Yes compare system responsiveness on liveUSB.

Please check diskspace on root partition.
try:

for sata disk
df -h | grep /dev/sd
and for nvne replace sd with 'nv':
df -h | grep /dev/nv

also could you provide a link to your code ?

wtechgo · 3 March 2022 09:54

good call

/dev/sda5 217G 84G 123G 41% /
/dev/sdb3 97G 84G 8,1G 92% /backup
/dev/sdb2 1,8T 436G 1,3T 25% /data
/dev/sda2 95M 26M 70M 27% /boot/efi

looks like I’m ok, sda5 is system and data is where the scraped stuff goes.
It’s a dual boot system.

wtechgo · 4 March 2022 10:14

I think I’ve found the culprit as also on Windows (dual boot) my system was sluggish af, I started to think my 6 months old SSD suffered hardware failure.

I did tests but they all came back like “all good bro”…

Then I bumped on a thread talking about SSD trim and fstrim.timer service. Turns out this thing wasn’t running.

fstrim.timer trims the SSD once a week to keep the SSD performant but that wasn’t happening, while my scrape operations just fueled the fire.

sudo systemctl status fstrim.timer
sudo systemctl enable --now fstrim.timer

Seems to be ok now but I need to observe the situation a little longer.

Thanks again for thinking along everybody.

system · 7 March 2022 00:14

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.