System permanently sluggish after scraping YouTube

I’m suffering a sluggish Manjaro and I seek to resolve this issue but I’m scratching my head on what happened and how to troubleshoot it.
I’ll go ahead to share the sequence of amusing events (imo) that lead up to this.

I’ve been coding together scripts that scrape video channels for metadata with yt-dlp.
Happy with my work, I decided to unleash it at full throttle meaning “scrape metadata of 140 YouTube channels concurrently” xD
We’re talking here about a loop that calls a channel scrape command with & operator to put the process in the background.

I came back 15 minutes later and my screen was black and had to do a hard shutdown.
Only to find on reboot that my beloved Manjaro felt like a virus infected Windows bloat boat.

I then went ahead to revert my system with Timeshift to a point in the past, surprisingly to no real avail.
My system did improve somewhat, from ‘pulling hair’ unworkable to workable with occasional frustrations of slowliness.

I also unleashed ClamTK on my home dir, it found 80 positives, 30 ish false positive and a bunch of JPGs with malformed JFIF structures.
The false positves were from a popular Node package imurmurhash which I left untouched.
I deleted all the JPGs, not sure if they really were an issue but hey.

I don’t quite understand what happened here.
I don’t think its possible that my hardware could have been damaged. If it can, kindly elaborate.
Could Google have gotten angry and hacked or infected my system? That seems far fetched tbh.
Before the event though, I did notice my mouse moving an inch on its own on occasions.
I did think then “wuut RAT hacker” or just a Linux bug? dunno

It’s really the whole system that became slower e.g. the IDE (JetBrains) is also notably slower now while no scraping processes are running.
Browsing is slower…

I don’t have much experience in troubleshooting Linux systems.
What do you guys think?

:rofl:

I sincerely hope you didn’t run this with root privs. If you are certain you didn’t use escalated privs then the damage should be contained within your home.

You have the answer right there - I would hone my Linux troubleshooting skills immediately.

This is not a flaw of Manjaro or Arch but a flaw in your code.

I think you have been sloppy in handling the responses - assuming response to be of a certain struct or having a specific content.

Without validation you may have opened your system for execution of PUP - of course - without the your code it is impossible to judge.

If you want to hone your skills - I suggest you use dd to create an image of your disk - save it on some external media with enough space.

Then reinstall your system.

Then you can use forencis tools to deduct what happened to your system after 15 minutes of youtube scraping.

@wtechgo

Never heard of that distribution :thinking:

Don't crosspost!

https://bbs.archlinux.org/viewtopic.php?id=274534

Manjaro is not Arch Linux and vice versa

4 Likes

:no_entry_sign:

Nover post Manjaro issues to Arch BBS. Prepare to banned on Arch BBS

These forums are for Arch Linux x86_64 ONLY.

Not Artix, or Apricity, or Manjaro, or any of the “easy Arch installers”, nor Arch-ARM; nothing other than vanilla 64-bit Arch Linux. Ask those communities for support.
If you have installed Arch, please read the rules before posting. README: Forum Rules.

1 Like

They already deleted this post on the Arch forum xD
Manjaro is not Arch !
they said

Yup - we know that - you didn’t - although you should - it should be deductable by reading

1 Like

I’d argue this is not a cross post as it was deleted in Arch.
Just a case of ‘my bad, let me put it where it belongs then’.

Crossposts is only prohibited within categories of the forum.

But posting a system issue running Manjaro to Arch is definately a no-go.

Fair enough.

Thanks for your input on handling responses. I thought about it.

I’m using yt-dlp, a Python library to do the scraping.
The result is a JSON file so I tend to think I’m quite safe on the response side.

Though I don’t know in what yt-dlp does before it gets to its JSON result.

From years of coding - one rule to rule them all - especially when it comes to data fetched through an API

  • validate - validate and validate

only then you can be sure the data you are handling is actually the data you want.

Just a thought for you to sink in - the rule to rule them all

If the data you get is the data you want - then what if the data you get is not the data you want - then validate, validate, validate

Think - what could possibly go wrong - what do I expect - then validate, validate, validate

validate should be your mantra when processing input … there is no other way!

1 Like

Is the system in the same state under a newly created user?

that was a good idea to try, but it is slow too with new user

And on a live environment from fresh ISO?

Have you checked how all resources are? i.e.: CPU, RAM, disk, any of those struggling?

Lol they won’t care. They have proper rate limiter implemented at server side.

Yes compare system responsiveness on liveUSB.

Please check diskspace on root partition.
try:

for sata disk
df -h | grep /dev/sd
and for nvne replace sd with 'nv':
df -h | grep /dev/nv

also could you provide a link to your code ?

good call

/dev/sda5 217G 84G 123G 41% /
/dev/sdb3 97G 84G 8,1G 92% /backup
/dev/sdb2 1,8T 436G 1,3T 25% /data
/dev/sda2 95M 26M 70M 27% /boot/efi

looks like I’m ok, sda5 is system and data is where the scraped stuff goes.
It’s a dual boot system.

I think I’ve found the culprit as also on Windows (dual boot) my system was sluggish af, I started to think my 6 months old SSD suffered hardware failure.

I did tests but they all came back like “all good bro”…

Then I bumped on a thread talking about SSD trim and fstrim.timer service. Turns out this thing wasn’t running.

fstrim.timer trims the SSD once a week to keep the SSD performant but that wasn’t happening, while my scrape operations just fueled the fire.

sudo systemctl status fstrim.timer
sudo systemctl enable --now fstrim.timer

Seems to be ok now but I need to observe the situation a little longer.

Thanks again for thinking along everybody.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.