Forum 500 errors and non-loading topics


#1

You will no doubt have seen an increasing number of 500 errors over the past few hours.

This seems to be an issue with Redis, and affects even Discourse’s own paid-for hosted instances:

It’s obviously not ideal. Please be patient until we can get it fixed. :slightly_frowning_face:


Moving the forum to Bytemark: three weeks later
Posting in the forum renders to internal server errors
#2

How long did you have to wait before successfully posting ? :thinking:


#3

About an hour spamming the “Post” “OK” buttons. :slight_smile:


#4

Phil fixed this earlier today. I’m keeping an eye on it.


#5

All seems to be working well now . Thanks for all the work guys!


#6

Well obviously you’re posting comments here… :yum:


#7

When I posted that previous comment I got this… lol :laughing:
@kmartin you jinxed it.


#8

Oh, for …

What is Discourse playing at?

OK, this is silly.

Postgresql recommendation (https://www.postgresql.org/docs/current/static/kernel-resources.html#LINUX-MEMORY-OVERCOMMIT) is to disable overcommit:

vm.overcommit_memory = 2

Redis recommendation (https://redis.io/topics/faq#background-saving-fails-with-a-fork-error-under-linux-even-if-i-have-a-lot-of-free-ram) is to always enable overcommit:

vm.overcommit_memory = 1

These are entirely opposite - we can’t do both on the same server…

And OOM would explain the issues.


#9

I just rebuilt the software and restarted the server. Hope this helps for now. Maybe we should think about a swapfile just in case …


#10

I’ve just put a 4GB swapfile in place. If we’re hitting spikes of RAM usage, and there’s no leeway, that will cause background processes to be killed.

Looking at the logs, it looks like 500 errors (preventing users from posting or interacting generally) also happen when a postgresql process is killed, so the transaction is rolled back.


#11

Ok, we will see. Any possibility to add some alert system to it?


#12

I think you need to request cloud server access for your Bytemark control panel.


#13

Since we have installed a complete new version, those might not work. Maybe we have to implement some on our own. Lets explore this and we will find a proper solution for sure. I didn’t used their debian based system. Prometheus would be such a tool we can use in rich varity.


#14

OK, though as a decent provider this should already be taken care of. If DO have alerting, Bytemark will have alerting. Plus with the correct access set in the panel you get a web console for the server. :slight_smile:


#15

We will figure it out.


#16

No more issues here. The forum seems faster now than it was before the switch to Bytemark.