I have this Debian server that won’t stop crashing. It crashes once every 2 or 3 days. Everything’s up to date, the cpu is good and prime 95 never finds any problems. The ram is good and I’ve run every ram test there is and never found anything wrong. I just can’t get it to stop crashing and it’s driving me insane.
I used to have an arduino connected to the motherboard’s reset jumper and then set up a bash script as a systemctl service that sent a signal to the arduino every 10 seconds and if the arduino didn’t receive a signal after 30 seconds it forces a reboot. This doesn’t even automate the process of restarting after a crash because too often, the server will crash just lightly enough that everything except that autorestart bash script service stops working so it won’t reboot. It does double amount the time the server works without manual intervention though which is better than nothing but not good enough.
Other than just randomly installing different distros until I find one that doesn’t do this (reinstalling an os and then setting all the server stuff back up is very time consuming), what can I do to troubleshoot/solve/stop or otherwise do anything about these crashes?
Yeah, unfortunately it’s damn near impossible to pin down the failing part exactly without a bunch of spares parts.
You could look around your mobo for bulging capacitors, but that could be a long shot.
You could also try sifting through your journalctl looking for warnings and errors.