What’s everyone using for status monitoring and/or status pages either in their lab or at work?
I setup a status page for my fediverse instances using Uptime Robot (have an existing subscription), and the features are kinda lacking. I feel like they haven’t really updated anything in the last 5 years which is unfortunate.
Uptime Kuma seems really cool. I set it up last week and it seems to work quite well.
I prefer gatus because you can pass it a configuration file. Allows me to manage my setup declaratively
We’re looking at that at work and it seems pretty good. I’d probably want to host it external to where my lab cluster is though otherwise it’s kind of pointless, eh?
That’s true, I thought of hosting it on a VPS but then VPN is another moving part that can fail. I ended up putting it in a mini-pc on the same stack as firewall and modem so that it is relatively stable.
This left me with the problem that I don’t want to expose my docker socket from each host so I’ve to use the network based tools rather than the built in docker monitoring. If you host it in the cluster itself, it shouldn’t be a problem.
Yeah, hosting it yourself certainly has various potential issues unfortunately :/
Maybe it’s been implemented by now, but when I set it up I was disappointed to realize there’s not an API. Previously I was using Statping and had a Slack bot that employee could use to check the status of everything. I saw that there was a project you could install alongside Uptime Kuma to add API endpoints, but I didn’t take the time to set it up.
You should look into uptime kuma. Really powerful and has amazing tools to create maintenance windows and status pages as such: https://fncy.ca/status
GitHub: https://github.com/louislam/uptime-kuma
EDIT: Sorry, looks like lemmy.cloudhub.social is taking a long time to federate to FancyLemmy - as the other user said, kuma is great
Odd, is that federation issue on my end?
Not sure, honestly! I’ll try to troubleshoot soon. That’s a first on my instance.
I switched to Upptime a while ago: https://github.com/upptime/upptime
That runs fully on GitHub Actions and fit my requirements. Very much depends on what you need of course.
Ooooh, I’ll have to check that out, my whole code base for the cluster is in GitHub using FluxCD