macgregor@lemmy.world

macgregor@lemmy.world

I run a baremetal Kubernetes cluster on a couple raspberry pis (though that detail isn’t super important to this question). I am familiar with Kubernetes metrics/alerting tools such as grafana, Prometheus, Loki, ELK stack, etc. I am also familiar with the node metrics exporter for gathering node level resource metrics like CPU, memory, file system, temps, etc. All that’s great and gets me like 99% of the way there. The last 1% that I am looking for are things like available updates (e.g. 56 packages with available updates), reboot required, system component status, etc and for whatever reason I sttuggle to find good search results for this specific problem area.

I can and do use things like dnf-automatic/unattended-upgrades and systemd to maintain the minimal system level health (so 99% -> 99.8%) but I haven’t been able to find a solution that provides a bit more insight depth into underlying system health, probably because that’s usually handled by cloud providers/hypervisors. I am sure I could come up with some custom, not too hacky solution for myself (off the top of my head: a pod/job with access the underlying system to run whatever commands I want to gather state and make it available to the Kube space general monitoring solution, feels dirty though) but it feels like an obvious hole I’m just missing the wrong Google incantation to find.

Any ideas or experience you can provide? Please don’t suggest kube metrics node-exporter, unless I am missing something it doesn’t provide what I am asking about.

Baremetal Kubernetes - LF host level metrics/monitoring/reporting solution

Baremetal Kubernetes - LF host level metrics/monitoring/reporting solution