Monitor time drift with nagios and snmp

The other day I threw together a script that simply checks the remote’s time against the monitor host’s time. It’s fairly straight forward, and will alert you when host’s time drift too much, indicating that your ntpd configuration is bad. I had to do some custimizations to it, because not all hosts have the HOST-RESOURCES-MIB::hrSystemDate.0 option. For hosts that don’t it falls-back to the UCD-SNMP-MIB::versionCDate.0 option.

An agent, auditor, and bodyguard walk into a bar…

This evening I wasted a bunch of time on what turned out to be a simple problem. I really hate it when that happens.

I fixed a bug in tpe-lkm where users weren’t seeing all of their processes, and updated my servers with the new module. Suddenly, my phone starts buzzing off the desk; nagios was complaining that some daemons were down. This data is retrieved via snmp, and upon further investigation, I noticed that the daemons were in-fact up.

So it was a snmp problem.

