Sunday, July 22, 2018

ZABBIX: fping vs nmap replacement and "distributed" pingning

Eventualy at some point of time i've decided that nmap will be better then fping in terms of speed.

The test on 3,5k hosts file have shown that nmap finishes the job about 2,5 times faster: 40 seconds versus 100 seconds.

I did the changes in /lib/icmpping.c, so perhaps it will work for proxy either.

The only shortcoming in nmap that it actually doesn't do loss check, but only accessibility check with round trip time calculation.

I haven't found a way to tell nmap to send 10 packets and have loss percentage as well as min/max/avg statistics. This was a stuck situation for some time, but then idea of "time distributed" monitoring came up.

The idea: Instead of sending N packets in a row in quite a short time span during the test period P, it's better to send 1 packet each P/N second and have a pings distributed evenly. Then calculate loss on triggers.

The idea was crazy with the old system . Limit for it was 120 seconds and that was distributed on 10 proxies. But with new enhancements in code and processing the system on its own achieved less then 1.5 seconds interval between check of a single host.

It's quite a remarkable. Totally it's about 28k NVPS.

The checks setup to check host accessibility once each 10 seconds, which is done right now, and triggers are setup to react to 3 consecutive ping losees. So, in the worst case it is 40 seconds reaction. It's ok for most of outages and reliable enough to not bother on some sudden packet loss.


The last thing to note, i left possibility the system to use fping and it's quite simple: for simple icmp checks, if you there is 1 packet in icmp parameters then nmap is used, if it's more, then  fping. This potentially might be misleading for administrators and they might forget about it. For our production system there is no fping option.



No comments:

Post a Comment