Tuesday, June 19, 2018

ZABBIX: the preprocessing manager



All went fine until asynchronous SNMP processing where put in test server with about 10k hosts to poll.

At the point when the server was polling about 5k SNMP values per second, the pre-processing manager queue started to grow.

At the same time i saw significant CPU increase for SNMP polling processes, also timings became much worse.

I've tried to play a bit with number of threads doing polling, pre-processing and db syncing and realized the following:
  • the problem doesn't appear when i have only one or two SNMP pollers. I believe reason for that is fast that two pollers hardly can get 5k SNMP NVPS's
  • it is the same problem i saw on my laptop. At that moment i thought is the reason of slow CPU
  • it doesn't depend on any other threads and their count
  • it doesn't depend on history database read speeds. Switching reading off doesn't change anything
  • The more values system able to gather, the worse it gets. Starting 6-10 SNMP pollers leads to fast preprocessing manager queue grow and  bigger the queue gets, less items gets processed. 
  • After 100k items in queue Zabbix processes only 0-200 items each 5 seconds
  • Typically after 2-60 minutes Zabbix crashes with syssegv 11 in the preprocessor manager queue 
 Picture:1M+ items queue, pollers consuming up to 15%CPU, working 3-10 times longer then normal

Overall it seems i have two alternative ways to solve this:
  • figure out what the problem is and fix it 
  • degrade version to 3.0 line  (probably i'll have to rewrite some fixes), since processing manager was introduced only in 3.4, this way all the API scripts might also need to be fixed.

 Sure i'll try the first one first. Nice that it's quite easy to reproduce the problem, and there is a helpful debug.

Update: the second way doesn't fit. As 3.0 doesn't have history interface and possibility to offloading history data to a BigData storage

Update: the issue is fixed, it was another time waster and lack of experience.

No comments:

Post a Comment