subj. ZABBIX does something quite opposite.
I had successfully changed the code to have as many preprocessor managers as i want.
But whatever i did, stuck at 60k nvps.
My initial thought was it's preprocessor manager slows things down. So i've rewrite it to have 4 managers.
But then i saw that running 4 managers spreads CPU load between them, but gives no NVPS increase.
Whatever i thought, i couldn't find a reason to such a behavior. But today it struck me - it's worker threads and workers are waiting for them.
So, i straced worker thread.
.. semop....
Ok, is it GLOBAL config LOCK again?
NO, it's mutex 0, the mutex dedicated for logs.
But i don't see anything from workers in logs and it is loglevel is 3.
And guess what?
Let me show you the main loop of worker code:
During all this tests i've also found preprocessor manager priority flaw - it was more likely that it will gather queues before the buffer is full. And when the buffer is full it would be working OK, but it would be slowing down pollers.
Now it's just fine. Queue is close to zero. Each one of managers writing about 110k items each 5 seconds. So totally it is 110k*4threads/5seconds=88kNVPS.
I had successfully changed the code to have as many preprocessor managers as i want.
But whatever i did, stuck at 60k nvps.
My initial thought was it's preprocessor manager slows things down. So i've rewrite it to have 4 managers.
But then i saw that running 4 managers spreads CPU load between them, but gives no NVPS increase.
Whatever i thought, i couldn't find a reason to such a behavior. But today it struck me - it's worker threads and workers are waiting for them.
So, i straced worker thread.
.. semop....
Ok, is it GLOBAL config LOCK again?
NO, it's mutex 0, the mutex dedicated for logs.
But i don't see anything from workers in logs and it is loglevel is 3.
And guess what?
Let me show you the main loop of worker code:
I commented it out. It does log rotation. With locking. Inside loop which runs possibly in 100 threads and which task is to process an item as fast as possible.
So all the worker threads was keep trying to put LOG lock and to rotate the log file if it's bigger then config values. I guess it's a developers mistake.
So, without log handling workers could do a bit faster and server achieved approximately 85-90kNVPS, while having only 25% idle.
So, perhaps, that's the limit for double E5645 system with 12 physical cores (and total 24 cores if we count hyper threaded ones).
During all this tests i've also found preprocessor manager priority flaw - it was more likely that it will gather queues before the buffer is full. And when the buffer is full it would be working OK, but it would be slowing down pollers.
Now it's just fine. Queue is close to zero. Each one of managers writing about 110k items each 5 seconds. So totally it is 110k*4threads/5seconds=88kNVPS.
No comments:
Post a Comment