The problem
On full load time we've noticed we have a 'strange' system CPU usage on the systems where ng_state works, and support team started to get complains about delays.
This started in the middle of december.
It took a few days to recognize the problem: ng_queue process starts to consume more processor time and even take fist 'top' positions
So, the queueing means the ng_state module cannot process traffic.
But wait, lets look graphs again - we have more then 50% of CPU free.(pic)
Here is the story begins.
Ok, i did some tests on the working system and clearly recognized the problem - removing the controlling traffic (all ngctl calls) lead to immediate load drop of ng_queue.
So, i've asked a question on FreeBSD forums about netgraph behavior, didn't get an answer.
My idea was that some kind of data blocking occurs when ngctl (netgraph control messages) calls are processed, i thought that this is general problem of controll mesages.
I did more tests and could prove that any messages to any of the modules involved to traffic processing lead to growth of ng_queue process time.
Ok, if control messages have such a bad nature then i had to get rid of them - so i did an extra 'command' hook to get commands via data packets and started to use NgSocket C library to interact with it.
On tests this showed absence of problems with ng_queue grouth on high control traffic.
Fine! 2-3 days and we have that on the production.
Guess what?
Yes the problem with ng_queue on high control traffic disappeared even on high intensity control traffic.
But .... a few hours later i saw that queuing is still there and takes some proccessor time.
In the evening time queuing was about the same amount as before with the same problems.
WTF????
Ok, this time i HAD to know - why queueng occurs - it took another week of tests, profiling, kernel sources digging and the picture was clear:
There are 3 reasons for queueng
- if we asked to queue data in the module/hook/packet
- if module is out of stack
- if the traffic is coming to inbound hook while being processing by outbound thread (loop situation)
And some good news - i've found the answer why queueng occurred - command messages have to had a special flag to be proccessed without blocking, so i've put the flag to the module, and returned to ngctl command traffic.
This was nowhere end of story.
Queuing isn't disappeared.
But queuing wasn't easy to catch - it was appearing after some time (5-30 minutes) after putting the control traffic on the module and loading the module with a few gigs of payload.
I was switching off some processing, swithcing back again, one by one. I was getting false positive results and next day i was thinking that problem somewhere else.
Someday i've found that the reason is MALLOC, great, i switched to UMA then, no success.
After one more week of doing that i had 2 results that was 100% prooved - after module has been restarted, without control traffic it lived fine. After first IP being put to a service - ng_queue started to grow.
Stuck.
Then I've switched to netgraph kernel profiling.
First of all, i added some printfs to see a reason why a packet has been queued.
And this was the first light in the tunnel.
So, i realized, i was getting all 3 reasons.
Unbelievable.
Ok. Reason one - loop detecten - well, this was easy, we have a loop there. This was eliminated quickly.
Reason 2 - queueng because we asked so - this was happening because of control massegges was 'WRITERS', in other words - the module was blocked during processing it, add the flag - disappeared.
Reason 3 - stack. I am not that a programmer to know reason why we out of stack. But wait. I still rememeber something. Stack is used to allocate varaibles defined inside procedures.
So, the netgraph is line of procedures call and sometimes there are up to 10 of them. Not that many to bother. I only knew by that time that module can allocat via MALLOCs 64 megs - need more? - Use UMA.
I was expecting that i have a few megs of stack, but it was wrong.
Profiling showed that only 16kilobites (yes, only 16384 bytes) was alloated for the stack.
If more then 3/4 of stack where used, netgraph would put the packed to queue instead of next procedure processing. So, as soon as stack consumption was close to 12 kB - we'are in queue (toast, dead).
When packet was coming to ng_state it had already 5-6kb of stack used. And ng_state, whait - what's there .... no, it's 2kB buffer at one place, and another 2kb buffer in other place ... do i need to continue?
Now things was clear - the second 2kb buffer appeared when i added messages processing via separate hook, this i can just remove because it's no longer needed, the second 2kb buffer was ... in dead code.
Ok, lets' recap.
1.There was queuing because of control message blocking
2.There was queuing of classified packets because of looping
3.After fixing 2 there was queeing of classified packets beacuse of extra 2kb buffer twice on the path
4.Because of (3) fixing of (2) didn't help
5.When finally (3) was fixed, things got to normal
and ... no, this wasn't all....
6.Because of hurry old SVN copy was used in 3 systems out of 4, next day i was looking at strange graph and i was in a mess - Support team reported great improvements, no complains, ones who's had problem said network is great now, but i still could see the problems - maybe i was so sure and glad due the fact that problem solved yesterday that they couldn't just accept there still problems... who knows.
So now it's still one interesting thing left - how much traffic we can put on a system ? Is it planned 4-5Gigs (7-8 in Duplex) or we'll meet next barrier someplace
Next week we'll maybe join loads from 2 machines into one which will be about 3-5 gigs of downstream traffic