Saturday, April 16, 2011

Problems with BRAS high load ... fixed


About two weeks ago we've started to expereince problems with NAS servers is i wrote a week ago.

On peak load hours they were serving almost 300mbits and suffering from 6000+ pipes number. At some point dummynet caused degradation up to 10 times (both by traffic and packet no) Pipes where used for customer traffic policing.

To avoid the 'pipe' problem and get rid of dummynet i've wrote ng_dummynet module. The reason we don't want to use IPFW + tables + ng_ipfw - we are not letting L3 processing at all. Ng dummynet rememres which Ip in which class and sends traffic of the same class (which might be several IPs) to the same ng_car module.
Two days in production showed 20-30% degradation in comparison to dummynet which where very strange and unexpected. In two days i've spent lots of time trying to figure where is the problem, which could end up kerenl tasq process profiling.

But today on the tetst stand we finally detected that degradation happens on more than 1000 customers in new ng_dummynet module.

While doing block by block cut-offs of functionality in the module finally i've found the problem: it was diffent network and host byte order. The module uses hashing by two last bytes, which turned to be first two ip bytes and they are the same for all customers. So, the hashing woked as simple one by one IPs enumeration. Which turned to enumeration of 2-3k IPs in average for each packet.

The module fix worked out: imeddieately CPU tasq time dropped times.
Shure you want the picture:


The interesting thing: traffic from one NIC always processed on  one core.

Conclusions: on two NIC system it's dnagerous to work on 2-Core system as traffic starvation may overload all the system. On 4- kernel system (i5) with two NIC's CPU Utilization will not go higher then 50%, on 8-core i7 -  not higher then 25%. Of course on i5 and i7 free cores may be loaded with extra something, for example, put another NIC.

Take a look: ะก2Duo 8500 -> Core I7 (some higher model with 4 cores 3mhz) upgrade




And the last few words: i even started to write all-in one module to avoid multple IP lookups and rely only on netflow engine.
But now it seems to be useless. Unless i do P2P by behaviour recognition.

I think that after final optimizations we can get even more 10-15% CPU time on the same system.

No comments:

Post a Comment