Saturday, March 26, 2011

some news, good and not so good

At the moment we transferred almost 20k happy customers to new technology. 40k pending. The first outcome - about 20% of customers prefer old access methods.

Now we have problems which appeared as result of service simplification in the past. The problem is DUMMYNET. After 6k PIPES system dies. That's it, doesn't matter how much traffic.

So, now we are adding boxes to process traffic, but this is temporary solution. I look forward now to get rid of dummynet and use ng_car for policing.

When we;ve switched from dummynet on pptp servers to ng_car, we could double user capacity per system. Hopefully this will be the case this time again.

Some task analysis: there 2 primary methods to do the policing - one is to put separate traffic multiplexer to thousands of ng_cars, the other is to use well-working ng_state for that.

Advantage of separate multiplexer is it will just work in current setup. Disadvantage - every packet should be switched that way.
Implementation with ng_state allows to process only first packet in the flow, BUT following problems arrives: traffic forwarding and class policing should be done differently then. Actually instead of using different traffic path's for different classes i should add class labels to the packet.

Will do separate multiplexer now, as faster to implement solution and also, it might be not much slower comparing to the second solution as there label lookup will be needed for each packet.

Thursday, March 10, 2011

DNS story, part2

Tests revealed that old djbdns seem to be too old as it processes requests 10 times slower then slow bind does.

The final decicsion and contest winner is powerDNS recursor for recursor part and BIND for authoritative part. And yes, finaly we've split them.

Idea to have 12 caches and split-horizon auth DNS server still continiuoes to be just an idea.

Instead it apperaed that powerDNS recursor has a very nice feature - it can pass all incoming requests to a lua script and all request ended with NX  answer to another lua script. This is enought to fullfill all our split horizon demands.

The only exception we don't know destination IP of DNS query in script when nice "packet cache" feature is on. Thats why at the moment two copies of powerDNS recursor is on duty. Not a big problem, two!=twelve.

It took about 10 working days to transfer all recursive payload from named to powerDNS. All went smooth,  without server lost.

Some things to notice:
  • the most recent version of powerDNS recursor doesn't do round robin correct when answering queries with many resources. I didn't have time to figure out why, just downgraded to one minor version, it's ok.
  • to have all private thing functional ( .local adresses and "grey" networks PTR resolving) recursor should be specifically told where auth server for them is, as root servers will not answer/know about such a resources. 
  • restarts instantly, 
  • memory consumtion as 200-300 megs, ten times less than named. 

At the moment we are leaving named as authoritative server. Mostly because of noc duty guys are comfortable to work with it, and absense of high traffic now, most mission-critical requests for customers processed inside recursor without asking auth server.

The CPU usage diagram:

Doesn't look impressive, but please note that user load (middle line) with bind couldn't get more then 50% (one CPU),  The user load magnitude after upgrade is because named cleaning it's empty cache (recursive part still on, but no real requests go to bind).