Today, instead of polishing my SCE solution to put it to production 4 am in Monday, i had to dig some DNS stuff.
In short, BIND sucks. To be more specific: actually it's not and it's a very good piece of software, which can do alot. But when used in so-called split-horizon setups it becomes very heavy, long starting, memory hungry begemoth
Today i spent 3 hours in trying to understand WTF is going on with one of our DNS servers. The task was even more complicated because of load balancers are on the way. I discovered it by very unpleasant surfing in browser, 'host' test showed that some queries, even cached ones took up to two seconds to resolve. That explained a lot.
So, when i looked at the first ns (say, ns1) stats - i saw this:
thats right, almost 100% resources under named. (See the green IDLE time).
Even at night time.
After doing some profiling ( strace -c ) i've discovered that most of the time named sits in futex syscall, and about 10-20% of them aren't succesifull.
Futex problem means concurrency problem. Switching off one CPU in config did magic thing - two out of three threads dropped futex time to less then 1%, and no more futex unsuccessful calls.
Interestingly, but the third process is almost only futex calls, and half of them failing.
This setup does the job much better - no visible delays occurs even when i put all the load to the server. And significant load drop.
Here is whole picture.
In the center there is the result of named restart - it's easy to see that problem not immediate, it takes 4-5 hours to occur.
On the right the result of switching to one CPU and getting rid of locks is visible - the system becomes 60% free, with no visible signs of degradation, actually,yet.
Troubleshooting is done, somehow it's working. But i need to further. First off all, hardware upgrade needed, as it's only c2duo 6300 CPU, will do on Monday.
But stop, any software can bring even most sophisticated hardware down to it's knees by doing stupid things like BIND does.
It looks like i had to look at something very efficient, for example djbdns, something like nginx.
The problem is that i need views support (split-horizon DNS). Djbdns is the only free alternative able to do that. Others suggest setting extra instance of the server per view, which will be pretty complicated setup in my case.
The other reason to get rid of BIND is for handling load-balanced pools which consists from 40-50 servers: BIND returns them all, and i haven't found a way to reduce number to say, 4-5 resources. And this part is really needed, because there are plenty of dumb devices that fail to resolve big RR pools.
Djbdns looks to be pretty efficient and realy nice solution.
It also splits caching and authoritive parts of the server into two daemons, which is nice industry-stnsart feature,but it is another problem for me.
Authoritive server will not do caching or forwarding, but it does split-horizon.
If i put caching server first (how it's supposed to be), then authoritive will have no info about user IPs which means that split horizon will not work.
Next i've got an idea to substitute root servers on authoritive server and use a patch to answer list of root servers when authoritive part recieves something it's not authoritive for. But it means every client will do two requests instead of one - first will return the IP of cache and then second will do the request to the cache. Not good.
What will definetly work - making one cache for each view. At the moment i have 5 views, and one view doesn't need caching.
So many thoughts that i am stopping at this point. Lets rest a bit.
While looking through the features of the servers i've found intreresting utulities with ProDNS server - they can replay captured DNS traffic on selected DNS - this is an easy way to simulate payload close to real on the new server.
No comments:
Post a Comment