Wednesday, June 22, 2011

DPI traffic recognition

How it works
To survive congestion times and prioritize traffic we're using mix of DPI and behavior-analyze.

DPI looks signatures and marks flows as certain class if they match patterns.

If flow doesn't match a pattern after first 10 packets it is marked as P2P-possible.
Each 5 seconds the decision is made about each client - is he P2P-er or not.

For ones detected as P2P-eer P2P-possible traffic is subject of additional polices.

Why behavior ?
 - P2P traffic mutates frequently and developers do everything to hide it, so DPI is difficult to maintain and not effective
 - such a traffic easily detectable by it's behavior - P2P applications tend to create hundreds of `connections` (TCP or UDP), and they greedy about bandwidth

So, to detect P2P it's simply enough to count flows and amount data transferred via that flows.
We also do simple optimizations - only considering  P2P-possible flows where traffic have been seen in last 3 seconds.

The numbers  were picked after 2 days of capturing and analyzing flow statistics.

One of the methods to do that was creating graphs from real-life.
This is how some kinds of traffic looks like:


Web pages generate up to 100 flows, but they contain relatively small number of bytes.

Downloads from one host (say, ftp or http file download) give ralatively small number of flows with big amount of bytes in them.

P2P traffic generates either a few hundreds number of flows  with lots of bytes in active flows.

P2P traffic is very flexible - according to network conditions it can do big flows from single peer or download only a few hundend kilobytes from each, but do it from thousand of peers. We've tried to do per-flow policing while detecting P2P - this gives no result - P2P software adapts very well to such a conditions: if large flows restricted - does lots of small flows. 

Policing traffic by it's behavior is a "feedback" system.
So, policing parameters should be adequate to now allow the "feedback" system to start flapping between border states (RED), instead it should attenuate to stable level.

So, playing with timings and amount of regulation each time we've got the BLUE picture bellow.



By now the system is pretty effective.
It's possible that later P2P will mutate to look like one of white-list protocols, this is very simple to do, but i think it won't be difficult to detect that.

At the moment from real traffic we weren't able to find such things.
But we've discovered that it's rather popular to use HTTP in games, SSL in applications, but haven't detected any P2P look-like traffic on top of them.

And the last, economical part:

We'd like to buy a ready DPI solution, because it has so many patterns, they are updated, maintained, BUT all of them COSTS.

Prices are about 10k$ per Gig. Our BRAS and DPI and SCE combined-device on top of PC hardware is 0.5k$ per Gig, and yes, plus our time, but still more then 10 times difference, plus easy upgradable and expandable.