Extended Independent Comparison of Popular Deep Packet Inspection (DPI) Tools for Traffic Classification

Publication: Research - peer-reviewReport

Abstract

Network traffic classification became an essential input for many network-related tasks. However, the continuous evolution of the Internet applications and their techniques to avoid being detected (as dynamic port numbers, encryption, or protocol obfuscation) considerably complicated their classification. We start the report by introducing and shortly describing several well-known DPI tools, which later will be evaluated: PACE, OpenDPI, L7-filter, NDPI, Libprotoident, and NBAR. We tried to use the most recent versions of the classifiers. However, OpenDPI project was closed in June 2011 and since that time no new version of this software was released. L7-filter, which is broadly described in the scientific literature, also seems to be not developed any longer - the most recent version of the classification engine is from January 2011 and the classification rules from 2009.

This report has several major contributions. At first, by using VBS, we created 3 datasets of 17 application protocols, 19 applications (also various configurations of the same application), and 34 web services, which are available to the research community. The first dataset contains full flows with entire packets, the second dataset contains truncated packets (the Ethernet frames were overwritten by 0s after the 70th byte), and the third dataset contains truncated flows (we took only 10 first packets for each flow). The datasets contain 767 690 flows labeled on a multidimensional level. The included application protocols are: DNS, HTTP, ICMP, IMAP (STARTTLS and TLS), NETBIOS (name service and session service), SAMBA, NTP, POP3 (plain and TLS), RTMP, SMTP (plain and TLS), SOCKSv5, SSH, and Webdav. The included applications (and their configurations) are: 4Shared, America's Army, BitTorrent clients (using plain and encrypted BitTorrent protocol), Dropbox, eDonkey clients (using plain and obfuscated eDonkey protocol), Freenet, FTP clients (in active and passive modes), iTunes, League of Legends, Pando Media Booster, PPLive, PPStream, RDP clients, Skype (including audio conversations, file transfers, video conversations), Sopcast, Spotify, Steam, TOR, and World of Warcraft. The included web services are: 4Shared, Amazon, Apple, Ask, Bing, Blogspot, CNN, Craigslist, Cyworld, Doubleclick, eBay, Facebook, Go.com, Google, Instagram, Justin.tv, LinkedIn, Mediafire, MSN, Myspace, Pinterest, Putlocker, QQ.com, Taobao, The Huffington Post, Tumblr, Twitter, Vimeo, VK.com, Wikipedia, Windows Live, Wordpress, Yahoo, and YouTube.

These datasets are available as a bunch of PCAP files containing full flows including the packet payload, together with corresponding text files, which describe the flows in the order as they were originally captured and stored in the PCAP files. The description files contain the start and end timestamps of flows based on the opening and closing of the system sockets, which is useful to reproduce the original behavior, when many short flows are generated between the same hosts during a short period of time. The application name taken from the system sockets is appended as well. Furthermore, each flow is described by one or more labels defining the application protocol, application itself, or the web service. These datasets can be directly used to test various traffic classifiers: port-based, DPI, statistical, etc.

At second, we developed a method for labeling non-HTTP flows, which belong to web services (as YouTube). Labeling based on the corresponding domain names taken from the HTTP header could allow to identify only the HTTP flows. Other flows (as encrypted SSL / HTTPS flows, RTMP flows) are left unlabeled. Therefore, we implemented a heuristic method for detection of non-HTTP flows, which belong to the specific services.

Then, we examined the ability of the DPI tools to accurately label the flows included in our datasets. All the classifiers except NBAR were tested by a special benchmark tool, which read the PCAP files together with their descriptions, composed the packets in the original flows, and provided the flows to the DPIs organized as libraries. To test the accuracy of NBAR, we needed to emulate a Cisco router by using Dynamips together with an original Cisco Internetwork Operating System image. The packets needed to be replayed back to the virtual interface where the Cisco router resided in order to be classified by NBAR. That imposed a few new requirements. At first, the destination MAC address of each packet needed to be changed to the MAC address of the virtual Cisco router interface, as Cisco routers do not accept packets, which are not directed to their interfaces. At second, the source MAC addresses were changed to contain the identifiers of the original flows, so the router could re-construct and assess the flows as they were originated. Then, the Flexible NetFlow feature of Cisco routers was used to apply per-flow application label by NBAR. The NetFlow records were captured on the host machine, where they were analyzed.

It was shown that the detection rate is almost identical on the set containing full flows with entire packets and the set with truncated flows, while it highly decreases on the set with truncated packets. However, Libprotoident is an exception, as it provides the same results independent of the set, as it uses only 4B of packet payload. We showed that, in most cases NBAR (apart of Libprotoident) was the most resistant tool regarding the impact of packet truncation on the detection rate.

We showed that PACE is able to identify the highest number of various web services among all the studied classifiers. PACE detected 16 web services, OpenDPI 2, L7-filter in its default version only 1, NDPI 7, Libprotoident 1, and NBAR none. We have also shown that L7-filter is characterized by a very high number of misclassified flows belonging to web services (usually 80-99%) - the flows were recognized in a vast majority as Finger and Skype.

We evaluated the impact of protocol encryption or obfuscation on the detection rate by the particular classifiers. Protocol encryption made the detection rate lower in all the cases, while we did not see such dependency while using obfuscated eDonkey protocol - in this case, PACE demonstrated even increased detection rate from 16.50% (for plain traffic) to 36%. We have shown that only PACE is able to identify accurately some applications, which are supposed to be hard to detect, as Freenet or TOR.
Close

Details

Network traffic classification became an essential input for many network-related tasks. However, the continuous evolution of the Internet applications and their techniques to avoid being detected (as dynamic port numbers, encryption, or protocol obfuscation) considerably complicated their classification. We start the report by introducing and shortly describing several well-known DPI tools, which later will be evaluated: PACE, OpenDPI, L7-filter, NDPI, Libprotoident, and NBAR. We tried to use the most recent versions of the classifiers. However, OpenDPI project was closed in June 2011 and since that time no new version of this software was released. L7-filter, which is broadly described in the scientific literature, also seems to be not developed any longer - the most recent version of the classification engine is from January 2011 and the classification rules from 2009.

This report has several major contributions. At first, by using VBS, we created 3 datasets of 17 application protocols, 19 applications (also various configurations of the same application), and 34 web services, which are available to the research community. The first dataset contains full flows with entire packets, the second dataset contains truncated packets (the Ethernet frames were overwritten by 0s after the 70th byte), and the third dataset contains truncated flows (we took only 10 first packets for each flow). The datasets contain 767 690 flows labeled on a multidimensional level. The included application protocols are: DNS, HTTP, ICMP, IMAP (STARTTLS and TLS), NETBIOS (name service and session service), SAMBA, NTP, POP3 (plain and TLS), RTMP, SMTP (plain and TLS), SOCKSv5, SSH, and Webdav. The included applications (and their configurations) are: 4Shared, America's Army, BitTorrent clients (using plain and encrypted BitTorrent protocol), Dropbox, eDonkey clients (using plain and obfuscated eDonkey protocol), Freenet, FTP clients (in active and passive modes), iTunes, League of Legends, Pando Media Booster, PPLive, PPStream, RDP clients, Skype (including audio conversations, file transfers, video conversations), Sopcast, Spotify, Steam, TOR, and World of Warcraft. The included web services are: 4Shared, Amazon, Apple, Ask, Bing, Blogspot, CNN, Craigslist, Cyworld, Doubleclick, eBay, Facebook, Go.com, Google, Instagram, Justin.tv, LinkedIn, Mediafire, MSN, Myspace, Pinterest, Putlocker, QQ.com, Taobao, The Huffington Post, Tumblr, Twitter, Vimeo, VK.com, Wikipedia, Windows Live, Wordpress, Yahoo, and YouTube.

These datasets are available as a bunch of PCAP files containing full flows including the packet payload, together with corresponding text files, which describe the flows in the order as they were originally captured and stored in the PCAP files. The description files contain the start and end timestamps of flows based on the opening and closing of the system sockets, which is useful to reproduce the original behavior, when many short flows are generated between the same hosts during a short period of time. The application name taken from the system sockets is appended as well. Furthermore, each flow is described by one or more labels defining the application protocol, application itself, or the web service. These datasets can be directly used to test various traffic classifiers: port-based, DPI, statistical, etc.

At second, we developed a method for labeling non-HTTP flows, which belong to web services (as YouTube). Labeling based on the corresponding domain names taken from the HTTP header could allow to identify only the HTTP flows. Other flows (as encrypted SSL / HTTPS flows, RTMP flows) are left unlabeled. Therefore, we implemented a heuristic method for detection of non-HTTP flows, which belong to the specific services.

Then, we examined the ability of the DPI tools to accurately label the flows included in our datasets. All the classifiers except NBAR were tested by a special benchmark tool, which read the PCAP files together with their descriptions, composed the packets in the original flows, and provided the flows to the DPIs organized as libraries. To test the accuracy of NBAR, we needed to emulate a Cisco router by using Dynamips together with an original Cisco Internetwork Operating System image. The packets needed to be replayed back to the virtual interface where the Cisco router resided in order to be classified by NBAR. That imposed a few new requirements. At first, the destination MAC address of each packet needed to be changed to the MAC address of the virtual Cisco router interface, as Cisco routers do not accept packets, which are not directed to their interfaces. At second, the source MAC addresses were changed to contain the identifiers of the original flows, so the router could re-construct and assess the flows as they were originated. Then, the Flexible NetFlow feature of Cisco routers was used to apply per-flow application label by NBAR. The NetFlow records were captured on the host machine, where they were analyzed.

It was shown that the detection rate is almost identical on the set containing full flows with entire packets and the set with truncated flows, while it highly decreases on the set with truncated packets. However, Libprotoident is an exception, as it provides the same results independent of the set, as it uses only 4B of packet payload. We showed that, in most cases NBAR (apart of Libprotoident) was the most resistant tool regarding the impact of packet truncation on the detection rate.

We showed that PACE is able to identify the highest number of various web services among all the studied classifiers. PACE detected 16 web services, OpenDPI 2, L7-filter in its default version only 1, NDPI 7, Libprotoident 1, and NBAR none. We have also shown that L7-filter is characterized by a very high number of misclassified flows belonging to web services (usually 80-99%) - the flows were recognized in a vast majority as Finger and Skype.

We evaluated the impact of protocol encryption or obfuscation on the detection rate by the particular classifiers. Protocol encryption made the detection rate lower in all the cases, while we did not see such dependency while using obfuscated eDonkey protocol - in this case, PACE demonstrated even increased detection rate from 16.50% (for plain traffic) to 36%. We have shown that only PACE is able to identify accurately some applications, which are supposed to be hard to detect, as Freenet or TOR.
Original languageEnglish
PublisherUniversitat Politècnica de Catalunya
Number of pages440
StatePublished - 7 Jan 2014

    Keywords

  • Computer networks, traffic classification, accuracy, PACE, OpenDPI, NDPI, Libprotoident, NBAR, L7-filter

Download statistics

No data available
ID: 179043084