Is our Ground-Truth for Traffic Classification Reliable?

Valentín Carela-Español, Tomasz Bujlow, Pere Barlet-Ros

Research output: Contribution to book/anthology/report/conference proceedingArticle in proceedingResearchpeer-review

50 Citations (Scopus)

Abstract

The validation of the different proposals in the traffic classification literature is a controversial issue. Usually, these works base their results on a ground-truth built from private datasets and labeled by techniques of unknown reliability. This makes the validation and comparison with other solutions an extremely difficult task.

This paper aims to be a first step towards addressing the validation and trustworthiness problem of network traffic classifiers. We perform a comparison between 6 well-known DPI-based techniques, which are frequently used in the literature for ground-truth generation. In order to evaluate these tools we have carefully built a labeled dataset of more than 500 000 flows, which contains traffic from popular applications. Our results present PACE, a commercial tool, as the most reliable solution for ground-truth generation. However, among the open-source tools available, NDPI and especially Libprotoident, also achieve very high precision, while other, more frequently used tools (e.g., L7-filter) are not reliable enough and should not be used for ground-truth generation in their current form.
Original languageEnglish
Title of host publicationPassive and Active Measurement : Passive and Active Measurement, 15th International Conference, PAM 2014, Los Angeles, USA, March 10-11, 2014, Proceedings Series:
Number of pages11
Volume8362
PublisherSpringer Science+Business Media
Publication date11 Mar 2014
Pages98-108
DOIs
Publication statusPublished - 11 Mar 2014
SeriesLecture Notes in Computer Science
ISSN0302-9743

Fingerprint

Dive into the research topics of 'Is our Ground-Truth for Traffic Classification Reliable?'. Together they form a unique fingerprint.

Cite this