Classification of HTTP traffic based on C5.0 Machine Learning Algorithm

Tomasz Bujlow; Tahir Riaz; Jens Myrup Pedersen

doi:10.1109/ISCC.2012.6249413

Classification of HTTP traffic based on C5.0 Machine Learning Algorithm

Tomasz Bujlow, Tahir Riaz, Jens Myrup Pedersen

Department of Electronic Systems

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

14 Citations (Scopus)

1033 Downloads (Pure)

Abstract

Our previous work demonstrated the possibility of distinguishing several groups of traffic with accuracy of over 99%. Today, most of the traffic is generated by web browsers, which provide different kinds of services based on the HTTP protocol: web browsing, file downloads, audio and voice streaming through third-party plugins, etc. This paper suggests and evaluates two approaches to distinguish various types of HTTP traffic based on the content: distributed among volunteers' machines and centralized running in the core of the network. We also assess the accuracy of the centralized classifier for both the HTTP traffic and mixed HTTP/non-HTTP traffic. In the latter case, we achieved the accuracy of 94%. Finally, we provide graphical characteristics of different kinds of HTTP traffic.

Original language	English
Title of host publication	IEEE Symposium on Computers and Communications (ISCC), 2012
Number of pages	6
Place of Publication	Cappadocia
Publisher	IEEE
Publication date	1 Jul 2012
Pages	000882 - 000887
ISBN (Print)	978-1-4673-2712-1
ISBN (Electronic)	978-1-4673-2711-4
DOIs	https://doi.org/10.1109/ISCC.2012.6249413
Publication status	Published - 1 Jul 2012
Event	The Seventeenth IEEE Symposium on Computers and Communications - Cappadocia, Turkey Duration: 1 Jul 2012 → 4 Jul 2012

Conference

Conference	The Seventeenth IEEE Symposium on Computers and Communications
Country/Territory	Turkey
City	Cappadocia
Period	01/07/2012 → 04/07/2012

Series	I E E E International Symposium on Computers and Communications
ISSN	1530-1346

Keywords

traffic classification
computer networks
HTTP traffic
browser traffic
C5.0
Machine Learning Algorithms (MLAs)
performance monitoring

Access to Document

10.1109/ISCC.2012.6249413

Classification of HTTP traffic based on C5.0 Machine Learning AlgorithmAccepted author manuscript, 533 KB

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@inproceedings{276fdb95701149268a4e724aabdcb02b,

title = "Classification of HTTP traffic based on C5.0 Machine Learning Algorithm",

abstract = "Our previous work demonstrated the possibility of distinguishing several groups of traffic with accuracy of over 99%. Today, most of the traffic is generated by web browsers, which provide different kinds of services based on the HTTP protocol: web browsing, file downloads, audio and voice streaming through third-party plugins, etc. This paper suggests and evaluates two approaches to distinguish various types of HTTP traffic based on the content: distributed among volunteers' machines and centralized running in the core of the network. We also assess the accuracy of the centralized classifier for both the HTTP traffic and mixed HTTP/non-HTTP traffic. In the latter case, we achieved the accuracy of 94%. Finally, we provide graphical characteristics of different kinds of HTTP traffic.",

keywords = "traffic classification, computer networks, HTTP traffic, browser traffic, C5.0, Machine Learning Algorithms (MLAs), performance monitoring",

author = "Tomasz Bujlow and Tahir Riaz and Pedersen, {Jens Myrup}",

year = "2012",

month = jul,

day = "1",

doi = "10.1109/ISCC.2012.6249413",

language = "English",

isbn = "978-1-4673-2712-1 ",

series = "I E E E International Symposium on Computers and Communications",

publisher = "IEEE",

pages = "000882 -- 000887",

booktitle = "IEEE Symposium on Computers and Communications (ISCC), 2012",

address = "United States",

note = "The Seventeenth IEEE Symposium on Computers and Communications, ISCC{\textquoteright}12 ; Conference date: 01-07-2012 Through 04-07-2012",

}

Bujlow, T, Riaz, T & Pedersen, JM 2012, Classification of HTTP traffic based on C5.0 Machine Learning Algorithm. in IEEE Symposium on Computers and Communications (ISCC), 2012 . IEEE, Cappadocia, I E E E International Symposium on Computers and Communications, pp. 000882 - 000887, The Seventeenth IEEE Symposium on Computers and Communications, Cappadocia, Turkey, 01/07/2012. https://doi.org/10.1109/ISCC.2012.6249413

Classification of HTTP traffic based on C5.0 Machine Learning Algorithm. / Bujlow, Tomasz; Riaz, Tahir; Pedersen, Jens Myrup.
IEEE Symposium on Computers and Communications (ISCC), 2012 . Cappadocia: IEEE, 2012. p. 000882 - 000887 (I E E E International Symposium on Computers and Communications).

Research output: Contribution to book/anthology/report/conference proceeding › Article in proceeding › Research › peer-review

TY - GEN

T1 - Classification of HTTP traffic based on C5.0 Machine Learning Algorithm

AU - Bujlow, Tomasz

AU - Riaz, Tahir

AU - Pedersen, Jens Myrup

PY - 2012/7/1

Y1 - 2012/7/1

N2 - Our previous work demonstrated the possibility of distinguishing several groups of traffic with accuracy of over 99%. Today, most of the traffic is generated by web browsers, which provide different kinds of services based on the HTTP protocol: web browsing, file downloads, audio and voice streaming through third-party plugins, etc. This paper suggests and evaluates two approaches to distinguish various types of HTTP traffic based on the content: distributed among volunteers' machines and centralized running in the core of the network. We also assess the accuracy of the centralized classifier for both the HTTP traffic and mixed HTTP/non-HTTP traffic. In the latter case, we achieved the accuracy of 94%. Finally, we provide graphical characteristics of different kinds of HTTP traffic.

AB - Our previous work demonstrated the possibility of distinguishing several groups of traffic with accuracy of over 99%. Today, most of the traffic is generated by web browsers, which provide different kinds of services based on the HTTP protocol: web browsing, file downloads, audio and voice streaming through third-party plugins, etc. This paper suggests and evaluates two approaches to distinguish various types of HTTP traffic based on the content: distributed among volunteers' machines and centralized running in the core of the network. We also assess the accuracy of the centralized classifier for both the HTTP traffic and mixed HTTP/non-HTTP traffic. In the latter case, we achieved the accuracy of 94%. Finally, we provide graphical characteristics of different kinds of HTTP traffic.

KW - traffic classification

KW - computer networks

KW - HTTP traffic

KW - browser traffic

KW - C5.0

KW - Machine Learning Algorithms (MLAs)

KW - performance monitoring

UR - http://www.scopus.com/inward/record.url?scp=84866614441&partnerID=8YFLogxK

U2 - 10.1109/ISCC.2012.6249413

DO - 10.1109/ISCC.2012.6249413

M3 - Article in proceeding

SN - 978-1-4673-2712-1

T3 - I E E E International Symposium on Computers and Communications

SP - 882

EP - 887

BT - IEEE Symposium on Computers and Communications (ISCC), 2012

PB - IEEE

CY - Cappadocia

T2 - The Seventeenth IEEE Symposium on Computers and Communications

Y2 - 1 July 2012 through 4 July 2012

ER -

Classification of HTTP traffic based on C5.0 Machine Learning Algorithm

Abstract

Conference

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this