Nowadays, there are many tools, which are being able to classify the traffic in computer networks. Each of these tools claims to have certain accuracy, but it is a hard task to asses which tool is better, because they are tested on various datasets. Therefore, we made an approach to create a dataset, which can be used to test all the traffic classifiers. In order to do that, we used our system to collect the complete packets from the network interfaces. The packets are grouped into flows, and each flow is collected together with the process name taken from Windows / Linux sockets, so the researchers do not only have the full payloads, but also they are provided the information which application created the flow. Therefore, the dataset is useful for testing Deep Packet Inspection (DPI) tools, as well as statistical, and port-based classifiers. The dataset was created in a fully manual way, which ensures that all the time parameters inside the dataset are comparable with the parameters of the usual network data of the same type. The system for collecting of the data, as well as the dataset, are made available to the public. Afterwards, we compared the accuracy of classification on our dataset of PACE, OpenDPI, NDPI, Libprotoident, NBAR, four different variants of L7-filter, and a statistic-based tool developed at UPC. We performed a comprehensive evaluation of the classifiers on different levels of granularity: application level, content level, and service provider level. We found out that the best performing classifier on our dataset is PACE. From the non-commercial tools, NDPI and Libprotoident provided the most accurate results, while the worst accuracy we obtained from all 4 versions of L7-filter.