TY - GEN
T1 - TABOO
T2 - 33rd IEEE International Conference on Data Engineering, ICDE 2017
AU - Neerbek, Jan
AU - Assent, Ira
AU - Dolog, Peter
PY - 2017/5/16
Y1 - 2017/5/16
N2 - © 2017 IEEE. Leak of sensitive information from unstructured text documents is a costly problem both for government and for industrial institutions. Traditional approaches for data leak prevention are commonly based on the hypothesis that sensitive information is reflected in the presence of distinct sensitive words. However, for complex sensitive information, this hypothesis may not hold. Our TABOO system detects complex sensitive information in text documents by learning the semantic and syntactic structure of text documents. Our approach is based on natural language processing methods for paraphrase detection, and uses recursive neural networks to assign sensitivity scores to the semantic components of the sentence structure. The demonstration of TABOO focuses on interactive detection of sensitive information with the TABOO system. Users may work with real documents, alter documents or prepare free text, and subject it to information detection. TABOO allows users to work with our TABOO engine or with traditional approaches, and to compare results. Users may verify that single words can change sensitivity according to context, thereby giving hands-on experience with complex cases of sensitive information.
AB - © 2017 IEEE. Leak of sensitive information from unstructured text documents is a costly problem both for government and for industrial institutions. Traditional approaches for data leak prevention are commonly based on the hypothesis that sensitive information is reflected in the presence of distinct sensitive words. However, for complex sensitive information, this hypothesis may not hold. Our TABOO system detects complex sensitive information in text documents by learning the semantic and syntactic structure of text documents. Our approach is based on natural language processing methods for paraphrase detection, and uses recursive neural networks to assign sensitivity scores to the semantic components of the sentence structure. The demonstration of TABOO focuses on interactive detection of sensitive information with the TABOO system. Users may work with real documents, alter documents or prepare free text, and subject it to information detection. TABOO allows users to work with our TABOO engine or with traditional approaches, and to compare results. Users may verify that single words can change sensitivity according to context, thereby giving hands-on experience with complex cases of sensitive information.
UR - http://www.scopus.com/inward/record.url?scp=85021226281&partnerID=8YFLogxK
U2 - 10.1109/ICDE.2017.195
DO - 10.1109/ICDE.2017.195
M3 - Article in proceeding
AN - SCOPUS:85021226281
T3 - Proceedings of the International Conference on Data Engineering
SP - 1399
EP - 1400
BT - IEEE 33rd International Conference on Data Engineering (ICDE), 2017
PB - IEEE
Y2 - 19 April 2017 through 22 April 2017
ER -