Embedding differential privacy in decision tree algorithm with different depths

Xuanyu Bai; Jianguo Yao; Mingxuan Yuan; Ke Deng; Xike Xie; Haibing Guan

doi:10.1007/s11432-016-0442-1

Embedding differential privacy in decision tree algorithm with different depths

Xuanyu Bai, Jianguo Yao^*, Mingxuan Yuan, Ke Deng, Xike Xie, Haibing Guan

^*Corresponding author for this work

Research output: Contribution to journal › Journal article › Research › peer-review

10 Citations (Scopus)

Abstract

Differential privacy (DP) has become one of the most important solutions for privacy protection in recent years. Previous studies have shown that prediction accuracy usually increases as more data mining (DM) logic is considered in the DP implementation. However, although one-step DM computation for decision tree (DT) model has been investigated, existing research has not studied the scenarios when the DP is embedded in two-step DM computation, three-step DM computation until the whole model DM computation. It is very challenging to embed DP in more than two steps of DM computation since the solution space exponentially increases with the increase of computational complexity. In this work, we propose algorithms by making use of Markov Chain Monte Carlo (MCMC) method, which can efficiently search a computationally infeasible space to embed DP into DT generation algorithm. We compare the performance when embedding DP in DT with different depths, i.e., one-step DM computation (previous work), two-step, three-step and the whole model. We find that the deep combination of DP and DT does help to increase the prediction accuracy. However, when the privacy budget is very large (e.g., ϵ = 10), this may overwhelm the complexity of DT model, and the increasing trend is not obvious. We also find that the prediction accuracy decreases with the increase of model complexity.

Original language	English
Article number	082104
Journal	Science China Information Sciences
Volume	60
Issue number	8
Number of pages	15
ISSN	1674-733X
DOIs	https://doi.org/10.1007/s11432-016-0442-1
Publication status	Published - 1 Aug 2017

Keywords

decision tree
exhaustive search
exponential mechanism
MCMC
rential privacy

Access to Document

10.1007/s11432-016-0442-1

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{4e5b8d8b15754b7bafec7ffc23034aca,

title = "Embedding differential privacy in decision tree algorithm with different depths",

abstract = "Differential privacy (DP) has become one of the most important solutions for privacy protection in recent years. Previous studies have shown that prediction accuracy usually increases as more data mining (DM) logic is considered in the DP implementation. However, although one-step DM computation for decision tree (DT) model has been investigated, existing research has not studied the scenarios when the DP is embedded in two-step DM computation, three-step DM computation until the whole model DM computation. It is very challenging to embed DP in more than two steps of DM computation since the solution space exponentially increases with the increase of computational complexity. In this work, we propose algorithms by making use of Markov Chain Monte Carlo (MCMC) method, which can efficiently search a computationally infeasible space to embed DP into DT generation algorithm. We compare the performance when embedding DP in DT with different depths, i.e., one-step DM computation (previous work), two-step, three-step and the whole model. We find that the deep combination of DP and DT does help to increase the prediction accuracy. However, when the privacy budget is very large (e.g., ϵ = 10), this may overwhelm the complexity of DT model, and the increasing trend is not obvious. We also find that the prediction accuracy decreases with the increase of model complexity.",

keywords = "decision tree, exhaustive search, exponential mechanism, MCMC, rential privacy",

author = "Xuanyu Bai and Jianguo Yao and Mingxuan Yuan and Ke Deng and Xike Xie and Haibing Guan",

year = "2017",

month = aug,

day = "1",

doi = "10.1007/s11432-016-0442-1",

language = "English",

volume = "60",

journal = "Science China Information Sciences",

issn = "1674-733X",

publisher = "Zhongguo Kexue Zazhishe",

number = "8",

}

TY - JOUR

T1 - Embedding differential privacy in decision tree algorithm with different depths

AU - Bai, Xuanyu

AU - Yao, Jianguo

AU - Yuan, Mingxuan

AU - Deng, Ke

AU - Xie, Xike

AU - Guan, Haibing

PY - 2017/8/1

Y1 - 2017/8/1

N2 - Differential privacy (DP) has become one of the most important solutions for privacy protection in recent years. Previous studies have shown that prediction accuracy usually increases as more data mining (DM) logic is considered in the DP implementation. However, although one-step DM computation for decision tree (DT) model has been investigated, existing research has not studied the scenarios when the DP is embedded in two-step DM computation, three-step DM computation until the whole model DM computation. It is very challenging to embed DP in more than two steps of DM computation since the solution space exponentially increases with the increase of computational complexity. In this work, we propose algorithms by making use of Markov Chain Monte Carlo (MCMC) method, which can efficiently search a computationally infeasible space to embed DP into DT generation algorithm. We compare the performance when embedding DP in DT with different depths, i.e., one-step DM computation (previous work), two-step, three-step and the whole model. We find that the deep combination of DP and DT does help to increase the prediction accuracy. However, when the privacy budget is very large (e.g., ϵ = 10), this may overwhelm the complexity of DT model, and the increasing trend is not obvious. We also find that the prediction accuracy decreases with the increase of model complexity.

AB - Differential privacy (DP) has become one of the most important solutions for privacy protection in recent years. Previous studies have shown that prediction accuracy usually increases as more data mining (DM) logic is considered in the DP implementation. However, although one-step DM computation for decision tree (DT) model has been investigated, existing research has not studied the scenarios when the DP is embedded in two-step DM computation, three-step DM computation until the whole model DM computation. It is very challenging to embed DP in more than two steps of DM computation since the solution space exponentially increases with the increase of computational complexity. In this work, we propose algorithms by making use of Markov Chain Monte Carlo (MCMC) method, which can efficiently search a computationally infeasible space to embed DP into DT generation algorithm. We compare the performance when embedding DP in DT with different depths, i.e., one-step DM computation (previous work), two-step, three-step and the whole model. We find that the deep combination of DP and DT does help to increase the prediction accuracy. However, when the privacy budget is very large (e.g., ϵ = 10), this may overwhelm the complexity of DT model, and the increasing trend is not obvious. We also find that the prediction accuracy decreases with the increase of model complexity.

KW - decision tree

KW - exhaustive search

KW - exponential mechanism

KW - MCMC

KW - rential privacy

UR - http://www.scopus.com/inward/record.url?scp=85022035848&partnerID=8YFLogxK

U2 - 10.1007/s11432-016-0442-1

DO - 10.1007/s11432-016-0442-1

M3 - Journal article

AN - SCOPUS:85022035848

SN - 1674-733X

VL - 60

JO - Science China Information Sciences

JF - Science China Information Sciences

IS - 8

M1 - 082104

ER -

Embedding differential privacy in decision tree algorithm with different depths

Abstract

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this