Semi-Private Computation of Data Similarity with Applications to Data Valuation and Pricing

René Bødker Christensen; Shashi Raj Pandey; Petar Popovski

doi:10.1109/TIFS.2023.3259879

Semi-Private Computation of Data Similarity with Applications to Data Valuation and Pricing

René Bødker Christensen^*, Shashi Raj Pandey, Petar Popovski

^*Corresponding author for this work

Research output: Contribution to journal › Journal article › Research › peer-review

Abstract

Consider two data providers that want to contribute data to a certain learning model. Recent works have shown that the value of the data of one of the providers is dependent on the similarity with the data owned by the other provider. It would thus be beneficial if the two providers can calculate the similarity of their data, while keeping the actual data private. In this work, we devise multiparty computation-protocols to compute similarity of two data sets based on correlation, while offering controllable privacy guarantees. We consider a simple model with two participating providers and develop methods to compute exact and approximate correlation, respectively, with controlled information leakage. Both protocols have computational and communication complexities that are linear in the number of data samples. We also provide general bounds on the maximal error in the approximation case, and analyse the resulting errors for practical parameter choices.

Original language	English
Journal	I E E E Transactions on Information Forensics and Security
Volume	18
Pages (from-to)	1978-1988
Number of pages	11
ISSN	1556-6013
DOIs	https://doi.org/10.1109/TIFS.2023.3259879
Publication status	Published - 5 Apr 2023

Keywords

data similarity
information leakage
multiparty computation
sample correlation
secure protocols

Access to Document

10.1109/TIFS.2023.3259879

https://arxiv.org/pdf/2206.06650Licence: Other

AUB Link

Search for the material in Aalborg University Library's search engine

Cite this

@article{b916888bd35f438da1101adf8f3facb6,

title = "Semi-Private Computation of Data Similarity with Applications to Data Valuation and Pricing",

abstract = "Consider two data providers that want to contribute data to a certain learning model. Recent works have shown that the value of the data of one of the providers is dependent on the similarity with the data owned by the other provider. It would thus be beneficial if the two providers can calculate the similarity of their data, while keeping the actual data private. In this work, we devise multiparty computation-protocols to compute similarity of two data sets based on correlation, while offering controllable privacy guarantees. We consider a simple model with two participating providers and develop methods to compute exact and approximate correlation, respectively, with controlled information leakage. Both protocols have computational and communication complexities that are linear in the number of data samples. We also provide general bounds on the maximal error in the approximation case, and analyse the resulting errors for practical parameter choices.",

keywords = "data similarity, information leakage, multiparty computation, sample correlation, secure protocols",

author = "Christensen, {Ren{\'e} B{\o}dker} and Pandey, {Shashi Raj} and Petar Popovski",

year = "2023",

month = apr,

day = "5",

doi = "10.1109/TIFS.2023.3259879",

language = "English",

volume = "18",

pages = "1978--1988",

journal = "I E E E Transactions on Information Forensics and Security",

issn = "1556-6013",

publisher = "IEEE",

}

TY - JOUR

T1 - Semi-Private Computation of Data Similarity with Applications to Data Valuation and Pricing

AU - Christensen, René Bødker

AU - Pandey, Shashi Raj

AU - Popovski, Petar

PY - 2023/4/5

Y1 - 2023/4/5

N2 - Consider two data providers that want to contribute data to a certain learning model. Recent works have shown that the value of the data of one of the providers is dependent on the similarity with the data owned by the other provider. It would thus be beneficial if the two providers can calculate the similarity of their data, while keeping the actual data private. In this work, we devise multiparty computation-protocols to compute similarity of two data sets based on correlation, while offering controllable privacy guarantees. We consider a simple model with two participating providers and develop methods to compute exact and approximate correlation, respectively, with controlled information leakage. Both protocols have computational and communication complexities that are linear in the number of data samples. We also provide general bounds on the maximal error in the approximation case, and analyse the resulting errors for practical parameter choices.

AB - Consider two data providers that want to contribute data to a certain learning model. Recent works have shown that the value of the data of one of the providers is dependent on the similarity with the data owned by the other provider. It would thus be beneficial if the two providers can calculate the similarity of their data, while keeping the actual data private. In this work, we devise multiparty computation-protocols to compute similarity of two data sets based on correlation, while offering controllable privacy guarantees. We consider a simple model with two participating providers and develop methods to compute exact and approximate correlation, respectively, with controlled information leakage. Both protocols have computational and communication complexities that are linear in the number of data samples. We also provide general bounds on the maximal error in the approximation case, and analyse the resulting errors for practical parameter choices.

KW - data similarity

KW - information leakage

KW - multiparty computation

KW - sample correlation

KW - secure protocols

UR - http://www.scopus.com/inward/record.url?scp=85151504650&partnerID=8YFLogxK

U2 - 10.1109/TIFS.2023.3259879

DO - 10.1109/TIFS.2023.3259879

M3 - Journal article

SN - 1556-6013

VL - 18

SP - 1978

EP - 1988

JO - I E E E Transactions on Information Forensics and Security

JF - I E E E Transactions on Information Forensics and Security

ER -

Semi-Private Computation of Data Similarity with Applications to Data Valuation and Pricing

Abstract

Keywords

Access to Document

AUB Link

Other files and links

Fingerprint

Cite this