How Private Set Operations Secure Data Collaboration in the Big Data Era
Private Set Operations (PSO) enable multiple parties to perform set intersections, unions, and related computations on encrypted data, preserving privacy through cryptographic techniques such as public‑key encryption, oblivious transfer, and garbled circuits, and are applied across advertising, finance, healthcare, and government for secure data collaboration.
Abstract
Private Set Operations (PSO) are key technologies for protecting user privacy in the big‑data era, supporting multi‑party set computations such as intersection, intersection‑cardinality, and union through public‑key cryptography, garbled circuits, and oblivious transfer, each offering different trade‑offs in efficiency, communication, and generality. Alibaba’s Secure Data Hub (SDH) has integrated multiple PSO capabilities into advertising services.
1. Introduction
1.1 Background
With the rapid growth of data generated by user clicks, stays, shares and purchases, enterprises gain fine‑grained operational insights while traditional marketing models are reshaped. Digital advertising’s growth relies on precise audience targeting, golden‑time slots and scenario matching, involving three core participants:
Advertisers who own CRM or app user data
Publishers/Ad networks with large audience reach
Third‑party data providers responsible for measurement and anti‑fraud
1.2 What is PSO?
Private Set Operations (PSO) allow multiple parties to jointly perform set operations while keeping each party’s input set private, commonly used for privacy‑preserving database queries. According to the number of participants, PSO can be two‑party or multi‑party. In two‑party scenarios, a receiver R obtains the protocol output while the sender S provides its set.
PSO protocols are classified into three functional categories:
1) Private Set Intersection (PSI) : computes the intersection of two sets.
2) Private Computation on Set Intersection (PCSI) : performs fine‑grained private computation on intersection elements.
3) Private Set Union (PSU) : computes the union of two sets.
2. Private Set Intersection (PSI)
2.1 Basics
In a two‑party PSI protocol, the receiver and sender hold sets X and Y respectively, execute the protocol, and the receiver obtains the intersection without learning any other information about the sender’s set. PSI enables privacy‑preserving inner joins and is widely used in advertising, insurance, and healthcare.
2.2 Technical Development
Based on Public‑Key Cryptography Research on public‑key‑based PSI dates back to 1986 when Meadows proposed a Diffie‑Hellman two‑party protocol. In 2004 Freedman et al. introduced homomorphic encryption and polynomial fitting, providing the first provably secure PSI scheme. Subsequent work improved efficiency, but high computational cost limited large‑scale deployment. In 2022 Vos et al. presented a high‑performance ECC‑based multi‑party PSI protocol, achieving the best efficiency for massive datasets.
Based on Garbled Circuits Garbled circuits (GC) transform any function into a Boolean circuit, suitable for “multi‑functional” PSI (intersection, union, threshold, cardinality). Since Huang et al. introduced the first semi‑honest GC‑PSI in 2012, research has focused on reducing circuit size and depth. Pinkas (2019) leveraged programmable OPRF to achieve linear communication complexity, but circuit design complexity still hampers performance.
Based on Oblivious Transfer (OT) Oblivious Transfer is a fundamental MPC primitive. The receiver selects a bit, the sender provides two elements, and after the protocol the receiver learns one element while the sender learns nothing. Ishai et al. (2003) proposed OT extension, greatly improving performance. In 2013 Dong introduced a Bloom‑filter‑based PSI using OT, and Kolesnikov (2016) combined PRF with OT extension for efficient PSI. Compared with public‑key and GC approaches, OT balances computation and communication overhead.
2.3 Implementation Example
Alibaba’s Secure Data Hub implements an ECC‑based Diffie‑Hellman PSI protocol. The protocol steps are:
Receiver R and sender S agree on an elliptic‑curve group and hash‑to‑curve algorithm, then generate private keys.
Both map their set elements to curve points via hash‑to‑curve.
Each encrypts its points with its private key (scalar multiplication).
R sends its encrypted points to S.
S encrypts the received points again with its private key and sends them back to R.
R performs a second encryption on the returned points with its private key.
R compares the doubly‑encrypted points with its own encrypted set to derive the intersection and outputs the result.
3. Other PSO Techniques
3.1 Private Set Intersection Computation (PCSI)
PCSI extends PSI by allowing private computation on the intersection, such as cardinality (PSI‑Card) or sum (PSI‑Card‑Sum). In a two‑party PCSI protocol, the parties agree on a target function, input their sets, and obtain the function result while learning nothing else about each other’s sets.
PSI‑Card outputs only the size of the intersection, useful for ad deduplication and overlap measurement. PSI‑Card‑Sum additionally sums numeric labels attached to intersecting elements, supporting ROI statistics and joint credit risk calculations.
Relevant work includes Vaidya (2005) on PSI‑Card, Hohenberger (2006) on sub‑quadratic complexity, Miao (2020) on PSI‑Sum with homomorphic encryption, and Garimella (2021) on a PSO framework supporting arbitrary functions.
3.2 Private Set Union (PSU)
PSU enables parties to compute the union of their sets without revealing individual elements. Compared to PSI, PSU implements a full join for privacy‑preserving databases. Early PSU schemes relied on homomorphic encryption (Kissner 2005) with high overhead. Later works introduced Bloom filters and Paillier encryption (Davidson 2017) to achieve linear complexity, and OT‑based approaches (Kolesnikov 2019) that significantly improve efficiency. Recent OT‑based linear‑complexity protocols further advance PSU performance.
3.3 Construction Method
Chen et al. (2024) proposed a unified construction framework that integrates multiple PSO protocols using a multi‑query Reverse Private Membership Test (mq‑RPMT) combined with OT. The framework supports PSI‑Card, PSI‑Card‑Sum, and PSU by interpreting the RPMT output vector: counting “1” bits yields PSI‑Card, summing elements corresponding to “1” yields PSI‑Card‑Sum, and extracting “0” positions yields PSU.
4. Application Scenarios
Financial risk control : combine intersection with differential privacy to share fraud blacklists between banks and e‑commerce platforms.
Healthcare : enable hospitals and research institutes to perform case data attribution and genomic association analysis while preserving patient privacy.
Digital advertising : use intersection deduplication and frequency control for precise audience matching and cold‑start optimization.
Government services : support cross‑department identity verification and risk prevention via intersection and data differencing.
5. Conclusion
PSI: ECC‑based implementation supporting Curve25519 and FourQ curves.
PCSI: ECDH‑PSI with permutation to hide element positions while counting intersections.
PSU: Proprietary OPRF‑based order‑preserving encryption enabling secure multi‑party union.
These capabilities are integrated in SDH for audience retargeting, cross‑domain consumer insights, cross‑domain measurement, and frequency control, providing advertisers with secure, consistent data‑driven decision making.
Future directions include lightweight protocol design, hardware acceleration, and integration with federated learning, which will further unlock the potential of PSO in AI and beyond.
6. References
Garimella G, Mohassel P, Rosulek M, et al. Private set operations from oblivious switching. IACR International Conference on Public‑Key Cryptography, 2021.
Freedman M J, Nissim K, Pinkas B. Efficient private matching and set intersection. International conference on the theory and applications of cryptographic techniques, 2004.
Vaidya J, Clifton C. Secure set intersection cardinality with application to association rule mining. Journal of Computer Security, 2005.
Kissner L, Song D. Privacy‑preserving set operations. Advances in Cryptology‑CRYPTO 2005.
Chen Y, Zhang M, Zhang C, et al. Private set operations from multi‑query reverse private membership test. IACR International Conference on Public‑Key Cryptography, 2024.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
