Privacy and Reliability in Big Data Collaboration: Trusted Execution Environments and Blockchain Coordination
This article presents a technical overview of the security challenges in multi‑party big‑data collaboration and explains how Trusted Execution Environments (TEE) and blockchain can be combined to protect data privacy, ensure computation integrity, and enable traceable data usage in distributed systems.
The talk, delivered by Dr. Zhang Tao from Tencent, focuses on the privacy and reliability problems that arise when multiple parties collaborate on big‑data analytics, and compares several mainstream TEE hardware solutions.
Security issues in data collaboration
Traditional multi‑party collaboration often exchanges de‑identified data in cleartext, exposing critical features and violating emerging privacy regulations.
Purely local processing prevents comprehensive analysis, e.g., insurance risk assessment requires both financial and health data.
Cryptographic approaches such as MPC, federated learning, and homomorphic encryption provide privacy but suffer from high performance overhead and complex protocol design.
Outsourcing compute to external platforms also raises privacy concerns.
Trusted Execution Environment (TEE) overview
TEE offers a practical trade‑off: it protects data confidentiality and execution logic while incurring only a modest 5‑10% performance overhead. Memory isolation and encryption safeguard in‑process data, and remote attestation guarantees that the code has not been tampered with.
TEE hardware options
Intel SGX – mature ecosystem, robust remote attestation (IAS, DCAP), high server‑grade adoption, improved memory and compute resources in SGX2.
ARM TrustZone – runs trusted applications on a Trusted OS, requires trust in firmware, limited remote attestation; Huawei Kunpeng provides a proprietary attestation mechanism.
RISC‑V Keystone/Sanctum – SGX‑like design, comparable security, but not yet commercially mature.
AMD SEV – focuses on VM isolation, easier integration but larger attack surface and known vulnerabilities in remote attestation and I/O.
TEE integration methods
TEE SDK – native SDK usage (SGX, TrustZone); invasive but minimal attack surface.
libOS – library OS abstracts instruction sets, reducing hardware awareness.
Virtualization – WebAssembly, Docker, or VM wrappers; suitable for AMD SEV, improves usability at the cost of a larger attack surface.
Distributed computing on TEE
To enable collaborative computation, nodes must mutually authenticate, establish encrypted channels, and negotiate shared keys. The design adopts a blockchain‑style key‑sharing scheme so that participants do not need to know the specific hardware they are contacting.
For Spark, SGX‑extended instructions replace standard x86 instructions, allowing Drivers and Executors to run inside enclaves. Remote attestation is extended to Executors; mismatched versions abort key sharing.
All input and output data remain encrypted, with field‑level encryption supporting selective data exposure.
Blockchain for data usage tracking
While TEE protects privacy and integrity, it cannot provide immutable audit trails for data usage. Blockchain supplies an untamperable ledger to record data hashes, signatures, and usage metrics, supporting billing, auditing, and accountability.
In the system, the compute cluster runs inside TEEs, whereas the blockchain operates outside, storing only task states and usage records, thus avoiding exposure of sensitive data.
Application in Tencent Cloud DataChain
The described architecture is implemented in Tencent Cloud’s DataChain product, leveraging Intel SGX and ARM TrustZone as enclave hardware and supporting Hyperledger Fabric, FISCO BCOS, or Chang’an Chain as the blockchain layer. Distributed workloads such as TensorFlow and Spark are orchestrated via Kubernetes, with key management and synchronization across nodes.
Typical use cases include government data sharing, financial credit scoring, and risk control, where privacy‑preserving, trustworthy, and auditable collaboration is essential.
Thank you for listening.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.