Artificial Intelligence 16 min read

ScaleOT: Privacy‑Utility‑Scalable Offsite‑Tuning with Dynamic LayerReplace and Selective Rank Compression

The ScaleOT framework introduces a privacy‑preserving offsite‑tuning pipeline for large language models that combines importance‑aware dynamic layer replacement with selective rank compression, enabling flexible model compression, near‑lossless fine‑tuning, and strong privacy guarantees across diverse downstream tasks.

AntTech

Mar 1, 2025

ScaleOT: Privacy‑Utility‑Scalable Offsite‑Tuning with Dynamic LayerReplace and Selective Rank Compression

Fine‑tuning large language models (LLMs) for downstream tasks typically requires either uploading private data to the model owner or sharing model weights with the data owner, both of which expose sensitive information and increase attack surfaces. To address this, the authors propose ScaleOT, a cross‑domain offsite‑tuning framework that protects both data and model privacy.

ScaleOT consists of two stages. In the first stage, an importance‑aware dynamic layer‑replace algorithm (Dynamic LayerReplace) estimates the importance of each LLM layer using reinforcement learning and trains lightweight coordinators to replace less important layers. The second stage generates a compressed simulator (emulator) by applying selective rank compression (SRC) to the remaining layers, particularly the multi‑head self‑attention (MHSA) components, while preserving the rest of the model.

Dynamic LayerReplace samples a subset of layers based on learned importance scores, groups layers to avoid instability, and replaces roughly half of the layers in each group with coordinators. Importance scores are updated jointly with coordinator parameters using a combination of deep‑learning gradient updates and reinforcement‑learning rewards.

Selective Rank Compression exploits the redundancy in LLM parameters by performing low‑rank approximation only on MHSA weights, dramatically reducing the simulator’s capacity for privacy protection while keeping the performance drop minimal. The compression ratio can be tuned to balance privacy and downstream performance.

Extensive experiments on medium‑size models (GPT‑2‑XL, OPT‑1.3B) and larger models (OPT‑6.7B, LLaMA‑7B) demonstrate that ScaleOT achieves near‑full‑model fine‑tuning performance, outperforms baseline offsite‑tuning methods, and substantially improves privacy. Incorporating SRC further reduces simulator performance with only a small impact on the final plugged‑in model. The framework also integrates seamlessly with parameter‑efficient fine‑tuning techniques such as Adapter‑tuning and LoRA, reducing trainable parameters by over 90% while maintaining strong downstream results.

Overall, ScaleOT provides a scalable, flexible, and privacy‑preserving solution for LLM fine‑tuning, and the work has been accepted to AAAI 2025.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

LLM model compression Adapter offsite tuning

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.