NetShare: An End-to-End System for GAN-Based IP Header Trace Packet Generation
This article presents NetShare, an end-to-end framework that uses time‑series GANs combined with domain‑specific encoding to synthesize privacy‑preserving IP header and flow traces, achieving up to 46% higher accuracy than prior generative baselines while improving the fidelity‑privacy trade‑off.
Preface
Packet‑ and flow‑level IP header tracing is crucial for many network‑management workflows, such as telemetry, anomaly detection, and benchmark testing. Direct access to real traces is often blocked by privacy constraints, so researchers resort to synthetic traces.
Research Background
Existing synthetic‑trace approaches fall into three categories: simulation‑driven, model‑driven, and machine‑learning‑driven. Simulation and model‑driven methods require extensive domain knowledge and manual parameter tuning and do not generalize well across applications. Pure ML methods generalize better but fail to capture domain‑specific attributes of network headers.
NetShare Design
1. Time‑Series Formulation – NetShare reframes header‑trace generation as a time‑series generation problem. For PCAP data each sequence element (packet) contains a timestamp, packet size, and IP header fields; for NetFlow each element contains flow start time, duration, packet/byte counts, and protocol type. A time‑series GAN models these sequences.
2. Domain Knowledge + ML Encoding – Instead of training the GAN on raw headers, NetShare transforms fields into GAN‑friendly representations. Numeric fields (e.g., packet/byte counts) undergo logarithmic scaling. Categorical fields such as IP addresses are bit‑wise encoded, while ports and protocols are embedded with IP2Vec. A qualitative analysis table (shown in the paper) compares these embeddings in terms of fidelity, scalability, and privacy.
3. Fine‑Tuning and Parallel Training – NetShare adopts a “hot‑start” strategy: the first GAN block is trained as a seed model, and subsequent blocks are fine‑tuned from this seed, enabling parallel training across blocks. Each flow header receives a “flowtag” – a 0‑1 flag indicating whether the flow originates in the current block, followed by a binary vector whose length equals the total number of blocks, marking the presence of the flow in each block.
4. Public‑Dataset Pre‑Training for Privacy‑Fidelity Balance – By pre‑training NetShare on relevant public datasets, the number of DP‑SGD (differential‑privacy stochastic gradient descent) rounds needed to reach a target fidelity on private data is reduced. After pre‑training, the model is fine‑tuned on the private dataset with DP‑SGD, further lowering the required iteration count.
System Evaluation
1. Accuracy Improvement – Across all distribution metrics, NetShare achieves a 46% higher accuracy than baseline generative methods. Baseline models cannot generate multiple packets for the same flow because they treat each packet as an independent table record without timestamps.
2. Downstream Task Preservation – NetShare maintains algorithmic accuracy and ordering for downstream tasks. Log‑scaled training enables the model to learn the wide value range of fields, and IP2Vec embeddings allow accurate capture of port‑number distributions.
3. Scalability‑Fidelity Trade‑off – Compared with existing GAN‑based frameworks, NetShare delivers a superior balance between scalability and fidelity, as demonstrated by the experimental results.
4. Private Trace Quality – NetShare produces higher‑quality, differentiated private traces than baseline methods, confirming its effectiveness for privacy‑preserving network‑trace synthesis.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Network Intelligence Research Center (NIRC)
NIRC is based on the National Key Laboratory of Network and Switching Technology at Beijing University of Posts and Telecommunications. It has built a technology matrix across four AI domains—intelligent cloud networking, natural language processing, computer vision, and machine learning systems—dedicated to solving real‑world problems, creating top‑tier systems, publishing high‑impact papers, and contributing significantly to the rapid advancement of China's network technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
