Artificial Intelligence 10 min read

How TensorNet Supercharges Sparse Feature Training on TensorFlow

TensorNet is a TensorFlow‑based distributed training framework optimized for massive sparse‑feature models in advertising and recommendation, dramatically reducing parameter sync overhead, enabling near‑infinite feature dimensions, cutting training time from hours to minutes, and boosting inference performance by up to 35%.

360 Zhihui Cloud Developer

Sep 16, 2020

How TensorNet Supercharges Sparse Feature Training on TensorFlow

What is TensorNet?

TensorNet is a distributed training framework built on TensorFlow, optimized for large‑scale sparse‑feature scenarios such as advertising recommendation. Its goal is to let TensorFlow users quickly train models with billions of sparse parameters.

Challenges of Training Large Sparse Feature Models

In advertising, search, and recommendation, deep models contain massive high‑dimensional discrete sparse features, leading to two main problems:

Huge training data (e.g., over 100 TB for 360 ad scenario).

Enormous model parameters (e.g., over 100 billion parameters).

Single‑machine training is slow; distributed training has become the industry standard.

Problems Using TensorFlow for Sparse Feature Models

TensorFlow, while popular, is not friendly to large sparse models because:

Supported feature dimension is limited by memory.

Distributed training synchronizes all parameters, causing huge communication overhead for sparse models.

TensorNet Overview

TensorNet reuses all TensorFlow capabilities while adding specific support for massive sparse features.

Key improvements:

Enables near‑infinite sparse feature dimensions.

Reduces synchronized parameter size to one‑ten‑thousandth (or one‑hundred‑thousandth) of the original, cutting training time from 3.5 hours to 25 minutes in a real 360 ad workload.

When combined with split‑graph inference, yields about 35 % online performance gain.

TensorNet Distributed Training Architecture

Supports both asynchronous and synchronous modes.

Asynchronous Architecture

In CPU‑only clusters, TensorNet uses separate parameter servers for sparse and dense parameters, embeds a sparse parameter server within each worker, distributes sparse parameters via a distributed hash table, and merges dense parameters into a single distributed array, reducing network requests.

TensorNet asynchronous training architecture

Synchronous Architecture

Similar to TensorFlow’s MultiWorkerMirroredStrategy, but with a dedicated sparse parameter server and synchronization only for the sparse features present in the current batch, reducing communication to one‑ten‑thousandth (or one‑hundred‑thousandth) of the original.

TensorNet synchronous training architecture

Core Optimizations

The main optimization is minimizing the embedding tensor size. Instead of a gigantic embedding matrix covering all possible IDs, TensorNet builds a small embedding matrix sized to the batch, using a virtual sparse feature to map IDs to indices. embedding_lookup During training, the workflow is:

Define the embedding matrix dimension as the maximum number of unique IDs in a batch.

Collect all IDs in the current batch.

Sort IDs and assign a continuous index (virtual sparse feature).

Fetch embedding vectors from the parameter server and place them into the batch‑sized matrix.

Use the virtual sparse feature as model input.

Inference Optimization

TensorNet changes only the first layer, so inference remains simple. The model is split into an offline training part (embedding_lookup_graph) and an online inference part (inference_graph) that consumes a pre‑exported sparse embedding dictionary.

Using split‑graph together with XLA AOT can improve online performance by about 35 %.

Open Source and Getting Started

TensorNet is open‑source and has been deployed in 360’s ad CTR prediction pipelines with significant results. The code, documentation, and tutorials are available at:

GitHub repository: https://github.com/Qihoo360/TensorNet

Quick start tutorial: https://github.com/Qihoo360/TensorNet/doc/tutorial/01-begin-with-wide-deep.ipynb

Additional docs: https://github.com/Qihoo360/TensorNet/README.md

Contact: Zhang Yansheng ([email protected]), Yao Lei ([email protected]).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI TensorFlow Recommendation Systems sparse features TensorNet

Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.