Artificial Intelligence 11 min read

Galileo: An Open‑Source Scalable Graph Deep Learning Framework for Industrial‑Scale Applications

Galileo is an open‑source, distributed graph deep‑learning framework that supports ultra‑large heterogeneous graphs, dual TensorFlow/PyTorch back‑ends, and a flexible API, enabling fast prototyping of graph neural networks such as HeteSAGE for real‑world recommendation and other AI scenarios.

JD Retail Technology

Jan 24, 2022

Galileo: An Open‑Source Scalable Graph Deep Learning Framework for Industrial‑Scale Applications

Galileo (https://github.com/JDGalileo/galileo) is an open‑source graph deep‑learning framework released by JD Retail, designed to handle ultra‑large heterogeneous graphs with high performance and easy extensibility. It supports both TensorFlow and PyTorch back‑ends, distributed training, and provides graph embedding and GNN capabilities.

The framework’s architecture consists of three layers: a distributed graph engine with compact in‑memory structures and ZeroCopy for low‑memory graph storage and fast sampling; a multi‑backend distributed training layer that abstracts TensorFlow and PyTorch, offering configurable single‑machine and multi‑machine training; and a model layer that decouples data from models, supports message‑passing APIs, and allows direct Python access to training back‑ends.

Quick start instructions include three installation options: pip/conda packages, source compilation, or the recommended Docker image (jdgalileo/galileo). After launching the container with docker run -it --rm jdgalileo/galileo:latest bash, users can run example scripts such as the Node2vec model via bash examples/start_zk.sh and python3 examples/tf/node2vec/simple.py.

Galileo’s runtime workflow loads graph data (converted to a binary format by galileo_convertor), starts the RPC graph engine, registers services with Zookeeper, and lets TensorFlow/PyTorch datasets query and sample the graph for mini‑batch training. The system supports multi‑threaded prefetching and outputs model checkpoints or embeddings.

For recommendation scenarios, the article presents a heterogeneous graph model called HeteSAGE, which encodes multiple meta‑paths and both sparse and dense features, aggregates them through multi‑layer graph convolutions, and fuses the results with self‑attention to produce vertex embeddings. Pseudo‑code for HeteSAGE is illustrated, detailing vertex type mapping, neighbor sampling, feature encoding, graph convolution, and attention‑based fusion.

Future plans for Galileo include building a unified platform that supports graph creation, storage, query, computation, and usage; adding real‑time and dynamic graph processing; and continuously integrating cutting‑edge GNN models while exploring new hardware accelerators.

The Galileo team belongs to JD Retail’s Data Intelligence R&D department, comprising experienced architects and engineers from leading internet companies and top universities, and they welcome contributions and talent (contact ).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Graph Neural Networks distributed training AI Framework Galileo heterogeneous graphs Scalable Graph Learning

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.