Galileo: An Open‑Source Scalable Graph Deep Learning Framework for Industrial‑Scale Applications
Galileo is an open‑source, distributed graph deep‑learning framework that supports ultra‑large heterogeneous graphs, dual TensorFlow/PyTorch back‑ends, and a flexible API, enabling fast prototyping of graph neural networks such as HeteSAGE for real‑world recommendation and other AI scenarios.
Galileo (https://github.com/JDGalileo/galileo) is an open‑source graph deep‑learning framework released by JD Retail, designed to handle ultra‑large heterogeneous graphs with high performance and easy extensibility. It supports both TensorFlow and PyTorch back‑ends, distributed training, and provides graph embedding and GNN capabilities.
The framework’s architecture consists of three layers: a distributed graph engine with compact in‑memory structures and ZeroCopy for low‑memory graph storage and fast sampling; a multi‑backend distributed training layer that abstracts TensorFlow and PyTorch, offering configurable single‑machine and multi‑machine training; and a model layer that decouples data from models, supports message‑passing APIs, and allows direct Python access to training back‑ends.
Quick start instructions include three installation options: pip/conda packages, source compilation, or the recommended Docker image (jdgalileo/galileo). After launching the container with docker run -it --rm jdgalileo/galileo:latest bash , users can run example scripts such as the Node2vec model via bash examples/start_zk.sh and python3 examples/tf/node2vec/simple.py .
Galileo’s runtime workflow loads graph data (converted to a binary format by galileo_convertor ), starts the RPC graph engine, registers services with Zookeeper, and lets TensorFlow/PyTorch datasets query and sample the graph for mini‑batch training. The system supports multi‑threaded prefetching and outputs model checkpoints or embeddings.
For recommendation scenarios, the article presents a heterogeneous graph model called HeteSAGE, which encodes multiple meta‑paths and both sparse and dense features, aggregates them through multi‑layer graph convolutions, and fuses the results with self‑attention to produce vertex embeddings. Pseudo‑code for HeteSAGE is illustrated, detailing vertex type mapping, neighbor sampling, feature encoding, graph convolution, and attention‑based fusion.
Future plans for Galileo include building a unified platform that supports graph creation, storage, query, computation, and usage; adding real‑time and dynamic graph processing; and continuously integrating cutting‑edge GNN models while exploring new hardware accelerators.
The Galileo team belongs to JD Retail’s Data Intelligence R&D department, comprising experienced architects and engineers from leading internet companies and top universities, and they welcome contributions and talent (contact ).
JD Retail Technology
Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.