Interview with Huawei Noah's Ark Lab Senior Researcher Zhou Min on Graph Machine Learning: Research, Deployment, Challenges, and Trends
In this DataFun interview, Huawei Noah's Ark Lab senior researcher Zhou Min discusses the state of graph machine learning in academia and industry, covering algorithmic foundations, model variants, practical applications, scalability challenges, and future directions for more universal feature extraction across domains.
Introduction – This article is a DataFun interview with Zhou Min, a senior researcher at Huawei Noah’s Ark Lab, exploring graph machine learning research, deployment status, challenges, trends, and differing focuses between academia and industry.
1. Algorithms – Graph data includes homogeneous, heterogeneous, and dynamic graphs; heterogeneous graphs are common in recommendation and search, while dynamic graphs appear in social networks. Direct (e.g., Graph Convolutional Networks) and inductive (e.g., Graph Attention Networks) models each have trade‑offs, with inductive models favored for scalability. Cutting‑edge research includes Graph Transformers with local attention and positional encoding, and equivariant GNNs that capture natural graph properties such as protein spatial structure.
In academic research, unsupervised methods like contrastive learning and large‑scale graph pre‑training are hot topics, especially amid the rise of large models.
Fundamental AI tasks for graphs cover model compression, causal inference, interpretability, robustness, privacy protection, and algorithmic bias correction. Model compression receives modest attention; causal inference is being combined with graph learning; interpretability work is scattered; robustness, bias correction, and privacy (e.g., OOD generalization, federated graph learning) have gained significant interest recently.
2. Applications – Graph machine learning serves basic tasks (node classification, link prediction, graph classification, clustering, information completion) with node classification, link prediction, and graph classification being the most used. Business scenarios include recommendation/search (facing offline‑online performance gaps and scalability), risk control (requiring extensive feature engineering and handling imbalanced anomalies), and emerging fields such as life sciences and physical simulation. Academic datasets are relatively small, whereas industry deals with billions of nodes, raising scalability and computational cost challenges.
Engineering practice relies on frameworks: AWS DGL is popular in industry, while PyTorch Geometric dominates academia; specialized frameworks exist for proteins, molecules, risk control, and heterogeneous graphs.
Large‑scale graph processing faces communication overhead, acceleration, accuracy, and distributed processing challenges, with task‑specific performance metrics.
Theoretical limits of GNNs are tied to the 1‑WL (Weisfeiler‑Lehman) test (e.g., GIN), but engineering tricks can push practical performance. Academia emphasizes generality, while industry optimizes for specific scenarios and resource trade‑offs.
Overall, the biggest challenge of graph machine learning is the diversity of graph data and application scenarios; unlike images, graph attributes vary widely across industries, making a one‑size‑fits‑all solution impossible. Industry focuses on data, sample, and feature construction, whereas academia concentrates on novel model architectures.
Future expectations include extracting more universal features for recommendation, risk control, and bio‑computing to enable knowledge reuse.
Expert Introduction – Zhou Min holds a bachelor’s degree from the University of Science and Technology of China and a Ph.D. from the National University of Singapore in Industrial Systems Engineering. Her research spans sequence and graph data mining, with multiple patents and publications in KDD, ICDE, Automatica, etc.
DataFun Series – "Data Intelligence Expert Interview" is a new series by DataFun that interviews core technical staff from various companies to share insights on industry priorities, hot topics, and difficulties, helping readers deepen their understanding of data‑intelligent technologies.
Community Outreach – DataFun also runs the "Big Talk on Data Intelligence" knowledge hub, offering maps, deep interviews, live streams, and courses for data‑intelligent practitioners. It is currently recruiting volunteers to help organize interview content, offering exposure to leading experts and hands‑on experience with cutting‑edge topics such as big‑data platforms, data governance, knowledge graphs, graph machine learning, privacy computing, and AIGC.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.