Artificial Intelligence 17 min read

Can LLMs Clean Noisy Graphs? Introducing GraphEdit for Robust Structure Learning

GraphEdit leverages large language models and a lightweight edge predictor to remove noisy connections and uncover hidden node dependencies, achieving state‑of‑the‑art performance on benchmark graph datasets such as Cora, Citeseer, and PubMed, while demonstrating strong robustness to noise and limited supervision.

NewBeeNLP

Mar 13, 2024

Can LLMs Clean Noisy Graphs? Introducing GraphEdit for Robust Structure Learning

1. Overview

Graph Structure Learning (GSL) aims to generate new graph structures that capture latent dependencies between nodes. Existing GSL methods rely heavily on explicit graph edges as supervision, making them vulnerable to noise and sparsity. GraphEdit proposes to use large language models (LLMs) to reason about node relationships and to edit graph structures.

2. Method

2.1 Instruction‑tuned LLM

We assume homophily: nodes with similar attributes tend to connect. The LLM is fine‑tuned with two instruction prompts: (1) evaluate label consistency of a node pair, and (2) predict the category of the pair. This provides the model with supervision for both edge existence and type.

2.2 LLM‑based Edge Predictor

A lightweight edge predictor estimates the probability η that an edge exists between two nodes using their LLM‑derived representations h_i and h_j. The predictor is trained with binary cross‑entropy loss against the ground‑truth adjacency.

2.3 LLM‑enhanced Structure Optimization

For each node we select the top‑k candidate edges from the predictor, combine them with the original adjacency, and feed the resulting graph to the LLM via prompts. The LLM decides which edges to add or delete, producing an optimized adjacency matrix that incorporates both learned predictions and LLM reasoning.

3. Experiments

3.1 Overall Performance

GraphEdit outperforms existing GSL baselines on Cora, Citeseer, and PubMed, achieving higher node‑classification accuracy by removing noisy edges and revealing global dependencies.

Observations: (1) many baselines do not consistently beat a vanilla GCN; (2) PubMed shows the largest gain due to richer textual node features.

3.2 Ablation Study

Removing the instruction prompt for category prediction (“‑prompt‑w/o‑ca”) degrades accuracy, showing the importance of joint edge‑existence and type supervision. Variants that only delete edges (“GraphEdit w/o Add”) or only add edges (“GraphEdit w/o Del”) confirm that both addition and deletion are needed for maximal performance. Excluding the GNN encoder (“w/o GNN”) also hurts results, indicating that the downstream GNN still contributes essential structural encoding.

3.3 Effect of Candidate‑Edge Quantity

Increasing the number k of candidate edges improves performance up to a plateau (k≈3 for Cora/Citeseer, k≈4 for PubMed), after which gains stabilize.

3.4 Graph Construction without Original Edges

When the original adjacency is omitted, GraphEdit‑con (constructed solely by the model) still achieves competitive results, even surpassing the raw PubMed graph, demonstrating the model’s ability to infer meaningful structure from text alone.

3.5 Noise Robustness

Under injected random edge noise (5‑25 %), GraphEdit maintains stable accuracy, whereas baselines such as IDGL and WSGNN degrade sharply. Surprisingly, moderate noise sometimes improves PubMed performance by adding useful edges.

3.6 Comparison with Other LLMs

GraphEdit beats ERNIE‑Bot‑turbo, Vicuna‑7B, and BLOOMZ‑7B on denoising Cora and Citeseer, confirming the effectiveness of the instruction‑tuning strategy.

3.7 Visualization

Visual comparisons show that GraphEdit removes inter‑class edges around ambiguous nodes and splits mixed‑class regions, leading to clearer community structures for downstream GCN classification.

4. Conclusion

GraphEdit introduces a novel LLM‑centric framework for graph structure optimization that simultaneously removes noisy connections and uncovers hidden global dependencies, yielding robust improvements across multiple benchmark datasets.

benchmark datasets Noise Robustness Edge Prediction Graph Structure Learning

Written by

NewBeeNLP

Always insightful, always fun

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.