How Dual‑Granularity Prompting Boosts Graph‑Enhanced LLMs for Fraud Detection

The article analyzes the Dual Granularity Prompting (DGP) framework, which mitigates information overload in graph‑enhanced large language models for fraud detection by applying fine‑grained processing to target nodes and coarse‑grained summarization to neighbors, achieving superior accuracy and token efficiency across multiple public and industrial datasets.

Data Party THU
Data Party THU
Data Party THU
How Dual‑Granularity Prompting Boosts Graph‑Enhanced LLMs for Fraud Detection

Background

Fraud detection in e‑commerce, social networks and finance requires reasoning over multi‑hop graphs with long textual attributes. Graph Neural Networks (GNNs) capture topology but ignore deep text semantics, while pure Large Language Models (LLMs) ingest all neighbor texts, causing token explosion and signal dilution.

Limitations of Existing Graph‑to‑Prompt Methods

Vectorized encoding : neighbors are compressed into fixed‑size vectors before feeding the LLM, which limits prompt length but discards semantic detail.

Plain‑text concatenation : concatenating all neighbor texts preserves semantics but quickly exceeds token budgets (e.g., two‑hop neighborhoods can reach millions of tokens), drowning the target node’s signal.

Dual‑Granularity Prompting (DGP)

DGP introduces differentiated granularity for the target node and its neighbors.

Target node : retain fine‑grained text to preserve core semantics.

Neighbor nodes : compress to coarse‑grained representations via a two‑layer semantic summarization, statistical aggregation for numeric features, and diffusion‑based meta‑path pruning (Markov Diffusion Kernel) to filter irrelevant neighbors.

Processing Pipeline

Textual neighbors → dual‑layer semantic summary : first summarize each node’s text, then aggregate summaries along meta‑paths.

Numeric neighbors → statistical aggregation : compute mean, distribution statistics and pass only these aggregates.

Neighbor pruning → diffusion‑based meta‑path pruning using a Markov Diffusion Kernel to retain structurally and semantically related neighbors.

Experimental Evaluation

Benchmarks on public datasets (Yelp, Amazon Video Reviews) and industrial datasets (E‑Commerce, LifeService) show that DGP consistently outperforms state‑of‑the‑art GNN and LLM baselines. Improvements reach up to 6.8 % absolute AUPRC. DGP also maintains strong performance with a token budget as low as 10 tokens, demonstrating an effective performance‑cost trade‑off.

Main Contributions

Introduced a dual‑granularity prompting framework that mitigates information overload in graph‑enhanced LLMs.

Designed fine‑grained text retention for target nodes and coarse‑grained semantic compression plus statistical aggregation for neighbors.

Provided extensive empirical validation across multiple datasets, showing superior accuracy and robustness.

Demonstrated extensibility toward future Graph Foundation Models (GFM).

Reference

Paper: https://arxiv.org/abs/2507.21653

Code example

来源:专知
本文
约1500字
,建议阅读
5
分钟
DGP通过在目标节点与邻居节点之间采用差异化的粒度控制,缓解了图增强大模型在欺诈检测中存在的信息过载问题。
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

fraud detectionlarge language modelsgraph neural networksdual granularity promptinggraph foundation model
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.