Apr 17, 2026 · Artificial Intelligence

Can LLMs Truly Mimic Human Shopping Behavior? The OPeRA Dataset and Evaluation

The paper introduces OPeRA, a step‑wise online‑shopping dataset capturing observations, personas, rationales, and actions from real users, and uses it to benchmark LLMs on next‑action prediction, revealing that even top models like GPT‑4.1 achieve only about 20 % accuracy on fine‑grained actions, with persona information offering limited benefit while rationales prove crucial.

AIEvaluationLLM

0 likes · 9 min read

Can LLMs Truly Mimic Human Shopping Behavior? The OPeRA Dataset and Evaluation

online shopping

Can LLMs Truly Mimic Human Shopping Behavior? The OPeRA Dataset and Evaluation