NewBeeNLP
Jul 10, 2024 · Artificial Intelligence
Can Large Language Models Master Co‑Temporal Reasoning? Introducing COTEMPQA
This article presents the COTEMPQA benchmark for evaluating large language models on co‑temporal reasoning, details its four scenario types, construction pipeline, experimental results across models, error analysis, and proposes the MR‑COT strategy that leverages mathematical reasoning to significantly improve performance.
LLM evaluationMR-COTbenchmark dataset
0 likes · 11 min read
