Uncovering 16 Limits of AI Search Engines and 16 Design Recommendations
A user study with 21 participants reveals sixteen critical limitations of generative AI search engines, maps them to eight quantitative metrics, proposes sixteen design recommendations, and evaluates You.com, Perplexity and BingChat against this framework to highlight current performance gaps.
Background
Generative search engines that use large language models (LLMs) are replacing traditional keyword‑based search. A user study with 21 participants compared AI‑augmented search to conventional search, identifying 16 limitations across answer text, citation, sources, and user interface.
Evaluation Framework (AEE)
The AI Search Engine Evaluation (AEE) framework defines eight quantitative metrics: One‑Sided Answer, Overconfident Answer, Relevant Statements, Unsupported Statements, Citation Accuracy, Citation Thoroughness, Source Necessity, and Uncited Sources. Automated evaluation was applied to three popular engines—You.com, Perplexity.ai, and BingChat.
Identified Limitations
Answer Text
Insufficient objective detail – all participants noted shallow answers.
Lack of diverse viewpoints – many answers were biased.
Over‑confident language – statements were presented with unwarranted certainty.
Over‑simplified writing – limited creativity and critical reasoning.
Citation
Misattribution and misunderstanding of sources .
Context‑driven selective information – models cherry‑pick data.
Missing citations for key statements .
Opaque source selection – lack of transparency in ranking.
Sources
Low‑frequency source usage – few sources cited.
More retrieved than used sources – mismatch between retrieved set and those actually used.
Distrust of source types .
Redundant source content – duplicate information across sources.
User Interface
Missing source‑filtering controls .
Limited human input in generation .
Extra effort required to verify answers .
Non‑standard citation format .
Design Recommendations
Answer Text
Provide balanced answers that avoid reinforcing user bias.
Include objective details such as data and statistics.
Eliminate irrelevant filler; keep every sentence on‑topic.
Make source selection transparent to enhance trust.
Citation
Ensure every statement has a proper supporting reference.
Cross‑check citation accuracy against external sources.
Reference all relevant sources for multi‑point statements.
Match the number of listed sources to those actually used.
Sources
Prioritize expert and authoritative sources.
Retrieve and use only necessary sources for each answer.
Distinguish model‑generated content from source‑derived content.
Explicitly evaluate source types for credibility.
User Interface
Incorporate human feedback on both sources and generated text.
Implement interactive citations (e.g., hover pop‑ups).
Provide paragraph‑level local citations indicating exact provenance.
Avoid forced answers when information is insufficient.
Quantitative Evaluation of Three Engines
Using the eight AEE metrics, the study measured performance of You.com, Perplexity.ai, and BingChat.
One‑Sided Answer : All engines frequently produce one‑sided answers (50‑80%); Perplexity performs worst.
Overconfident Answer : Perplexity shows the highest rate of overconfident responses on debate questions.
Relevant Statements : Similar rates across engines (≈75‑82%).
Unsupported Statements : A sizable portion of statements lack supporting citations.
Citation Accuracy : All engines struggle to correctly cite sources.
Citation Thoroughness : No engine cites all possible accurate sources.
Source Necessity : Engines often list more sources than needed.
Uncited Sources : You.com ensures most listed sources are used; BingChat has the highest proportion of uncited sources.
Overall, no engine excels across most metrics, indicating substantial room for improvement in handling hallucinations, unsupported statements, and citation fidelity. You.com shows modest advantages in confidence handling and source presentation, while Perplexity scores lowest due to overconfidence and citation issues. BingChat falls in the middle, listing many sources without consistent coverage improvement.
Eight Quantitative Metrics (AEE)
One‑Sided Answer
Overconfident Answer
Relevant Statements
Unsupported Statements
Citation Accuracy
Citation Thoroughness
Source Necessity
Uncited Sources
Reference
https://arxiv.org/pdf/2410.22349Search Engines in an AI Era: The False Promise of Factual and Verifiable Source‑Cited Responses
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baobao Algorithm Notes
Author of the BaiMian large model, offering technology and industry insights.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
