How to Evaluate and Design Generative AI Assistants: Frameworks and Real‑World Cases
This article outlines practical methods for evaluating generative AI assistants, covering cost‑benefit metrics, existing evaluation models, three‑dimensional experience criteria, common assessment techniques, and two detailed business case studies that illustrate design decisions and implementation strategies.
In the first part of the Generative AI Assistant Design Guide we discussed key design elements and principles; this second part focuses on experience evaluation approaches and presents two business cases.
Measuring Cost and Benefit
Users invest time and money to obtain helpful answers from an AI assistant. Costs include understandability, guidance, usability, input effort, error tolerance, and efficiency; benefits include answer usefulness and satisfaction.
Borrowing Existing Evaluation Models
Established models such as Google’s HEART, GSM, and Alipay’s PTECH can be adapted, each with different focus areas.
Evaluating from Three Experience Dimensions
Assess product performance (usability, consistency, accessibility, answer generation speed, regeneration ratio, helpfulness), user behavior (frequency, duration, retention, like/dislike ratio), and user perception (net promoter score, satisfaction, delight).
Key Experience Indicators
Typical priority metrics are satisfaction, net promoter score, and effort, representing design, commercial, and product value respectively.
Common Evaluation Methods
Use backend data analysis, questionnaire surveys, and optionally expert scoring.
Business Case 1: CRM AI Assistant
The Copilot AI assistant in an easy‑sale CRM helps salespeople improve efficiency and supports onboarding. It appears as a draggable floating window with a conversational interface that enables performance queries, product knowledge learning, and sales script practice.
Feature Recommendation Guidance
Early exposure of recommended functions builds user awareness; role‑based quick‑command prioritization provides personalized recommendations.
Recommendation Strategy Optimization
Roles are classified and quick commands are prioritized per role, achieving a “one‑size‑fits‑one” experience.
Recommendation Scenarios
Before asking, a welcome card suggests likely queries; after asking, a "you might ask" panel offers quick commands directly in the input field.
Command Center
A dedicated command center page with tabbed categories organizes all recommended functions without disrupting the main sales view.
Business Case 2: AI‑Powered Formula Editor
In a commission‑calculation system, users build templates that combine data items, value sources, and calculation methods (fill‑in, import, compute).
Formula Generation, Explanation, and Function Explanation
AI can generate formulas from natural language, explain existing formulas, and clarify function usage, addressing the complexity and error‑prone nature of manual formula creation.
Design of the Editing Flow
The interface uses a single‑turn interaction rather than continuous dialogue to reduce time consumption for high‑frequency B‑side tasks.
Encouraging User‑Provided Fields
Users can @‑select required fields, which the AI then respects during matching.
Pre‑recognition and Confirmation of Intent
After input, AI optimizes the text, pre‑fills needed fields, and refines the query to improve answer accuracy.
Future Outlook
AI assistants will evolve toward multimodal interaction, emotional intelligence, proactive assistance, and deeper contextual understanding, becoming seamless, personalized partners in daily workflows.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
