Product Management 13 min read

How to Evaluate and Design Generative AI Assistants: Frameworks and Real‑World Cases

This article outlines practical methods for evaluating generative AI assistants, covering cost‑benefit metrics, existing evaluation models, three‑dimensional experience criteria, common assessment techniques, and two detailed business case studies that illustrate design decisions and implementation strategies.

58UXD

Jun 27, 2024

How to Evaluate and Design Generative AI Assistants: Frameworks and Real‑World Cases

In the first part of the Generative AI Assistant Design Guide we discussed key design elements and principles; this second part focuses on experience evaluation approaches and presents two business cases.

Measuring Cost and Benefit

Users invest time and money to obtain helpful answers from an AI assistant. Costs include understandability, guidance, usability, input effort, error tolerance, and efficiency; benefits include answer usefulness and satisfaction.

Borrowing Existing Evaluation Models

Established models such as Google’s HEART, GSM, and Alipay’s PTECH can be adapted, each with different focus areas.

Evaluating from Three Experience Dimensions

Assess product performance (usability, consistency, accessibility, answer generation speed, regeneration ratio, helpfulness), user behavior (frequency, duration, retention, like/dislike ratio), and user perception (net promoter score, satisfaction, delight).

Key Experience Indicators

Typical priority metrics are satisfaction, net promoter score, and effort, representing design, commercial, and product value respectively.

Common Evaluation Methods

Use backend data analysis, questionnaire surveys, and optionally expert scoring.

Business Case 1: CRM AI Assistant

The Copilot AI assistant in an easy‑sale CRM helps salespeople improve efficiency and supports onboarding. It appears as a draggable floating window with a conversational interface that enables performance queries, product knowledge learning, and sales script practice.

Feature Recommendation Guidance

Early exposure of recommended functions builds user awareness; role‑based quick‑command prioritization provides personalized recommendations.

Recommendation Strategy Optimization

Roles are classified and quick commands are prioritized per role, achieving a “one‑size‑fits‑one” experience.

Recommendation Scenarios

Before asking, a welcome card suggests likely queries; after asking, a "you might ask" panel offers quick commands directly in the input field.

Command Center

A dedicated command center page with tabbed categories organizes all recommended functions without disrupting the main sales view.

Business Case 2: AI‑Powered Formula Editor

In a commission‑calculation system, users build templates that combine data items, value sources, and calculation methods (fill‑in, import, compute).

Formula Generation, Explanation, and Function Explanation

AI can generate formulas from natural language, explain existing formulas, and clarify function usage, addressing the complexity and error‑prone nature of manual formula creation.

Design of the Editing Flow

The interface uses a single‑turn interaction rather than continuous dialogue to reduce time consumption for high‑frequency B‑side tasks.

Encouraging User‑Provided Fields

Users can @‑select required fields, which the AI then respects during matching.

Pre‑recognition and Confirmation of Intent

After input, AI optimizes the text, pre‑fills needed fields, and refines the query to improve answer accuracy.

Future Outlook

AI assistants will evolve toward multimodal interaction, emotional intelligence, proactive assistance, and deeper contextual understanding, becoming seamless, personalized partners in daily workflows.