Artificial Intelligence 27 min read

What Are the Key UX Design Elements for Generative AI Assistants?

This article examines the essential UX design components of generative AI assistants—from core functions, command guidance, and smart agents to input methods, conversation bubbles, generation feedback, and voice call interactions—offering practical principles and real‑world case studies to help designers create intuitive, trustworthy AI experiences.

58UXD

Jun 19, 2024

What Are the Key UX Design Elements for Generative AI Assistants?

Key Design Elements of Generative AI Assistants

Since ChatGPT’s debut in November 2022, generative AI has reshaped the tech landscape. AI assistants, powered by advanced natural‑language processing, can understand user instructions, generate text, and even create content, expanding human‑computer interaction and delivering great convenience.

Tech giants and startups alike are racing to embed generative AI assistants into their products. This article breaks down the crucial design elements of AI assistants from a UX perspective, discusses design principles and evaluation methods, and presents concrete business case studies.

Function

Function refers to the specific tasks an AI assistant can perform, such as text understanding, speech recognition, translation, or information search.

Designers often confuse functions with commands and agents. A command is the user’s instruction that activates a function, while an agent is a software entity that carries out a task (e.g., a search agent or recommendation agent). In short: functions are the assistant’s capabilities; commands tell the assistant what to do; agents are the mini‑helpers that execute specific tasks.

Function Guidance

Welcome Card : provides onboarding and quick‑command recommendations for new users.

Above the Input Box : recommends shortcut commands directly above the input field.

Inside the Input Box : uses placeholder text to suggest available shortcuts.

Function Center

As AI assistants evolve, the number of functions grows, requiring a function center to aggregate and display them clearly. Three common presentation styles are:

Large Modal Dialog : used by Tencent; offers a spacious, filterable view without leaving the current page.

Standalone Page : used by DingTalk; provides tabbed categorisation and detailed graphic/text explanations.

Floating Layer : used by Cloud One‑Do; a lightweight overlay with simple tabs for fewer functions.

Quick Commands

Commands are user‑issued instructions that trigger corresponding functions (e.g., “Translate this paragraph”). Quick commands improve efficiency by offering concise, intuitive ways to interact with the AI.

Recommended Functions

Recommended functions often appear as quick commands on cards, giving users clear expectations.

“You Might Ask”

By analysing user habits and context, AI can predict likely follow‑up questions and proactively suggest them, reducing search time and increasing trust.

Smart Agents

Smart agents are specialised modules within an AI assistant (e.g., language‑understanding agents, recommendation agents) that handle specific tasks.

Agent Center

Similar to the function center, an agent center aggregates all agents, using tabs and search to help users locate the desired capability.

Agent Usage

Agents should integrate seamlessly with the main chat flow. For example, Doubao displays the agent’s avatar and name within the chat list, making the agent’s identity clear.

Input Box

The input box is the primary channel for users to convey their requests to the AI.

Input Methods

Text Input : the basic method for typing queries or commands.

Voice Input : a microphone button that records speech, requiring clear feedback.

Multimodal Input : combines text, voice, images, attachments, and slots for richer interaction.

Quick Input : context‑aware quick‑reply buttons or shortcuts (e.g., “@” or “/”).

Core Elements

Basic input boxes contain only a text field. Advanced designs add voice input, file upload, quick‑function shortcuts, send button, new‑conversation button, stop‑generation button, slot fields, and text‑optimization tools.

Send Button

The send button indicates whether the input is currently usable. Disabled states should be visually distinct and explain why sending is unavailable.

Voice Button

Voice input is common on mobile; on desktop it is less frequent. Some products combine keyboard shortcuts (e.g., hold space to record) with UI buttons.

File Upload

Uploading files lets users provide local data for processing. Designers must show real‑time progress, success/failure messages, and size/format limits.

Slot

When the AI needs additional structured information, slots appear as inline input fields linked by natural‑language phrases. Slots should be simple (single‑choice or text) to avoid overwhelming users.

Text Optimization

After a user types, an optimization feature can correct typos, simplify language, or enrich the query with missing context, acting as a “pre‑understanding” step.

Clear/New Conversation Button

Allows users to start a fresh dialogue, remove history, or improve performance when conversations become long.

Answer Message Body

AI responses can be plain text, rich text, links, images, videos, buttons, forms, or interactive cards. Supporting multiple media types enriches the experience.

Text Message

Plain text is concise; rich text adds formatting, links, and emphasis for detailed guidance.

Multimedia Message

Images, videos, audio, or embedded code editors provide richer context.

Card Message

Cards combine title, image, and action buttons for structured information (e.g., news, product listings).

Interactive Form

Similar to slots but presented as a lightweight form, allowing users to supply structured data without leaving the chat.

Conversation Bubble

Messages appear in bubbles, each with a permanent or hover‑activated operation area.

Permanent : always visible, suitable for both web and mobile.

Hover : appears on mouse‑over, common on web but can waste space.

Feedback Operations

User commands can be copied, edited, or deleted. AI answers can be copied, regenerated, liked, disliked, or deleted. Regeneration is often the most prominent action.

Interrupt Operations

Users can stop generation via a “stop” button. On web it is usually placed below the bubble; on mobile it may replace the send button.

Generation Process Interaction

During Generation

Instant Feedback : progress indicators, “analyzing intent…”, or reference counts keep users informed.

Interruptibility : users can halt generation at any time.

Avoid Distractions : disable input fields while generating to keep focus.

After Generation

Result Display : clear formatting, pagination, or folding for long outputs.

Operation Feedback : bubble action bar for copying, regenerating, etc.

User Feedback : thumbs‑up/down or surveys to improve the model.

Stop Generation

Save Progress : preserve partial results for later continuation.

Prompt Message : inform the user that generation stopped.

Next Steps : suggest “regenerate” or other actions.

Voice Call

Beyond text, many AI assistants support voice calls, enabling hands‑free interaction.

Call Flow

Typical steps: start → connect → user speak → AI recognize → AI respond → end.

Key Differences from Human Calls

AI never interrupts; users must signal “I’m done” (auto‑detected or manual send).

Users cannot speak over AI; a “break” button lets them interrupt the AI.

UI Elements for Voice Calls

Status Prompts : show connection, signal quality, etc.

Pre‑Speech Prompt : encourage the user to speak (“I’m listening”).

Listening Indicator : visual waveforms while the user talks.

Auto‑Send Prompt : indicate when the system will send after a pause.

AI Speaking Indicator : show that the AI is speaking and offer a “interrupt” option.

Additional Controls

Hang‑up, pause/resume, and post‑call transcript are essential. Some products add avatar animation, real‑time subtitles, or role selection.

Design Principles for Generative AI Assistants

Visualization of NLP : show real‑time intent analysis or network queries.

Context Awareness & Coherence : display recent dialogue history or summarise key points.

Multimodal Interaction : allow seamless switching between text, voice, images, video, maps, etc.

Instant Feedback & Confirmation : echo voice‑to‑text conversion, highlight recognised keywords.

Personalisation & Customisation : suggest options based on user history.

Transparency & Explainability : let users view the reasoning behind suggestions.

Error Handling & Correction : provide “regenerate”, “edit”, or “copy” buttons for quick fixes.

Emotional Understanding & Feedback : recognise user emotions and respond with empathy.

Conclusion

The eight principles above summarise the core UX considerations for generative AI assistants and are applicable to any AI‑assistant product.

Product Design AI assistant Interaction Design UX design

Written by

58UXD

58.com User Experience Design Center

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.