MemSlides: Memory‑Driven Slides Agent for Multi‑Round Precise PPT Edits (HuggingFace #1)

MemSlides introduces a hierarchical memory framework—persistent user profiles, active working memory, and tool experience—to turn AI slide generation into a stateful multi‑turn authoring process, improving persona alignment and local edit efficiency, with closed‑loop completion rising from 0.815 to 0.963 and edit time dropping from 609.5 s to 242.5 s.

Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
MemSlides: Memory‑Driven Slides Agent for Multi‑Round Precise PPT Edits (HuggingFace #1)

Problem

AI‑assisted PPT generation systems typically focus on producing a complete deck in a single pass. In real usage, users repeatedly edit slides and need the system to keep stable preferences across rounds while honoring temporary constraints such as “make all later titles blue”. Without an explicit memory mechanism, these preferences are either repeatedly injected into prompts or lost during multi‑turn editing.

Memory‑augmented Slides Agent

MemSlides models slide authoring as a stateful, multi‑turn problem and introduces a three‑tier hierarchical memory:

User‑profile memory stores long‑term preferences (theme, content density, visual style, layout) that persist across jobs.

Working memory holds the active deck state, temporary preferences, carry‑over instructions, and coverage status for the current session.

Tool memory records reusable execution experience at two granularities: round‑scope task experience and operation‑scope tool‑chain fragments .

When signals conflict, explicit user feedback overrides the current deck first, then the task template, and finally the long‑term profile. After a job finishes, only stable, transferable signals are consolidated back into the persistent profile to avoid persisting one‑off requests.

Working Memory Details

Working memory enables delayed preference carry‑over. For example, a user may state “later titles should be blue” early in the session; the rule is stored in working memory and applied when new slides are added, preventing the system from forgetting the delayed preference.

Tool Memory Details

Tool memory captures execution experience:

Round‑scope experience aggregates lessons from an entire modify job (e.g., error summaries, automatically extracted patterns).

Operation‑scope fragments break the reasoning‑tool‑observation chain into reusable snippets that can be retrieved before similar tool calls.

The goal is to reduce repeated tool errors, lower back‑tracking, and improve the reliability of local verification.

Scoped Slide‑Local Revision (Plan‑Act‑Guard)

MemSlides maps a user edit request to the minimal effective slide region and constrains the edit to that scope. The pipeline consists of three stages:

Plan : construct an execution contract that records the inferred region, target slide path, active rule identifiers, selector hints, and coverage requirements.

Act : select the appropriate editing tool based on the contract and apply the minimal patch to the target region.

Guard : verify that the patch binds to the correct snapshot, that coverage requirements are satisfied, and that finalization does not occur prematurely.

This contract‑driven process ensures that “completion” is not a model‑generated stop token but a verified local modification that respects non‑target areas.

Experiments

Two evaluation dimensions were measured:

Personalized generation : a multi‑persona, multi‑intent profile bank was used. Persona‑alignment judgments showed that user‑profile memory improves round‑0 persona alignment across dimensions while maintaining competitive slide quality.

Multi‑round local editing : a diagnostic matched‑pair setup isolated the effect of tool memory. Injecting tool memory increased overall closed‑loop completion from 0.815 to 0.963 , strict verification from 0.310 to 0.534 , and reduced time‑to‑first‑correct‑edit from 609.5 s to 242.5 s . Pair‑level results indicated heterogeneity across models and task difficulty, confirming that tool memory does not uniformly dominate every case.

Analysis and Governance

The study demonstrates that a usable Slides Agent must:

Distinguish long‑term stable preferences from transient task‑level constraints.

Leverage past tool experience to avoid redundant errors.

Maintain a clear separation between persistent profile consolidation and temporary working memory updates.

Persisted profiles can encode sensitive user habits or organizational style constraints, and erroneous consolidation may carry outdated preferences forward. The authors therefore recommend that future agents provide transparent memory inspection, editing, and deletion mechanisms.

Resources

arXiv paper: https://arxiv.org/abs/2606.17162

Project site: https://memslides.github.io/

GitHub repository: https://github.com/huohua325/Memslides

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

personalizationAI PPT generationlocal modificationmemory‑driven agentsMemSlidesmulti‑turn editingpresentation AI
Machine Learning Algorithms & Natural Language Processing
Written by

Machine Learning Algorithms & Natural Language Processing

Focused on frontier AI technologies, empowering AI researchers' progress.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.