How ComfyUI Caches Work: CLASSIC vs LRU vs RAM_PRESSURE Explained
This article breaks down ComfyUI's two‑level cache system, explains the differences between the CLASSIC, LRU, and RAM_PRESSURE strategies for outputs and objects, and offers practical guidance on choosing the right cache mode when running multiple models, LoRAs, and workflows on a single machine.
ComfyUI maintains two parallel caches— outputs for intermediate results and UI output, and objects for node instances—to avoid redundant computation and object construction. Both caches are managed by a unified CacheSet defined in execution.py.
Overall architecture and cache strategies
The cache type is selected via the CacheType enum (CLASSIC, LRU, RAM_PRESSURE, NONE). The CacheSet constructor creates the appropriate containers:
class CacheType(Enum):
CLASSIC = 0
LRU = 1
NONE = 2
RAM_PRESSURE = 3
class CacheSet:
def __init__(self, cache_type=None, cache_args={}):
if cache_type == CacheType.NONE:
self.init_null_cache()
elif cache_type == CacheType.RAM_PRESSURE:
cache_ram = cache_args.get("ram", 16.0)
self.init_ram_cache(cache_ram)
elif cache_type == CacheType.LRU:
cache_size = cache_args.get("lru", 0)
self.init_lru_cache(cache_size)
else:
self.init_classic_cache()
self.all = [self.outputs, self.objects]Cache keys
Outputs use a key composed of the node’s input signature, an is_changed fingerprint, and optionally the node_id for non‑idempotent nodes. The signature is built by CacheKeySetInputSignature (see caching.py:100‑126) and includes the current node and all its ancestors.
async def get_node_signature(self, dynprompt, node_id):
signature = []
ancestors, order_mapping = self.get_ordered_ancestry(dynprompt, node_id)
signature.append(await self.get_immediate_node_signature(dynprompt, node_id, order_mapping))
for ancestor_id in ancestors:
signature.append(await self.get_immediate_node_signature(dynprompt, ancestor_id, order_mapping))
return to_hashable(signature)The immediate signature packs class_type, the is_changed fingerprint, optional node_id, and ordered input values.
Objects use a simple tuple (node_id, class_type) as the key, managed by CacheKeySetID.
Cache containers
BasicCache: provides set_prompt, clean_unused, get, set. HierarchicalCache: builds a parent‑to‑child cache hierarchy for sub‑graphs. LRUCache: extends BasicCache with a generational LRU eviction policy. RAMPressureCache: extends LRUCache and adds RAM‑pressure‑driven eviction. NullCache: a no‑op implementation used when caching is disabled.
Integration into the execution loop
During each execution ( execution.py:681‑685) the system:
Creates an IsChangedCache for the current prompt.
Calls set_prompt on both outputs and objects (behavior varies by strategy).
Runs clean_unused() to purge stale entries.
When a node runs, it first attempts to hit outputs. If the key matches, the node is skipped; otherwise the node executes and its result is stored as a CacheEntry.
CLASSIC strategy (default)
CLASSIC uses hierarchical caches for both outputs and objects without size or TTL limits. clean_unused() removes any entry whose key is not present in the current prompt’s key set, effectively resetting the cache on every prompt change.
def clean_unused(self):
self._clean_cache()
self._clean_subcaches()Consequences:
Best for stable workflows where only intra‑prompt reuse is needed.
No cross‑prompt persistence; switching workflows discards previous caches.
LRU strategy
LRU applies only to outputs; objects remain hierarchical. It tracks a global generation counter that increments on each new prompt. Each cache entry records the generation in which it was last used.
class LRUCache(BasicCache):
def __init__(self, key_class, max_size=100):
self.max_size = max_size
self.min_generation = 0
self.generation = 0
self.used_generation = {}
self.children = {}
async def set_prompt(self, ...):
self.generation += 1
for node_id in node_ids:
self._mark_used(node_id)
def _mark_used(self, node_id):
cache_key = self.cache_key_set.get_data_key(node_id)
if cache_key is not None:
self.used_generation[cache_key] = self.generationWhen len(cache) > max_size, the cache increments min_generation and evicts entries whose used_generation is older than this threshold.
RAM_PRESSURE strategy
RAM_PRESSURE inherits from LRU but replaces size‑based eviction with RAM‑pressure‑driven eviction. The poll(ram_headroom) method checks available RAM (preferring cgroup limits, falling back to psutil) and, if below the headroom, computes an OOM score for each entry based on:
Age (generations since last use, exponential factor).
Estimated RAM usage (CPU tensor size, custom get_ram_usage() if available).
Timestamp for tie‑breaking.
Entries are sorted by this score and removed until enough headroom is restored, with a hysteresis factor to avoid rapid thrashing.
def poll(self, ram_headroom):
if _ram_gb() > ram_headroom:
return
gc.collect()
if _ram_gb() > ram_headroom:
return
clean_list = []
for key, (outputs, _) in self.cache.items():
oom_score = RAM_CACHE_OLD_WORKFLOW_OOM_MULTIPLIER ** (self.generation - self.used_generation[key])
ram_usage = RAM_CACHE_DEFAULT_RAM_USAGE
# recursively sum tensor sizes …
oom_score *= ram_usage
bisect.insort(clean_list, (oom_score, self.timestamps[key], key))
while _ram_gb() < ram_headroom * RAM_CACHE_HYSTERESIS and clean_list:
_, _, key = clean_list.pop()
del self.cache[key]
gc.collect()Unlike CLASSIC, clean_unused() in RAM mode only removes unused sub‑caches; outputs entries persist across prompts until RAM pressure forces eviction.
Practical recommendations
Stable single workflow : use the default CLASSIC for simplicity.
Multiple workflows with limited memory : enable LRU ( --cache-lru N) to keep recent results up to N entries.
Container or cloud environments with strict RAM limits : prefer RAM_PRESSURE ( --cache-ram <GB>) to let the system automatically free memory based on OOM scoring.
Disable caching entirely : use --cache-none (NullCache) for debugging or resource‑constrained runs.
Summary
ComfyUI’s caching consists of two parallel stores ( outputs and objects) governed by four selectable strategies. CLASSIC offers pure hierarchical caching with prompt‑driven eviction, LRU adds generational size limits for outputs, RAM_PRESSURE adds memory‑aware eviction, and NONE disables caching. Choosing the right mode depends on workflow stability, desired cross‑prompt reuse, and available RAM.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture and Beyond
Focused on AIGC SaaS technical architecture and tech team management, sharing insights on architecture, development efficiency, team leadership, startup technology choices, large‑scale website design, and high‑performance, highly‑available, scalable solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
