Industry Insights 31 min read

How Do Process Mining Algorithms Compare? A Deep Dive into Control‑Flow Techniques

This article examines the rapid growth of process‑mining techniques, explains the core concepts and modeling languages, evaluates model quality criteria, and provides a detailed comparison of direct, heuristic, genetic, and log‑classification algorithms from the perspective of control‑flow analysis.

AsiaInfo Technology: New Tech Exploration

Oct 31, 2022

How Do Process Mining Algorithms Compare? A Deep Dive into Control‑Flow Techniques

Introduction

Enterprises increasingly focus on process governance, driving rapid development of process‑mining techniques and a proliferation of algorithms. Effective analysis requires understanding each algorithm’s strengths and weaknesses and comparing them using the control‑flow view.

Process‑Mining Overview

Process mining extracts knowledge from event logs recorded by information systems to discover, monitor, and improve actual business processes. It comprises three perspectives: process discovery, conformance checking, and process enhancement.

Control‑flow View and Modeling Languages

The control‑flow view captures the order of activities and the branching structure of a process. Common modeling formalisms include:

Petri nets – bipartite graphs of places, transitions and tokens, supporting concurrency.

Workflow nets (WF‑nets) – a subclass of Petri nets with a single source place and a single sink place, suitable for α‑algorithm discovery.

Causal nets (C‑nets) – activity‑centric representation using input and output bindings without explicit places.

BPMN – graphical notation with activities, events and gateways (AND, XOR, OR).

Process trees – hierarchical representation using four operators: sequence (→), exclusive choice (×), parallel (∧) and redo loop (↺).

Event Log Characteristics

An event log consists of a set of traces, each trace being an ordered list of events. Each event records at least an activity name, a timestamp, a case identifier and optionally resources, costs, or data objects. Two practical issues affect mining:

Noise – low‑frequency events caused by logging errors or exceptional executions. Some algorithms (heuristic miner, genetic miner) incorporate noise handling.

Completeness – the degree to which the log contains all possible behavior. Formal completeness levels include global completeness (GC), direct succession (DS), short‑loop (DS+), long‑loop (DS++) and frequency completeness (FS).

Model‑quality Criteria

Four dimensions are used to evaluate discovered models:

Reproducibility (Fitness) – proportion of log traces that can be reproduced by the model.

Precision – extent to which the model does not allow behavior not seen in the log.

Generality – ability of the model to allow correct but unseen behavior (avoid over‑fitting).

Simplicity – preference for fewer elements while preserving the other criteria.

Algorithm Families

1. Direct (α‑type) Algorithms

These algorithms derive four ordering relations (direct succession, causality, parallel, unrelated) from the log and construct five control‑flow patterns (sequence, exclusive choice, inclusive choice, parallel split, parallel join) to produce a WF‑net. They assume noise‑free, complete logs and cannot handle non‑free‑choice or short‑loop constructs.

2. Heuristic Miner

The heuristic miner builds a dependency/frequency matrix. For each activity pair (a,b) it computes a dependency measure:

dep(a,b) = (|a→b| - |b→a|) / (|a→b| + |b→a| + 1)

Pairs with dep(a,b) above a user‑defined threshold are kept; low‑frequency relations are treated as noise. The resulting dependency graph is interpreted as sequence, choice or parallel structures, yielding a C‑net or Petri net.

3. Genetic Algorithms

Process models are encoded as individuals (often as process trees). A fitness function combines the four quality criteria:

fitness = w1·Reproducibility + w2·Precision + w3·Generality + w4·Simplicity

Typical weights (w1…w4) are set by the user. The algorithm iterates selection, crossover and mutation to evolve a population toward higher fitness. This approach is robust to noise and incomplete logs but requires more computation.

4. Log‑classification (Cluster‑based) Approaches

To reduce log diversity, each trace is transformed into a feature vector (e.g., binary activity presence, frequency counts, data‑object attributes). Standard similarity measures (Euclidean, Hamming, Jaccard) are used to cluster traces into sub‑logs. A chosen mining algorithm (α‑algorithm, heuristic, genetic) is applied to each sub‑log, producing local models that can be merged into a global view. This improves understandability for highly heterogeneous processes.

Algorithm Comparison

Direct and heuristic miners are fast but struggle with noisy or highly diverse logs. Genetic algorithms handle noise and incompleteness at the cost of higher runtime. Log‑classification techniques mitigate diversity but depend on effective clustering and subsequent merging.

Technical Trends

Future work focuses on hybrid methods that combine the efficiency of direct/heuristic algorithms with the robustness of genetic or causal techniques, and on leveraging machine‑learning and increased computational power to improve model quality while scaling to large, noisy logs.

References

Aalst, W. Process Mining: Discovery, Conformance and Enhancement of Business Processes , Springer, 2011.

Weijters, A., Aalst, W. “Process mining with the heuristics miner‑algorithm”, 2006.

Buijs, J., Dongen, B., Aalst, W. “A genetic algorithm for discovering process trees”, 2012 IEEE Congress on Evolutionary Computation.

Algorithm Comparison Control Flow process mining process discovery workflow modeling

Written by

AsiaInfo Technology: New Tech Exploration

AsiaInfo's cutting‑edge ICT viewpoints and industry insights, featuring its latest technology and product case studies.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.