How to Mine Association Rules with discoverR: Apriori & FP‑Growth in R

This guide explains the fundamentals of association‑rule mining, introduces support, confidence and lift metrics, and demonstrates step‑by‑step how to use the discoverR R package with Apriori and FP‑Growth algorithms to extract and visualize recommendation rules from the classic groceries dataset.

StarRing Big Data Open Lab
StarRing Big Data Open Lab
StarRing Big Data Open Lab
How to Mine Association Rules with discoverR: Apriori & FP‑Growth in R

Understanding Association Rules

Association rule mining, also known as market‑basket analysis, discovers items that frequently appear together in transaction records (e.g., the classic "diaper → beer" example). The key metrics are support (joint occurrence probability), confidence (conditional probability P(B|A)), and lift (ratio of confidence to overall probability of B).

Support, confidence and lift relationships are illustrated below:

Loading Data with discoverR

First, load the discoverR library and the built‑in groceries dataset:

library(discoverR)

data(groceries)

The dataset contains 9,835 transactions and 169 items, with columns id (transaction number) and items (list of purchased products).

Initialize a Spark session (local or YARN) and convert the data frame to a distributed Spark DataFrame:

discover.init()

df_groceries <- as.DataFrame(groceries)

Training Models: Apriori and FP‑Growth

discoverR provides two algorithms for association‑rule mining: Apriori (via txApriori) and FP‑Growth (via txFPgrowth).

Apriori generates candidate itemsets iteratively, filtering by a minimum support threshold until no further frequent itemsets are found.

# Reshape data for mining
fp <- txReshape(data = df_groceries, column = c("id", "items"))

# Find frequent itemsets with support > 0.05
set <- txApriori(data = fp, colName = "items", parameter = list(support = 0.05))

# Generate rules with support ≥ 0.05 and confidence ≥ 0.1
rule <- txApriori(data = fp, colName = "items", parameter = list(support = 0.05, confidence = 0.1, target = "rules"))

FP‑Growth builds a compact FP‑tree and recursively extracts frequent patterns without candidate generation.

# Reshape data
fp <- txReshape(data = df_groceries, column = c("id", "items"))

# Frequent itemsets with support > 0.1
set <- txFPgrowth(data = fp, colName = "items", parameter = list(support = 0.1))

# Rules with support ≥ 0.035 and confidence ≥ 0.1
rule <- txFPgrowth(data = fp, colName = "items", parameter = list(support = 0.035, confidence = 0.1, target = "rules"))

Visualizing Rules

Use plot() to display rule metrics:

plot(rule)

For a graph‑based view showing support, confidence and lift:

plot(rule, method = "graph", measure = "support", shading = "lift")

Conclusion

By leveraging discoverR’s built‑in Apriori and FP‑Growth implementations, analysts can quickly perform association‑rule mining on large transaction datasets without writing low‑level code, enabling effective recommendation and cross‑selling strategies in retail, search and other domains.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data miningrecommendation systemsAprioriRassociation rulesFP-Growth
StarRing Big Data Open Lab
Written by

StarRing Big Data Open Lab

Focused on big data technology research, exploring the Big Data era | [email protected]

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.