How to Mine Association Rules with discoverR: Apriori & FP‑Growth in R
This guide explains the fundamentals of association‑rule mining, introduces support, confidence and lift metrics, and demonstrates step‑by‑step how to use the discoverR R package with Apriori and FP‑Growth algorithms to extract and visualize recommendation rules from the classic groceries dataset.
Understanding Association Rules
Association rule mining, also known as market‑basket analysis, discovers items that frequently appear together in transaction records (e.g., the classic "diaper → beer" example). The key metrics are support (joint occurrence probability), confidence (conditional probability P(B|A)), and lift (ratio of confidence to overall probability of B).
Support, confidence and lift relationships are illustrated below:
Loading Data with discoverR
First, load the discoverR library and the built‑in groceries dataset:
library(discoverR)
data(groceries)The dataset contains 9,835 transactions and 169 items, with columns id (transaction number) and items (list of purchased products).
Initialize a Spark session (local or YARN) and convert the data frame to a distributed Spark DataFrame:
discover.init()
df_groceries <- as.DataFrame(groceries)Training Models: Apriori and FP‑Growth
discoverR provides two algorithms for association‑rule mining: Apriori (via txApriori) and FP‑Growth (via txFPgrowth).
Apriori generates candidate itemsets iteratively, filtering by a minimum support threshold until no further frequent itemsets are found.
# Reshape data for mining
fp <- txReshape(data = df_groceries, column = c("id", "items"))
# Find frequent itemsets with support > 0.05
set <- txApriori(data = fp, colName = "items", parameter = list(support = 0.05))
# Generate rules with support ≥ 0.05 and confidence ≥ 0.1
rule <- txApriori(data = fp, colName = "items", parameter = list(support = 0.05, confidence = 0.1, target = "rules"))FP‑Growth builds a compact FP‑tree and recursively extracts frequent patterns without candidate generation.
# Reshape data
fp <- txReshape(data = df_groceries, column = c("id", "items"))
# Frequent itemsets with support > 0.1
set <- txFPgrowth(data = fp, colName = "items", parameter = list(support = 0.1))
# Rules with support ≥ 0.035 and confidence ≥ 0.1
rule <- txFPgrowth(data = fp, colName = "items", parameter = list(support = 0.035, confidence = 0.1, target = "rules"))Visualizing Rules
Use plot() to display rule metrics:
plot(rule)For a graph‑based view showing support, confidence and lift:
plot(rule, method = "graph", measure = "support", shading = "lift")Conclusion
By leveraging discoverR’s built‑in Apriori and FP‑Growth implementations, analysts can quickly perform association‑rule mining on large transaction datasets without writing low‑level code, enabling effective recommendation and cross‑selling strategies in retail, search and other domains.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRing Big Data Open Lab
Focused on big data technology research, exploring the Big Data era | [email protected]
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
