Artificial Intelligence 11 min read

How to Use CHAID Decision Trees in SPSS for Market Segmentation

This article explains why simple single‑feature analysis can miss important user groups, introduces decision trees—especially the CHAID algorithm—as a way to uncover multi‑attribute segments, and provides step‑by‑step instructions for building descriptive and predictive trees in SPSS, including how to interpret tree visuals and benefit tables.

JD.com Experience Design Center

Apr 7, 2021

How to Use CHAID Decision Trees in SPSS for Market Segmentation

In recent projects, business teams reported that low content exposure resulted in insufficient baseline data for recommendation algorithms, prompting a need for research‑guided cold‑start content placement to identify which user features should receive which content.

Analyzing a single feature can hide important attribute combinations within specific populations. For example, if we only compare gender for cosmetics purchases, we might conclude men rarely buy cosmetics, but further splitting by age reveals that post‑95 male users have a relatively high purchase rate, a segment that would be missed by gender‑only analysis.

Because many user attributes exist, manually combining them is cumbersome; decision‑tree analysis can automate this process.

What Is a Decision Tree?

A decision tree is a method for segmenting users that incorporates a target variable, unlike clustering which lacks such a focus. The goal is to iteratively split users based on explanatory variables (features) so that the resulting segments differ as much as possible on the target variable.

Explanatory variables include demographics, consumption traits, behavioral data, etc. The target variable is the core metric of interest and can serve two purposes:

Descriptive: typically a binary variable (e.g., whether a user prefers a certain content). The tree reveals which feature combinations correspond to high or low preference.

Predictive: a categorical variable (e.g., brand choice A/B/C/D). The tree identifies which feature groups favor each category and can output decision rules for prediction.

Decision‑Tree Principle

The CHAID (Chi‑square Automatic Interaction Detection) algorithm is commonly used because it produces concise and well‑separated splits. Exhaustive CHAID evaluates more combinations for potentially better splits, but both share the same calculation method; this guide focuses on CHAID.

CHAID automatically cross‑analyzes explanatory and target variables, performs chi‑square tests, and selects the most significant split. The process repeats until no further significant splits are found or predefined limits are reached, producing a tree diagram.

The resulting tree conveys two key pieces of information:

Structure and groups: parent nodes split into child nodes; leaf nodes represent final segments whose attributes are the intersection of all splits leading to them.

Target‑variable distribution: each node shows the proportion of the target outcome (e.g., 73.5% of first‑tier city users are interested in a specific benefit).

Large trees can be pruned by setting maximum depth, minimum parent‑node sample size, and minimum child‑node sample size. If a node fails these criteria, further splitting stops.

Beyond the visual tree, SPSS can output a benefit table that ranks leaf nodes by their contribution to the target variable. The table includes:

Node percentage (share of total sample).

Gain (proportion of target‑class samples within the node).

Response (percentage of target‑class responses within the node).

Index = (Gain / Node) × 100%; values >100% indicate above‑average preference.

The benefit table orders segments from low to high preference, allowing quick identification of high‑interest groups. For instance, node 5 may have the highest interest but only 7.1% market share; combining nodes 5, 16, 1, 12 can raise coverage to 37%.

How to Operate in SPSS

Descriptive purpose :

Data preparation: each row represents a user sample with the target variable and all explanatory variables.

Choose Decision Tree: Analyze → Classify → Tree.

Input variables: set the target (e.g., content preference: prefer / not prefer) as the Dependent variable, and user attributes as Independent variables. In the classification dialog, select the target value to enable benefit‑table output.

Select growth method: default CHAID.

Set conditions: maximum tree depth, minimum parent‑node size, minimum child‑node size.

Output: request the benefit table and benefit chart.

Predictive purpose :

Validation: optionally enable cross‑validation to assess model risk; SPSS splits the sample into folds, builds trees, and averages risk scores.

Save: choose to output predicted class and prediction probabilities for each case.

Applications and Limitations of Decision Trees

Decision trees excel when the goal is to find user segments with stark differences on a target metric, making them ideal for market‑product or brand positioning to pinpoint core audience characteristics. They are especially useful for initial business questions such as “Which users should receive specific content?”

However, limitations exist:

Only one target variable can be set, unlike clustering which can consider multiple dimensions simultaneously.

The number of resulting segments cannot be directly controlled; trees may produce many leaves, increasing interpretation effort. If comprehensive segmentation is needed, clustering may be more appropriate.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Analysis decision tree SPSS market segmentation CHAID

Written by

JD.com Experience Design Center

Professional, creative, passionate about design. The JD.com User Experience Design Department is committed to creating better e-commerce shopping experiences.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.