Industry Insights 13 min read

STRAPSim: A Component‑Level Portfolio Similarity Metric for ETF Alignment and Trade Execution

The paper introduces STRAPSim, a semantic, two‑stage, residual‑aware similarity measure that captures component‑level semantics and weight distribution for ETFs, and demonstrates through extensive toy and corporate‑bond ETF experiments that it consistently outperforms Jaccard, weighted Jaccard and BERTScore variants in classification, regression, recommendation and Spearman correlation tasks.

Bighead's Algorithm Notes

Nov 24, 2025

STRAPSim: A Component‑Level Portfolio Similarity Metric for ETF Alignment and Trade Execution

Background

Accurate measurement of portfolio similarity is crucial for ETF recommendation, portfolio trading, and risk alignment. Existing similarity metrics rely on exact asset overlap or static distance measures, which ignore semantic relationships between components and cannot handle partially overlapping portfolios with heterogeneous weights.

Problem Definition

The authors identify four core shortcomings of current methods:

Missing component‑level semantic similarity – traditional metrics such as Jaccard only consider exact overlap.

Weight‑insensitivity – static distance indicators ignore the impact of asset weights.

Static residual handling – methods like BERTScore do not update remaining weights after matching, leading to double‑counting.

Execution alignment challenges – custom basket‑to‑benchmark alignment in portfolio trading relies on heuristic overlap or industry tags, lacking a systematic, weight‑aware measure.

Method

STRAPSim Core Framework

STRAPSim computes portfolio similarity through an iterative process of semantic matching, weight transfer, and residual update:

Initialization: All components of reference portfolio X and candidate portfolio Y are marked unmatched, preserving full weights w_X(i) and w_Y(j).

Iterative Matching: Select the unmatched pair (x_i, y_j) with the highest semantic similarity S_{ij}.

Weight Transfer: Add min(w_X^{t}(i), w_Y^{t}(j)) × S_{ij} to the total similarity.

Weight Update: Subtract the matched amount from w_X(i) and w_Y(j); components with zero remaining weight are excluded from further matching.

Termination: When no weight remains, the total similarity equals the sum of all matched contributions, and any unmatched weight is added as a residual term.

The mathematical formulation is shown in the accompanying figures.

Component‑Level Similarity Calculation

For the corporate‑bond ETF experiments, the authors use a random‑forest proximity measure as the component‑level similarity S_{ij}, defined as the proportion of trees in which two components fall into the same leaf (higher proximity indicates stronger semantic similarity).

Baseline Comparisons

Jaccard Index – considers only exact component intersection over union, ignoring weights.

Weighted Jaccard – incorporates minimum weight of intersecting components and maximum weight of the union.

BERTScore Variant – adapts text‑similarity recall, precision and F1 to portfolios but does not update weights dynamically.

Theoretical Advantages of STRAPSim

Component‑level semantic matching captures similarity between different assets that share risk characteristics.

Weight‑aware aggregation reflects the actual importance of assets in a portfolio.

Residual dynamic update prevents double‑counting and respects weight limits.

Greedy matching replaces costly optimal‑transport linear programming, improving computational efficiency.

Explicit match logs enhance interpretability of alignment differences.

Experiments

Datasets

Toy datasets: classification (Iris, Breast‑Cancer), regression (Big Mac), recommendation (Movie ratings).

Corporate‑bond ETF dataset: 20 ETFs covering 6,870 bonds (average 511 bonds per ETF) with March 2024 holdings and monthly returns from Feb 2022 to Mar 2024. Features include issuer, maturity, industry, rating, issue date, market, currency, coupon, country, 144A flag, issue size, and coupon frequency (one‑hot encoded).

Experimental Setup

Toy data: treat each sample as a weighted feature set; component similarity computed via cosine similarity (Iris) or TF‑IDF (movie ratings).

Classification: STRAPSim‑based K‑NN voting.

Regression: STRAPSim‑based K‑NN weighted average.

Recommendation: user similarity from STRAPSim, rating prediction as weighted average of 20 nearest neighbours (10‑fold CV).

Corporate‑bond ETF: component similarity from random‑forest proximity (target variables OAS and return, 90 % train / 10 % test, 5‑fold hyper‑parameter tuning). Model RMSE = 0.21, MAPE = 0.08 (train) and RMSE = 0.51, MAPE = 0.15 (test).

Results

Toy datasets – Classification accuracy/F1: STRAPSim (Iris 0.90/0.90, Breast‑Cancer 0.68/0.72) surpasses Jaccard, weighted Jaccard and BERTScore variants. Regression (Big Mac): STRAPSim RMSE = 20.11, MAPE = 25.21 % vs. Jaccard RMSE = 30.91, weighted Jaccard RMSE = 21.37, BERTScore RMSE = 37.94. Recommendation (Movie ratings): STRAPSim RMSE = 0.90, MAPE = 23.27 % – marginally better than baselines.

Corporate‑bond ETF – Similarity heatmap from STRAPSim aligns best with monthly‑return correlation heatmap, clearly separating high‑ and low‑similar ETF pairs. Spearman rank correlation: STRAPSim average ρ = 0.6783 (p = 0.0081), significantly higher than Jaccard (0.5864, p = 0.0791), weighted Jaccard (0.5791, p = 0.0592) and BERTScore (0.5865, p = 0.0548). Statistical significance: 95 % of samples significant at α = 5 % and 100 % at α = 10 %.

Overall, STRAPSim provides a scalable, interpretable, and weight‑sensitive framework for comparing structured asset baskets, delivering consistent performance gains across classification, regression, recommendation, and real‑world ETF alignment tasks.