Discover 10 Underrated Machine Learning Algorithms That Can Supercharge Your Models
This article explores several powerful yet often overlooked machine‑learning techniques—including symbolic regression, isolation forest, Tsetlin machines, random kitchen sinks, field‑aware factorization machines, CRFs, ELMs, and VAEs—detailing their principles, code implementations, and real‑world application scenarios.
When we talk about machine learning, we usually mention linear regression, decision trees, and neural networks, but beyond these well‑known models there are lesser‑known yet highly effective algorithms that can solve unique challenges with impressive efficiency. This article introduces a collection of underrated but useful machine‑learning algorithms that deserve a place in your toolbox.
1. Symbolic Regression
Unlike traditional regression models that assume a predefined equation, symbolic regression discovers the mathematical expression that best fits the data. In simple terms, it does not assume a form but uses genetic programming to evolve models through mutation and crossover, similar to natural selection.
# !pip install gplearn
import numpy as np
import matplotlib.pyplot as plt
from gplearn.genetic import SymbolicRegressor
# Generate example data
X = np.linspace(-10, 10, 100).reshape(-1, 1)
y = 3 * np.sin(X).ravel() + 2 * X.ravel() ** 2 - 4
# Initialize symbolic regressor
sr = SymbolicRegressor(population_size=2000,
generations=20,
stopping_criteria=0.01,
function_set=('add', 'sub', 'mul', 'div', 'sin', 'cos', 'sqrt', 'log'),
p_crossover=0.7,
random_state=42)
# Fit model
sr.fit(X, y)
# Predict
y_pred = sr.predict(X)
plt.scatter(X, y, color='black', label='Real data')
plt.plot(X, y_pred, color='red', label='Discovered function')
plt.legend()
plt.show()Applications of Symbolic Regression
Discovering physical laws : Re‑derive underlying equations from experimental data.
Stock market prediction : Derive equations that model price movements.
Medical research : Uncover relationships between drugs and patient recovery.
Data‑science competitions : A hidden gem for Kaggle challenges.
2. Isolation Forest (iForest)
Isolation Forest is a tree‑based anomaly detection algorithm that isolates outliers faster than traditional clustering or density‑based methods (e.g., DBSCAN or One‑Class SVM). It does not model normal data; instead, it randomly partitions the feature space and isolates anomalies, performing well on high‑dimensional data without requiring labeled samples.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
# Generate synthetic normal data
rng = np.random.RandomState(42)
X = 0.3 * rng.randn(100, 2) # 100 normal points
# Add some outliers
X_outliers = rng.uniform(low=-4, high=4, size=(10, 2)) # 10 outliers
X = np.vstack([X, X_outliers])
iso_forest = IsolationForest(n_estimators=100, contamination=0.1, random_state=42)
y_pred = iso_forest.fit_predict(X)
plt.scatter(X[:, 0], X[:, 1], c=y_pred, cmap='coolwarm', edgecolors='k')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Isolation Forest Anomaly Detection')
plt.show()Suitable Scenarios
Fraud detection in credit‑card transactions.
Network intrusion or malware activity detection.
Quality‑control identification of defective products.
Rare disease or anomaly detection in health data.
Marking abnormal stock‑market activity for insider‑trading detection.
3. Tsetlin Machine (TM)
The Tsetlin Machine was introduced by Granmo in 2018 and is based on Tsetlin Automata . Unlike traditional models, it uses propositional logic to detect complex patterns, optimizing decisions through reward and penalty mechanisms. Its key advantages are low memory consumption and high learning speed, making it suitable for low‑power hardware and energy‑efficient AI applications.
Key Features
Low computational demand : Far fewer resources than deep‑learning models.
Easy to interpret : Generates human‑readable rules instead of opaque equations.
Ideal for small AI systems : Fits well on embedded devices.
More details can be found in the authors' GitHub repository and the original research paper.
4. Random Kitchen Sinks (RKS)
Kernel methods such as SVMs and Gaussian processes become costly on large datasets because of expensive kernel calculations. Random Kitchen Sinks provide an efficient trick to approximate kernel functions using random Fourier features, projecting data into a higher‑dimensional space without explicit kernel computation, thus enabling scalable learning.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.kernel_approximation import RBFSampler
# Generate non‑linearly separable data
X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)
# Apply RKS for kernel approximation
rks = RBFSampler(gamma=1.0, n_components=500, random_state=42)
X_rks = rks.fit_transform(X)
# Visualize transformed space with PCA
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_rks)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, edgecolors='k', alpha=0.6)
plt.title('RKS‑transformed data')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()Applications
Accelerating SVM and kernel regression on large datasets.
Efficient approximation of RBF kernels for scalable learning.
Reducing memory and computation costs of nonlinear models.
5. Bayesian Optimization
Bayesian Optimization is a sequential, probabilistic optimization method for expensive functions such as hyper‑parameter tuning of deep‑learning or machine‑learning models. Instead of blind grid or random search, it builds a probabilistic model (e.g., Gaussian Process) of the objective and intelligently selects promising points.
Application Scenarios
Hyper‑parameter tuning : More efficient than grid/random search.
A/B testing : Finds the best variant without wasting resources.
AutoML : Powers tools like Google AutoML.
import numpy as np
from bayes_opt import BayesianOptimization
# Define objective function (e.g., maximize -x**2 * sin(x))
def objective_function(x):
return -(x**2 * np.sin(x))
param_bounds = {'x': (-5, 5)}
optimizer = BayesianOptimization(f=objective_function, pbounds=param_bounds, random_state=42)
optimizer.maximize(init_points=5, n_iter=20)
print('Best parameters:', optimizer.max)6. Hopfield Network
The Hopfield Network is a type of recurrent neural network (RNN) designed for pattern recognition and error correction. It stores binary patterns in memory; when presented with a noisy or incomplete input, it retrieves the closest stored pattern through auto‑association.
Application Scenarios
Memory recall systems for restoring corrupted images or missing data.
Error correction in telecommunications.
Neuroscience simulations of human memory processes.
7. Self‑Organizing Maps (SOMs)
A Self‑Organizing Map is an unsupervised neural network that projects high‑dimensional data onto a low‑dimensional (usually 2‑D) grid using competitive learning. Unlike back‑propagation networks, SOMs preserve the topological relationships of the input data, making them suitable for clustering, pattern recognition, and data exploration.
Application Scenarios
Market segmentation: identifying distinct customer groups.
Medical diagnosis: clustering patient symptoms to detect diseases.
Anomaly detection in manufacturing.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from minisom import MiniSom
from tensorflow.keras.datasets import mnist
DATASET = "mnist"
if DATASET == "mnist":
(X_train, y_train), _ = mnist.load_data()
X_train = X_train.reshape(X_train.shape[0], -1) / 255.0
X_train, y_train = X_train[:1000], y_train[:1000]
elif DATASET == "wine":
df = sns.load_dataset("wine_quality")
X_train = df.drop(columns=["quality"]).values
X_train = X_train / np.linalg.norm(X_train, axis=1, keepdims=True)
y_train = df["quality"].values
elif DATASET == "customers":
url = "https://raw.githubusercontent.com/MachineLearningWithPython/datasets/main/Mall_Customers.csv"
df = pd.read_csv(url)
X_train = df[["Annual Income (k$)", "Spending Score (1-100)"]].values
X_train = X_train / np.linalg.norm(X_train, axis=1, keepdims=True)
y_train = None
som_size = (10, 10)
som = MiniSom(som_size[0], som_size[1], X_train.shape[1], sigma=1.0, learning_rate=0.5)
som.random_weights_init(X_train)
som.train_random(X_train, 1000)
activation_map = np.zeros(som_size)
for x in X_train:
winner = som.winner(x)
activation_map[winner] += 1
plt.figure(figsize=(10, 8))
plt.imshow(activation_map.T, cmap="coolwarm", origin="lower", alpha=0.7)
plt.colorbar(label="Neuron activation frequency")
for i, x in enumerate(X_train):
winner = som.winner(x)
label = str(y_train[i]) if y_train is not None else "•"
plt.text(winner[0], winner[1], label, color="black", fontsize=8, ha="center", va="center",
bbox=dict(facecolor="white", edgecolor="black", boxstyle="round,pad=0.3"))
plt.title(f"SOM clustering - {DATASET.upper()} dataset")
plt.xticks(range(som_size[0]))
plt.yticks(range(som_size[1]))
plt.grid(color="black", linestyle="--", linewidth=0.5)
plt.show()8. Field‑Aware Factorization Machines (FFMs)
FFMs extend traditional Factorization Machines (FMs) to handle high‑dimensional, sparse data common in recommendation systems and online advertising (CTR prediction). While standard FMs assign a single latent vector to each feature, FFMs allocate a separate latent vector for each feature‑field pair, improving the modeling of interactions between different feature groups.
Application Scenarios
Recommendation systems used by Netflix, YouTube, Amazon, etc.
Advertising: predicting click‑through rates.
E‑commerce: enhancing product suggestions based on user behavior.
9. Conditional Random Fields (CRFs)
Conditional Random Fields are probabilistic models for structured prediction. Unlike independent classifiers, CRFs consider context, making them suitable for sequence data such as natural‑language processing, bioinformatics, and computer‑vision tasks.
Application Scenarios
Natural Language Processing: named‑entity recognition, part‑of‑speech tagging.
Bioinformatics: protein‑structure prediction.
Image segmentation in computer vision.
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
from sklearn_crfsuite import CRF, metrics
from sklearn.model_selection import train_test_split
# Example data (simplified NER format)
X = [[{'word': 'John'}, {'word': 'loves'}, {'word': 'Python'}],
[{'word': 'Alice'}, {'word': 'codes'}, {'word': 'in'}, {'word': 'Java'}]]
y = [['B-PER', 'O', 'B-LANG'],
['B-PER', 'O', 'O', 'B-LANG']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)
crf = CRF(algorithm='lbfgs', max_iterations=50)
crf.fit(X_train, y_train)
y_pred = crf.predict(X_test)
# Flatten for confusion matrix
y_test_flat = [label for seq in y_test for label in seq]
y_pred_flat = [label for seq in y_pred for label in seq]
labels = list(set(y_test_flat + y_pred_flat))
conf_matrix = confusion_matrix(y_test_flat, y_pred_flat, labels=labels)
plt.figure(figsize=(6, 5))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=labels, yticklabels=labels)
plt.title('CRF prediction confusion matrix')
plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.show()10. Extreme Learning Machines (ELMs)
Extreme Learning Machines are feed‑forward neural networks that randomly initialize hidden‑layer weights and only learn the output weights, achieving extremely fast training without back‑propagation. They are well‑suited for large‑scale classification and regression tasks where speed is critical.
Applicable Scenarios
When you need rapid training compared to deep learning.
Large datasets for classification or regression.
Shallow models (single hidden layer) are sufficient.
When fine‑tuning hidden‑layer weights is unnecessary.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, accuracy_score
import hpelm # high‑performance ELM
# Load Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Normalize features
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# One‑hot encode labels for ELM
y_train_onehot = np.eye(len(set(y)))[y_train]
y_test_onehot = np.eye(len(set(y)))[y_test]
# Define and train ELM
elm = hpelm.ELM(X_train.shape[1], y_train_onehot.shape[1])
elm.add_neurons(50, "sigm") # 50 hidden neurons with sigmoid activation
elm.train(X_train, y_train_onehot, "c") # 'c' denotes classification
# Predict
y_pred = elm.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
accuracy = accuracy_score(y_test, y_pred_classes)
print(f"Accuracy: {accuracy:.2f}")
# Confusion matrix
cm = confusion_matrix(y_test, y_pred_classes)
cm_percentage = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize=(7, 5))
sns.heatmap(cm_percentage, annot=True, fmt=".2%", cmap="coolwarm", linewidths=2,
xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.title('ELM confusion matrix (normalized)')
plt.show()Bonus: Variational Autoencoders (VAEs)
A Variational Autoencoder is a generative deep‑learning model that learns a probabilistic latent representation of input data and can generate new samples similar to the training set. By outputting a mean (μ) and variance (σ) for each latent dimension, VAEs sample from these distributions during decoding, enabling applications such as image generation, data augmentation, anomaly detection, and latent‑space exploration.
Code Mala Tang
Read source code together, write articles together, and enjoy spicy hot pot together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
