Discover 10 Underrated Machine Learning Algorithms That Can Supercharge Your Models

This article explores several powerful yet often overlooked machine‑learning techniques—including symbolic regression, isolation forest, Tsetlin machines, random kitchen sinks, field‑aware factorization machines, CRFs, ELMs, and VAEs—detailing their principles, code implementations, and real‑world application scenarios.

Code Mala Tang
Code Mala Tang
Code Mala Tang
Discover 10 Underrated Machine Learning Algorithms That Can Supercharge Your Models

When we talk about machine learning, we usually mention linear regression, decision trees, and neural networks, but beyond these well‑known models there are lesser‑known yet highly effective algorithms that can solve unique challenges with impressive efficiency. This article introduces a collection of underrated but useful machine‑learning algorithms that deserve a place in your toolbox.

1. Symbolic Regression

Unlike traditional regression models that assume a predefined equation, symbolic regression discovers the mathematical expression that best fits the data. In simple terms, it does not assume a form but uses genetic programming to evolve models through mutation and crossover, similar to natural selection.

# !pip install gplearn
import numpy as np
import matplotlib.pyplot as plt
from gplearn.genetic import SymbolicRegressor
# Generate example data
X = np.linspace(-10, 10, 100).reshape(-1, 1)
y = 3 * np.sin(X).ravel() + 2 * X.ravel() ** 2 - 4
# Initialize symbolic regressor
sr = SymbolicRegressor(population_size=2000,
                       generations=20,
                       stopping_criteria=0.01,
                       function_set=('add', 'sub', 'mul', 'div', 'sin', 'cos', 'sqrt', 'log'),
                       p_crossover=0.7,
                       random_state=42)
# Fit model
sr.fit(X, y)
# Predict
y_pred = sr.predict(X)
plt.scatter(X, y, color='black', label='Real data')
plt.plot(X, y_pred, color='red', label='Discovered function')
plt.legend()
plt.show()
Symbolic regression fitting result
Symbolic regression fitting result

Applications of Symbolic Regression

Discovering physical laws : Re‑derive underlying equations from experimental data.

Stock market prediction : Derive equations that model price movements.

Medical research : Uncover relationships between drugs and patient recovery.

Data‑science competitions : A hidden gem for Kaggle challenges.

2. Isolation Forest (iForest)

Isolation Forest is a tree‑based anomaly detection algorithm that isolates outliers faster than traditional clustering or density‑based methods (e.g., DBSCAN or One‑Class SVM). It does not model normal data; instead, it randomly partitions the feature space and isolates anomalies, performing well on high‑dimensional data without requiring labeled samples.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest
# Generate synthetic normal data
rng = np.random.RandomState(42)
X = 0.3 * rng.randn(100, 2)  # 100 normal points
# Add some outliers
X_outliers = rng.uniform(low=-4, high=4, size=(10, 2))  # 10 outliers
X = np.vstack([X, X_outliers])
iso_forest = IsolationForest(n_estimators=100, contamination=0.1, random_state=42)
y_pred = iso_forest.fit_predict(X)
plt.scatter(X[:, 0], X[:, 1], c=y_pred, cmap='coolwarm', edgecolors='k')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('Isolation Forest Anomaly Detection')
plt.show()
Isolation forest anomaly detection
Isolation forest anomaly detection

Suitable Scenarios

Fraud detection in credit‑card transactions.

Network intrusion or malware activity detection.

Quality‑control identification of defective products.

Rare disease or anomaly detection in health data.

Marking abnormal stock‑market activity for insider‑trading detection.

3. Tsetlin Machine (TM)

The Tsetlin Machine was introduced by Granmo in 2018 and is based on Tsetlin Automata . Unlike traditional models, it uses propositional logic to detect complex patterns, optimizing decisions through reward and penalty mechanisms. Its key advantages are low memory consumption and high learning speed, making it suitable for low‑power hardware and energy‑efficient AI applications.

Key Features

Low computational demand : Far fewer resources than deep‑learning models.

Easy to interpret : Generates human‑readable rules instead of opaque equations.

Ideal for small AI systems : Fits well on embedded devices.

More details can be found in the authors' GitHub repository and the original research paper.

4. Random Kitchen Sinks (RKS)

Kernel methods such as SVMs and Gaussian processes become costly on large datasets because of expensive kernel calculations. Random Kitchen Sinks provide an efficient trick to approximate kernel functions using random Fourier features, projecting data into a higher‑dimensional space without explicit kernel computation, thus enabling scalable learning.

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_moons
from sklearn.kernel_approximation import RBFSampler
# Generate non‑linearly separable data
X, y = make_moons(n_samples=1000, noise=0.2, random_state=42)
# Apply RKS for kernel approximation
rks = RBFSampler(gamma=1.0, n_components=500, random_state=42)
X_rks = rks.fit_transform(X)
# Visualize transformed space with PCA
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_rks)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y, edgecolors='k', alpha=0.6)
plt.title('RKS‑transformed data')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()
RKS clustering result
RKS clustering result

Applications

Accelerating SVM and kernel regression on large datasets.

Efficient approximation of RBF kernels for scalable learning.

Reducing memory and computation costs of nonlinear models.

5. Bayesian Optimization

Bayesian Optimization is a sequential, probabilistic optimization method for expensive functions such as hyper‑parameter tuning of deep‑learning or machine‑learning models. Instead of blind grid or random search, it builds a probabilistic model (e.g., Gaussian Process) of the objective and intelligently selects promising points.

Application Scenarios

Hyper‑parameter tuning : More efficient than grid/random search.

A/B testing : Finds the best variant without wasting resources.

AutoML : Powers tools like Google AutoML.

import numpy as np
from bayes_opt import BayesianOptimization
# Define objective function (e.g., maximize -x**2 * sin(x))
def objective_function(x):
    return -(x**2 * np.sin(x))
param_bounds = {'x': (-5, 5)}
optimizer = BayesianOptimization(f=objective_function, pbounds=param_bounds, random_state=42)
optimizer.maximize(init_points=5, n_iter=20)
print('Best parameters:', optimizer.max)
Bayesian optimization result
Bayesian optimization result

6. Hopfield Network

The Hopfield Network is a type of recurrent neural network (RNN) designed for pattern recognition and error correction. It stores binary patterns in memory; when presented with a noisy or incomplete input, it retrieves the closest stored pattern through auto‑association.

Application Scenarios

Memory recall systems for restoring corrupted images or missing data.

Error correction in telecommunications.

Neuroscience simulations of human memory processes.

7. Self‑Organizing Maps (SOMs)

A Self‑Organizing Map is an unsupervised neural network that projects high‑dimensional data onto a low‑dimensional (usually 2‑D) grid using competitive learning. Unlike back‑propagation networks, SOMs preserve the topological relationships of the input data, making them suitable for clustering, pattern recognition, and data exploration.

Application Scenarios

Market segmentation: identifying distinct customer groups.

Medical diagnosis: clustering patient symptoms to detect diseases.

Anomaly detection in manufacturing.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from minisom import MiniSom
from tensorflow.keras.datasets import mnist
DATASET = "mnist"
if DATASET == "mnist":
    (X_train, y_train), _ = mnist.load_data()
    X_train = X_train.reshape(X_train.shape[0], -1) / 255.0
    X_train, y_train = X_train[:1000], y_train[:1000]
elif DATASET == "wine":
    df = sns.load_dataset("wine_quality")
    X_train = df.drop(columns=["quality"]).values
    X_train = X_train / np.linalg.norm(X_train, axis=1, keepdims=True)
    y_train = df["quality"].values
elif DATASET == "customers":
    url = "https://raw.githubusercontent.com/MachineLearningWithPython/datasets/main/Mall_Customers.csv"
    df = pd.read_csv(url)
    X_train = df[["Annual Income (k$)", "Spending Score (1-100)"]].values
    X_train = X_train / np.linalg.norm(X_train, axis=1, keepdims=True)
    y_train = None
som_size = (10, 10)
som = MiniSom(som_size[0], som_size[1], X_train.shape[1], sigma=1.0, learning_rate=0.5)
som.random_weights_init(X_train)
som.train_random(X_train, 1000)
activation_map = np.zeros(som_size)
for x in X_train:
    winner = som.winner(x)
    activation_map[winner] += 1
plt.figure(figsize=(10, 8))
plt.imshow(activation_map.T, cmap="coolwarm", origin="lower", alpha=0.7)
plt.colorbar(label="Neuron activation frequency")
for i, x in enumerate(X_train):
    winner = som.winner(x)
    label = str(y_train[i]) if y_train is not None else "•"
    plt.text(winner[0], winner[1], label, color="black", fontsize=8, ha="center", va="center",
             bbox=dict(facecolor="white", edgecolor="black", boxstyle="round,pad=0.3"))
plt.title(f"SOM clustering - {DATASET.upper()} dataset")
plt.xticks(range(som_size[0]))
plt.yticks(range(som_size[1]))
plt.grid(color="black", linestyle="--", linewidth=0.5)
plt.show()
SOM clustering result
SOM clustering result

8. Field‑Aware Factorization Machines (FFMs)

FFMs extend traditional Factorization Machines (FMs) to handle high‑dimensional, sparse data common in recommendation systems and online advertising (CTR prediction). While standard FMs assign a single latent vector to each feature, FFMs allocate a separate latent vector for each feature‑field pair, improving the modeling of interactions between different feature groups.

Application Scenarios

Recommendation systems used by Netflix, YouTube, Amazon, etc.

Advertising: predicting click‑through rates.

E‑commerce: enhancing product suggestions based on user behavior.

9. Conditional Random Fields (CRFs)

Conditional Random Fields are probabilistic models for structured prediction. Unlike independent classifiers, CRFs consider context, making them suitable for sequence data such as natural‑language processing, bioinformatics, and computer‑vision tasks.

Application Scenarios

Natural Language Processing: named‑entity recognition, part‑of‑speech tagging.

Bioinformatics: protein‑structure prediction.

Image segmentation in computer vision.

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
from sklearn_crfsuite import CRF, metrics
from sklearn.model_selection import train_test_split
# Example data (simplified NER format)
X = [[{'word': 'John'}, {'word': 'loves'}, {'word': 'Python'}],
     [{'word': 'Alice'}, {'word': 'codes'}, {'word': 'in'}, {'word': 'Java'}]]
y = [['B-PER', 'O', 'B-LANG'],
     ['B-PER', 'O', 'O', 'B-LANG']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)
crf = CRF(algorithm='lbfgs', max_iterations=50)
crf.fit(X_train, y_train)
y_pred = crf.predict(X_test)
# Flatten for confusion matrix
y_test_flat = [label for seq in y_test for label in seq]
y_pred_flat = [label for seq in y_pred for label in seq]
labels = list(set(y_test_flat + y_pred_flat))
conf_matrix = confusion_matrix(y_test_flat, y_pred_flat, labels=labels)
plt.figure(figsize=(6, 5))
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=labels, yticklabels=labels)
plt.title('CRF prediction confusion matrix')
plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.show()
CRF confusion matrix
CRF confusion matrix

10. Extreme Learning Machines (ELMs)

Extreme Learning Machines are feed‑forward neural networks that randomly initialize hidden‑layer weights and only learn the output weights, achieving extremely fast training without back‑propagation. They are well‑suited for large‑scale classification and regression tasks where speed is critical.

Applicable Scenarios

When you need rapid training compared to deep learning.

Large datasets for classification or regression.

Shallow models (single hidden layer) are sufficient.

When fine‑tuning hidden‑layer weights is unnecessary.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix, accuracy_score
import hpelm  # high‑performance ELM
# Load Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Normalize features
scaler = StandardScaler()
X = scaler.fit_transform(X)
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# One‑hot encode labels for ELM
y_train_onehot = np.eye(len(set(y)))[y_train]
y_test_onehot = np.eye(len(set(y)))[y_test]
# Define and train ELM
elm = hpelm.ELM(X_train.shape[1], y_train_onehot.shape[1])
elm.add_neurons(50, "sigm")  # 50 hidden neurons with sigmoid activation
elm.train(X_train, y_train_onehot, "c")  # 'c' denotes classification
# Predict
y_pred = elm.predict(X_test)
y_pred_classes = np.argmax(y_pred, axis=1)
accuracy = accuracy_score(y_test, y_pred_classes)
print(f"Accuracy: {accuracy:.2f}")
# Confusion matrix
cm = confusion_matrix(y_test, y_pred_classes)
cm_percentage = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
plt.figure(figsize=(7, 5))
sns.heatmap(cm_percentage, annot=True, fmt=".2%", cmap="coolwarm", linewidths=2,
            xticklabels=iris.target_names, yticklabels=iris.target_names)
plt.xlabel('Predicted label')
plt.ylabel('True label')
plt.title('ELM confusion matrix (normalized)')
plt.show()
ELM confusion matrix
ELM confusion matrix

Bonus: Variational Autoencoders (VAEs)

A Variational Autoencoder is a generative deep‑learning model that learns a probabilistic latent representation of input data and can generate new samples similar to the training set. By outputting a mean (μ) and variance (σ) for each latent dimension, VAEs sample from these distributions during decoding, enabling applications such as image generation, data augmentation, anomaly detection, and latent‑space exploration.

VAE architecture
VAE architecture
machine learningAlgorithmsvariational autoencoderisolation forestsymbolic regressiontsetlin machine
Code Mala Tang
Written by

Code Mala Tang

Read source code together, write articles together, and enjoy spicy hot pot together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.