Tagged articles
95 articles
Page 1 of 1
Data Party THU
Data Party THU
May 19, 2026 · Artificial Intelligence

Model Performance Lagging? Master Feature Engineering with a Complete Step‑by‑Step Guide

This article walks through the entire feature‑engineering pipeline—data cleaning, missing‑value imputation, encoding, outlier handling, scaling, feature construction, and selection—using Pandas and Scikit‑learn, and shows how to wrap the steps into a reproducible Scikit‑learn Pipeline.

Pipelinedata preprocessingfeature engineering
0 likes · 9 min read
Model Performance Lagging? Master Feature Engineering with a Complete Step‑by‑Step Guide
IT Services Circle
IT Services Circle
May 15, 2026 · Artificial Intelligence

Why Your Validation Set Fails: Outliers Are Skewing Your Data

The article explains how outliers can dramatically distort training and validation results in machine learning, outlines practical detection methods such as business rules, Z‑Score, IQR and Isolation Forest, and demonstrates cleaning techniques with a complete house‑price prediction case study in Python.

Isolation ForestPythondata cleaning
0 likes · 19 min read
Why Your Validation Set Fails: Outliers Are Skewing Your Data
DeepHub IMBA
DeepHub IMBA
May 12, 2026 · Artificial Intelligence

Hands‑On Feature Engineering with Pandas and Scikit‑Learn: Complete Code Walkthrough

This article walks through a full feature‑engineering pipeline using Pandas and Scikit‑Learn, covering data inspection, missing‑value imputation, categorical encoding, outlier handling, scaling, feature construction, selection, and a final Pipeline that prepares clean, predictive features for a logistic‑regression model.

Pipelinedata preprocessingfeature engineering
0 likes · 9 min read
Hands‑On Feature Engineering with Pandas and Scikit‑Learn: Complete Code Walkthrough
Data Party THU
Data Party THU
Apr 9, 2026 · Fundamentals

Mastering Numeric Feature Scaling: 4 Techniques with Scikit‑Learn

This article explains why numeric feature engineering is essential for machine learning, outlines the challenges of differing scales and outliers, and demonstrates four preprocessing methods—Standardization, Robust Scaler, Power Transformer, and Normalization—using the California housing dataset with detailed code examples and visual analysis.

feature scalingnormalizationnumeric preprocessing
0 likes · 11 min read
Mastering Numeric Feature Scaling: 4 Techniques with Scikit‑Learn
DeepHub IMBA
DeepHub IMBA
Apr 6, 2026 · Artificial Intelligence

Mastering Machine Learning Feature Engineering: Scaling, Encoding, Aggregation, Embedding, and Automation

The article explains why good features matter more than fancy algorithms and walks through practical techniques—scaling, log transforms, binning, interaction, various encoding schemes, datetime extraction, text statistics, geospatial distances, aggregation, feature selection, and automated feature generation—illustrated with concrete pandas and scikit‑learn code examples.

automationencodingfeature engineering
0 likes · 16 min read
Mastering Machine Learning Feature Engineering: Scaling, Encoding, Aggregation, Embedding, and Automation
DeepHub IMBA
DeepHub IMBA
Mar 22, 2026 · Artificial Intelligence

Four Numeric Scaling Techniques: When to Use Standard, Robust, Power, and Min‑Max

This article explains why numeric feature engineering is essential for machine‑learning models, outlines the two main challenges of differing magnitudes and outliers, and demonstrates four scaling methods—StandardScaler, RobustScaler, PowerTransformer, and MinMaxScaler—using the California housing dataset, complete with code, visualizations, and guidance on when each method is appropriate.

feature scalingmin-max scalingpower transformer
0 likes · 13 min read
Four Numeric Scaling Techniques: When to Use Standard, Robust, Power, and Min‑Max
Data Party THU
Data Party THU
Oct 30, 2025 · Artificial Intelligence

How to Generate Realistic Synthetic Data with Histograms and GMMs

This article explains two practical techniques—histogram‑based per‑column synthesis and Gaussian‑Mixture‑Model generation—for creating large, privacy‑preserving synthetic datasets that retain the statistical distributions and inter‑column relationships of the original data, and shows how to evaluate their quality.

Data GenerationGaussian mixture modelPython
0 likes · 27 min read
How to Generate Realistic Synthetic Data with Histograms and GMMs
Data STUDIO
Data STUDIO
Sep 15, 2025 · Artificial Intelligence

Understanding Linear and Logistic Regression: From MSE to Cross‑Entropy

The article explains linear regression and logistic regression fundamentals, covering loss functions such as mean‑squared error and cross‑entropy, analytic solutions, feature expansion for non‑linear separability, and provides Python code examples to illustrate the concepts.

Pythoncross entropylinear regression
0 likes · 7 min read
Understanding Linear and Logistic Regression: From MSE to Cross‑Entropy
Data STUDIO
Data STUDIO
Sep 9, 2025 · Artificial Intelligence

10 Hidden Sklearn Features That Boost Your ML Pipelines

This article walks through ten lesser‑known Scikit‑learn utilities—including FunctionTransformer, custom estimators, TransformedTargetRegressor, HTML estimator visualisation, QuadraticDiscriminantAnalysis, Voting and Stacking ensembles, LocalOutlierFactor with UMAP, QuantileTransformer, and a PCA‑tSNE/UMAP workflow—showing concrete code examples, performance numbers and practical tips for more efficient and robust machine‑learning pipelines.

FunctionTransformerLocalOutlierFactorPCA
0 likes · 17 min read
10 Hidden Sklearn Features That Boost Your ML Pipelines
Data STUDIO
Data STUDIO
Sep 5, 2025 · Artificial Intelligence

19 Elegant Sklearn Tricks for More Efficient Machine Learning

This article presents 19 practical Sklearn functions—ranging from outlier detection to hyper‑parameter search—that replace manual data‑science steps, each illustrated with concise code examples and performance comparisons.

Model EvaluationPipelinedata preprocessing
0 likes · 24 min read
19 Elegant Sklearn Tricks for More Efficient Machine Learning
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 17, 2025 · Artificial Intelligence

How to Build a House Price Prediction Model with Python: A Step‑by‑Step Guide

This tutorial walks developers through the complete workflow of building a house‑price regression model—from problem definition, data collection and preprocessing, feature engineering, and model selection, to training, hyper‑parameter tuning, evaluation, optimization, deployment as a Flask service, and ongoing monitoring—using Python, pandas, scikit‑learn, and visualisation libraries.

Model DeploymentPythonfeature engineering
0 likes · 29 min read
How to Build a House Price Prediction Model with Python: A Step‑by‑Step Guide
Python Programming Learning Circle
Python Programming Learning Circle
Jul 8, 2025 · Artificial Intelligence

10 One‑Line Python Tricks to Jump‑Start Your Machine Learning Projects

This article presents ten concise, practical one‑line Python code snippets—ranging from loading CSV data with Pandas to building sophisticated Scikit‑learn pipelines—that streamline common machine‑learning tasks such as data cleaning, encoding, splitting, scaling, model training, evaluation, cross‑validation, and prediction.

PipelinePythondata preprocessing
0 likes · 10 min read
10 One‑Line Python Tricks to Jump‑Start Your Machine Learning Projects
Liangxu Linux
Liangxu Linux
May 19, 2025 · Fundamentals

Why PCA Transforms High‑Dimensional Data into Simple Insights (with Python)

This article demystifies Principal Component Analysis by explaining its intuition, the role of variance, step‑by‑step visual analogies, the mathematical foundation, and a complete Python implementation using scikit‑learn, including data generation, scaling, fitting, scree plot visualization, component interpretation, and dimensionality reduction to two principal components.

Data visualizationPCAPython
0 likes · 16 min read
Why PCA Transforms High‑Dimensional Data into Simple Insights (with Python)
php Courses
php Courses
May 15, 2025 · Artificial Intelligence

Why Python Dominates Data Analysis and Machine Learning: Core Tools, Full‑Stack Solutions, and Learning Path

This article explains why Python has become the leading language for data analysis and machine learning, outlines the essential libraries and frameworks, provides practical code examples, describes typical application scenarios, suggests a staged learning roadmap, and forecasts future trends such as AutoML and federated learning.

AutoMLPyTorchPython
0 likes · 6 min read
Why Python Dominates Data Analysis and Machine Learning: Core Tools, Full‑Stack Solutions, and Learning Path
AI Code to Success
AI Code to Success
Mar 13, 2025 · Artificial Intelligence

Unlocking K-Nearest Neighbors: Theory, Implementation, and Real-World Tips

This article provides a comprehensive guide to the K‑Nearest Neighbors algorithm, covering its intuitive principle, step‑by‑step workflow, distance metrics, strategies for selecting the optimal K via cross‑validation, Python implementation with scikit‑learn, advantages, limitations, and diverse application scenarios.

Pythonclassificationcross-validation
0 likes · 24 min read
Unlocking K-Nearest Neighbors: Theory, Implementation, and Real-World Tips
AI Code to Success
AI Code to Success
Mar 12, 2025 · Artificial Intelligence

Mastering K‑Means: Theory, Implementation, and Real‑World Applications

This comprehensive guide explores the K‑Means clustering algorithm, covering its mathematical foundation, step‑by‑step procedure, centroid initialization strategies, practical implementation with Python’s Scikit‑learn on the Iris dataset, evaluation metrics, optimization techniques, and diverse applications ranging from image segmentation to bioinformatics.

K-MeansPythonalgorithm
0 likes · 31 min read
Mastering K‑Means: Theory, Implementation, and Real‑World Applications
AI Code to Success
AI Code to Success
Feb 27, 2025 · Artificial Intelligence

Master Decision Trees: Theory, Construction, and Python Implementation

This article provides a comprehensive guide to decision tree algorithms, covering their theoretical foundations, key components, construction workflow—including data preprocessing, feature selection, tree growth, stopping criteria, and pruning—followed by an overview of popular variants like ID3, C4.5, CART, practical advantages, applications, and a complete Python implementation using scikit-learn.

Pythonclassificationdata preprocessing
0 likes · 29 min read
Master Decision Trees: Theory, Construction, and Python Implementation
AI Code to Success
AI Code to Success
Feb 25, 2025 · Artificial Intelligence

Master Logistic Regression: Theory, Practice, and Real‑World Tips

This comprehensive guide explains logistic regression fundamentals, the role of the Sigmoid function, loss and optimization methods, step‑by‑step Python implementation with data preparation, model training, evaluation, hyper‑parameter tuning, handling over‑ and under‑fitting, multi‑class extensions, and diverse application scenarios across medicine, finance, e‑commerce, and text analysis.

Model EvaluationPythonclassification
0 likes · 23 min read
Master Logistic Regression: Theory, Practice, and Real‑World Tips
Python Programming Learning Circle
Python Programming Learning Circle
Feb 8, 2025 · Artificial Intelligence

Random Forest Classification with PCA and Hyper‑Parameter Tuning on the Breast Cancer Dataset

This tutorial walks through loading the scikit‑learn breast‑cancer dataset, preprocessing it, building baseline and PCA‑reduced Random Forest models, applying RandomizedSearchCV and GridSearchCV for hyper‑parameter optimization, and evaluating the final models using recall as the primary metric.

Breast CancerPCARandom Forest
0 likes · 17 min read
Random Forest Classification with PCA and Hyper‑Parameter Tuning on the Breast Cancer Dataset
Test Development Learning Exchange
Test Development Learning Exchange
Dec 5, 2024 · Artificial Intelligence

End-to-End House Prices Prediction Project: Data Collection, Preprocessing, Modeling, Evaluation, and Deployment with Python

This tutorial walks through a complete house price prediction project, covering data collection from Kaggle, preprocessing with pandas and scikit‑learn, model training using RandomForestRegressor, evaluation, and deployment of a Flask API for real‑time predictions, providing full code examples.

FlaskModel DeploymentPython
0 likes · 9 min read
End-to-End House Prices Prediction Project: Data Collection, Preprocessing, Modeling, Evaluation, and Deployment with Python
Test Development Learning Exchange
Test Development Learning Exchange
Nov 21, 2024 · Artificial Intelligence

Data Preprocessing: Standardization, Normalization, and Missing Value Imputation with Python

This tutorial demonstrates how to perform essential data preprocessing techniques—including standardization, min‑max normalization, and various missing‑value imputation methods—using pandas and scikit‑learn in Python, providing code examples and explanations to help you prepare datasets for machine‑learning models.

Pythonmissing value imputationnormalization
0 likes · 6 min read
Data Preprocessing: Standardization, Normalization, and Missing Value Imputation with Python
IT Services Circle
IT Services Circle
Sep 8, 2024 · Artificial Intelligence

10 Essential Plots for Linear Regression with Python Code Examples

This tutorial explains ten crucial visualizations for linear regression—scatter plot, trend line, residual plot, normal probability plot, learning curve, bias‑variance tradeoff, residuals vs fitted, partial regression, leverage, and Cook's distance—each illustrated with clear Python code using scikit‑learn, matplotlib, seaborn, and statsmodels.

Data visualizationMatplotlibModel Evaluation
0 likes · 21 min read
10 Essential Plots for Linear Regression with Python Code Examples
IT Services Circle
IT Services Circle
Jul 9, 2024 · Artificial Intelligence

Comparative Study of Classification Algorithms and Calibration Using Synthetic Data

This article presents a comprehensive case study that explains classification principles, shows the key formulas for logistic regression and SVM, and provides a full Python implementation that generates synthetic data, trains multiple classifiers, calibrates them, and visualizes calibration curves and probability histograms.

CalibrationPythonclassification
0 likes · 6 min read
Comparative Study of Classification Algorithms and Calibration Using Synthetic Data
Python Programming Learning Circle
Python Programming Learning Circle
Jun 21, 2024 · Artificial Intelligence

Using scikit-learn for Data Mining: Feature Engineering, Parallel Processing, Pipelines, and Model Persistence

This article demonstrates how to perform data mining with scikit-learn by detailing the full workflow—from data acquisition and feature engineering, through parallel and pipeline processing, to automated hyper‑parameter tuning and model persistence—using the Iris dataset as an example.

Pipelinedata miningfeature engineering
0 likes · 13 min read
Using scikit-learn for Data Mining: Feature Engineering, Parallel Processing, Pipelines, and Model Persistence
Test Development Learning Exchange
Test Development Learning Exchange
May 21, 2024 · Artificial Intelligence

Step-by-Step Data Analysis and Machine Learning Workflow with Pandas, Matplotlib, and Scikit-learn

This guide walks through loading CSV data with pandas, cleaning missing values, filtering, grouping, visualizing, performing correlation and time‑series analysis, detecting outliers, and applying linear and logistic regression models using scikit‑learn, all illustrated with complete Python code snippets.

data cleaningmachine learningpandas
0 likes · 6 min read
Step-by-Step Data Analysis and Machine Learning Workflow with Pandas, Matplotlib, and Scikit-learn
Sohu Tech Products
Sohu Tech Products
Mar 6, 2024 · Artificial Intelligence

Mastering Regression: A Comprehensive Guide to Linear and Non‑Linear Models

This article provides an in‑depth overview of regression prediction, covering linear models like OLS, Lasso, Ridge, and Bayesian approaches, as well as non‑linear techniques such as tree ensembles, SVR, KNN, neural networks, and advanced deep learning frameworks for tabular data.

Deep Learninggradient boostinglinear models
0 likes · 13 min read
Mastering Regression: A Comprehensive Guide to Linear and Non‑Linear Models
IT Services Circle
IT Services Circle
Mar 6, 2024 · Artificial Intelligence

Comprehensive Overview of Ten Regression Algorithms with Core Concepts and Code Examples

This article provides a comprehensive summary of ten regression algorithms—including linear, ridge, Lasso, decision tree, random forest, gradient boosting, SVR, XGBoost, LightGBM, and neural network regression—detailing their principles, advantages, disadvantages, suitable scenarios, and offering core Python code examples for each.

Pythongradient boostingmachine learning
0 likes · 33 min read
Comprehensive Overview of Ten Regression Algorithms with Core Concepts and Code Examples
Test Development Learning Exchange
Test Development Learning Exchange
Jan 23, 2024 · Fundamentals

Common Data Preprocessing Techniques with Python Code Examples

This article presents ten essential data preprocessing methods—including handling missing values, type conversion, standardization, encoding, smoothing, outlier treatment, text cleaning, word frequency counting, sentiment analysis, and topic modeling—each explained with clear Python code snippets.

Pythondata cleaningdata preprocessing
0 likes · 9 min read
Common Data Preprocessing Techniques with Python Code Examples
Test Development Learning Exchange
Test Development Learning Exchange
Oct 19, 2023 · Artificial Intelligence

Common Machine Learning Algorithms for Data Prediction with Python Code Examples

This article introduces ten widely used machine learning algorithms for data prediction, explains their core concepts, and provides complete Python code snippets using scikit‑learn and related libraries to help readers implement regression, classification, and time‑series forecasting tasks.

Pythonclassificationdata prediction
0 likes · 12 min read
Common Machine Learning Algorithms for Data Prediction with Python Code Examples
Model Perspective
Model Perspective
Aug 23, 2023 · Artificial Intelligence

Master Logistic Regression: Binary, Multiclass, and Ordered Extensions with Python

This article explains logistic regression and its extensions—binary, multiclass (softmax), and ordered logistic regression—covering mathematical foundations, optimization objectives, real‑world applications, and Python implementations using scikit‑learn with code examples and visual illustrations.

Pythonbinary classificationlogistic regression
0 likes · 15 min read
Master Logistic Regression: Binary, Multiclass, and Ordered Extensions with Python
Model Perspective
Model Perspective
Mar 22, 2023 · Artificial Intelligence

Master DBSCAN Clustering: Theory, Python Code, and Real-World Examples

DBSCAN is a density‑based clustering algorithm that automatically discovers arbitrarily shaped clusters and isolates noise, with detailed explanations of core, border, and noise points, step‑by‑step examples, Python implementations using scikit‑learn, and guidance on key parameters such as eps and min_samples.

DBSCANPythonclustering
0 likes · 10 min read
Master DBSCAN Clustering: Theory, Python Code, and Real-World Examples
Model Perspective
Model Perspective
Mar 21, 2023 · Artificial Intelligence

Master Linear Discriminant Analysis (LDA) with Python: Theory & Code

This article explains Linear Discriminant Analysis (LDA) as a pattern‑recognition technique that projects data onto a low‑dimensional space to maximize class separation, details its mathematical formulation with between‑class and within‑class scatter matrices, and provides a complete Python implementation using scikit‑learn on the Iris dataset, including visualization of the results.

LDALinear Discriminant AnalysisPython
0 likes · 6 min read
Master Linear Discriminant Analysis (LDA) with Python: Theory & Code
Model Perspective
Model Perspective
Mar 20, 2023 · Artificial Intelligence

Master Feature Selection with Recursive Elimination (RFE) in Python

Feature Recursive Elimination (RFE) is a powerful feature‑selection technique that iteratively trains a model, discards the weakest features, and repeats until a desired number of features remains, helping prevent overfitting and improve model performance, illustrated with a complete Python example using scikit‑learn.

Pythonfeature selectionrecursive elimination
0 likes · 6 min read
Master Feature Selection with Recursive Elimination (RFE) in Python
Model Perspective
Model Perspective
Mar 19, 2023 · Artificial Intelligence

Master Data Sampling Techniques in Python for Machine Learning

This article explains common data sampling methods—random, stratified, oversampling, undersampling, and adaptive sampling—and provides Python code examples using scikit-learn and imbalanced-learn to implement each technique on the Iris dataset and synthetic data.

data samplingoversamplingscikit-learn
0 likes · 11 min read
Master Data Sampling Techniques in Python for Machine Learning
Python Programming Learning Circle
Python Programming Learning Circle
Dec 31, 2022 · Artificial Intelligence

A Beginner’s Guide to Data Preprocessing for Machine Learning in Python

This tutorial walks beginners through the essential steps of data preprocessing for any machine learning model, covering library imports, dataset loading, handling missing values, encoding categorical features, splitting into train‑test sets, and applying feature scaling using Python’s scikit‑learn.

Pythondata preprocessingfeature scaling
0 likes · 11 min read
A Beginner’s Guide to Data Preprocessing for Machine Learning in Python
Model Perspective
Model Perspective
Dec 30, 2022 · Fundamentals

How PCA Transforms Supplier Evaluation with Weighted Scores

This article explains the Principal Component Analysis (PCA) method, outlines its step‑by‑step weighting algorithm, and demonstrates a complete Python implementation that converts supplier metrics into objective scores using scikit‑learn.

PCAPythondata analysis
0 likes · 9 min read
How PCA Transforms Supplier Evaluation with Weighted Scores
Python Programming Learning Circle
Python Programming Learning Circle
Oct 20, 2022 · Artificial Intelligence

Overview of Common Python AI Libraries with Code Examples

This article provides a concise introduction to a wide range of popular Python libraries for artificial intelligence and data science, such as NumPy, OpenCV, scikit-image, Pillow, Scikit-learn, TensorFlow, PyTorch, and many others, accompanied by practical code snippets and performance comparisons.

Artificial IntelligenceNumPyOpenCV
0 likes · 33 min read
Overview of Common Python AI Libraries with Code Examples
MaGe Linux Operations
MaGe Linux Operations
Oct 1, 2022 · Artificial Intelligence

11 Powerful Feature Selection Techniques Every Data Scientist Should Master

This guide walks through a comprehensive set of feature‑selection strategies—from removing unused or missing columns to handling multicollinearity, low‑variance features, and using PCA—complete with Python code examples and visualizations to help you build leaner, more interpretable machine‑learning models.

Pythondata preprocessingdimensionality reduction
0 likes · 18 min read
11 Powerful Feature Selection Techniques Every Data Scientist Should Master
MaGe Linux Operations
MaGe Linux Operations
Sep 8, 2022 · Artificial Intelligence

Master 10 Popular Clustering Algorithms in Python with Scikit‑Learn

This tutorial introduces unsupervised clustering, explains its purpose, and walks through installing scikit‑learn and implementing ten popular clustering algorithms—including AffinityPropagation, Agglomerative, BIRCH, DBSCAN, K‑Means, Mini‑Batch K‑Means, MeanShift, OPTICS, Spectral Clustering, and Gaussian Mixture—complete with code examples and visualizations.

Unsupervised Learningclusteringdata mining
0 likes · 27 min read
Master 10 Popular Clustering Algorithms in Python with Scikit‑Learn
MaGe Linux Operations
MaGe Linux Operations
Jul 29, 2022 · Artificial Intelligence

Master 10 Popular Clustering Algorithms in Python with Scikit‑Learn

This tutorial introduces clustering, explains why no single algorithm fits all data, and provides step‑by‑step Python examples using scikit‑learn for ten popular unsupervised learning methods, complete with code snippets and visualizations to illustrate results.

PythonUnsupervised Learningclustering
0 likes · 24 min read
Master 10 Popular Clustering Algorithms in Python with Scikit‑Learn
Model Perspective
Model Perspective
Jun 18, 2022 · Artificial Intelligence

Understanding Support Vector Machines: Theory, Example, and Python Code

This article explains the fundamentals of Support Vector Machines, describes how they separate data with optimal hyperplanes, provides a 2‑D example with visualizations, and includes Python code using scikit‑learn to generate synthetic data, plot points, and illustrate possible decision boundaries.

Support Vector Machineclassificationmachine learning
0 likes · 4 min read
Understanding Support Vector Machines: Theory, Example, and Python Code
Python Programming Learning Circle
Python Programming Learning Circle
Apr 19, 2022 · Artificial Intelligence

Step‑by‑Step Guide to Building Machine Learning Models with Scikit‑learn Templates

This article introduces a practical, step‑by‑step tutorial on building machine learning models with scikit‑learn, covering problem types, dataset loading, splitting, and a series of reusable templates (V1.0, V2.0, V3.0) for classification, regression, clustering, cross‑validation, and hyper‑parameter tuning, complete with code examples.

Pythonclassificationcross-validation
0 likes · 17 min read
Step‑by‑Step Guide to Building Machine Learning Models with Scikit‑learn Templates
Python Programming Learning Circle
Python Programming Learning Circle
Apr 14, 2022 · Artificial Intelligence

Top Clustering Algorithms in Python with scikit-learn: A Comprehensive Tutorial

This tutorial explains clustering as an unsupervised learning task, outlines why no single algorithm fits all data, and provides step‑by‑step Python code using scikit‑learn to install the library, generate synthetic datasets, and apply ten popular clustering algorithms with visualizations.

PythonUnsupervised Learningclustering
0 likes · 21 min read
Top Clustering Algorithms in Python with scikit-learn: A Comprehensive Tutorial
IT Services Circle
IT Services Circle
Mar 23, 2022 · Artificial Intelligence

Local Outlier Factor (LOF) Algorithm: Theory, Workflow, Pros & Cons, and Python Implementation

This article introduces the classic density‑based anomaly detection method Local Outlier Factor (LOF), explains its underlying concepts such as k‑distance, reachability distance, and local reachability density, outlines the algorithm steps, discusses its advantages and limitations, and provides practical Python examples using PyOD and scikit‑learn.

LOFPythonanomaly detection
0 likes · 10 min read
Local Outlier Factor (LOF) Algorithm: Theory, Workflow, Pros & Cons, and Python Implementation
Code DAO
Code DAO
Jan 15, 2022 · Artificial Intelligence

Improving Class Imbalance in Machine Learning with Class Weights: A Python Logistic Regression Walkthrough

The article demonstrates, with Python code, how applying class_weight—first using the default logistic regression, then the balanced option, and finally manually tuned weights via grid search—can raise the F1 score from 0 to about 0.16 on imbalanced data, and discusses further techniques such as feature engineering and threshold adjustment.

F1 scorePythonclass weight
0 likes · 7 min read
Improving Class Imbalance in Machine Learning with Class Weights: A Python Logistic Regression Walkthrough
Code DAO
Code DAO
Jan 1, 2022 · Artificial Intelligence

Automating Machine Learning Workflows with Scikit‑Learn Pipelines

This article demonstrates how to build a reproducible fraud‑detection workflow using scikit‑learn's Pipeline class, comparing a manual script with a pipeline‑based approach on the IEEE‑CIS Kaggle dataset and showing the benefits of modular, repeatable ML code.

PipelinePythonfraud detection
0 likes · 8 min read
Automating Machine Learning Workflows with Scikit‑Learn Pipelines
Code DAO
Code DAO
Dec 26, 2021 · Artificial Intelligence

Building a Vector‑Based Movie Recommendation System with Transformers

This tutorial walks through constructing a movie recommendation engine by downloading a dataset, cleaning and de‑duplicating entries, encoding plot summaries into vectors with transformer models, and performing nearest‑neighbor searches using scikit‑learn, while handling misspellings with Levenshtein distance.

Levenshtein distanceTransformersmovie recommendation
0 likes · 8 min read
Building a Vector‑Based Movie Recommendation System with Transformers
Code DAO
Code DAO
Dec 18, 2021 · Artificial Intelligence

Implement Random Forest Regression in Python using Scikit-Learn

This article explains the fundamentals of random forest regression, describes why it outperforms single decision trees for nonlinear or noisy data, defines bootstrapping and bagging, and provides a step‑by‑step Python example using NumPy, Pandas, and Scikit‑Learn’s RandomForestRegressor with data loading, preprocessing, model training, prediction, and evaluation via MSE and R².

BootstrappingPythonRandom Forest
0 likes · 6 min read
Implement Random Forest Regression in Python using Scikit-Learn
Code DAO
Code DAO
Dec 11, 2021 · Artificial Intelligence

How to Optimize Machine Learning Hyperparameters with GridSearchCV

This article explains how GridSearchCV automates hyperparameter tuning for machine‑learning models, demonstrates its use with a RandomForest classifier on the breast‑cancer dataset—including code, cross‑validation, best‑parameter results, and discusses its advantages and scalability limits.

GridSearchCVRandomForestcross-validation
0 likes · 6 min read
How to Optimize Machine Learning Hyperparameters with GridSearchCV
Code DAO
Code DAO
Dec 7, 2021 · Artificial Intelligence

How to Cluster Text with TF‑IDF, KMeans and PCA in Python

This article walks through a complete Python workflow that loads the 20 Newsgroups dataset, preprocesses the documents, vectorizes them with TF‑IDF, groups them using KMeans, reduces dimensions with PCA, and visualizes the resulting clusters, illustrating each step with code and plots.

KMeansNLPPCA
0 likes · 13 min read
How to Cluster Text with TF‑IDF, KMeans and PCA in Python
Code DAO
Code DAO
Dec 3, 2021 · Artificial Intelligence

SMOTE Techniques for Handling Imbalanced Classification in Machine Learning

This article explains the SMOTE oversampling method for imbalanced classification, demonstrates how to generate synthetic minority samples, evaluates models with and without SMOTE using scikit‑learn pipelines, and explores advanced variants such as Borderline‑SMOTE, SVMSMOTE and ADASYN with concrete code examples and benchmark results.

SMOTEclassificationimbalanced learning
0 likes · 24 min read
SMOTE Techniques for Handling Imbalanced Classification in Machine Learning
Code DAO
Code DAO
Nov 29, 2021 · Artificial Intelligence

Feature Selection: Reducing Input Variables for Predictive Modeling

This article explains the purpose and types of feature selection, compares supervised and unsupervised, wrapper, filter, and embedded methods, discusses choosing statistical metrics based on variable types, and provides scikit‑learn code examples for regression and classification tasks.

embedded methodsfeature selectionfilter methods
0 likes · 12 min read
Feature Selection: Reducing Input Variables for Predictive Modeling
Python Programming Learning Circle
Python Programming Learning Circle
Aug 24, 2021 · Artificial Intelligence

Top 10 Python Libraries for Machine Learning

An overview of ten widely used Python machine‑learning libraries—including TensorFlow, Scikit‑Learn, NumPy, Keras, PyTorch, LightGBM, Eli5, SciPy, Theano, and Pandas—detailing their core features, typical applications, and why they are essential tools for data scientists and AI developers.

KerasNumPyPyTorch
0 likes · 15 min read
Top 10 Python Libraries for Machine Learning
Python Crawling & Data Mining
Python Crawling & Data Mining
Jun 19, 2021 · Fundamentals

Essential Python Data Analysis Libraries You Must Know

This article provides a concise overview of key Python data‑analysis libraries—including NumPy, pandas, matplotlib, IPython/Jupyter, SciPy, scikit‑learn, and statsmodels—explaining their core features, typical use cases, and how they interoperate to form a powerful scientific computing ecosystem.

MatplotlibNumPyPython
0 likes · 12 min read
Essential Python Data Analysis Libraries You Must Know
Python Programming Learning Circle
Python Programming Learning Circle
May 8, 2021 · Artificial Intelligence

Top 10 New Features in Scikit‑learn 0.24

The article reviews the most important additions in scikit‑learn 0.24, including faster hyper‑parameter search methods, ICE plots, histogram‑based boosting improvements, new feature‑selection tools, polynomial‑feature approximations, a semi‑supervised classifier, MAPE metric, enhanced OneHotEncoder and OrdinalEncoder handling, and a more flexible RFE interface.

Model EvaluationPythondata preprocessing
0 likes · 8 min read
Top 10 New Features in Scikit‑learn 0.24
DataFunSummit
DataFunSummit
Mar 28, 2021 · Artificial Intelligence

Deploying Scikit‑learn and HMMlearn Models as High‑Performance Online Prediction Services Using ONNX

This article demonstrates how to convert traditional scikit‑learn and hmmlearn machine‑learning models into ONNX format and integrate them into a C++ gRPC service for fast online inference, covering environment setup, model conversion, custom operators, performance testing, and end‑to‑end pipeline construction.

Model DeploymentONNXPython
0 likes · 22 min read
Deploying Scikit‑learn and HMMlearn Models as High‑Performance Online Prediction Services Using ONNX
DataFunTalk
DataFunTalk
Mar 9, 2021 · Artificial Intelligence

Introduction to Common Machine Learning Algorithms with Python Implementations

This article introduces the three main categories of machine learning—supervised, unsupervised, and reinforcement learning—detailing common algorithms such as Linear Regression, Logistic Regression, Naive Bayes, K‑Nearest Neighbors, Decision Trees, Random Forests, SVM, K‑Means, and PCA, and provides concise Python code examples using scikit‑learn for each.

PythonReinforcement LearningUnsupervised Learning
0 likes · 18 min read
Introduction to Common Machine Learning Algorithms with Python Implementations
Python Crawling & Data Mining
Python Crawling & Data Mining
Jan 24, 2021 · Fundamentals

Master Python Data Analysis: From Reading Files to Visualization

This guide walks you through the complete Python data‑analysis workflow—reading and writing data, processing with NumPy and pandas, modeling with statsmodels and scikit‑learn, and visualizing results with Matplotlib—while highlighting the key tools and learning path for beginners and busy professionals alike.

NumPyPythondata analysis
0 likes · 6 min read
Master Python Data Analysis: From Reading Files to Visualization
Python Programming Learning Circle
Python Programming Learning Circle
Dec 16, 2020 · Artificial Intelligence

Linear Regression Theory and Python Implementation with Iris and Boston Datasets

This article explains the fundamentals of linear regression, including regression formulas, loss functions, and error metrics, and provides complete Python code using scikit‑learn to perform both simple and multiple linear regression on the Iris and Boston housing datasets, along with model evaluation and visualization.

Data SciencePythonlinear regression
0 likes · 7 min read
Linear Regression Theory and Python Implementation with Iris and Boston Datasets
Python Programming Learning Circle
Python Programming Learning Circle
Dec 9, 2020 · Artificial Intelligence

Introduction to Artificial Neural Networks and BP Neural Network Implementation with Keras and Scikit-learn

This article introduces artificial neural networks, explains various activation functions, describes common ANN models such as BP, RBF, FNN and LM, and provides step‑by‑step implementation of BP neural networks for classification and regression using Keras Sequential and scikit‑learn’s MLPClassifier/MLPRegressor.

BP Neural NetworkKerasactivation functions
0 likes · 6 min read
Introduction to Artificial Neural Networks and BP Neural Network Implementation with Keras and Scikit-learn
Python Crawling & Data Mining
Python Crawling & Data Mining
Jun 24, 2020 · Artificial Intelligence

How to Quickly Analyze and Predict Stock Prices with Python in 12 Minutes

This tutorial shows how to fetch historical stock data from Yahoo Finance using pandas, compute moving averages and returns, explore correlations among major tech stocks, engineer features, train linear, polynomial, and K‑Nearest‑Neighbour models with scikit‑learn, evaluate their accuracy, and visualize both historical prices and future forecasts, all in a concise, step‑by‑step guide.

Pythonscikit-learnstock analysis
0 likes · 17 min read
How to Quickly Analyze and Predict Stock Prices with Python in 12 Minutes
Python Programming Learning Circle
Python Programming Learning Circle
Apr 22, 2020 · Artificial Intelligence

Python Audio‑Based Parkinson’s Disease Detection Using Machine Learning

This tutorial demonstrates how to build a Python library that extracts acoustic measurements from healthy and Parkinson’s disease audio recordings, constructs a machine‑learning dataset, trains a logistic‑regression classifier with scikit‑learn, evaluates its accuracy, and provides functions to load and use the trained model in other applications.

Audio ProcessingParkinson's DiseaseParselmouth
0 likes · 12 min read
Python Audio‑Based Parkinson’s Disease Detection Using Machine Learning
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 16, 2020 · Artificial Intelligence

How Mars Supercharges Numpy, Pandas, and Scikit‑Learn with Parallel and GPU Acceleration

This article explains how the Mars framework enables parallel and distributed execution of core Python data‑science libraries—Numpy, Pandas, and Scikit‑Learn—while integrating with RAPIDS for GPU acceleration, and demonstrates its performance advantages through code examples and benchmark results.

GPU AccelerationMarsNumPy
0 likes · 16 min read
How Mars Supercharges Numpy, Pandas, and Scikit‑Learn with Parallel and GPU Acceleration
Python Programming Learning Circle
Python Programming Learning Circle
Oct 15, 2019 · Artificial Intelligence

Why Python Beats Java for Data Science: Jupyter, Pandas, scikit-learn & Mapping

Python’s ecosystem—Jupyter notebooks, Pandas for data manipulation, scikit-learn for machine learning, and matplotlib/Basemap for powerful visualizations—offers a streamlined, scriptable environment that outperforms traditional Java or PHP workflows, enabling researchers to write, run, and document code seamlessly in a single web interface.

Data visualizationJupyterMatplotlib
0 likes · 8 min read
Why Python Beats Java for Data Science: Jupyter, Pandas, scikit-learn & Mapping
MaGe Linux Operations
MaGe Linux Operations
Sep 27, 2019 · Artificial Intelligence

Top 10 Python Libraries Every AI Developer Should Master

This article introduces ten essential Python libraries—TensorFlow, Scikit‑Learn, NumPy, Keras, PyTorch, LightGBM, Eli5, SciPy, Theano, and Pandas—detailing their features, typical use cases, and adoption in machine‑learning and data‑science projects, while highlighting each library's performance advantages, community support, and integration capabilities to help developers choose the right tool for their AI workflows.

KerasNumPyPyTorch
0 likes · 15 min read
Top 10 Python Libraries Every AI Developer Should Master
Qunar Tech Salon
Qunar Tech Salon
Apr 17, 2019 · Artificial Intelligence

Understanding AdaBoost: Theory, Scikit‑learn Library, and Practical Implementation in Python

This article introduces the AdaBoost algorithm, explains its boosting principle, describes the AdaBoostClassifier and AdaBoostRegressor classes in scikit‑learn, provides a complete Python example with data loading, model training, prediction, evaluation, and visualisation, and discusses the algorithm’s advantages, disadvantages, and detailed iterative process.

AdaBoostPythonboosting
0 likes · 12 min read
Understanding AdaBoost: Theory, Scikit‑learn Library, and Practical Implementation in Python
MaGe Linux Operations
MaGe Linux Operations
Feb 4, 2019 · Artificial Intelligence

8 Python Linear Regression Techniques Compared for Speed and Complexity

This article reviews eight Python-based simple linear regression algorithms, examining their computational complexity and speed on datasets up to ten million points, highlighting trade‑offs between ease of use, flexibility, and performance to help data scientists choose the most efficient method.

Data SciencePythonlinear regression
0 likes · 10 min read
8 Python Linear Regression Techniques Compared for Speed and Complexity
MaGe Linux Operations
MaGe Linux Operations
Nov 26, 2018 · Artificial Intelligence

Master Python Machine Learning in 14 Steps: From Zero to Expert

This comprehensive guide walks beginners through fourteen practical steps to learn Python machine learning, covering essential Python skills, core scientific libraries, fundamental algorithms, advanced techniques like SVM and ensemble methods, dimensionality reduction, and deep learning with TensorFlow, all using free online resources.

Data ScienceDeep LearningPython
0 likes · 22 min read
Master Python Machine Learning in 14 Steps: From Zero to Expert
MaGe Linux Operations
MaGe Linux Operations
Jun 22, 2018 · Artificial Intelligence

8 Fast Python Linear Regression Techniques Compared for Speed and Complexity

This article reviews eight Python-based simple linear regression methods, explains their underlying algorithms, compares their computational complexity and execution speed on datasets up to ten million points, and offers guidance on selecting the most efficient approach for data‑science tasks.

NumPylinear regressionmachine learning
0 likes · 10 min read
8 Fast Python Linear Regression Techniques Compared for Speed and Complexity
Qunar Tech Salon
Qunar Tech Salon
Jun 15, 2018 · Artificial Intelligence

Predicting the 2018 FIFA World Cup Winners Using Machine Learning

This article demonstrates how to collect historical football data, perform exploratory analysis and feature engineering, and apply a logistic‑regression model in Python to predict the 2018 FIFA World Cup champion, group‑stage results, and knockout‑stage outcomes.

FIFA World CupPythondata analysis
0 likes · 8 min read
Predicting the 2018 FIFA World Cup Winners Using Machine Learning
MaGe Linux Operations
MaGe Linux Operations
Mar 29, 2018 · Artificial Intelligence

Master Python’s Top Data Analysis & AI Libraries with Hands‑On Code

This article introduces Python’s essential features for data analysis and mining, then reviews the most widely used libraries—NumPy, SciPy, Matplotlib, Pandas, Scikit‑Learn, Keras, and Gensim—each accompanied by concise code examples that demonstrate their core capabilities.

KerasPythondata analysis
0 likes · 14 min read
Master Python’s Top Data Analysis & AI Libraries with Hands‑On Code
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Mar 6, 2018 · Artificial Intelligence

Master Naive Bayes: From Theory to Python Text Classification

This article introduces the Naive Bayes classifier, explains its underlying probability formulas—including conditional probability, total probability, and the Bayes theorem—covers the feature independence assumption, Laplace smoothing, and demonstrates both manual and scikit‑learn implementations for email and text classification with Python code.

Naive Bayesprobabilityscikit-learn
0 likes · 11 min read
Master Naive Bayes: From Theory to Python Text Classification
MaGe Linux Operations
MaGe Linux Operations
Jan 14, 2018 · Artificial Intelligence

7 Essential Python Tools Every Data Scientist Must Master

This article introduces seven must‑know Python tools—including IPython, GraphLab Create, Pandas, PuLP, Matplotlib, Scikit‑Learn, and Spark—explaining their key features and how they empower data scientists to work efficiently in production environments.

Data ScienceGraphLabIPython
0 likes · 9 min read
7 Essential Python Tools Every Data Scientist Must Master
MaGe Linux Operations
MaGe Linux Operations
Aug 11, 2017 · Artificial Intelligence

Master Python Machine Learning in 14 Free Steps from Zero to Advanced

This comprehensive guide walks beginners through fourteen free steps to learn Python machine learning, covering installation, core scientific libraries, fundamental and advanced algorithms, ensemble methods, gradient boosting, dimensionality reduction, and deep learning frameworks with curated resources and practical examples.

Deep LearningPythonscikit-learn
0 likes · 24 min read
Master Python Machine Learning in 14 Free Steps from Zero to Advanced
21CTO
21CTO
Jun 23, 2017 · Artificial Intelligence

Master Python Machine Learning: A Step‑by‑Step 0‑to‑100 Guide

This comprehensive tutorial walks beginners from zero to proficiency in Python‑based machine learning, covering essential Python skills, core ML concepts, key scientific libraries, fundamental algorithms, advanced techniques like SVM and ensemble methods, and an introduction to deep learning with practical resources and code examples.

Data SciencePythonscikit-learn
0 likes · 24 min read
Master Python Machine Learning: A Step‑by‑Step 0‑to‑100 Guide
ITPUB
ITPUB
May 29, 2017 · Fundamentals

Why R Users Should Learn Python for Data Science: A Hands‑On Guide

This tutorial explains why R programmers should add Python to their toolkit, compares core data types and structures between the two languages, introduces essential Python libraries for data analysis, and walks through a practical Boston housing dataset example to solidify the concepts.

Data ScienceNumPyPython
0 likes · 12 min read
Why R Users Should Learn Python for Data Science: A Hands‑On Guide
MaGe Linux Operations
MaGe Linux Operations
Apr 7, 2017 · Artificial Intelligence

Predict Diabetes with Linear Regression: A Step‑by‑Step Python Guide

This tutorial walks through using scikit‑learn's LinearRegression on the classic diabetes dataset, covering data description, model training with fit(), making predictions, evaluating performance, and code optimizations, all illustrated with clear output images and plots.

Diabetes PredictionPythonlinear regression
0 likes · 5 min read
Predict Diabetes with Linear Regression: A Step‑by‑Step Python Guide
MaGe Linux Operations
MaGe Linux Operations
Apr 5, 2017 · Artificial Intelligence

Master Decision Trees with the Iris Dataset: A Hands‑On Guide

This article introduces classification and decision‑tree algorithms, explains the Iris dataset, and provides step‑by‑step Python code using scikit‑learn to build, train, evaluate, and visualize decision‑tree models, including optimizations and practical tips for accurate predictions.

classificationdecision treeiris dataset
0 likes · 10 min read
Master Decision Trees with the Iris Dataset: A Hands‑On Guide
MaGe Linux Operations
MaGe Linux Operations
Feb 28, 2017 · Artificial Intelligence

How to Build a Python Machine Learning Environment and Fit Your First Model

This tutorial walks through setting up a Python 2.7 machine learning environment with scikit-learn, installing required libraries, loading web traffic data, cleaning NaN entries, visualizing the data, performing a linear regression using SciPy's polyfit, and evaluating the model's fit.

Data visualizationPythonlinear regression
0 likes · 9 min read
How to Build a Python Machine Learning Environment and Fit Your First Model