Tagged articles
29 articles
Page 1 of 1
Data STUDIO
Data STUDIO
Sep 5, 2025 · Artificial Intelligence

19 Elegant Sklearn Tricks for More Efficient Machine Learning

This article presents 19 practical Sklearn functions—ranging from outlier detection to hyper‑parameter search—that replace manual data‑science steps, each illustrated with concise code examples and performance comparisons.

Model EvaluationPipelinedata preprocessing
0 likes · 24 min read
19 Elegant Sklearn Tricks for More Efficient Machine Learning
Python Programming Learning Circle
Python Programming Learning Circle
Mar 23, 2024 · Artificial Intelligence

Eight Python Libraries to Accelerate Data‑Science Workflows

This article introduces eight Python libraries—including Optuna, ITMO_FS, shap‑hypetune, PyCaret, floWeaver, Gradio, Terality, and Torch‑Handle—that streamline data‑science tasks such as hyperparameter optimization, feature selection, model building, visualization, and deployment, helping users save coding time and improve productivity.

AutomationData SciencePython
0 likes · 12 min read
Eight Python Libraries to Accelerate Data‑Science Workflows
Model Perspective
Model Perspective
Aug 31, 2023 · Artificial Intelligence

Master Feature Selection: From Filters to PCA with Python

This article explains why selecting the right features is essential for machine learning, outlines the general workflow, compares filter, wrapper, and embedded methods, demonstrates statistical tests and Python code examples, and shows how PCA can synthesize features for dimensionality reduction.

PCAPythonchi-square
0 likes · 18 min read
Master Feature Selection: From Filters to PCA with Python
Architects' Tech Alliance
Architects' Tech Alliance
Jul 11, 2023 · Artificial Intelligence

Wear-Updated Integrated Feature Ranking (WEFR) for Robust SSD Failure Prediction

The article presents a large‑scale study of SSD failure prediction using SMART logs from multiple vendors, introduces the Wear‑Updated Integrated Feature Ranking (WEFR) method to automatically and robustly select predictive features, and demonstrates its effectiveness through extensive experiments on real‑world data.

SSDStorage ReliabilityWEFR
0 likes · 10 min read
Wear-Updated Integrated Feature Ranking (WEFR) for Robust SSD Failure Prediction
Python Crawling & Data Mining
Python Crawling & Data Mining
May 8, 2023 · Artificial Intelligence

How to Choose the Right Features for Python Machine Learning Projects

This article explains Python machine‑learning basics, covering data splitting, feature and label concepts, key factors for feature selection, and practical tips for building predictive models, while also offering code snippets and visual illustrations to help readers apply these techniques effectively.

AIdata miningfeature selection
0 likes · 6 min read
How to Choose the Right Features for Python Machine Learning Projects
Model Perspective
Model Perspective
Mar 20, 2023 · Artificial Intelligence

Master Feature Selection with Recursive Elimination (RFE) in Python

Feature Recursive Elimination (RFE) is a powerful feature‑selection technique that iteratively trains a model, discards the weakest features, and repeats until a desired number of features remains, helping prevent overfitting and improve model performance, illustrated with a complete Python example using scikit‑learn.

Pythonfeature selectionrecursive elimination
0 likes · 6 min read
Master Feature Selection with Recursive Elimination (RFE) in Python
Model Perspective
Model Perspective
Feb 8, 2023 · Artificial Intelligence

Mastering Feature Selection: From Filters to Embedded Methods in Python

This article explains why feature selection is crucial for machine learning, outlines the general workflow, compares filter, wrapper, embedded, and synthesis approaches, and provides practical Python examples—including Pearson correlation, chi‑square tests, mutual information, variance selection, recursive elimination, L1 regularization, and PCA—complete with code snippets and visualizations.

Pythonfeature selectionstatistics
0 likes · 20 min read
Mastering Feature Selection: From Filters to Embedded Methods in Python
Python Programming Learning Circle
Python Programming Learning Circle
Oct 25, 2022 · Artificial Intelligence

Genetic Algorithms: Theory, Steps, and Practical Implementation with TPOT for Data Science

This article introduces genetic algorithms, explains their biological inspiration, details each step of the algorithm, demonstrates solving the knapsack problem, and provides a complete Python implementation using the TPOT library for feature selection and regression on the Big Mart Sales dataset.

PythonTPOTfeature selection
0 likes · 19 min read
Genetic Algorithms: Theory, Steps, and Practical Implementation with TPOT for Data Science
MaGe Linux Operations
MaGe Linux Operations
Oct 1, 2022 · Artificial Intelligence

11 Powerful Feature Selection Techniques Every Data Scientist Should Master

This guide walks through a comprehensive set of feature‑selection strategies—from removing unused or missing columns to handling multicollinearity, low‑variance features, and using PCA—complete with Python code examples and visualizations to help you build leaner, more interpretable machine‑learning models.

Pythondata preprocessingdimensionality reduction
0 likes · 18 min read
11 Powerful Feature Selection Techniques Every Data Scientist Should Master
Python Programming Learning Circle
Python Programming Learning Circle
Feb 23, 2022 · Artificial Intelligence

A Survey of Python Libraries for Hyperparameter Optimization, Feature Selection, Model Explainability, and Rapid Machine Learning Development

This article introduces several Python libraries—including Optuna, ITMO_FS, shap‑hypertune, PyCaret, floWeaver, Gradio, Terality, and torch‑handle—that simplify hyperparameter tuning, feature selection, model explainability, visualization, and low‑code ML workflows, providing code examples and key advantages for each tool.

Model ExplainabilityPythonfeature selection
0 likes · 10 min read
A Survey of Python Libraries for Hyperparameter Optimization, Feature Selection, Model Explainability, and Rapid Machine Learning Development
Code DAO
Code DAO
Dec 18, 2021 · Artificial Intelligence

Essential Feature Selection Techniques for Machine Learning

This article explains why feature selection is crucial for building robust machine‑learning models and walks through popular filter, wrapper, and embedded methods—including information gain, chi‑square, LASSO, random‑forest importance, and PCA—providing code examples and practical guidance.

PCARegularizationembedded methods
0 likes · 18 min read
Essential Feature Selection Techniques for Machine Learning
Code DAO
Code DAO
Nov 29, 2021 · Artificial Intelligence

Feature Selection: Reducing Input Variables for Predictive Modeling

This article explains the purpose and types of feature selection, compares supervised and unsupervised, wrapper, filter, and embedded methods, discusses choosing statistical metrics based on variable types, and provides scikit‑learn code examples for regression and classification tasks.

embedded methodsfeature selectionfilter methods
0 likes · 12 min read
Feature Selection: Reducing Input Variables for Predictive Modeling
Code DAO
Code DAO
Nov 29, 2021 · Artificial Intelligence

Dimensionality Reduction Algorithms: Why Too Many Features Hurt Machine Learning

The article explains how high‑dimensional data causes the curse of dimensionality, reduces model performance, and surveys feature‑selection, matrix‑decomposition, manifold‑learning, and auto‑encoder techniques while advising systematic experiments and proper data scaling.

PCAautoencodersdimensionality reduction
0 likes · 9 min read
Dimensionality Reduction Algorithms: Why Too Many Features Hurt Machine Learning
Meituan Technology Team
Meituan Technology Team
Nov 18, 2021 · Artificial Intelligence

Multi‑Business Product Ranking in Meituan Search: Challenges, Modeling Approaches, and Practical Results

Meituan Search tackles the difficulty of ranking items from diverse business lines by introducing a five‑tower mixed architecture, group‑lasso and feature‑gate selection, a probabilistic graph model, and a joint block‑order/size predictor, achieving notable offline NDCG gains and online CTR and purchase‑rate improvements.

Deep Learninge‑commercefeature selection
0 likes · 19 min read
Multi‑Business Product Ranking in Meituan Search: Challenges, Modeling Approaches, and Practical Results
DataFunSummit
DataFunSummit
Sep 10, 2021 · Artificial Intelligence

Advances in Pre‑Ranking: The COLD System for Large‑Scale Advertising

This article reviews the evolution of coarse‑ranking in large‑scale ad systems, explains the two main technical routes—set selection and precise value estimation—introduces the Computing‑Power‑Cost‑Aware Online Lightweight Deep (COLD) pre‑ranking framework, and presents experimental results and future directions for deeper integration with fine‑ranking.

AdvertisingCOLDfeature selection
0 likes · 21 min read
Advances in Pre‑Ranking: The COLD System for Large‑Scale Advertising
DataFunTalk
DataFunTalk
Jun 14, 2021 · Artificial Intelligence

From Massive to Compact: Model Compression Strategies for Large‑Scale CTR Prediction in Alibaba Search Advertising

This article describes how Alibaba's search advertising team transformed trillion‑parameter CTR models into lightweight, high‑precision systems by compressing embedding layers through feature‑space reduction, dimension quantization, and multi‑hash techniques, while also introducing graph‑based pre‑training and dropout‑driven feature selection to maintain accuracy.

CTR predictionembedding reductionfeature selection
0 likes · 15 min read
From Massive to Compact: Model Compression Strategies for Large‑Scale CTR Prediction in Alibaba Search Advertising
Alimama Tech
Alimama Tech
Jun 2, 2021 · Artificial Intelligence

Model Compression and Feature Optimization for Large-Scale CTR Prediction in Advertising

Alibaba‑Mama’s advertising team shrank multi‑terabyte CTR models to just tens of gigabytes by applying row‑dimension embedding compression, multi‑hash embeddings, graph‑based relationship networks, PCF‑GNN pre‑training, and droprank feature selection, preserving accuracy while halving training time, doubling online QPS, and retiring hundreds of servers.

Large-scale MLembedding reductionfeature selection
0 likes · 14 min read
Model Compression and Feature Optimization for Large-Scale CTR Prediction in Advertising
Alimama Tech
Alimama Tech
May 27, 2021 · Artificial Intelligence

Towards a Better Tradeoff between Effectiveness and Efficiency in Pre‑Ranking: A Learnable Feature‑Selection‑Based Approach

The authors introduce an interaction‑focused pre‑ranking model combined with a learnable, complexity‑aware feature‑selection technique (FSCD) that selects a compact feature set, enabling Alibaba’s search advertising system to boost offline AUC from 0.695 to 0.737, raise recall to 95 %, improve CTR and RPM, yet retain CPU usage and latency comparable to traditional vector‑dot models.

effectivenessfeature selectionpre‑ranking
0 likes · 15 min read
Towards a Better Tradeoff between Effectiveness and Efficiency in Pre‑Ranking: A Learnable Feature‑Selection‑Based Approach
Alimama Tech
Alimama Tech
May 27, 2021 · Artificial Intelligence

Advances in Click‑Through Rate (CTR) Modeling: Overview of Recent SIGIR Papers and Optimization Paths

The article reviews recent Alibaba Mama advances in click‑through‑rate modeling, classifying optimizations across the three‑layer CTR architecture and highlighting three SIGIR papers—GIN’s graph‑based user intent modeling, PCF’s pre‑trained GNN for explicit cross‑feature semantics, and FSCD’s compute‑factor‑guided automatic feature selection—each boosting prediction accuracy and system efficiency.

embedding layersfeature selectionsearch advertising
0 likes · 12 min read
Advances in Click‑Through Rate (CTR) Modeling: Overview of Recent SIGIR Papers and Optimization Paths
Python Programming Learning Circle
Python Programming Learning Circle
May 8, 2021 · Artificial Intelligence

Top 10 New Features in Scikit‑learn 0.24

The article reviews the most important additions in scikit‑learn 0.24, including faster hyper‑parameter search methods, ICE plots, histogram‑based boosting improvements, new feature‑selection tools, polynomial‑feature approximations, a semi‑supervised classifier, MAPE metric, enhanced OneHotEncoder and OrdinalEncoder handling, and a more flexible RFE interface.

Model EvaluationPythondata preprocessing
0 likes · 8 min read
Top 10 New Features in Scikit‑learn 0.24
TAL Education Technology
TAL Education Technology
Sep 17, 2020 · Artificial Intelligence

Comprehensive Guide to Feature Engineering and Data Preprocessing for Machine Learning

This article provides an extensive overview of feature engineering, covering feature understanding, cleaning, construction, selection, transformation, and dimensionality reduction techniques, illustrated with Python code using the Titanic dataset, and offers practical guidelines for improving data quality and model performance in machine learning projects.

PythonTitanic datasetdata preprocessing
0 likes · 44 min read
Comprehensive Guide to Feature Engineering and Data Preprocessing for Machine Learning
DataFunTalk
DataFunTalk
Aug 18, 2020 · Artificial Intelligence

COLD: A Next‑Generation Pre‑Ranking System for Online Advertising

The article introduces COLD, a computing‑power‑aware online and lightweight deep pre‑ranking system for Alibaba's targeted ads, detailing its evolution from static CTR models to vector‑inner‑product models, its flexible network architecture with feature‑selection via SE blocks, engineering optimizations such as parallelism, column‑wise computation, Float16 and MPS, and demonstrates superior offline and online performance through extensive experiments.

COLDModel Optimizationfeature selection
0 likes · 11 min read
COLD: A Next‑Generation Pre‑Ranking System for Online Advertising
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Nov 9, 2018 · Artificial Intelligence

Predicting Server Memory Failures with Machine Learning: Feature Selection, Data Preprocessing, and Model Evaluation

This article presents a machine‑learning approach to predict DRAM failures in large‑scale data centers by analyzing server logs, selecting state, log, and static features through statistical tests and mutual information, preprocessing the data, and employing a tree‑based ensemble classifier that outperforms industry baselines.

Predictive Maintenanceclassificationfeature selection
0 likes · 7 min read
Predicting Server Memory Failures with Machine Learning: Feature Selection, Data Preprocessing, and Model Evaluation
Architecture Digest
Architecture Digest
Feb 14, 2018 · Artificial Intelligence

Comparative Analysis and Optimization of Machine Learning Models on the UCI Census Income Dataset

This article walks through a complete machine‑learning workflow on the UCI Census Income dataset, covering data exploration, preprocessing (including log‑transformation and scaling), model training with Naïve Bayes, Decision Tree and SVM, performance evaluation, hyper‑parameter tuning via grid search, feature importance analysis, and feature selection, providing code snippets and visualizations.

Model EvaluationPythondata preprocessing
0 likes · 24 min read
Comparative Analysis and Optimization of Machine Learning Models on the UCI Census Income Dataset
Meituan Technology Team
Meituan Technology Team
Oct 12, 2017 · Artificial Intelligence

Machine Learning Q&A: Data Imputation, Feature Selection, Recommendation Systems and More

The article answers ten machine‑learning questions, explaining how to impute missing behavior data, extract and select features, describe Meituan‑Dianping’s recommendation pipeline, suggest a beginner learning path, clarify L1 sparsity, recommend TextCNN for text, discuss search‑ranking sample bias, label generation for wide‑deep models, the shift to deep‑learning video detection, and the use of factorization machines for CTR with open‑source examples.

Deep LearningL1 RegularizationRecommendation Systems
0 likes · 7 min read
Machine Learning Q&A: Data Imputation, Feature Selection, Recommendation Systems and More
MaGe Linux Operations
MaGe Linux Operations
Apr 17, 2017 · Artificial Intelligence

Essential Machine Learning Visuals: Test Error, Overfitting, and More

This article presents a curated collection of insightful machine‑learning diagrams that illustrate key concepts such as test versus training error, under‑ and over‑fitting, Occam’s razor, feature interactions, irrelevant features, basis functions, discriminative versus generative models, loss functions, least‑squares geometry, and sparsity.

Loss FunctionsOccam's razorfeature selection
0 likes · 6 min read
Essential Machine Learning Visuals: Test Error, Overfitting, and More
Meituan Technology Team
Meituan Technology Team
Dec 18, 2014 · Artificial Intelligence

Auto-Label Missing POI Categories Using Naive Bayes and Feature Selection

This article details a step‑by‑step machine‑learning pipeline that transforms over one million calibrated POI records into feature vectors, selects discriminative terms via information‑gain and domain rules, trains a Naive Bayes classifier, and achieves 91% accuracy with 84% coverage on unseen POI data.

Chinese NLPNaive BayesPOI classification
0 likes · 12 min read
Auto-Label Missing POI Categories Using Naive Bayes and Feature Selection