Tagged articles
83 articles
Page 1 of 1
Data Party THU
Data Party THU
May 19, 2026 · Artificial Intelligence

Model Performance Lagging? Master Feature Engineering with a Complete Step‑by‑Step Guide

This article walks through the entire feature‑engineering pipeline—data cleaning, missing‑value imputation, encoding, outlier handling, scaling, feature construction, and selection—using Pandas and Scikit‑learn, and shows how to wrap the steps into a reproducible Scikit‑learn Pipeline.

Pipelinedata preprocessingfeature engineering
0 likes · 9 min read
Model Performance Lagging? Master Feature Engineering with a Complete Step‑by‑Step Guide
DeepHub IMBA
DeepHub IMBA
May 12, 2026 · Artificial Intelligence

Hands‑On Feature Engineering with Pandas and Scikit‑Learn: Complete Code Walkthrough

This article walks through a full feature‑engineering pipeline using Pandas and Scikit‑Learn, covering data inspection, missing‑value imputation, categorical encoding, outlier handling, scaling, feature construction, selection, and a final Pipeline that prepares clean, predictive features for a logistic‑regression model.

Pipelinedata preprocessingfeature engineering
0 likes · 9 min read
Hands‑On Feature Engineering with Pandas and Scikit‑Learn: Complete Code Walkthrough
Data Party THU
Data Party THU
Feb 2, 2026 · Fundamentals

Why Standardize Data to Mean 0 and Variance 1?

The article explains that setting the mean to zero recenters data around the origin, making optimization algorithms converge faster, while scaling variance to one equalizes feature scales so no single feature dominates, illustrated with examples and visualizations of how standardization improves machine‑learning models.

data preprocessingfeature scalingmachine learning
0 likes · 5 min read
Why Standardize Data to Mean 0 and Variance 1?
AI Cyberspace
AI Cyberspace
Jan 18, 2026 · Artificial Intelligence

Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training

The article explains various learning paradigms (supervised, unsupervised, self‑supervised, semi‑supervised, and reinforcement), describes dataset types and quality considerations, outlines preprocessing steps like filtering, deduplication, and tokenization, and discusses scaling laws linking model size, data volume, and compute resources, with concrete examples and code.

Model Trainingdata preprocessingmachine learning
0 likes · 26 min read
Understanding Supervised, Unsupervised, Self‑Supervised, Semi‑Supervised, and Reinforcement Learning for Large Language Model Training
Data STUDIO
Data STUDIO
Oct 28, 2025 · Artificial Intelligence

8 Proven Ways to Boost Machine Learning Model Accuracy

This article outlines eight practical techniques—including data augmentation, handling missing values, feature engineering, algorithm selection, hyperparameter tuning, ensemble methods, and cross‑validation—to systematically improve the accuracy of Python machine‑learning models, supported by explanations, examples, and code snippets.

cross-validationdata preprocessingensemble methods
0 likes · 16 min read
8 Proven Ways to Boost Machine Learning Model Accuracy
DataFunSummit
DataFunSummit
Sep 13, 2025 · Artificial Intelligence

How Pinterest Scaled LLM Data Pipelines with Ray: Boosting Throughput and Cutting Costs

This article details how Pinterest’s senior staff engineer Dr. Luo leveraged the open‑source Ray framework to overcome LLM data‑preprocessing bottlenecks, describing the system’s architecture, key features such as map_batches, Carry‑Over Columns and Accumulators, and the dramatic performance and cost improvements achieved.

LLMPinterestRay
0 likes · 12 min read
How Pinterest Scaled LLM Data Pipelines with Ray: Boosting Throughput and Cutting Costs
Data STUDIO
Data STUDIO
Sep 5, 2025 · Artificial Intelligence

19 Elegant Sklearn Tricks for More Efficient Machine Learning

This article presents 19 practical Sklearn functions—ranging from outlier detection to hyper‑parameter search—that replace manual data‑science steps, each illustrated with concise code examples and performance comparisons.

Model EvaluationPipelinedata preprocessing
0 likes · 24 min read
19 Elegant Sklearn Tricks for More Efficient Machine Learning
Python Programming Learning Circle
Python Programming Learning Circle
Jul 8, 2025 · Artificial Intelligence

10 One‑Line Python Tricks to Jump‑Start Your Machine Learning Projects

This article presents ten concise, practical one‑line Python code snippets—ranging from loading CSV data with Pandas to building sophisticated Scikit‑learn pipelines—that streamline common machine‑learning tasks such as data cleaning, encoding, splitting, scaling, model training, evaluation, cross‑validation, and prediction.

PipelinePythondata preprocessing
0 likes · 10 min read
10 One‑Line Python Tricks to Jump‑Start Your Machine Learning Projects
AI Code to Success
AI Code to Success
Feb 27, 2025 · Artificial Intelligence

Master Decision Trees: Theory, Construction, and Python Implementation

This article provides a comprehensive guide to decision tree algorithms, covering their theoretical foundations, key components, construction workflow—including data preprocessing, feature selection, tree growth, stopping criteria, and pruning—followed by an overview of popular variants like ID3, C4.5, CART, practical advantages, applications, and a complete Python implementation using scikit-learn.

Pythonclassificationdata preprocessing
0 likes · 29 min read
Master Decision Trees: Theory, Construction, and Python Implementation
Test Development Learning Exchange
Test Development Learning Exchange
Dec 5, 2024 · Artificial Intelligence

End-to-End House Prices Prediction Project: Data Collection, Preprocessing, Modeling, Evaluation, and Deployment with Python

This tutorial walks through a complete house price prediction project, covering data collection from Kaggle, preprocessing with pandas and scikit‑learn, model training using RandomForestRegressor, evaluation, and deployment of a Flask API for real‑time predictions, providing full code examples.

FlaskModel DeploymentPython
0 likes · 9 min read
End-to-End House Prices Prediction Project: Data Collection, Preprocessing, Modeling, Evaluation, and Deployment with Python
Test Development Learning Exchange
Test Development Learning Exchange
Nov 26, 2024 · Artificial Intelligence

Comprehensive Python Tutorial for Data Preprocessing, Feature Engineering, Model Training, Evaluation, and Deployment

This tutorial walks through consolidating the first ten days of learning by covering data preprocessing, feature engineering, model training with linear regression, decision tree, and random forest, model evaluation using cross‑validation, and finally saving and loading the best model, all illustrated with complete Python code examples.

Model TrainingPythondata preprocessing
0 likes · 9 min read
Comprehensive Python Tutorial for Data Preprocessing, Feature Engineering, Model Training, Evaluation, and Deployment
DaTaobao Tech
DaTaobao Tech
Aug 21, 2024 · Artificial Intelligence

Mastering Custom Large‑Model Training: Data Strategies, LoRA Tricks, and Resource Planning

This article provides a comprehensive, step‑by‑step guide to training customized large language models, covering industry‑specific needs, data privacy, meticulous data cleaning, optimal data‑ratio balancing, token budgeting, GPU memory accounting, LoRA fine‑tuning techniques, and practical evaluation metrics for robust AI deployment.

AI trainingFine-tuningGPU Memory
0 likes · 23 min read
Mastering Custom Large‑Model Training: Data Strategies, LoRA Tricks, and Resource Planning
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jul 7, 2024 · Artificial Intelligence

Daily and Sports Activities Dataset: Description, Preprocessing Pipeline, and CNN Classification Results

This article introduces the Daily_and_Sports_Activities sensor dataset, details its structure and characteristics, provides a Python preprocessing pipeline with sliding‑window segmentation and Z‑score normalization, and reports CNN training results achieving 87.93% accuracy on activity classification.

CNNUCIdata preprocessing
0 likes · 9 min read
Daily and Sports Activities Dataset: Description, Preprocessing Pipeline, and CNN Classification Results
php Courses
php Courses
Jun 13, 2024 · Artificial Intelligence

Using PHP for Data Dimensionality Reduction and Feature Extraction

This article explains the importance of data dimensionality reduction and feature extraction in machine learning, and provides a step‑by‑step guide with PHP code examples—including library installation, data preprocessing, PCA‑based reduction, and feature selection techniques—demonstrating how to handle large datasets efficiently.

PCAPHPdata preprocessing
0 likes · 6 min read
Using PHP for Data Dimensionality Reduction and Feature Extraction
HelloTech
HelloTech
Mar 14, 2024 · Artificial Intelligence

Feature Engineering: Concepts, Methods, and Automation

Feature engineering transforms existing data into new predictive variables through manual analysis or automated pipelines, encompassing single‑variable encoding, pairwise arithmetic, group‑statistics, multi‑variable combinations, time‑series and text derivations, with tools like Deep Feature Synthesis and beam‑search to generate and select useful features.

Time Seriesautomated featuresdata preprocessing
0 likes · 17 min read
Feature Engineering: Concepts, Methods, and Automation
Test Development Learning Exchange
Test Development Learning Exchange
Jan 23, 2024 · Fundamentals

Common Data Preprocessing Techniques with Python Code Examples

This article presents ten essential data preprocessing methods—including handling missing values, type conversion, standardization, encoding, smoothing, outlier treatment, text cleaning, word frequency counting, sentiment analysis, and topic modeling—each explained with clear Python code snippets.

Pythondata cleaningdata preprocessing
0 likes · 9 min read
Common Data Preprocessing Techniques with Python Code Examples
Test Development Learning Exchange
Test Development Learning Exchange
Dec 4, 2023 · Fundamentals

Common Data Cleaning Techniques with Python Code Examples

This article presents a comprehensive collection of Python code snippets demonstrating essential data cleaning methods—including handling missing values, outlier detection, type conversion, formatting, duplicate removal, normalization, one‑hot encoding, text preprocessing, and dataset merging—providing practical guidance for preparing data for analysis or machine‑learning tasks.

data cleaningdata preprocessingmachine learning
0 likes · 7 min read
Common Data Cleaning Techniques with Python Code Examples
DaTaobao Tech
DaTaobao Tech
Sep 11, 2023 · Artificial Intelligence

Large Language Model Upgrade Paths and Architecture Selection

This article analyzes upgrade paths of major LLMs—ChatGLM, LLaMA, Baichuan—detailing performance, context length, and architectural changes, then examines essential capabilities, data cleaning, tokenizer and attention design, and offers practical guidance for balanced scaling and efficient model construction.

BaichuanChatGLMLLM architecture
0 likes · 32 min read
Large Language Model Upgrade Paths and Architecture Selection
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
May 10, 2023 · Artificial Intelligence

How LLaMA Preprocesses Training Data with CCNet Before Model Training

Before training large language models like LLaMA, MetaAI applies a multi‑stage CCNet pipeline that crawls web data, stores it in WET format, deduplicates paragraphs, detects and filters languages using fastText, and further refines content by similarity to Wikipedia and citation‑based linear models.

CCNetLLaMAdata preprocessing
0 likes · 7 min read
How LLaMA Preprocesses Training Data with CCNet Before Model Training
Model Perspective
Model Perspective
Mar 3, 2023 · Fundamentals

Unlock Hidden Patterns: A Practical Guide to Factor Analysis with Python

Factor analysis, a statistical technique for uncovering underlying common factors among variables, is explained alongside its distinction from PCA, detailed procedural steps, adequacy tests, and a hands‑on Python implementation using the factor_analyzer library with visualizations and factor rotation methods.

Pythondata preprocessingfactor analysis
0 likes · 10 min read
Unlock Hidden Patterns: A Practical Guide to Factor Analysis with Python
Python Programming Learning Circle
Python Programming Learning Circle
Dec 31, 2022 · Artificial Intelligence

A Beginner’s Guide to Data Preprocessing for Machine Learning in Python

This tutorial walks beginners through the essential steps of data preprocessing for any machine learning model, covering library imports, dataset loading, handling missing values, encoding categorical features, splitting into train‑test sets, and applying feature scaling using Python’s scikit‑learn.

Pythondata preprocessingfeature scaling
0 likes · 11 min read
A Beginner’s Guide to Data Preprocessing for Machine Learning in Python
Python Programming Learning Circle
Python Programming Learning Circle
Dec 7, 2022 · Artificial Intelligence

Predicting the 2022 FIFA World Cup Champion Using Machine Learning Models

This article details a data‑mining project that uses historical World Cup match data, extensive feature engineering, and various machine‑learning algorithms—including neural networks, logistic regression, SVM, decision trees, and random forests—to predict the champion of the 2022 tournament, while analyzing model errors and proposing improvements.

Model EvaluationWorld Cupclassification
0 likes · 7 min read
Predicting the 2022 FIFA World Cup Champion Using Machine Learning Models
MaGe Linux Operations
MaGe Linux Operations
Oct 1, 2022 · Artificial Intelligence

11 Powerful Feature Selection Techniques Every Data Scientist Should Master

This guide walks through a comprehensive set of feature‑selection strategies—from removing unused or missing columns to handling multicollinearity, low‑variance features, and using PCA—complete with Python code examples and visualizations to help you build leaner, more interpretable machine‑learning models.

Pythondata preprocessingdimensionality reduction
0 likes · 18 min read
11 Powerful Feature Selection Techniques Every Data Scientist Should Master
ITPUB
ITPUB
Sep 15, 2022 · Artificial Intelligence

Why Precise Feature Engineering Still Matters in Recommendation Systems

In the era of deep learning, feature engineering remains crucial for recommendation and search advertising because it bridges raw relational data and models, improves performance, reduces complexity, and handles high‑cardinality, large‑scale, and time‑sensitive scenarios with robust transformations and statistical encoding.

AIRecommendation Systemsdata preprocessing
0 likes · 20 min read
Why Precise Feature Engineering Still Matters in Recommendation Systems
DataFunTalk
DataFunTalk
Aug 30, 2022 · Artificial Intelligence

Feature Engineering for Recommendation and Search Advertising

This article explains why meticulous feature engineering remains crucial in recommendation and search advertising, outlines what constitutes good features, describes common transformation techniques such as scaling, binning, and encoding, and provides practical examples and Q&A for practitioners.

AIRecommendation Systemsdata preprocessing
0 likes · 18 min read
Feature Engineering for Recommendation and Search Advertising
Model Perspective
Model Perspective
Aug 14, 2022 · Artificial Intelligence

Mastering Feature Binning with sklearn: Uniform, Quantile, and K‑Means Methods

This article explains why discretizing continuous variables improves model stability, introduces three common binning techniques—equal-width, equal-frequency, and clustering—and demonstrates how to implement each using scikit‑learn's KBinsDiscretizer with Python code examples on a synthetic score dataset.

KBinsDiscretizerPythondata preprocessing
0 likes · 5 min read
Mastering Feature Binning with sklearn: Uniform, Quantile, and K‑Means Methods
Python Programming Learning Circle
Python Programming Learning Circle
Feb 28, 2022 · Artificial Intelligence

Time Series Data Preprocessing: Missing Value Imputation, Denoising, and Outlier Detection

This article explains essential time series preprocessing techniques—including data sorting, handling missing values with interpolation methods, applying rolling averages, Fourier transform denoising, and detecting anomalies using rolling statistics, isolation forests, and K‑means clustering—illustrated with Python code on the AirPassengers and Google stock datasets.

DenoisingPythonTime Series
0 likes · 9 min read
Time Series Data Preprocessing: Missing Value Imputation, Denoising, and Outlier Detection
Baobao Algorithm Notes
Baobao Algorithm Notes
Feb 14, 2022 · Artificial Intelligence

Mastering Feature Engineering: From AutoML Dictionaries to Business‑Driven Insights

This article presents a comprehensive, practical methodology for feature engineering that combines brute‑force AutoML‑style dictionary searches, business‑logic‑driven feature creation, and feature‑importance‑guided refinement, illustrating each approach with real Kaggle competition examples and concrete code snippets.

AutoMLKaggledata preprocessing
0 likes · 12 min read
Mastering Feature Engineering: From AutoML Dictionaries to Business‑Driven Insights
Python Programming Learning Circle
Python Programming Learning Circle
Aug 3, 2021 · Fundamentals

Practical Python Data Cleaning Functions

This article presents a collection of straightforward yet practical Python functions for data cleaning tasks—including dropping columns, changing data types, converting categorical variables, handling missing values, removing unwanted characters, trimming whitespace, conditional concatenation, and converting string timestamps—designed to streamline preprocessing in data analysis projects.

Data Sciencedata preprocessing
0 likes · 7 min read
Practical Python Data Cleaning Functions
JD Tech
JD Tech
Jul 30, 2021 · Databases

Practical Use of HBase in a Logistics HR Data Preprocessing Platform

This article details how the logistics HR data preprocessing platform processes around 20 million daily records by adopting HBase for high‑performance, scalable, column‑oriented storage, covering its architecture, read/write mechanisms, best practices, and performance considerations.

Big DataHBaseNoSQL
0 likes · 10 min read
Practical Use of HBase in a Logistics HR Data Preprocessing Platform
Architecture Digest
Architecture Digest
Jun 21, 2021 · Databases

Using HBase for HR Performance Data Preprocessing Platform: Architecture, Concepts, and Best Practices

This article introduces the HR performance data preprocessing platform’s requirements, explains why HBase was selected as the storage solution, details its core concepts, architecture, data write/read processes, best practices, limitations, and presents performance metrics demonstrating its suitability for large‑scale, high‑throughput workloads.

Big DataDatabase ArchitectureHBase
0 likes · 12 min read
Using HBase for HR Performance Data Preprocessing Platform: Architecture, Concepts, and Best Practices
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Jun 20, 2021 · Big Data

Why HBase Is the Ideal Choice for Large‑Scale HR Data Preprocessing

This article explains how HBase’s distributed column‑oriented architecture, high‑performance read/write capabilities, and flexible schema make it a cost‑effective solution for handling massive, unstructured HR performance data, covering its core concepts, cluster operation, best practices, and performance metrics.

Big DataHBasedata preprocessing
0 likes · 11 min read
Why HBase Is the Ideal Choice for Large‑Scale HR Data Preprocessing
Python Programming Learning Circle
Python Programming Learning Circle
May 8, 2021 · Artificial Intelligence

Top 10 New Features in Scikit‑learn 0.24

The article reviews the most important additions in scikit‑learn 0.24, including faster hyper‑parameter search methods, ICE plots, histogram‑based boosting improvements, new feature‑selection tools, polynomial‑feature approximations, a semi‑supervised classifier, MAPE metric, enhanced OneHotEncoder and OrdinalEncoder handling, and a more flexible RFE interface.

Model EvaluationPythondata preprocessing
0 likes · 8 min read
Top 10 New Features in Scikit‑learn 0.24
DataFunTalk
DataFunTalk
Jan 23, 2021 · Artificial Intelligence

Feature Engineering: Mapping Raw Data to Machine‑Learning Features and Best Practices

This article explains how feature engineering transforms raw data into numerical representations for machine‑learning models, covering mapping of numeric and categorical values, one‑hot and multi‑hot encoding, sparse representations, scaling, handling outliers, binning, data quality checks, and feature interactions to capture non‑linear relationships.

data preprocessingencodingfeature engineering
0 likes · 20 min read
Feature Engineering: Mapping Raw Data to Machine‑Learning Features and Best Practices
Python Programming Learning Circle
Python Programming Learning Circle
Dec 18, 2020 · Fundamentals

Data Exploration and Cleaning: Core Concepts, Steps, and Example Workflow

This article explains the purpose of data exploration and cleaning, outlines core analysis tasks, details missing‑value and outlier handling techniques—including various imputation methods—and illustrates the complete workflow with example images and a histogram‑based distribution analysis.

data cleaningdata explorationdata preprocessing
0 likes · 3 min read
Data Exploration and Cleaning: Core Concepts, Steps, and Example Workflow
Taobao Frontend Technology
Taobao Frontend Technology
Oct 27, 2020 · Artificial Intelligence

Mastering Tensors in TensorFlow.js: From Scalars to Neural Networks

This guide explains the fundamentals of tensors in TensorFlow.js—including scalars, vectors, and higher‑dimensional tensors—demonstrates how to convert real‑world data such as the Titanic dataset into tensors, and shows how to build, compile, and train a simple neural network model using appropriate layers, loss functions, and optimizers.

JavaScriptNeural NetworkTensorFlow.js
0 likes · 7 min read
Mastering Tensors in TensorFlow.js: From Scalars to Neural Networks
TAL Education Technology
TAL Education Technology
Sep 17, 2020 · Artificial Intelligence

Comprehensive Guide to Feature Engineering and Data Preprocessing for Machine Learning

This article provides an extensive overview of feature engineering, covering feature understanding, cleaning, construction, selection, transformation, and dimensionality reduction techniques, illustrated with Python code using the Titanic dataset, and offers practical guidelines for improving data quality and model performance in machine learning projects.

PythonTitanic datasetdata preprocessing
0 likes · 44 min read
Comprehensive Guide to Feature Engineering and Data Preprocessing for Machine Learning
Alibaba Cloud Developer
Alibaba Cloud Developer
Aug 23, 2020 · Artificial Intelligence

Unlocking Powerful Features: A Deep Dive into Tianchi’s Repeat Purchase Prediction

This tutorial walks through the complete feature‑engineering pipeline for the Alibaba Tianchi “Tmall User Repeat Purchase Prediction” competition, covering data acquisition, memory‑efficient preprocessing, multi‑entity feature construction, statistical aggregations, text vectorisation, embedding generation and stacking‑based model features, all illustrated with Python code and diagrams.

Stackingdata preprocessingfeature engineering
0 likes · 16 min read
Unlocking Powerful Features: A Deep Dive into Tianchi’s Repeat Purchase Prediction
DataFunTalk
DataFunTalk
Aug 14, 2020 · Artificial Intelligence

Illustrated Guide to the Complete Machine Learning Workflow

This article presents a hand‑drawn, illustrated walkthrough of the entire machine‑learning pipeline—from dataset definition, exploratory data analysis, preprocessing, and data splitting to model building, algorithm selection, hyper‑parameter tuning, feature selection, and evaluation for both classification and regression tasks.

Model Evaluationclassificationcross-validation
0 likes · 17 min read
Illustrated Guide to the Complete Machine Learning Workflow
Python Programming Learning Circle
Python Programming Learning Circle
Jun 11, 2020 · Artificial Intelligence

Step-by-Step Guide to Building a Movie Recommendation System with TensorFlow

This tutorial walks through collecting and cleaning the MovieLens dataset, constructing rating and record matrices, normalizing ratings, defining a collaborative‑filtering model in TensorFlow, training it with Adam optimizer, evaluating performance, and finally generating personalized movie recommendations for a chosen user.

TensorFlowcollaborative filteringdata preprocessing
0 likes · 10 min read
Step-by-Step Guide to Building a Movie Recommendation System with TensorFlow
JD Tech Talk
JD Tech Talk
Jun 4, 2020 · Artificial Intelligence

The Art and Science of Feature Engineering: Importance, Methods, and Automation

Feature engineering, which occupies the majority of data scientists' time, is essential for building high‑performing machine‑learning models and involves careful data quality control, diverse construction techniques, rigorous selection, and emerging automation efforts, all of which demand domain expertise and systematic practice.

AIdata preprocessingfeature engineering
0 likes · 14 min read
The Art and Science of Feature Engineering: Importance, Methods, and Automation
JD Tech Talk
JD Tech Talk
May 29, 2020 · Artificial Intelligence

The Black Art of Feature Engineering: Importance, Techniques, and Automation

This article explains why feature engineering consumes most of a data scientist's time, outlines its critical steps—including data observation, cleaning, transformation, selection, and reduction—covers practical issues such as missing‑value handling, data leakage, and feature stability, and discusses both manual and automated approaches for building effective machine‑learning models.

data preprocessingfeature engineeringmachine learning
0 likes · 14 min read
The Black Art of Feature Engineering: Importance, Techniques, and Automation
Python Programming Learning Circle
Python Programming Learning Circle
May 21, 2020 · Artificial Intelligence

Time Series Forecasting and Anomaly Detection for API Traffic Using Seasonal Decomposition and ARIMA

The article presents a complete workflow for predicting next‑day API request volumes by exploring per‑minute traffic data, handling missing values, applying seasonal decomposition, training an ARIMA model on the trend component, and generating confidence intervals to flag anomalous spikes.

ARIMATime Seriesanomaly detection
0 likes · 12 min read
Time Series Forecasting and Anomaly Detection for API Traffic Using Seasonal Decomposition and ARIMA
Python Programming Learning Circle
Python Programming Learning Circle
Mar 6, 2020 · Artificial Intelligence

Introduction to Machine Learning Concepts: Data, Features, Labels, Training, and Common Algorithms

This article provides a beginner-friendly overview of machine learning fundamentals, covering the definition of data, the distinction between features and labels, types of features, dimensionality, training and test datasets, normalization, supervised and unsupervised learning methods, algorithm selection, development workflow, and recommended Python libraries such as NumPy.

Unsupervised Learningdata preprocessingfeatures
0 likes · 12 min read
Introduction to Machine Learning Concepts: Data, Features, Labels, Training, and Common Algorithms
360 Quality & Efficiency
360 Quality & Efficiency
Jan 17, 2020 · Artificial Intelligence

File Release Application Prediction Model Using GBDT

This article describes how a GBDT‑based prediction model was built to forecast file release application parameters such as volume ratio, target audience, and gray level, covering data collection, feature engineering, model training, service deployment, and practical considerations for handling bad cases.

GBDTdata preprocessingfile release
0 likes · 8 min read
File Release Application Prediction Model Using GBDT
Tencent Cloud Developer
Tencent Cloud Developer
Dec 3, 2019 · Artificial Intelligence

Feature Engineering Practices for Short‑Video Recommendation Systems

Effective short‑video recommendation relies on meticulous feature engineering that transforms raw signals—numerical counts, categorical IDs, content and user embeddings, context and session data—through bucketization, scaling, crossing, and smoothing, then selects and evaluates them via filtering, wrapping, regularization, and importance analysis to mitigate business biases and improve multi‑objective ranking performance.

Embeddingbias mitigationdata preprocessing
0 likes · 32 min read
Feature Engineering Practices for Short‑Video Recommendation Systems
58 Tech
58 Tech
Nov 11, 2019 · Artificial Intelligence

Design and Implementation of the 58 Car Price Estimation System Using Machine Learning

The article describes the end‑to‑end architecture, data collection, preprocessing, feature engineering, model selection, training, and hyper‑parameter tuning of 58’s car price estimation platform, which leverages Spark, XGBoost, LightGBM and custom business rules to predict vehicle resale values.

LightGBMXGBoostcar price estimation
0 likes · 11 min read
Design and Implementation of the 58 Car Price Estimation System Using Machine Learning
MaGe Linux Operations
MaGe Linux Operations
Mar 1, 2019 · Artificial Intelligence

Master Python Data Mining & Machine Learning: From Preprocessing to Classification

This comprehensive guide introduces data mining and machine learning concepts, walks through Python data preprocessing techniques, reviews common classification algorithms, demonstrates an Iris flower classification case, and offers practical tips for selecting the most suitable algorithm for a given problem.

Classification AlgorithmsPythondata mining
0 likes · 21 min read
Master Python Data Mining & Machine Learning: From Preprocessing to Classification
Qunar Tech Salon
Qunar Tech Salon
Sep 19, 2018 · Artificial Intelligence

Logistic Regression Tutorial with scikit-learn

This article introduces logistic regression, explains its theoretical basis, details key scikit-learn parameters, and provides a complete Python example for breast cancer classification, covering data preprocessing, model training, prediction, and evaluation with classification reports.

Pythonclassificationdata preprocessing
0 likes · 7 min read
Logistic Regression Tutorial with scikit-learn
Architecture Digest
Architecture Digest
Jul 12, 2018 · Artificial Intelligence

How to Choose the Right Machine Learning Algorithm

This article explains that there is no universal solution for selecting machine learning algorithms and outlines practical factors—such as data characteristics, problem type, business constraints, and algorithm complexity—to help practitioners systematically narrow down and pick the most suitable models.

Model Evaluationalgorithm selectiondata preprocessing
0 likes · 14 min read
How to Choose the Right Machine Learning Algorithm
Tencent Advertising Technology
Tencent Advertising Technology
Jun 12, 2018 · Artificial Intelligence

Insights on Data Preprocessing, Modeling, and Mindset from a Tencent Advertising Algorithm Competition Participant

A participant from Harbin Institute of Technology shares practical data‑preprocessing tricks, model choices, useful feature ideas, and a resilient mindset gained while competing in the Tencent Advertising Algorithm Contest, offering tips that can help other data scientists handle large‑scale ad data.

Mindsetcompetitiondata preprocessing
0 likes · 5 min read
Insights on Data Preprocessing, Modeling, and Mindset from a Tencent Advertising Algorithm Competition Participant
MaGe Linux Operations
MaGe Linux Operations
Apr 8, 2018 · Artificial Intelligence

Master Python Data Mining & Machine Learning: From Preprocessing to Classification

This comprehensive tutorial walks you through Python data mining and machine learning fundamentals, covering data preprocessing techniques, common classification algorithms, an Iris flower classification case study, and practical tips for selecting the right algorithm, all illustrated with clear code examples and visualizations.

Classification AlgorithmsNaive BayesPython
0 likes · 22 min read
Master Python Data Mining & Machine Learning: From Preprocessing to Classification
Baobao Algorithm Notes
Baobao Algorithm Notes
Mar 25, 2018 · Artificial Intelligence

How to Crush the Kaggle Toxic Comment Challenge: Data Prep, Models, and Ensemble Secrets

This article breaks down the Kaggle toxic comment classification competition, detailing thorough data cleaning, advanced word‑vector techniques, pseudo‑labeling, BPE tokenization, diverse neural models and ensemble strategies, and shares practical insights and pitfalls from the author's nine‑month competition journey.

BPEKaggleNLP
0 likes · 9 min read
How to Crush the Kaggle Toxic Comment Challenge: Data Prep, Models, and Ensemble Secrets
Architecture Digest
Architecture Digest
Feb 14, 2018 · Artificial Intelligence

Comparative Analysis and Optimization of Machine Learning Models on the UCI Census Income Dataset

This article walks through a complete machine‑learning workflow on the UCI Census Income dataset, covering data exploration, preprocessing (including log‑transformation and scaling), model training with Naïve Bayes, Decision Tree and SVM, performance evaluation, hyper‑parameter tuning via grid search, feature importance analysis, and feature selection, providing code snippets and visualizations.

Model EvaluationPythondata preprocessing
0 likes · 24 min read
Comparative Analysis and Optimization of Machine Learning Models on the UCI Census Income Dataset
Architects' Tech Alliance
Architects' Tech Alliance
Dec 3, 2016 · Fundamentals

Effective Data Cleaning Practices and Tips

This article provides practical guidance on data cleaning, covering the importance of data wrangling, using assertions, handling incomplete records, checkpointing, testing on subsets, logging, optional raw data storage, and validating the cleaned dataset to ensure reliable downstream analysis.

Checkpointassertionsdata cleaning
0 likes · 7 min read
Effective Data Cleaning Practices and Tips
Qunar Tech Salon
Qunar Tech Salon
Aug 14, 2015 · Big Data

The Nine Laws of Data Mining: Principles, Processes, and Insights

This article presents nine fundamental laws of data mining—covering goals, knowledge, preparation, experimentation, patterns, insight, prediction, value, and change—explaining how business objectives and domain expertise drive each stage of the CRISP‑DM process and why technical metrics alone cannot guarantee success.

CRISP-DMPredictive Modelingbusiness knowledge
0 likes · 19 min read
The Nine Laws of Data Mining: Principles, Processes, and Insights