Tag

data cleaning

0 views collected around this technical thread.

php中文网 Courses
php中文网 Courses
May 7, 2025 · Fundamentals

Comprehensive Guide to Pandas Data Processing in Python

This tutorial provides a detailed overview of Pandas, covering its core data structures, data import/export, selection, cleaning, aggregation, merging, and a practical sales analysis example, with complete code snippets for each operation.

Pythondata aggregationdata analysis
0 likes · 8 min read
Comprehensive Guide to Pandas Data Processing in Python
Sohu Tech Products
Sohu Tech Products
Apr 16, 2025 · Artificial Intelligence

Comprehensive Guide to Building AI Datasets: From Source Collection to Data Augmentation and Validation

This guide walks readers through every stage of building high‑quality AI training datasets—from locating open‑source data and defining goals, through collection, annotation, cleaning, large‑scale processing, optional augmentation, and splitting, to validation—using a medical QA example for fine‑tuning DeepSeek‑R1.

AI fine‑tuningPythondata augmentation
0 likes · 18 min read
Comprehensive Guide to Building AI Datasets: From Source Collection to Data Augmentation and Validation
DaTaobao Tech
DaTaobao Tech
Mar 31, 2025 · Artificial Intelligence

AI Audio Generation and Voice Synthesis Practices at Taobao

The article surveys Taobao’s AI‑generated audio pipeline, detailing eight technical papers on image‑to‑video, OpenAI o1, multimodal video, and large‑model voice synthesis, while highlighting advances like VALL‑E, CosyVoice, F5‑TTS, data‑cleaning methods, and e‑commerce applications such as voice‑cloned live streams, multilingual TTS, AI video‑audio integration, and audiobook production.

AI audioTTSdata cleaning
0 likes · 11 min read
AI Audio Generation and Voice Synthesis Practices at Taobao
Python Programming Learning Circle
Python Programming Learning Circle
Mar 6, 2025 · Fundamentals

CSV Trimming: A Python Package for Cleaning Messy CSV Files

CSV Trimming is a lightweight Python library that transforms irregular, poorly formatted CSV files into clean, well‑structured tables with a single line of code, supporting basic trimming as well as advanced row‑correlation handling for complex datasets.

CSVData ProcessingPython
0 likes · 5 min read
CSV Trimming: A Python Package for Cleaning Messy CSV Files
Python Programming Learning Circle
Python Programming Learning Circle
Dec 25, 2024 · Fundamentals

Using pandas‑profiling for Fast Exploratory Data Analysis in Python

This article introduces pandas‑profiling as a powerful Python library for automating exploratory data analysis, compares it with R's skimr and pandas.describe(), shows quick installation and usage examples, and explains how to customize reports via code or YAML configuration for small to medium datasets.

EDAdata analysisdata cleaning
0 likes · 8 min read
Using pandas‑profiling for Fast Exploratory Data Analysis in Python
Test Development Learning Exchange
Test Development Learning Exchange
Nov 19, 2024 · Fundamentals

Sales Data Analysis Project: Reading, Cleaning, Aggregating, and Visualizing with Python

This tutorial guides you through a comprehensive sales data project that covers reading a CSV file, cleaning missing and duplicate entries, grouping by department to compute average sales, and visualizing the results with line and bar charts using pandas and matplotlib.

CSVMatplotlibPython
0 likes · 5 min read
Sales Data Analysis Project: Reading, Cleaning, Aggregating, and Visualizing with Python
Test Development Learning Exchange
Test Development Learning Exchange
Nov 17, 2024 · Fundamentals

Basic Data Cleaning Techniques with Pandas

This tutorial teaches fundamental data cleaning with Pandas, covering how to handle missing values, rename columns, and remove duplicate rows through clear explanations and complete code examples.

data cleaningduplicate rowsmissing values
0 likes · 6 min read
Basic Data Cleaning Techniques with Pandas
Test Development Learning Exchange
Test Development Learning Exchange
Nov 10, 2024 · Fundamentals

20 Essential Pandas Data Processing Methods with Code Examples

This article provides a comprehensive overview of 20 essential Pandas data processing methods with detailed code examples covering statistics, data cleaning, transformation, filtering, merging, grouping, sorting, reshaping, aggregation, window functions, time series analysis, conditional selection, indexing, slicing, visualization, type conversion, data filling, filtering, renaming, and import/export operations.

Data ProcessingData TransformationPython
0 likes · 16 min read
20 Essential Pandas Data Processing Methods with Code Examples
Test Development Learning Exchange
Test Development Learning Exchange
Oct 28, 2024 · Big Data

Data Preprocessing with Pandas: A Comprehensive Guide

This article provides a comprehensive guide to data preprocessing using Pandas, covering essential steps like data cleaning, feature engineering, and data transformation for machine learning projects.

Categorical EncodingDataset Splittingdata cleaning
0 likes · 5 min read
Data Preprocessing with Pandas: A Comprehensive Guide
Test Development Learning Exchange
Test Development Learning Exchange
Sep 1, 2024 · Fundamentals

Python Utility Scripts for Data Cleaning, Translation, File Sync, Cloud Backup, and More

This article presents a collection of Python utility scripts that demonstrate how to clean CSV data, translate text files, synchronize folders, upload files to S3, count directory contents, classify files by type, perform OCR on images, convert video to audio, extract images from webpages, and generate text summaries using modern libraries.

AIPythoncloud storage
0 likes · 6 min read
Python Utility Scripts for Data Cleaning, Translation, File Sync, Cloud Backup, and More
Test Development Learning Exchange
Test Development Learning Exchange
Jul 14, 2024 · Fundamentals

Using pandas fillna() to Handle Missing Data: 10 Practical Examples

This article introduces pandas' fillna() method and demonstrates ten practical examples—including basic filling, column‑specific values, forward/backward filling, limiting fills, using other DataFrames, functions, conditional fills, dictionaries, and Series—to help developers effectively handle missing data in Python data analysis.

data cleaningfillnamissing data
0 likes · 6 min read
Using pandas fillna() to Handle Missing Data: 10 Practical Examples
Test Development Learning Exchange
Test Development Learning Exchange
Jul 4, 2024 · Fundamentals

Parameterized Excel Data Processing with Python: Techniques and Code Examples

This article explains how to use Python's parameterized approach to efficiently read, clean, transform, summarize, visualize, and export Excel data through reusable functions and code snippets, enabling flexible and automated data workflows.

Exceldata cleaningdata-processing
0 likes · 6 min read
Parameterized Excel Data Processing with Python: Techniques and Code Examples
Python Programming Learning Circle
Python Programming Learning Circle
Jun 5, 2024 · Big Data

Parallel Processing of Large CSV Files with multiprocessing, joblib, and tqdm in Python

This tutorial demonstrates how to accelerate processing of a multi‑million‑row CSV dataset by dividing the work into parallel tasks using Python's multiprocessing, joblib, and tqdm libraries, comparing serial, multi‑process, batch, and process‑map approaches with detailed timing results.

Parallel ProcessingPythonbig data
0 likes · 8 min read
Parallel Processing of Large CSV Files with multiprocessing, joblib, and tqdm in Python
Test Development Learning Exchange
Test Development Learning Exchange
May 21, 2024 · Artificial Intelligence

Step-by-Step Data Analysis and Machine Learning Workflow with Pandas, Matplotlib, and Scikit-learn

This guide walks through loading CSV data with pandas, cleaning missing values, filtering, grouping, visualizing, performing correlation and time‑series analysis, detecting outliers, and applying linear and logistic regression models using scikit‑learn, all illustrated with complete Python code snippets.

Pythondata cleaningmachine learning
0 likes · 6 min read
Step-by-Step Data Analysis and Machine Learning Workflow with Pandas, Matplotlib, and Scikit-learn
Python Programming Learning Circle
Python Programming Learning Circle
May 18, 2024 · Fundamentals

Pandas Data Modification, Iteration, and Function Application Techniques

This article provides a comprehensive guide to using Pandas for data cleaning and transformation, covering value modification, replacement, filling missing data, renaming, column addition, row insertion, merging, deletion, advanced filtering, iteration methods, and applying functions such as pipe, apply, agg, and transform.

FunctionsIterationdata cleaning
0 likes · 9 min read
Pandas Data Modification, Iteration, and Function Application Techniques
Test Development Learning Exchange
Test Development Learning Exchange
Apr 30, 2024 · Fundamentals

Using Python's filter() Function: Scenarios and Example Code

The article explains Python’s built‑in filter() function, outlines various practical scenarios such as data cleaning, conditional selection, and preprocessing, and provides ten clear code examples ranging from filtering even numbers to extracting primes and non‑empty dictionaries.

FilterFunctional ProgrammingPython
0 likes · 6 min read
Using Python's filter() Function: Scenarios and Example Code
Python Programming Learning Circle
Python Programming Learning Circle
Apr 28, 2024 · Fundamentals

Data Cleaning Techniques in Python: 21 Practical Examples and Code

This tutorial explains data cleaning concepts, key quality dimensions, and demonstrates 21 practical Python examples—including regex phone cleaning, temperature conversion, missing‑value detection, visualization with missingno, and record linkage using fuzzy matching—providing clear code snippets and step‑by‑step guidance for reliable data analysis.

Pythondata cleaningmissing data
0 likes · 20 min read
Data Cleaning Techniques in Python: 21 Practical Examples and Code
Python Programming Learning Circle
Python Programming Learning Circle
Apr 27, 2024 · Fundamentals

Data Cleaning Techniques in Python: 21 Practical Examples and Code

This article provides a comprehensive guide to data cleaning in Python, covering common data issues, methods for handling missing values, duplicates, categorical inconsistencies, and text normalization, illustrated with 21 detailed code examples using pandas and matplotlib.

Pythondata analysisdata cleaning
0 likes · 16 min read
Data Cleaning Techniques in Python: 21 Practical Examples and Code
Python Programming Learning Circle
Python Programming Learning Circle
Mar 22, 2024 · Fundamentals

Using FuzzyWuzzy for Fuzzy String Matching in Python

This article introduces the FuzzyWuzzy Python library, explains its underlying Levenshtein distance algorithm, demonstrates how to install it, describes the key functions in the fuzz and process modules, and provides practical examples for matching company names and province fields with complete code snippets.

FuzzyWuzzyLevenshteinPython
0 likes · 10 min read
Using FuzzyWuzzy for Fuzzy String Matching in Python
Model Perspective
Model Perspective
Feb 13, 2024 · Big Data

Mastering Noisy Data: From Cleaning to Visualization and NLP with Python

This article reviews the key concepts from the Bad Data Handbook, covering noise identification, data validation, human readability, web data restructuring, special domain challenges, and data quality analysis, while also presenting practical data visualization techniques, popular analysis tools, Python web‑scraping libraries, and a basic NLP workflow with code examples.

NLPPythonWeb Scraping
0 likes · 20 min read
Mastering Noisy Data: From Cleaning to Visualization and NLP with Python