Tagged articles
128 articles
Page 1 of 2
DataFunSummit
DataFunSummit
Sep 14, 2025 · Artificial Intelligence

How AI is Revolutionizing Chemistry and Drug Discovery: From Data to Breakthroughs

This article explores how AI-driven models and data pipelines are transforming the chemistry and pharmaceutical sectors by accelerating drug design, improving protein‑antibody predictions, automating patent data extraction, and outlining future goals for end‑to‑end AI‑enabled scientific discovery.

AI for ScienceChemistry AILarge Language Models
0 likes · 13 min read
How AI is Revolutionizing Chemistry and Drug Discovery: From Data to Breakthroughs
Kuaishou Tech
Kuaishou Tech
Jul 29, 2025 · Artificial Intelligence

How Kuaishou’s 8 Groundbreaking Papers Are Shaping AI at KDD 2025

Eight Kuashou research papers covering recommendation systems, multi‑task learning, multimodal large models, large language models, and combinatorial optimization have been accepted to the premier AI data‑mining conference KDD 2025, highlighting the company’s cutting‑edge innovations and their potential impact on the field.

AIMultimodal LearningRecommendation Systems
0 likes · 18 min read
How Kuaishou’s 8 Groundbreaking Papers Are Shaping AI at KDD 2025
AI Code to Success
AI Code to Success
Mar 12, 2025 · Artificial Intelligence

Mastering K‑Means: Theory, Implementation, and Real‑World Applications

This comprehensive guide explores the K‑Means clustering algorithm, covering its mathematical foundation, step‑by‑step procedure, centroid initialization strategies, practical implementation with Python’s Scikit‑learn on the Iris dataset, evaluation metrics, optimization techniques, and diverse applications ranging from image segmentation to bioinformatics.

K-MeansPythonalgorithm
0 likes · 31 min read
Mastering K‑Means: Theory, Implementation, and Real‑World Applications
Python Programming Learning Circle
Python Programming Learning Circle
Jan 2, 2025 · Artificial Intelligence

A Comprehensive Guide to Dimensionality Reduction Algorithms with Python Implementations

This article introduces eleven classic dimensionality reduction techniques—including PCA, LDA, MDS, LLE, and t‑SNE—explains their principles, advantages, and limitations, and provides complete Python code examples and resources for each method, making it a valuable guide for beginners in machine learning and data mining.

PCAdata miningdimensionality reduction
0 likes · 17 min read
A Comprehensive Guide to Dimensionality Reduction Algorithms with Python Implementations
JavaEdge
JavaEdge
Oct 7, 2024 · Big Data

Master Data Analysis: From Collection to Visualization

This guide explains why data analysis is essential, breaks it into three core stages—data collection, data mining, and data visualization—offers practical tool recommendations, and presents principles for efficient learning and skill development.

Big DataData visualizationPython
0 likes · 10 min read
Master Data Analysis: From Collection to Visualization
Software Development Quality
Software Development Quality
Oct 7, 2024 · Fundamentals

8 Essential Data Analysis Techniques Every Analyst Should Master

This article introduces eight core data analysis methods—including association, comparative, clustering, cross, Pareto, quadrant, funnel, and full‑path analysis—explaining their principles, typical use cases, key metrics, and visual examples to help professionals make data‑driven decisions.

data miningstatistical methods
0 likes · 11 min read
8 Essential Data Analysis Techniques Every Analyst Should Master
AntTech
AntTech
Sep 5, 2024 · Artificial Intelligence

Ant InTech Technology Award Announces First Ten Young Scholars and Their Research Areas

On September 5 at the 2024 Inclusion·Bund Conference, Ant InTech announced its first ten award-winning young scholars from top Chinese universities, highlighting their research in artificial intelligence, data processing, cloud computing, security, and related fields, each receiving a 200,000‑RMB grant.

Ant GroupArtificial IntelligenceInTech Award
0 likes · 4 min read
Ant InTech Technology Award Announces First Ten Young Scholars and Their Research Areas
AntTech
AntTech
Aug 28, 2024 · Artificial Intelligence

Ant Group’s Selected Papers at KDD2024: Abstracts and Highlights

The article presents a curated collection of Ant Group's research papers accepted at KDD2024, summarizing each paper's title, type, link, source, relevant fields, and abstract, covering topics such as graph mining, large language models, fraud detection, recommendation systems, and multimodal medical AI.

AI researchAnt GroupKDD2024
0 likes · 31 min read
Ant Group’s Selected Papers at KDD2024: Abstracts and Highlights
Meituan Technology Team
Meituan Technology Team
Aug 8, 2024 · Artificial Intelligence

BlackPearl Team Wins All Three Tracks of KDD 2024 OAG‑Challenge Cup with Large‑Model Solutions

The BlackPearl team from Meituan’s Dazhong Dianping division swept all three KDD 2024 OAG‑Challenge Cup tracks—WhoIsWho, PST, and AQA—by deploying innovative large‑model techniques such as iterative text clustering, graft‑learning‑enhanced BERT RAG pipelines, and a Boosting LLM‑for‑Vector search, and have released the code publicly on GitHub.

Academic DisambiguationKDD CupPaper Retrieval
0 likes · 4 min read
BlackPearl Team Wins All Three Tracks of KDD 2024 OAG‑Challenge Cup with Large‑Model Solutions
Architect
Architect
Jul 19, 2024 · Artificial Intelligence

Can Machine Learning Beat the Odds? A Deep Dive into Football Match Prediction

This article presents a data‑driven football match prediction system that extracts match features, builds machine‑learning models—including linear, SVM, random forest, and deep neural networks—and evaluates their accuracy on European league data, then analyzes betting strategies, limitations, and extensions to stock forecasting.

Artificial IntelligenceModel Evaluationdata mining
0 likes · 24 min read
Can Machine Learning Beat the Odds? A Deep Dive into Football Match Prediction
Tencent Cloud Developer
Tencent Cloud Developer
Jul 4, 2024 · Artificial Intelligence

Football Match Outcome Prediction and Betting Strategy Using Machine Learning

The study combines team statistics and bookmaker odds with machine‑learning models—including Poisson, regression, Bayesian, SVM, Random Forest, DNN, and LSTM—to predict football match outcomes, identify confidence‑based betting intervals that yield profit, and suggests extensions to broader data, features, and financial trading.

Random Forestdata miningfootball prediction
0 likes · 23 min read
Football Match Outcome Prediction and Betting Strategy Using Machine Learning
Python Programming Learning Circle
Python Programming Learning Circle
Jun 21, 2024 · Artificial Intelligence

Using scikit-learn for Data Mining: Feature Engineering, Parallel Processing, Pipelines, and Model Persistence

This article demonstrates how to perform data mining with scikit-learn by detailing the full workflow—from data acquisition and feature engineering, through parallel and pipeline processing, to automated hyper‑parameter tuning and model persistence—using the Iris dataset as an example.

Pipelinedata miningfeature engineering
0 likes · 13 min read
Using scikit-learn for Data Mining: Feature Engineering, Parallel Processing, Pipelines, and Model Persistence
DataFunSummit
DataFunSummit
Jun 2, 2024 · Artificial Intelligence

Construction and Application of a User Profile Tag System: Methods, Platforms, and Use Cases

This article presents a comprehensive overview of building a user profile tag system—including tag taxonomy, platform architecture, construction methods, update cycles, access patterns, common algorithmic tags, and real‑world applications such as marketing, metric attribution, and A/B testing—illustrated with examples and a detailed Q&A session from a data‑mining senior manager at Qunar.

AB testingcausal inferencedata mining
0 likes · 21 min read
Construction and Application of a User Profile Tag System: Methods, Platforms, and Use Cases
Model Perspective
Model Perspective
May 13, 2024 · Fundamentals

How to Identify and Quantify Core Variables for Better Decision‑Making

The article explains why pinpointing core variables is crucial, outlines domain‑knowledge and technical methods such as sensitivity analysis and data mining to discover them, and describes practical ways to turn those variables into quantitative indicators like scoring systems, composite indices, and real‑world examples.

Metricscore variablesdata mining
0 likes · 10 min read
How to Identify and Quantify Core Variables for Better Decision‑Making
DataFunTalk
DataFunTalk
Mar 6, 2024 · Artificial Intelligence

Construction and Practical Application of a User Profile Tagging System

This article details the design, integration, and operational practices of a comprehensive user and item profiling tag system, covering tag taxonomy, construction methods, update cycles, access strategies, algorithmic implementations, and real‑world applications such as marketing, attribution analysis, and A/B testing.

AB testingTagging Systemdata mining
0 likes · 20 min read
Construction and Practical Application of a User Profile Tagging System
Test Development Learning Exchange
Test Development Learning Exchange
Jan 26, 2024 · Artificial Intelligence

Data Mining Techniques for Marketing: Customer Segmentation, Purchase Prediction, Recommendation, and More with Python

This article introduces ten data‑mining applications for marketing—including customer segmentation, purchase forecasting, market‑basket analysis, churn prediction, sentiment analysis, response modeling, recommendation systems, brand reputation, competitive analysis, and public‑opinion monitoring—each illustrated with concise Python code examples.

Customer SegmentationPredictionPython
0 likes · 11 min read
Data Mining Techniques for Marketing: Customer Segmentation, Purchase Prediction, Recommendation, and More with Python
Test Development Learning Exchange
Test Development Learning Exchange
Jan 7, 2024 · Big Data

Association Rule Mining Applications Across Various Business Scenarios with Python Code

This article demonstrates how to apply the Apriori algorithm and association rule mining using Python's mlxtend library across ten real‑world business scenarios, providing step‑by‑step code examples for retail, e‑commerce, marketing, healthcare, security, CRM, social networks, travel, market basket, and online advertising.

AprioriPythonassociation rule mining
0 likes · 9 min read
Association Rule Mining Applications Across Various Business Scenarios with Python Code
DataFunSummit
DataFunSummit
Dec 20, 2023 · Artificial Intelligence

Building and Applying an Image Tagging System: Architecture, Tag Design, Algorithms, and Business Use Cases

This presentation by senior data mining manager Zhou Yuanwei of Qunar outlines the architecture of an image tagging platform, the construction of a comprehensive tagging system, common algorithmic tags, and real-world applications such as look‑alike marketing, A/B test efficiency analysis, and business attribution, helping audiences understand tag types, design considerations, and value‑driven use cases.

AB testingBusiness Analyticsdata mining
0 likes · 2 min read
Building and Applying an Image Tagging System: Architecture, Tag Design, Algorithms, and Business Use Cases
Meituan Technology Team
Meituan Technology Team
Aug 10, 2023 · Artificial Intelligence

Selected Meituan Technical Papers from KDD 2023: Summaries of Seven Research Works

The article showcases seven Meituan research papers accepted at KDD 2023—spanning feed‑stream, cross‑domain, takeaway, bonus allocation, contour‑based segmentation, living‑needs prediction, and multilingual recommendation—detailing their novel methods, real‑world deployments, and concluding with an invitation for academic collaboration.

Artificial IntelligenceKDD 2023Meituan
0 likes · 17 min read
Selected Meituan Technical Papers from KDD 2023: Summaries of Seven Research Works
DataFunTalk
DataFunTalk
Jul 24, 2023 · Artificial Intelligence

Session Analytics: User Path Analysis, Data Processing, and Algorithm Mining

This article introduces user path analysis and the SessionAnalytics open‑source framework, covering business scenarios, technical architecture, data integration, session segmentation, data cleaning, sampling, graph structures, NLP‑based mining, clustering, and visualization techniques for extracting insights from large‑scale user behavior data.

NLPdata miningsession analytics
0 likes · 19 min read
Session Analytics: User Path Analysis, Data Processing, and Algorithm Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
May 8, 2023 · Artificial Intelligence

How to Choose the Right Features for Python Machine Learning Projects

This article explains Python machine‑learning basics, covering data splitting, feature and label concepts, key factors for feature selection, and practical tips for building predictive models, while also offering code snippets and visual illustrations to help readers apply these techniques effectively.

AIdata miningfeature selection
0 likes · 6 min read
How to Choose the Right Features for Python Machine Learning Projects
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 28, 2022 · Big Data

Comprehensive Guide to Big Data Interview Topics: Log Collection, Data Synchronization, Offline Development, Real‑time Technology, Data Services, and Data Mining

This article provides an extensive overview of big‑data interview subjects, covering browser and mobile log collection methods, data synchronization techniques (batch, real‑time, sharding), offline data development platforms, streaming architectures, data service evolution, performance optimization, and data‑mining layers and applications.

Big DataStreamingdata mining
0 likes · 17 min read
Comprehensive Guide to Big Data Interview Topics: Log Collection, Data Synchronization, Offline Development, Real‑time Technology, Data Services, and Data Mining
Baidu Intelligent Testing
Baidu Intelligent Testing
Oct 19, 2022 · Artificial Intelligence

Intelligent Test Evaluation: Risk Dimension Mining, Admission Assessment, Multi‑Dimensional Activity Data Mining, and Model‑Based Risk Evaluation

This article presents an end‑to‑end intelligent testing framework that mines development‑stage risk dimensions, conducts admission risk assessment, extracts multi‑dimensional activity data such as coverage metrics, and applies model‑based risk evaluation to guide quality‑assurance decisions and improve release safety.

Artificial IntelligenceModelingSoftware Testing
0 likes · 11 min read
Intelligent Test Evaluation: Risk Dimension Mining, Admission Assessment, Multi‑Dimensional Activity Data Mining, and Model‑Based Risk Evaluation
Baidu Geek Talk
Baidu Geek Talk
Oct 18, 2022 · Artificial Intelligence

Intelligent Test Evaluation and Risk Assessment in Software Quality Assurance

The article describes an intelligent test‑evaluation framework that gathers performance data, quantifies project, personnel, and code risk dimensions, feeds them into rule‑based and logistic‑regression models to produce risk scores and risk‑driven testing plans, and demonstrates how this approach identified thousands of high‑risk projects, prevented hundreds of bugs, and saved thousands of person‑days.

Software Testingdata miningrisk assessment
0 likes · 9 min read
Intelligent Test Evaluation and Risk Assessment in Software Quality Assurance
MaGe Linux Operations
MaGe Linux Operations
Sep 8, 2022 · Artificial Intelligence

Master 10 Popular Clustering Algorithms in Python with Scikit‑Learn

This tutorial introduces unsupervised clustering, explains its purpose, and walks through installing scikit‑learn and implementing ten popular clustering algorithms—including AffinityPropagation, Agglomerative, BIRCH, DBSCAN, K‑Means, Mini‑Batch K‑Means, MeanShift, OPTICS, Spectral Clustering, and Gaussian Mixture—complete with code examples and visualizations.

Unsupervised Learningclusteringdata mining
0 likes · 27 min read
Master 10 Popular Clustering Algorithms in Python with Scikit‑Learn
Model Perspective
Model Perspective
Sep 4, 2022 · Fundamentals

Grey Relational Analysis: A Powerful Tool for Comprehensive Evaluation

The article explains the principles of grey system theory, introduces grey relational analysis as a method for handling sparse information, outlines its mathematical foundations, step‑by‑step modeling process, and demonstrates how the grey comprehensive evaluation method can rank and compare multiple alternatives without requiring large sample sizes or strict statistical assumptions.

Comprehensive EvaluationMethodologydata mining
0 likes · 14 min read
Grey Relational Analysis: A Powerful Tool for Comprehensive Evaluation
MaGe Linux Operations
MaGe Linux Operations
Jul 29, 2022 · Artificial Intelligence

Master 10 Popular Clustering Algorithms in Python with Scikit‑Learn

This tutorial introduces clustering, explains why no single algorithm fits all data, and provides step‑by‑step Python examples using scikit‑learn for ten popular unsupervised learning methods, complete with code snippets and visualizations to illustrate results.

PythonUnsupervised Learningclustering
0 likes · 24 min read
Master 10 Popular Clustering Algorithms in Python with Scikit‑Learn
Model Perspective
Model Perspective
Jun 17, 2022 · Artificial Intelligence

What Is Classification in Data Mining? Types, Models, and Key Applications

The article explains classification as a data‑analysis task that builds models to assign new observations to predefined categories, outlines its implementation steps, describes various data types (boolean, nominal, ordinal, continuous, discrete), presents common machine‑learning classifiers such as decision trees and neural networks, and highlights practical applications like crime detection, disease risk prediction, and credit assessment.

Model Evaluationclassificationdata mining
0 likes · 5 min read
What Is Classification in Data Mining? Types, Models, and Key Applications
Python Programming Learning Circle
Python Programming Learning Circle
Apr 14, 2022 · Artificial Intelligence

Top Clustering Algorithms in Python with scikit-learn: A Comprehensive Tutorial

This tutorial explains clustering as an unsupervised learning task, outlines why no single algorithm fits all data, and provides step‑by‑step Python code using scikit‑learn to install the library, generate synthetic datasets, and apply ten popular clustering algorithms with visualizations.

PythonUnsupervised Learningclustering
0 likes · 21 min read
Top Clustering Algorithms in Python with scikit-learn: A Comprehensive Tutorial
DataFunTalk
DataFunTalk
Mar 18, 2022 · Artificial Intelligence

Alternative Data Mining: From 19th‑Century Cholera Mapping to Modern AI‑Driven Risk Modeling

This talk reviews the concept of alternative data, illustrates its early use in John Snow's cholera map, explores contemporary AI‑powered systems such as IBM's Debater and satellite‑based poverty estimation, and presents the speaker's own research on using unconventional data for financial‑market risk detection and prediction.

Artificial IntelligenceRisk ModelingSatellite Imagery
0 likes · 14 min read
Alternative Data Mining: From 19th‑Century Cholera Mapping to Modern AI‑Driven Risk Modeling
NetEase LeiHuo UX Big Data Technology
NetEase LeiHuo UX Big Data Technology
Feb 26, 2022 · Fundamentals

Applying DMAIC and the Five‑Layer UX Model to Data Product Design

The article explains how the DMAIC framework from Six Sigma and the five‑layer user‑experience model can be combined to guide the definition, measurement, analysis, improvement, and control of data products, especially in gaming contexts, emphasizing systematic design, visualization, and iterative refinement.

Data Product DesignSix Sigmadata mining
0 likes · 8 min read
Applying DMAIC and the Five‑Layer UX Model to Data Product Design
Meituan Technology Team
Meituan Technology Team
Jan 6, 2022 · Artificial Intelligence

Multi-domain Modeling and AutoML Techniques from Kaggle/KDD Cup Championships

Drawing on seven Kaggle and KDD Cup victories, the article outlines a multi‑domain modeling optimization strategy—covering recommendation, time‑series, and AutoML problems—alongside a three‑module AutoML pipeline and a three‑stage workflow that emphasize systematic evaluation, bias‑variance balance, and robust model‑fusion for competition and industry success.

AutoMLKDD CupKaggle
0 likes · 37 min read
Multi-domain Modeling and AutoML Techniques from Kaggle/KDD Cup Championships
YunZhu Net Technology Team
YunZhu Net Technology Team
Dec 17, 2021 · Artificial Intelligence

Understanding Recommendation Systems for B2B Construction E‑Commerce

This article explains why recommendation systems are essential for B2B construction e‑commerce, describes the types of data they rely on, outlines multi‑channel recall methods, details collaborative‑filtering algorithms with similarity calculations, and presents the four‑stage recommendation pipeline from recall to re‑ranking.

Artificial IntelligenceB2B e-commercecollaborative filtering
0 likes · 11 min read
Understanding Recommendation Systems for B2B Construction E‑Commerce
DataFunSummit
DataFunSummit
Sep 29, 2021 · Artificial Intelligence

Construction and Application of Retail Product Knowledge Graph at Meituan

This article describes how Meituan builds a multi‑level, multi‑dimensional retail product knowledge graph to support new‑retail scenarios, detailing its architecture, data acquisition challenges, labeling pipelines, attribute extraction methods, efficiency improvements, human‑machine collaboration, and downstream search and recommendation applications.

AIMeituanRetail
0 likes · 25 min read
Construction and Application of Retail Product Knowledge Graph at Meituan
IT Architects Alliance
IT Architects Alliance
Sep 25, 2021 · Big Data

Top 10 Classic Data Mining Algorithms and Their Core Characteristics

This article introduces the ten classic data‑mining algorithms selected by IEEE ICDM—C4.5, k‑Means, SVM, Apriori, EM, PageRank, AdaBoost, k‑NN, Naive Bayes, and CART—explaining their main ideas, advantages, and typical applications for readers seeking a solid foundation in data analysis.

Algorithmsclassificationclustering
0 likes · 8 min read
Top 10 Classic Data Mining Algorithms and Their Core Characteristics
Meituan Technology Team
Meituan Technology Team
Sep 2, 2021 · Artificial Intelligence

Construction and Application of Retail Product Knowledge Graph at Meituan

The paper describes Meituan’s retail product knowledge graph—a multi‑layered, multi‑modal system that structures billions of SKUs, attributes, and user insights using hierarchical categories, graph‑enhanced NER, semi‑supervised learning, and expert‑in‑the‑loop validation, enabling precise search, ranking, recommendation, and real‑time merchant optimization.

AIMultimodalRetail
0 likes · 25 min read
Construction and Application of Retail Product Knowledge Graph at Meituan
Python Crawling & Data Mining
Python Crawling & Data Mining
Aug 1, 2021 · Artificial Intelligence

How to Unlock Restaurant Success with Data Mining: A Step‑by‑Step Guide

This article explains the complete data‑mining workflow for the restaurant industry—from defining business goals and sampling relevant data to exploring, preprocessing, modeling, evaluating results, and selecting suitable tools—enabling intelligent dish recommendation, customer segmentation, sales forecasting, and optimal store placement.

Business Intelligencedata miningrestaurant industry
0 likes · 13 min read
How to Unlock Restaurant Success with Data Mining: A Step‑by‑Step Guide
DataFunTalk
DataFunTalk
Jul 31, 2021 · Artificial Intelligence

Construction and Application of Retail Product Knowledge Graph at Meituan

This article details Meituan's development of a multi‑level, multi‑dimensional retail product knowledge graph, covering its background in new retail, hierarchical design, attribute modeling, challenges, efficiency improvements, human‑machine collaboration, and its impact on search, recommendation and both C‑ and B‑side services.

Artificial IntelligenceMeituanRetail
0 likes · 25 min read
Construction and Application of Retail Product Knowledge Graph at Meituan
Python Crawling & Data Mining
Python Crawling & Data Mining
Jun 14, 2021 · Big Data

Why Stanford’s Data Mining Tutorial Is the Ultimate Guide to Large‑Scale Data Mining

This article introduces the third edition of Stanford’s Data Mining Tutorial, highlighting its panoramic roadmap of data‑mining techniques for massive datasets, core features, comprehensive topic coverage, target audience, and supplementary resources while noting its popularity among students and professionals.

AlgorithmsStanforddata mining
0 likes · 11 min read
Why Stanford’s Data Mining Tutorial Is the Ultimate Guide to Large‑Scale Data Mining
iQIYI Technical Product Team
iQIYI Technical Product Team
May 21, 2021 · Big Data

Design and Implementation of iQIYI's User Feedback Analysis System

iQIYI built an in‑house user‑feedback analysis system that automatically ingests multi‑channel data, classifies and clusters issues, assesses feedback quality, localizes problems, and streamlines repair closure, boosting recall accuracy, alarm precision, closure rates and reducing cycle time across business lines to enhance user experience.

AIBig Dataclassification
0 likes · 15 min read
Design and Implementation of iQIYI's User Feedback Analysis System
Python Crawling & Data Mining
Python Crawling & Data Mining
Dec 31, 2020 · Backend Development

How to Scrape Thousands of New‑House Listings in Python: A Step‑by‑Step Guide

This article demonstrates how to use Python's requests, fake_useragent, and lxml libraries to batch‑scrape nearly a thousand new‑house listings from the 惠民之家 website, extracting 41 fields such as name, price, layout, opening date, plot ratio and green ratio, while handling pagination and anti‑scraping measures.

CSVPythonReal Estate Data
0 likes · 9 min read
How to Scrape Thousands of New‑House Listings in Python: A Step‑by‑Step Guide
Zhengtong Technical Team
Zhengtong Technical Team
Oct 27, 2020 · Mobile Development

Implementing Mobile Data Collection and Analytics with Countly: Architecture, Customization, and Insights

This article outlines how to design and implement a comprehensive mobile data collection and analysis system using the open‑source Countly platform, covering background requirements, solution selection, architecture, customizations for client, server and dashboard, SDK integration for Android and H5, and practical data mining insights.

Android SDKCountlyH5 SDK
0 likes · 11 min read
Implementing Mobile Data Collection and Analytics with Countly: Architecture, Customization, and Insights
Meituan Technology Team
Meituan Technology Team
Sep 24, 2020 · Artificial Intelligence

Multimodal Recall Solution for KDD Cup 2020: ImageBERT and LXMERT Based Approach

The second‑place team tackled KDD Cup 2020’s Multimodal Recall challenge by fine‑tuning ImageBERT and LXMERT on query‑image pairs, generating negatives, applying AMSoftmax and multi‑similarity losses, ensembling weighted predictions, and using score‑based post‑processing, boosting NDCG@5 to 0.8352 and powering Meituan’s multimodal search pipeline.

ImageBERTKDD Cup 2020LXMERT
0 likes · 23 min read
Multimodal Recall Solution for KDD Cup 2020: ImageBERT and LXMERT Based Approach
Xianyu Technology
Xianyu Technology
Sep 10, 2020 · Artificial Intelligence

Interest Tagging System for Xianyu: Data‑Driven User Profiling

The Xianyu interest‑tagging system profiles post‑95 users by matching expert and hot‑search keywords to product text, weighting user actions with a TF‑IDF‑based behavior‑statistics pipeline, producing over twenty tags that cover more than half the target cohort and have already doubled click‑through rates for interest‑aligned live streams.

TF-IDFbehavior analyticsdata mining
0 likes · 11 min read
Interest Tagging System for Xianyu: Data‑Driven User Profiling
Python Crawling & Data Mining
Python Crawling & Data Mining
Aug 10, 2020 · Artificial Intelligence

Build Smart Product Recommendations with Python’s Apriori Algorithm

This article explains how intelligent recommendation differs from generic marketing, introduces association‑rule concepts such as support, confidence, and lift, and provides a step‑by‑step Python implementation using the Apriori algorithm to generate and interpret market‑basket recommendations.

Apriori algorithmMarket Basket AnalysisPython
0 likes · 13 min read
Build Smart Product Recommendations with Python’s Apriori Algorithm
MaGe Linux Operations
MaGe Linux Operations
Aug 5, 2020 · Big Data

8 Must‑Know Python Tools for Data Mining and Analysis

This article introduces eight essential Python libraries—Gensim, TensorFlow, SciPy, NumPy, Matplotlib, Pandas, Scikit‑Learn, and Keras—that empower developers to clean, prepare, merge, and accurately analyze data for effective data mining.

Pythondata analysisdata mining
0 likes · 4 min read
8 Must‑Know Python Tools for Data Mining and Analysis
Architects Research Society
Architects Research Society
Jul 10, 2020 · Artificial Intelligence

Core Concepts and Relationships in Data Science: Big Data, Machine Learning, Data Mining, Deep Learning, and AI

This article examines six core data‑science concepts—Big Data, Machine Learning, Data Mining, Deep Learning, Artificial Intelligence, and Data Science itself—explaining their definitions, interrelationships, and how they fit together as pieces of a larger analytical puzzle.

Artificial IntelligenceData ScienceDeep Learning
0 likes · 17 min read
Core Concepts and Relationships in Data Science: Big Data, Machine Learning, Data Mining, Deep Learning, and AI
Cloud Native Technology Community
Cloud Native Technology Community
Jun 5, 2020 · Artificial Intelligence

Automating a Data‑Science Workflow on Kubernetes: From GitHub Issue Mining to an MLP Bug Classifier

This article describes how to collect, clean, and analyze 90,000+ GitHub issues and pull requests from the Kubernetes repository using Kubeflow, TensorFlow, and a fully automated CI/CD pipeline, then build, train, and serve a simple MLP model that classifies release‑note texts as bugs or non‑bugs.

KubeflowKubernetesTensorFlow
0 likes · 19 min read
Automating a Data‑Science Workflow on Kubernetes: From GitHub Issue Mining to an MLP Bug Classifier
37 Interactive Technology Team
37 Interactive Technology Team
Feb 20, 2020 · Artificial Intelligence

Risk Control System for Detecting Game Account Fraud Using Feature Engineering and Graph Database

The article describes a risk‑control pipeline for detecting high‑volume fraudulent game accounts, detailing data collection from game logs, extensive feature engineering and statistical tests, enrichment via a Neo4j knowledge graph, and a hybrid RandomForest‑GBDT model combined with methods to filter personal accounts.

Graph DatabaseNeo4jdata mining
0 likes · 8 min read
Risk Control System for Detecting Game Account Fraud Using Feature Engineering and Graph Database
Python Programming Learning Circle
Python Programming Learning Circle
Feb 19, 2020 · Backend Development

How to Earn Extra Income with Python: Freelance Crawling, Web Development, Data Services, and More

This article outlines practical ways for individuals, especially students, to generate side income using Python by taking on web‑scraping freelance projects, building data‑driven websites, creating simple automation tools, running blogs or media channels, and even modest stock‑analysis scripts.

Pythondata miningfreelance
0 likes · 7 min read
How to Earn Extra Income with Python: Freelance Crawling, Web Development, Data Services, and More
Xianyu Technology
Xianyu Technology
Nov 7, 2019 · Big Data

Sequence Pattern Mining for User Behavior Analysis in Xianyu

By applying sequence pattern mining and unsupervised clustering to Xianyu’s massive event logs, the study abstracts high‑level user behaviors, discovers frequent subsequences, uncovers unknown fraudulent account patterns, expands known fraud cohorts with 99 % precision, and enables richer analyses such as PCA‑based cross‑group comparisons.

Big Dataclusteringdata mining
0 likes · 8 min read
Sequence Pattern Mining for User Behavior Analysis in Xianyu
DataFunTalk
DataFunTalk
Aug 22, 2019 · Artificial Intelligence

End‑to‑End Group Risk Perception Modeling: From Requirement Mining to Deployment

This article presents a comprehensive workflow for group risk perception, covering business requirement mining, data acquisition and understanding, feature engineering, model training and evaluation, deployment, and practical user applications, with detailed objectives, methods, and deliverables for each stage.

Model DeploymentRisk Modelingdata mining
0 likes · 11 min read
End‑to‑End Group Risk Perception Modeling: From Requirement Mining to Deployment
JD Retail Technology
JD Retail Technology
Apr 12, 2019 · R&D Management

Balancing Business Demands and Technical Advancement: Insights from JD’s Data Knowledge Leader Li Wei

In an interview, JD data platform leader Li Wei discusses how dynamic balance between business demands and technical improvement, knowledge computing, and AI-driven product quality control can drive innovation, enhance user experience, and shape future R&D management strategies.

Artificial IntelligenceProduct DevelopmentR&D management
0 likes · 8 min read
Balancing Business Demands and Technical Advancement: Insights from JD’s Data Knowledge Leader Li Wei
JD Tech Talk
JD Tech Talk
Mar 22, 2019 · Artificial Intelligence

Data Mining Techniques for Telemarketing: Supervised Classification, Clustering, Optimization, Anomaly Detection, and Text Mining

The article examines how telemarketing, a data‑intensive industry, leverages various data‑mining methods—including supervised classification, clustering, operations research optimization, anomaly detection, and text mining—to improve lead selection, agent allocation, churn prediction, and voice analysis, while also outlining the key data‑talent roles needed for successful implementation.

Telemarketinganomaly detectionclustering
0 likes · 7 min read
Data Mining Techniques for Telemarketing: Supervised Classification, Clustering, Optimization, Anomaly Detection, and Text Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Mar 17, 2019 · Artificial Intelligence

How Association Rules and Machine Learning Reveal Stock Market Industry Linkages

This report analyzes 2018 AMAC industry index data using association‑rule mining and several machine‑learning models (Apriori, KNN, Bayesian, decision tree, neural network) to uncover sector linkages, predict index and stock movements, compare model performance, and suggest future improvements.

PredictionR languageassociation rules
0 likes · 11 min read
How Association Rules and Machine Learning Reveal Stock Market Industry Linkages
MaGe Linux Operations
MaGe Linux Operations
Mar 1, 2019 · Artificial Intelligence

Master Python Data Mining & Machine Learning: From Preprocessing to Classification

This comprehensive guide introduces data mining and machine learning concepts, walks through Python data preprocessing techniques, reviews common classification algorithms, demonstrates an Iris flower classification case, and offers practical tips for selecting the most suitable algorithm for a given problem.

Classification AlgorithmsPythondata mining
0 likes · 21 min read
Master Python Data Mining & Machine Learning: From Preprocessing to Classification
Python Crawling & Data Mining
Python Crawling & Data Mining
Jan 25, 2019 · Backend Development

Master Web Crawlers: How Python Scrapes the Web Efficiently

As online information explodes, traditional data collection methods fall short, prompting the rise of Python web crawlers that use URLs and libraries like urllib, urllib2, and re, while frameworks boost efficiency, enabling fast, accurate, and automated extraction of web data for analysis.

Data Extractiondata miningweb scraper
0 likes · 5 min read
Master Web Crawlers: How Python Scrapes the Web Efficiently
Meituan Technology Team
Meituan Technology Team
Dec 13, 2018 · Artificial Intelligence

Advances in Machine Learning for Real‑Time Delivery at Meituan

Meituan’s AI‑driven “Superbrain” platform combines real‑time big‑data processing, fine‑grained location perception, high‑precision ETA forecasting, multi‑rider dispatch and dynamic pricing to cut instant food‑delivery times from about an hour to roughly thirty minutes while boosting efficiency, cost savings and user experience.

AIETA predictionLogistics
0 likes · 19 min read
Advances in Machine Learning for Real‑Time Delivery at Meituan
Big Data and Microservices
Big Data and Microservices
Sep 17, 2018 · Big Data

5 Essential Data Mining Techniques Every Analyst Should Know

This article outlines five widely used data‑mining methods—association rules, classification/tagging, clustering, decision trees, and sequential pattern mining—explaining their principles, real‑world examples, and how they help organizations extract actionable insights from massive datasets.

Big DataDecision TreesSequential Pattern Mining
0 likes · 6 min read
5 Essential Data Mining Techniques Every Analyst Should Know
Big Data and Microservices
Big Data and Microservices
Sep 3, 2018 · Big Data

From Raw Data to Business Impact: A Complete Data Analyst Skill Guide

The article outlines a comprehensive data‑analyst competency framework, covering data collection, storage, extraction, mining, analysis, visualization, and practical application, and provides concrete questions, techniques, and tool recommendations to help analysts turn raw data into actionable business insights.

Business IntelligenceData visualizationdata analysis
0 likes · 9 min read
From Raw Data to Business Impact: A Complete Data Analyst Skill Guide
Big Data and Microservices
Big Data and Microservices
Aug 16, 2018 · Big Data

Mastering Big Data Analysis: 5 Core Aspects and 4 Key Methods

This article outlines the five fundamental aspects of big data analysis—visualization, data‑mining algorithms, predictive analytics, semantic engines, and data quality management—and explains four primary analytical approaches: descriptive, diagnostic, predictive, and prescriptive analysis.

Big Datadata analysisdata mining
0 likes · 6 min read
Mastering Big Data Analysis: 5 Core Aspects and 4 Key Methods
Model Perspective
Model Perspective
Jun 17, 2018 · Big Data

How Tablet Usage Data Can Transform Education: Insights and Strategies

By leveraging tablet-based learning platforms, schools can collect rich usage data, which, when mined, reveals student habits, sentiment, and learning patterns, enabling educators to personalize instruction, improve curricula, and guide strategic decisions, while highlighting the need for data protection and dedicated research centers.

Educational Technologybig data in educationdata mining
0 likes · 5 min read
How Tablet Usage Data Can Transform Education: Insights and Strategies
AntTech
AntTech
Jun 14, 2018 · Artificial Intelligence

A Local Online Learning Approach for Non-linear Data (SCW-LOL)

This paper introduces the SCW-LOL algorithm, a local online learning method based on Soft Confidence Weighted that extends a global model with multiple local classifiers, uses online K‑Means for sample assignment, provides theoretical error bounds, and demonstrates superior performance on ten benchmark datasets, especially for multi‑class classification.

Online LearningSCW algorithmdata mining
0 likes · 9 min read
A Local Online Learning Approach for Non-linear Data (SCW-LOL)
Xianyu Technology
Xianyu Technology
May 16, 2018 · Artificial Intelligence

Geographic Alias Mining and Knowledge Base Construction Using Contextual Vectors and Address Similarity

The paper presents two inexpensive techniques for extracting geographic aliases of points of interest—comparing high‑dimensional contextual vectors of nearby shipping addresses and analyzing co‑occurring words in identical addresses—to construct a knowledge base that links official names with their synonyms, improving location‑based service accuracy.

Cosine SimilarityGeographic AliasKnowledge Base
0 likes · 9 min read
Geographic Alias Mining and Knowledge Base Construction Using Contextual Vectors and Address Similarity
AntTech
AntTech
Apr 9, 2018 · Artificial Intelligence

Practical Guide to Modeling Stability: Feature PSI, Model PSI, and Monitoring Techniques

This article explains the importance of modeling stability, describes how to assess feature and model stability using the Population Stability Index (PSI), provides step‑by‑step calculation methods, and shares practical monitoring practices such as rank mapping and daily SQL‑based checks.

Model MonitoringModelingPSI
0 likes · 9 min read
Practical Guide to Modeling Stability: Feature PSI, Model PSI, and Monitoring Techniques
MaGe Linux Operations
MaGe Linux Operations
Apr 8, 2018 · Artificial Intelligence

Master Python Data Mining & Machine Learning: From Preprocessing to Classification

This comprehensive tutorial walks you through Python data mining and machine learning fundamentals, covering data preprocessing techniques, common classification algorithms, an Iris flower classification case study, and practical tips for selecting the right algorithm, all illustrated with clear code examples and visualizations.

Classification AlgorithmsNaive BayesPython
0 likes · 22 min read
Master Python Data Mining & Machine Learning: From Preprocessing to Classification
JD Tech
JD Tech
Jan 26, 2018 · Artificial Intelligence

JD Big Data R&D Department Presents Three Accepted Papers at AAAI-2018

The JD Big Data R&D team announced that three of its research papers—covering cross‑domain human parsing, multi‑view outlier detection, and orthogonal weight normalization for deep neural networks—were accepted at the prestigious AAAI‑2018 conference, highlighting the department's contributions to computer vision, data mining, and deep learning.

Artificial IntelligenceComputer VisionCross‑domain Adaptation
0 likes · 8 min read
JD Big Data R&D Department Presents Three Accepted Papers at AAAI-2018
Hulu Beijing
Hulu Beijing
Dec 1, 2017 · Artificial Intelligence

How to Evaluate Unsupervised Clustering Algorithms: Metrics, Scenarios, and Insights

This article explains how to assess unsupervised clustering algorithms by describing realistic user‑watching scenarios, outlining common cluster and algorithm types, presenting five key evaluation criteria, and introducing practical metrics such as RMSSTD, R‑Square, and the improved Hubert‑Gamma statistic.

Metricsclustering evaluationdata mining
0 likes · 10 min read
How to Evaluate Unsupervised Clustering Algorithms: Metrics, Scenarios, and Insights
Baixing.com Technical Team
Baixing.com Technical Team
Nov 30, 2017 · Artificial Intelligence

How User Profiling Powers Modern Recommendation Systems

This article explains what user profiling is, why it’s crucial for recommendation systems, outlines key dimensions such as personal attributes, status, and interests, describes algorithms like classification and autoregressive models, and details offline and real‑time computation methods, evaluation techniques, and practical examples.

Recommendation Systemsalgorithmdata mining
0 likes · 11 min read
How User Profiling Powers Modern Recommendation Systems
21CTO
21CTO
Sep 27, 2017 · Artificial Intelligence

How Tagging and User Profiling Power Modern Recommendation Systems

This article explores how simple tagging and user profiling underpin modern recommendation systems, contrasting tag‑based, flexible representations with traditional hierarchical classifications, and examines practical applications such as personalized advertising, industry research, and product optimization.

Recommendation SystemsTaggingdata mining
0 likes · 13 min read
How Tagging and User Profiling Power Modern Recommendation Systems
MaGe Linux Operations
MaGe Linux Operations
Jul 30, 2017 · Artificial Intelligence

Why Python Dominates Data Mining: Clear Syntax, Rich Libraries, and Speed Trade‑offs

Python is favored for data‑mining algorithms because its clear syntax, built‑in advanced data structures, easy text handling, extensive libraries, and widespread community support outweigh its slower execution speed compared to Java or C, allowing rapid development and seamless integration with high‑performance code when needed.

Algorithm DevelopmentPythondata mining
0 likes · 5 min read
Why Python Dominates Data Mining: Clear Syntax, Rich Libraries, and Speed Trade‑offs
21CTO
21CTO
Jul 18, 2017 · Artificial Intelligence

What’s the Real Salary Value of Your Coding Skills? Insights from 1M Job Posts

By mining over a million computer‑related job postings with weak‑supervised learning and a BiLSTM NER model, this article quantifies how programming languages, development tools, and hardware skills translate into salary value, offering data‑driven guidance for developers and new graduates.

AIdata miningdeep learning tools
0 likes · 8 min read
What’s the Real Salary Value of Your Coding Skills? Insights from 1M Job Posts
Tencent Advertising Technology
Tencent Advertising Technology
Jun 15, 2017 · Artificial Intelligence

Tencent Social Ads Data Mining Expert Q&A: Feature Engineering, Modeling, and Competition Insights

In a Q&A session, a Tencent social ads data mining expert addressed competition participants' questions on data delays, full‑set versus sliding‑window statistics, dataset authenticity, Bayesian smoothing, feature selection, handling missing values, large‑scale training, feature interactions, model stacking, online mini‑batch training, and provided reference resources.

Online LearningVowpal Wabbitcompetition
0 likes · 11 min read
Tencent Social Ads Data Mining Expert Q&A: Feature Engineering, Modeling, and Competition Insights