Tag

TF-IDF

0 views collected around this technical thread.

Test Development Learning Exchange
Test Development Learning Exchange
Nov 27, 2024 · Artificial Intelligence

Basic Natural Language Processing: Text Preprocessing and TF‑IDF with Python

This tutorial introduces fundamental natural language processing techniques, covering text preprocessing steps such as tokenization and stop‑word removal, followed by TF‑IDF feature extraction, and provides complete Python code examples to practice these concepts on a sample dataset.

NLPPythonTF-IDF
0 likes · 5 min read
Basic Natural Language Processing: Text Preprocessing and TF‑IDF with Python
Test Development Learning Exchange
Test Development Learning Exchange
Apr 20, 2024 · Artificial Intelligence

Implementing a Simple University Paper Plagiarism Detection System in Python

This article outlines the design and implementation of a basic university paper plagiarism detection system using Python, covering text preprocessing with NLTK, TF‑IDF weighting, cosine similarity calculation, and a sample in‑memory paper database, while also discussing scalability, UI, and legal considerations.

NLPPythonTF-IDF
0 likes · 10 min read
Implementing a Simple University Paper Plagiarism Detection System in Python
政采云技术
政采云技术
Aug 23, 2022 · Backend Development

Understanding Elasticsearch Document Scoring and Aggregation Techniques

This article explains the underlying principles of Elasticsearch scoring, covering Boolean model queries, TF/IDF, field length normalization, the vector space model, and detailed aggregation examples with code snippets to illustrate practical search and analytics usage.

AggregationBackendElasticsearch
0 likes · 19 min read
Understanding Elasticsearch Document Scoring and Aggregation Techniques
Yuewen Technology
Yuewen Technology
Apr 1, 2022 · Artificial Intelligence

Detecting Emerging Terms in Web Novels: PMI, Entropy, and TF‑IDF Methods

This article explores how to automatically discover new words in Chinese web novels by combining n‑gram statistics, pointwise mutual information, information entropy, and TF‑IDF filtering, presenting a practical, unsupervised pipeline that improves tokenization and search recall without manual labeling.

Chinese text miningNLPPMI
0 likes · 14 min read
Detecting Emerging Terms in Web Novels: PMI, Entropy, and TF‑IDF Methods
vivo Internet Technology
vivo Internet Technology
Oct 14, 2020 · Artificial Intelligence

Understanding Cosine Similarity: From Mathematical Foundations to Practical Applications

The article explains cosine similarity from basic geometry and vector math, derives its formula, and shows how it powers precision marketing, image classification, and text retrieval, while also detailing its industrial implementation in Lucene’s vector space model.

LuceneTF-IDFVector Space Model
0 likes · 18 min read
Understanding Cosine Similarity: From Mathematical Foundations to Practical Applications
Xianyu Technology
Xianyu Technology
Sep 10, 2020 · Artificial Intelligence

Interest Tagging System for Xianyu: Data‑Driven User Profiling

The Xianyu interest‑tagging system profiles post‑95 users by matching expert and hot‑search keywords to product text, weighting user actions with a TF‑IDF‑based behavior‑statistics pipeline, producing over twenty tags that cover more than half the target cohort and have already doubled click‑through rates for interest‑aligned live streams.

Data MiningTF-IDFbehavior analytics
0 likes · 11 min read
Interest Tagging System for Xianyu: Data‑Driven User Profiling
JD Tech Talk
JD Tech Talk
Apr 19, 2019 · Artificial Intelligence

Fundamentals and Practical Applications of Text Mining: Workflow, Methods, and a Sentiment Analysis Case Study

This article outlines the end‑to‑end text‑mining workflow—from data acquisition and preprocessing to feature extraction, algorithm selection, and model evaluation—while demonstrating a sentiment‑analysis case study that combines LDA topic modeling with deep‑learning classifiers.

LDANatural Language ProcessingTF-IDF
0 likes · 11 min read
Fundamentals and Practical Applications of Text Mining: Workflow, Methods, and a Sentiment Analysis Case Study
360 Quality & Efficiency
360 Quality & Efficiency
Nov 2, 2018 · Artificial Intelligence

Extracting Regression from Production Requests Using Clustering Algorithms

This article explains how to apply TF‑IDF weighting and the K‑means clustering algorithm in Python to identify a small set of representative regression cases from hundreds of thousands of production request records, including guidance on selecting the optimal number of clusters.

ClusteringTF-IDFk-means
0 likes · 5 min read
Extracting Regression from Production Requests Using Clustering Algorithms
Baidu Tech Salon
Baidu Tech Salon
Jan 12, 2015 · Artificial Intelligence

Boolean Algebra and Search Engine Technology

The article outlines how search engines combine the Tao of underlying principles—crawling, binary‑based Boolean indexing, PageRank matrix calculations, and TF‑IDF weighting—with specific Shu implementations to efficiently retrieve, rank, and present relevant web pages using Boolean logic, link analysis, and term relevance metrics.

IndexingPageRankTF-IDF
0 likes · 7 min read
Boolean Algebra and Search Engine Technology