Tagged articles

TF-IDF

16 articles · Page 1 of 1

Apr 27, 2026 · Artificial Intelligence

Why Resumes Disappear: Decoding the AI Screening Logic and How to Adapt

The article explains how AI-powered applicant tracking systems have evolved from simple keyword filters to TF‑IDF, cosine similarity, and large‑model embeddings, reveals their biases and legal challenges, and offers concrete, technically grounded steps job seekers can take to improve their resume's chances of passing the AI filter.

AI recruitingATSTF-IDF

0 likes · 12 min read

Why Resumes Disappear: Decoding the AI Screening Logic and How to Adapt

AI Engineer Programming

Apr 8, 2026 · Artificial Intelligence

TF‑IDF vs BM25: Statistical Foundations of Text Retrieval for RAG

The article explains how TF‑IDF and BM25 compute term importance, compares their strengths and weaknesses, and shows how these sparse retrieval methods integrate with dense retrieval techniques such as DPR, SPLADE, and ColBERT in Retrieval‑Augmented Generation systems, concluding with a hybrid retrieval decision matrix.

BM25Hybrid RetrievalInformation Retrieval

0 likes · 14 min read

TF‑IDF vs BM25: Statistical Foundations of Text Retrieval for RAG

Tech Freedom Circle

Nov 5, 2025 · Artificial Intelligence

Elasticsearch: BM25, TF‑IDF, Dense Vectors, kNN, L2 & Cosine Distances, RRF

This article provides a comprehensive technical guide to Elasticsearch’s core retrieval models—BM25 and TF‑IDF—while detailing modern vector‑based search using dense_vector, kNN, L2 and cosine distances, and demonstrates how to combine keyword and semantic results through hybrid search and Reciprocal Rank Fusion (RRF) with practical configuration examples.

BM25ElasticsearchRRF

0 likes · 42 min read

Elasticsearch: BM25, TF‑IDF, Dense Vectors, kNN, L2 & Cosine Distances, RRF

Test Development Learning Exchange

Nov 27, 2024 · Artificial Intelligence

Basic Natural Language Processing: Text Preprocessing and TF‑IDF with Python

This tutorial introduces fundamental natural language processing techniques, covering text preprocessing steps such as tokenization and stop‑word removal, followed by TF‑IDF feature extraction, and provides complete Python code examples to practice these concepts on a sample dataset.

NLPPythonScikit-learn

0 likes · 5 min read

Basic Natural Language Processing: Text Preprocessing and TF‑IDF with Python

Test Development Learning Exchange

Apr 20, 2024 · Artificial Intelligence

Implementing a Simple University Paper Plagiarism Detection System in Python

This article outlines the design and implementation of a basic university paper plagiarism detection system using Python, covering text preprocessing with NLTK, TF‑IDF weighting, cosine similarity calculation, and a sample in‑memory paper database, while also discussing scalability, UI, and legal considerations.

Cosine SimilarityNLPPython

0 likes · 10 min read

Implementing a Simple University Paper Plagiarism Detection System in Python

政采云技术

Aug 23, 2022 · Backend Development

Understanding Elasticsearch Document Scoring and Aggregation Techniques

This article explains the underlying principles of Elasticsearch scoring, covering Boolean model queries, TF/IDF, field length normalization, the vector space model, and detailed aggregation examples with code snippets to illustrate practical search and analytics usage.

AggregationElasticsearchScoring

0 likes · 19 min read

Understanding Elasticsearch Document Scoring and Aggregation Techniques

Yuewen Technology

Apr 1, 2022 · Artificial Intelligence

Detecting Emerging Terms in Web Novels: PMI, Entropy, and TF‑IDF Methods

This article explores how to automatically discover new words in Chinese web novels by combining n‑gram statistics, pointwise mutual information, information entropy, and TF‑IDF filtering, presenting a practical, unsupervised pipeline that improves tokenization and search recall without manual labeling.

Chinese text miningNLPPMI

0 likes · 14 min read

Detecting Emerging Terms in Web Novels: PMI, Entropy, and TF‑IDF Methods

Baobao Algorithm Notes

Jan 14, 2022 · Artificial Intelligence

Boosting BERT Text Classification with Label Embedding: How It Works

The paper proposes a simple yet effective method that fuses label embeddings into BERT, enhancing text‑classification performance without increasing computational cost, and validates the approach across six benchmark datasets, also exploring tf‑idf‑based label augmentation and the impact of using [SEP] versus no‑[SEP] inputs.

BERTDeep LearningNLP

0 likes · 8 min read

Boosting BERT Text Classification with Label Embedding: How It Works

Code DAO

Dec 7, 2021 · Artificial Intelligence

How to Cluster Text with TF‑IDF, KMeans and PCA in Python

This article walks through a complete Python workflow that loads the 20 Newsgroups dataset, preprocesses the documents, vectorizes them with TF‑IDF, groups them using KMeans, reduces dimensions with PCA, and visualizes the resulting clusters, illustrating each step with code and plots.

KMeansNLPPCA

0 likes · 13 min read

How to Cluster Text with TF‑IDF, KMeans and PCA in Python

ITPUB

Oct 23, 2020 · Fundamentals

How General Search Engines Work: From Crawlers to Ranking

This article provides a comprehensive overview of general search engines, covering their classification, core workflow, key modules such as web crawlers, content processing, storage, user query handling, ranking strategies like TF‑IDF and PageRank, as well as anti‑cheat measures and user intent understanding.

Information RetrievalPageRankSearch Engine

0 likes · 16 min read

How General Search Engines Work: From Crawlers to Ranking

vivo Internet Technology

Oct 14, 2020 · Artificial Intelligence

Understanding Cosine Similarity: From Mathematical Foundations to Practical Applications

The article explains cosine similarity from basic geometry and vector math, derives its formula, and shows how it powers precision marketing, image classification, and text retrieval, while also detailing its industrial implementation in Lucene’s vector space model.

Cosine SimilarityLuceneSearch Engine

0 likes · 18 min read

Understanding Cosine Similarity: From Mathematical Foundations to Practical Applications

Xianyu Technology

Sep 10, 2020 · Artificial Intelligence

Interest Tagging System for Xianyu: Data‑Driven User Profiling

The Xianyu interest‑tagging system profiles post‑95 users by matching expert and hot‑search keywords to product text, weighting user actions with a TF‑IDF‑based behavior‑statistics pipeline, producing over twenty tags that cover more than half the target cohort and have already doubled click‑through rates for interest‑aligned live streams.

TF-IDFbehavior analyticsdata mining

0 likes · 11 min read

Interest Tagging System for Xianyu: Data‑Driven User Profiling

JD Tech Talk

Apr 19, 2019 · Artificial Intelligence

Fundamentals and Practical Applications of Text Mining: Workflow, Methods, and a Sentiment Analysis Case Study

This article outlines the end‑to‑end text‑mining workflow—from data acquisition and preprocessing to feature extraction, algorithm selection, and model evaluation—while demonstrating a sentiment‑analysis case study that combines LDA topic modeling with deep‑learning classifiers.

Deep LearningLDASentiment Analysis

0 likes · 11 min read

Fundamentals and Practical Applications of Text Mining: Workflow, Methods, and a Sentiment Analysis Case Study

Alibaba Cloud Developer

Jan 16, 2019 · Artificial Intelligence

How Machine Learning Can Clean Up Low‑Quality E‑Commerce Product Materials

This article explains a machine‑learning‑driven system that automatically detects and classifies poor‑quality e‑commerce product materials—such as misleading titles, exaggerated benefits, and over‑promotion—to protect consumers, reduce platform risk, and improve conversion rates during major sales events.

AITF-IDFcontent moderation

0 likes · 13 min read

How Machine Learning Can Clean Up Low‑Quality E‑Commerce Product Materials

360 Quality & Efficiency

Nov 2, 2018 · Artificial Intelligence

Extracting Regression from Production Requests Using Clustering Algorithms

This article explains how to apply TF‑IDF weighting and the K‑means clustering algorithm in Python to identify a small set of representative regression cases from hundreds of thousands of production request records, including guidance on selecting the optimal number of clusters.

ClusteringK-MeansTF-IDF

0 likes · 5 min read

Extracting Regression from Production Requests Using Clustering Algorithms

Baidu Tech Salon

Jan 12, 2015 · Artificial Intelligence

Boolean Algebra and Search Engine Technology

The article outlines how search engines combine the Tao of underlying principles—crawling, binary‑based Boolean indexing, PageRank matrix calculations, and TF‑IDF weighting—with specific Shu implementations to efficiently retrieve, rank, and present relevant web pages using Boolean logic, link analysis, and term relevance metrics.

IndexingPageRankTF-IDF

0 likes · 7 min read