Tagged articles
16 articles
Page 1 of 1
Model Perspective
Model Perspective
Apr 27, 2026 · Artificial Intelligence

Why Resumes Disappear: Decoding the AI Screening Logic and How to Adapt

The article explains how AI-powered applicant tracking systems have evolved from simple keyword filters to TF‑IDF, cosine similarity, and large‑model embeddings, reveals their biases and legal challenges, and offers concrete, technically grounded steps job seekers can take to improve their resume's chances of passing the AI filter.

AI recruitingATSTF-IDF
0 likes · 12 min read
Why Resumes Disappear: Decoding the AI Screening Logic and How to Adapt
AI Engineer Programming
AI Engineer Programming
Apr 8, 2026 · Artificial Intelligence

TF‑IDF vs BM25: Statistical Foundations of Text Retrieval for RAG

The article explains how TF‑IDF and BM25 compute term importance, compares their strengths and weaknesses, and shows how these sparse retrieval methods integrate with dense retrieval techniques such as DPR, SPLADE, and ColBERT in Retrieval‑Augmented Generation systems, concluding with a hybrid retrieval decision matrix.

BM25Hybrid RetrievalRAG
0 likes · 14 min read
TF‑IDF vs BM25: Statistical Foundations of Text Retrieval for RAG
Tech Freedom Circle
Tech Freedom Circle
Nov 5, 2025 · Artificial Intelligence

Elasticsearch: BM25, TF‑IDF, Dense Vectors, kNN, L2 & Cosine Distances, RRF

This article provides a comprehensive technical guide to Elasticsearch’s core retrieval models—BM25 and TF‑IDF—while detailing modern vector‑based search using dense_vector, kNN, L2 and cosine distances, and demonstrates how to combine keyword and semantic results through hybrid search and Reciprocal Rank Fusion (RRF) with practical configuration examples.

BM25ElasticsearchRRF
0 likes · 42 min read
Elasticsearch: BM25, TF‑IDF, Dense Vectors, kNN, L2 & Cosine Distances, RRF
Test Development Learning Exchange
Test Development Learning Exchange
Apr 20, 2024 · Artificial Intelligence

Implementing a Simple University Paper Plagiarism Detection System in Python

This article outlines the design and implementation of a basic university paper plagiarism detection system using Python, covering text preprocessing with NLTK, TF‑IDF weighting, cosine similarity calculation, and a sample in‑memory paper database, while also discussing scalability, UI, and legal considerations.

Cosine SimilarityNLPPython
0 likes · 10 min read
Implementing a Simple University Paper Plagiarism Detection System in Python
政采云技术
政采云技术
Aug 23, 2022 · Backend Development

Understanding Elasticsearch Document Scoring and Aggregation Techniques

This article explains the underlying principles of Elasticsearch scoring, covering Boolean model queries, TF/IDF, field length normalization, the vector space model, and detailed aggregation examples with code snippets to illustrate practical search and analytics usage.

ElasticsearchScoringSearch
0 likes · 19 min read
Understanding Elasticsearch Document Scoring and Aggregation Techniques
Yuewen Technology
Yuewen Technology
Apr 1, 2022 · Artificial Intelligence

Detecting Emerging Terms in Web Novels: PMI, Entropy, and TF‑IDF Methods

This article explores how to automatically discover new words in Chinese web novels by combining n‑gram statistics, pointwise mutual information, information entropy, and TF‑IDF filtering, presenting a practical, unsupervised pipeline that improves tokenization and search recall without manual labeling.

Chinese text miningNLPPMI
0 likes · 14 min read
Detecting Emerging Terms in Web Novels: PMI, Entropy, and TF‑IDF Methods
Baobao Algorithm Notes
Baobao Algorithm Notes
Jan 14, 2022 · Artificial Intelligence

Boosting BERT Text Classification with Label Embedding: How It Works

The paper proposes a simple yet effective method that fuses label embeddings into BERT, enhancing text‑classification performance without increasing computational cost, and validates the approach across six benchmark datasets, also exploring tf‑idf‑based label augmentation and the impact of using [SEP] versus no‑[SEP] inputs.

BERTDeep LearningNLP
0 likes · 8 min read
Boosting BERT Text Classification with Label Embedding: How It Works
Code DAO
Code DAO
Dec 7, 2021 · Artificial Intelligence

How to Cluster Text with TF‑IDF, KMeans and PCA in Python

This article walks through a complete Python workflow that loads the 20 Newsgroups dataset, preprocesses the documents, vectorizes them with TF‑IDF, groups them using KMeans, reduces dimensions with PCA, and visualizes the resulting clusters, illustrating each step with code and plots.

KMeansNLPPCA
0 likes · 13 min read
How to Cluster Text with TF‑IDF, KMeans and PCA in Python
ITPUB
ITPUB
Oct 23, 2020 · Fundamentals

How General Search Engines Work: From Crawlers to Ranking

This article provides a comprehensive overview of general search engines, covering their classification, core workflow, key modules such as web crawlers, content processing, storage, user query handling, ranking strategies like TF‑IDF and PageRank, as well as anti‑cheat measures and user intent understanding.

PageRankTF-IDFWeb Crawling
0 likes · 16 min read
How General Search Engines Work: From Crawlers to Ranking
Xianyu Technology
Xianyu Technology
Sep 10, 2020 · Artificial Intelligence

Interest Tagging System for Xianyu: Data‑Driven User Profiling

The Xianyu interest‑tagging system profiles post‑95 users by matching expert and hot‑search keywords to product text, weighting user actions with a TF‑IDF‑based behavior‑statistics pipeline, producing over twenty tags that cover more than half the target cohort and have already doubled click‑through rates for interest‑aligned live streams.

TF-IDFbehavior analyticsdata mining
0 likes · 11 min read
Interest Tagging System for Xianyu: Data‑Driven User Profiling
JD Tech Talk
JD Tech Talk
Apr 19, 2019 · Artificial Intelligence

Fundamentals and Practical Applications of Text Mining: Workflow, Methods, and a Sentiment Analysis Case Study

This article outlines the end‑to‑end text‑mining workflow—from data acquisition and preprocessing to feature extraction, algorithm selection, and model evaluation—while demonstrating a sentiment‑analysis case study that combines LDA topic modeling with deep‑learning classifiers.

Deep LearningLDASentiment Analysis
0 likes · 11 min read
Fundamentals and Practical Applications of Text Mining: Workflow, Methods, and a Sentiment Analysis Case Study
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 16, 2019 · Artificial Intelligence

How Machine Learning Can Clean Up Low‑Quality E‑Commerce Product Materials

This article explains a machine‑learning‑driven system that automatically detects and classifies poor‑quality e‑commerce product materials—such as misleading titles, exaggerated benefits, and over‑promotion—to protect consumers, reduce platform risk, and improve conversion rates during major sales events.

AITF-IDFcontent moderation
0 likes · 13 min read
How Machine Learning Can Clean Up Low‑Quality E‑Commerce Product Materials
Baidu Tech Salon
Baidu Tech Salon
Jan 12, 2015 · Artificial Intelligence

Boolean Algebra and Search Engine Technology

The article outlines how search engines combine the Tao of underlying principles—crawling, binary‑based Boolean indexing, PageRank matrix calculations, and TF‑IDF weighting—with specific Shu implementations to efficiently retrieve, rank, and present relevant web pages using Boolean logic, link analysis, and term relevance metrics.

PageRankTF-IDFalgorithm
0 likes · 7 min read
Boolean Algebra and Search Engine Technology