Tagged articles
21 articles
Page 1 of 1
Geek Labs
Geek Labs
May 6, 2026 · Artificial Intelligence

Build a GPT from Scratch and Decode AI Coding Jargon with Two Top GitHub Projects

The article introduces two practical GitHub repositories—how-to-train-your-gpt, a step‑by‑step guide that builds a LLaMA‑style GPT model across 12 chapters, and dictionary-of-ai-coding, a plain‑language glossary of AI‑coding terms—showing how they together provide a complete understanding of modern LLM fundamentals and terminology.

GPTGitHubLLM
0 likes · 9 min read
Build a GPT from Scratch and Decode AI Coding Jargon with Two Top GitHub Projects
AI Explorer
AI Explorer
Apr 11, 2026 · Artificial Intelligence

How Kronos Redefines Quantitative Analysis with a Financial‑Market Language Model

Kronos, an open‑source large model trained on OHLCV data from over 45 exchanges, treats financial time‑series as a specialized language, using a custom tokenizer and a two‑stage Transformer to enable price prediction, market state detection, signal generation, and risk simulation, with easy Hugging Face integration and a live demo for BTC/USDT.

KronosTokenizerTransformer
0 likes · 6 min read
How Kronos Redefines Quantitative Analysis with a Financial‑Market Language Model
Weekly Large Model Application
Weekly Large Model Application
Mar 22, 2026 · Artificial Intelligence

Inside MiMo-Audio: Dissecting the Large-Scale Audio Model

The article breaks down MiMo-Audio, a next‑token‑prediction‑style large‑scale audio model built on Qwen2, detailing its acoustic front‑end, RVQ tokenizer, patch‑based transformer architecture, streaming capabilities, performance advantages, engineering constraints, and recommended application scenarios.

Audio ModelingFew-ShotQwen2
0 likes · 9 min read
Inside MiMo-Audio: Dissecting the Large-Scale Audio Model
Fun with Large Models
Fun with Large Models
Jan 14, 2026 · Artificial Intelligence

Understanding Large Language Model Files: Structure, Tokens, and Inference with Qwen3

This article walks through the complete workflow of loading and running the open‑source Qwen3‑8B model, explaining each core file (weights, config, generation config, tokenizer), how the model tokenizes input, applies chat templates, generates responses, and decodes output, all illustrated with code and diagrams.

InferenceModelScopePython
0 likes · 16 min read
Understanding Large Language Model Files: Structure, Tokens, and Inference with Qwen3
Tencent Technical Engineering
Tencent Technical Engineering
Dec 24, 2025 · Artificial Intelligence

Build a Mini LLM from Scratch: Step‑by‑Step Guide to Tokenizer, Attention, and Transformer

This article walks through constructing a small large‑language model from the ground up, covering model architecture, tokenization methods, BPE vocabulary building, embedding, positional encoding, attention mechanisms, multi‑head attention, transformer blocks, training pipelines, inference, and sampling strategies, all with runnable Python code.

Deep LearningLLMPython
0 likes · 34 min read
Build a Mini LLM from Scratch: Step‑by‑Step Guide to Tokenizer, Attention, and Transformer
AI Frontier Lectures
AI Frontier Lectures
Jul 31, 2025 · Artificial Intelligence

Can a 32‑Token Compressor Generate Images Without Training?

This article reviews a recent study that demonstrates how a highly compressed one‑dimensional tokenizer, using only 32 discrete tokens and gradient‑based test‑time optimization, can generate high‑quality images without training a separate generative model, and explores its methodology, findings, applications, and limitations.

1D tokenizerAI researchImage Generation
0 likes · 10 min read
Can a 32‑Token Compressor Generate Images Without Training?
Architect
Architect
May 14, 2025 · Artificial Intelligence

How Qwen3 Controls Hybrid Reasoning with the enable_thinking Parameter

This article explains how Qwen3 implements hybrid (fast/slow) reasoning by using the enable_thinking flag in the tokenizer's apply_chat_template method, detailing the underlying Jinja2 chat template, example prompts, the effect of toggling the flag, and design considerations for future autonomous thinking control.

AI modelChatMLHybrid Reasoning
0 likes · 13 min read
How Qwen3 Controls Hybrid Reasoning with the enable_thinking Parameter
Open Source Tech Hub
Open Source Tech Hub
Mar 31, 2025 · Backend Development

How to Implement Powerful Full‑Text Search in PHP with TNTSearch

This guide explains how to install, configure, and use the PHP‑based TNTSearch engine, covering its key features, required dependencies, index creation, various search modes, dynamic updates, custom tokenizers, geo‑search, and text classification with practical code examples.

Full‑Text SearchTokenizergeo-search
0 likes · 9 min read
How to Implement Powerful Full‑Text Search in PHP with TNTSearch
Alibaba Cloud Developer
Alibaba Cloud Developer
Nov 28, 2024 · Artificial Intelligence

Understanding Tokenizers and Embeddings in Large Language Models

This article introduces the core concepts of tokenizers and embeddings in large language models, explains how they convert text into numeric IDs and dense vectors, compares different tokenization strategies, and provides practical JavaScript and TensorFlow.js code examples for beginners.

AI fundamentalsJavaScriptLLM
0 likes · 10 min read
Understanding Tokenizers and Embeddings in Large Language Models
Baobao Algorithm Notes
Baobao Algorithm Notes
Sep 24, 2024 · Artificial Intelligence

From Zero to One: A Practical Guide to Pretraining Large Language Models

This comprehensive guide walks you through every stage of LLM pretraining—from data sourcing, cleaning, and deduplication to tokenizer design, model architecture choices, training framework selection, optimization tricks, and evaluation methods—highlighting common pitfalls and practical solutions for building robust models.

LLM PretrainingTokenizerTraining Framework
0 likes · 34 min read
From Zero to One: A Practical Guide to Pretraining Large Language Models
政采云技术
政采云技术
Dec 19, 2023 · Backend Development

Principles and Simple Implementation of a Search Engine in Go

This article explains the fundamental concepts of search engine technology—including forward and inverted indexes, tokenizers, stop words, synonym handling, ranking algorithms, and NLP integration—and provides a concise Go implementation with code examples and performance testing.

GoNLPTokenizer
0 likes · 21 min read
Principles and Simple Implementation of a Search Engine in Go
Tencent Cloud Developer
Tencent Cloud Developer
Jul 19, 2023 · Artificial Intelligence

Build a Full‑Scale LLM from Scratch in 61 Lines of Python

This step‑by‑step tutorial shows how to set up a GPU environment, prepare custom text data, train a tokenizer, configure and train a GPT‑2‑based large language model, test its generation, and run the entire pipeline using only 61 lines of Python code.

DockerGPT-2LLM
0 likes · 10 min read
Build a Full‑Scale LLM from Scratch in 61 Lines of Python
Tencent Cloud Developer
Tencent Cloud Developer
Feb 20, 2023 · Mobile Development

iOS WeChat Full-Text Search Technology Upgrade: Selection and Optimization

iOS WeChat’s full‑text search was upgraded by selecting SQLite FTS5, creating a VerbatimTokenizer with multi‑level delimiter support, optimizing table formats to cut index size by 30 %, improving batch index updates and parallel search logic, resulting in 40‑60 % faster query latency.

Database OptimizationFull‑Text SearchIndex Optimization
0 likes · 26 min read
iOS WeChat Full-Text Search Technology Upgrade: Selection and Optimization
Code DAO
Code DAO
Dec 21, 2021 · Artificial Intelligence

Four Keras Techniques for Preprocessing Text for Deep Learning

This article explains four Keras utilities—text_to_word_sequence, hashing_trick, one_hot, and Tokenizer—showing how each converts raw text into token lists, hash indices, integer encodings, or document matrices, with code examples and sample outputs.

KerasTokenizerhashing_trick
0 likes · 6 min read
Four Keras Techniques for Preprocessing Text for Deep Learning
Tencent IMWeb Frontend Team
Tencent IMWeb Frontend Team
Aug 26, 2021 · Frontend Development

How to Fix HTML Entity Bugs That Break Rich Text Rendering

This article explains why HTML entities like "<" and ">" can disappear in rich‑text fields, analyzes the underlying tokenizer state machine, and provides a lightweight hack that inserts empty comment nodes to preserve the original text without breaking legacy rendering logic.

EntityHTMLJavaScript
0 likes · 12 min read
How to Fix HTML Entity Bugs That Break Rich Text Rendering
Efficient Ops
Efficient Ops
Jun 23, 2021 · Backend Development

Why Can’t Elasticsearch Find My Logs? Uncovering Full‑Text Search Pitfalls and Tokenizer Tweaks

This article explains why large‑scale Elasticsearch clusters may miss log entries during keyword searches, dives into the fundamentals of inverted indexes and tokenization, and demonstrates practical index‑time and query‑time tokenizer optimizations—including custom analyzers for English and Chinese—to dramatically improve search recall and precision.

ElasticsearchFull‑Text SearchTokenizer
0 likes · 13 min read
Why Can’t Elasticsearch Find My Logs? Uncovering Full‑Text Search Pitfalls and Tokenizer Tweaks
MaGe Linux Operations
MaGe Linux Operations
Jun 1, 2020 · Backend Development

Mastering Elasticsearch Analyzers: A Deep Dive into Tokenizers and Filters

This article explains how Elasticsearch uses Analyzer components—character filters, tokenizers, and token filters—to perform text analysis, reviews the built‑in analyzers such as standard, simple, stop, whitespace, keyword, pattern, language, ICU and IK, and provides practical _analyze API examples with code snippets and result screenshots.

ElasticsearchICU PluginIK Analyzer
0 likes · 11 min read
Mastering Elasticsearch Analyzers: A Deep Dive into Tokenizers and Filters
System Architect Go
System Architect Go
Sep 3, 2018 · Fundamentals

Understanding Elasticsearch Analyzer, Tokenizer, and Token Filters

This article explains the core components of Elasticsearch's full‑text search analysis—Analyzers, Tokenizers, and Token Filters—detailing their roles, building blocks, built‑in types, and how they combine to customize text processing for effective indexing and querying.

ElasticsearchFull‑Text SearchToken Filter
0 likes · 5 min read
Understanding Elasticsearch Analyzer, Tokenizer, and Token Filters