Tagged articles
64 articles
Page 1 of 1
Architects' Tech Alliance
Architects' Tech Alliance
May 8, 2026 · Artificial Intelligence

Token Fundamentals: A Technical Panorama of AI Language Units

Tokens are the smallest language building blocks that AI models process, representing characters, words, subwords, punctuation or emojis; they determine context window size and generation speed, so tokenization directly impacts model understanding accuracy and efficiency, as explained in the 2026 Token Report.

AI fundamentalsContext Windowlanguage models
0 likes · 4 min read
Token Fundamentals: A Technical Panorama of AI Language Units
Ops Community
Ops Community
Apr 21, 2026 · Artificial Intelligence

How to Tame Unstable LLM Prompts: Causes and Fixes

This article explains why large‑model prompts can yield inconsistent answers, examines the roles of temperature, top‑p/top‑k, tokenization, context windows, position bias, and model randomness, and provides a step‑by‑step debugging workflow and production‑grade best‑practice checklist to achieve stable outputs.

DebuggingLLM stabilityPrompt engineering
0 likes · 13 min read
How to Tame Unstable LLM Prompts: Causes and Fixes
Geek Labs
Geek Labs
Apr 20, 2026 · Artificial Intelligence

A Complete Open‑Source Guide to LLM Internals: From Tokenization to Inference Optimization

This open‑source tutorial breaks down large language model internals into 11 detailed topics—covering BPE tokenization, attention mathematics, backpropagation, transformer architecture, KV‑Cache, Paged and Flash Attention, and frontier techniques—each with numeric derivations and Python code, making it ideal for developers and interview preparation.

Flash AttentionInference OptimizationKV cache
0 likes · 5 min read
A Complete Open‑Source Guide to LLM Internals: From Tokenization to Inference Optimization
Architect's Tech Stack
Architect's Tech Stack
Apr 18, 2026 · Artificial Intelligence

What’s New in Claude Opus 4.7? Deep Dive into Capabilities and Migration Tips

Anthropic’s Claude Opus 4.7 launches with enhanced handling of complex, long‑running tasks, higher‑resolution visual analysis, stricter instruction compliance, improved benchmark scores, expanded file‑system memory, new effort levels (xhigh), API task‑budget beta, reinforced security measures, and migration guidance on tokenization and prompt adjustments.

AI modelAnthropicClaude Opus
0 likes · 4 min read
What’s New in Claude Opus 4.7? Deep Dive into Capabilities and Migration Tips
AgentGuide
AgentGuide
Apr 12, 2026 · Artificial Intelligence

What Is a Token? A Deep Dive into Tokenization Algorithms for LLMs

The article defines tokens (now officially called “词元”), explains why large language models require numeric input, and details three main tokenization strategies—word‑based, character‑based, and subword—along with the sub‑methods BPE, WordPiece, and Unigram, highlighting their advantages and drawbacks.

BPELLMUnigram
0 likes · 6 min read
What Is a Token? A Deep Dive into Tokenization Algorithms for LLMs
Code Mala Tang
Code Mala Tang
Apr 7, 2026 · Artificial Intelligence

Demystifying LLMs: From Tokens to Agents – An Engineer’s Deep Dive

This article provides a comprehensive, engineering‑focused breakdown of large language models, covering their Transformer roots, tokenization, context windows, prompt engineering, tool integration via MCP, and autonomous agents, while offering practical examples and actionable insights for developers.

AI fundamentalsAgentLLM
0 likes · 10 min read
Demystifying LLMs: From Tokens to Agents – An Engineer’s Deep Dive
AI Info Trend
AI Info Trend
Apr 2, 2026 · Industry Insights

What Will Shape FinTech in 2026? 9 Key Predictions Unveiled

The 2026 CB Insights FinTech report forecasts nine major trends—including new‑bank entrants, BNPL giants entering banking, Robinhood’s super‑app shift, crypto firms targeting banks, stablecoin‑driven AI payments, and the rise of trusted‑data prediction markets—offering a data‑driven roadmap for industry players and users alike.

AI agentsBNPLBanking
0 likes · 10 min read
What Will Shape FinTech in 2026? 9 Key Predictions Unveiled
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 31, 2026 · Artificial Intelligence

Unified Multimodal Modeling: How LongCat-Next Bridges Understanding and Generation

The article analyzes why text models naturally combine understanding and generation, explains the fundamental conflicts that prevent images from sharing the same tokenization, and details LongCat-Next’s discrete autoregressive approach—using SAE visual encoders, residual vector quantization, and a unified LLM backbone—to achieve a single model that can both comprehend and create multimodal content.

LongCat-NextRVQdNaViT
0 likes · 21 min read
Unified Multimodal Modeling: How LongCat-Next Bridges Understanding and Generation
PaperAgent
PaperAgent
Mar 30, 2026 · Artificial Intelligence

How LongCat-Next Redefines Multimodal AI with Discrete Tokens

The LongCat-Next model from Meituan introduces a native multimodal architecture that uses discrete tokenization for vision and audio, achieving unified understanding and generation across modalities while delivering state‑of‑the‑art benchmark performance and simplifying training pipelines.

AIBenchmarkMeituan
0 likes · 11 min read
How LongCat-Next Redefines Multimodal AI with Discrete Tokens
Architecture Digest
Architecture Digest
Mar 24, 2026 · Databases

How to Perform Fuzzy Searches on Encrypted Data: Methods, Pros, and Cons

This article examines why encrypted data hampers fuzzy queries, categorizes three implementation approaches—from naïve in‑memory decryption to conventional token‑based indexing and advanced algorithmic schemes—evaluates their performance, storage overhead, and security trade‑offs, and provides practical references.

Securityfuzzy-searchtokenization
0 likes · 10 min read
How to Perform Fuzzy Searches on Encrypted Data: Methods, Pros, and Cons
Full-Stack Cultivation Path
Full-Stack Cultivation Path
Mar 23, 2026 · Artificial Intelligence

What Exactly Is a Token in LLMs? A First‑Principles Explanation

The article explains that a token is the smallest discrete text unit a large language model processes, detailing why tokenization is essential, how tokenizers work, how tokens flow through the transformer, and how token counts affect context windows, cost, latency, and overall model behavior.

Context WindowCost ManagementEmbedding
0 likes · 20 min read
What Exactly Is a Token in LLMs? A First‑Principles Explanation
Weekly Large Model Application
Weekly Large Model Application
Mar 20, 2026 · Artificial Intelligence

Inside GLM-4-Voice: An End-to-End Chinese-English Speech Dialogue Model

GLM-4-Voice is an end-to-end Chinese-English speech dialogue model that aligns discrete speech tokens with GLM-4-9B, uses VQ-based tokenization at 12.5 token/s, supports emotion, tone, speed and dialect control, and offers streaming inference with low latency, while detailing its architecture, advantages, limitations and suitable use cases.

GLM-4-VoiceMultimodal AIflow matching
0 likes · 10 min read
Inside GLM-4-Voice: An End-to-End Chinese-English Speech Dialogue Model
Data STUDIO
Data STUDIO
Feb 25, 2026 · Artificial Intelligence

Build a Large Language Model from Scratch with PyTorch—No Libraries, No Shortcuts

This guide walks you through building, training, and fine‑tuning a Transformer‑based large language model entirely from scratch using PyTorch, covering tokenization, self‑attention, multi‑head attention, positional encoding, model architecture, data preparation, training loops, and fine‑tuning on custom lyrics.

Fine-tuningGPTLLM
0 likes · 43 min read
Build a Large Language Model from Scratch with PyTorch—No Libraries, No Shortcuts
AI Architecture Hub
AI Architecture Hub
Dec 24, 2025 · Artificial Intelligence

From LLMs to Autonomous Agents: The Three Evolution Stages of AI

This article explains the three evolutionary stages of AI—from large language models that generate text, through workflow‑enhanced systems using retrieval‑augmented generation, to fully autonomous agents capable of self‑directed decision‑making—while detailing the four core technologies that power each stage.

AI evolutionAgentEmbedding
0 likes · 9 min read
From LLMs to Autonomous Agents: The Three Evolution Stages of AI
Architect
Architect
Dec 15, 2025 · Artificial Intelligence

Demystifying LLM Architecture: From Transformers to Modern MoE Designs

This comprehensive guide explains the fundamentals of large language model (LLM) architectures, covering the original Transformer, tokenization, embeddings, positional encoding, attention mechanisms, feed‑forward networks, layer stacking, a step‑by‑step translation example, and the latest open‑source and hybrid LLM designs shaping the field.

EmbeddingLLMMoE
0 likes · 41 min read
Demystifying LLM Architecture: From Transformers to Modern MoE Designs
Tencent Cloud Developer
Tencent Cloud Developer
Dec 9, 2025 · Artificial Intelligence

How Do Large Language Models Turn Text into Math? A Deep Dive into Transformers

This article walks through the complete workflow of AI large language models, from turning user queries into token matrices via tokenization and embedding, through the Transformer’s self‑attention and multi‑head mechanisms, to decoding logits into human‑readable text, while also covering position encoding, long‑context strategies, generation parameters, and practical engineering tips.

Inference OptimizationSelf-AttentionTransformer
0 likes · 29 min read
How Do Large Language Models Turn Text into Math? A Deep Dive into Transformers
ShiZhen AI
ShiZhen AI
Dec 1, 2025 · Artificial Intelligence

AI Comic Episode 3: What Exactly Is a Token?

This episode explains that a token is the smallest text chunk an LLM processes—ranging from characters to subwords—covers why subword tokenization avoids vocabulary explosion, compares token counts across languages, describes the computational cost of sequential generation, and introduces visual tokens for multimodal models.

AI fundamentalslarge language modelsmultimodal
0 likes · 7 min read
AI Comic Episode 3: What Exactly Is a Token?
HyperAI Super Neural
HyperAI Super Neural
Nov 24, 2025 · Artificial Intelligence

Introducing AION-1: The First Astronomical Multimodal Foundation Model Trained on 200M Targets

AION-1, developed by a consortium including UC Berkeley, Cambridge and Oxford, is the first large‑scale multimodal foundation model for astronomy that unifies images, spectra and catalog data via an early‑fusion backbone, achieving zero‑shot and linear‑probe performance that rivals or surpasses task‑specific models across diverse scientific tasks.

Multimodal AIastronomycross‑modal generation
0 likes · 18 min read
Introducing AION-1: The First Astronomical Multimodal Foundation Model Trained on 200M Targets
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 28, 2025 · Artificial Intelligence

Demystifying AI Jargon: A Beginner’s Guide to Large Language Models

This guide breaks down the complex terminology of large language models—explaining tokens, transformers, self‑attention, RAG, scaling laws, dense vs. sparse architectures, and training stages—using clear analogies and step‑by‑step explanations so readers can confidently understand and work with modern AI systems.

AI fundamentalsModel TrainingRAG
0 likes · 35 min read
Demystifying AI Jargon: A Beginner’s Guide to Large Language Models
Bighead's Algorithm Notes
Bighead's Algorithm Notes
Sep 7, 2025 · Artificial Intelligence

Paper Review: Kronos – A Temporal Foundation Model for Financial Market Language

This article reviews Kronos, a unified and scalable pre‑training framework designed for financial K‑line data, detailing its tokenization approach, autoregressive architecture, large‑scale pre‑training on 12 billion records, and experimental results that show substantial gains in price prediction, volatility forecasting, synthetic data generation, and investment simulation.

Kronosautoregressive pretrainingfinancial time series
0 likes · 9 min read
Paper Review: Kronos – A Temporal Foundation Model for Financial Market Language
Qborfy AI
Qborfy AI
Aug 16, 2025 · Artificial Intelligence

Mastering LLM Tokens: How They Work, Cost, and Choose the Right Model

This article explains what tokens are in large language models, how they are counted and priced, compares tokenization methods across major models, and provides practical guidelines and code examples for optimizing token usage and selecting the appropriate model for different scenarios.

AICost OptimizationLLM
0 likes · 8 min read
Mastering LLM Tokens: How They Work, Cost, and Choose the Right Model
Qborfy AI
Qborfy AI
Aug 12, 2025 · Artificial Intelligence

What Powers Large Language Models? A Deep Dive into LLM Architecture and Scaling

This article explains how massive Transformer‑based large language models compress text data into mathematical representations, why scale, self‑attention, and training paradigms enable emergent general intelligence, and walks through tokenization, embedding, multi‑layer attention, architecture choices, energy costs, and hallucination mitigation.

AIEmbeddingLLM
0 likes · 6 min read
What Powers Large Language Models? A Deep Dive into LLM Architecture and Scaling
Data Party THU
Data Party THU
Aug 5, 2025 · Artificial Intelligence

Why State Space Models May Outperform Transformers: A Deep Dive

The article provides a comprehensive technical analysis of state space models (SSM) versus Transformers, covering their core mechanisms, three essential design factors, training efficiency, scaling behavior, tokenization debates, and experimental evidence that highlights the trade‑offs and potential advantages of SSMs in modern AI systems.

MambaState Space ModelTransformer
0 likes · 21 min read
Why State Space Models May Outperform Transformers: A Deep Dive
Ops Development & AI Practice
Ops Development & AI Practice
Aug 4, 2025 · Blockchain

Why ERC-1400 Is the Key to Compliant Security Tokens on Ethereum

The article explains how ERC-1400 extends ERC-20 with built‑in compliance features—such as KYC checks, transfer restrictions, tranche handling, on‑chain document storage, and forced transfer mechanisms—to enable legally compliant tokenization of real‑world assets like equity, bonds, and real‑estate.

BlockchainERC-1400Ethereum
0 likes · 7 min read
Why ERC-1400 Is the Key to Compliant Security Tokens on Ethereum
AI Frontier Lectures
AI Frontier Lectures
Jul 24, 2025 · Artificial Intelligence

State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling

This article analyzes the fundamental differences between state space models (SSM) and Transformer architectures, highlighting their three core components, training efficiency, memory handling, tokenization impact, and empirical performance trade‑offs, and argues why SSMs can outperform Transformers on many sequence tasks.

AI ArchitectureSequence ModelingTransformers
0 likes · 19 min read
State Space Models vs Transformers: Uncovering the Real Trade‑offs in Sequence Modeling
Tencent Technical Engineering
Tencent Technical Engineering
May 12, 2025 · Artificial Intelligence

Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture

This article provides a detailed Chinese‑to‑English summary of Andrej Karpathy’s 7‑hour LLM tutorial, covering chat process analysis, tokenization, pre‑training data pipelines, model architecture, training strategies, post‑training fine‑tuning, reinforcement learning, chain‑of‑thought reasoning, and current industry applications.

AILLMModel architecture
0 likes · 25 min read
Comprehensive Summary and Expansion of Andrej Karpathy’s 7‑Hour LLM Lecture
AI2ML AI to Machine Learning
AI2ML AI to Machine Learning
Apr 17, 2025 · Artificial Intelligence

Inside Qwen: A Deep Dive into the Large Model’s Source Code

The article provides a comprehensive technical walkthrough of Qwen’s large‑model series, covering data preparation, tokenization, model tweaks, training settings, RLHF pipeline, Code‑Qwen specifics, Qwen2 and Qwen3 architectural changes, scaling‑law experiments, and detailed source‑code analysis with illustrative diagrams.

MoEModel architectureQwen
0 likes · 7 min read
Inside Qwen: A Deep Dive into the Large Model’s Source Code
DevOps
DevOps
Apr 13, 2025 · Artificial Intelligence

The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap

This article reviews the breakthrough image‑generation capabilities of GPT‑4o, showcases diverse examples, and offers a detailed speculation on its underlying autoregressive architecture, tokenization methods, VQ‑VAE/GAN advances, and training strategies that could explain its performance.

AI researchGPT-4oVQ-VAE
0 likes · 16 min read
The Amazing Magic of GPT‑4o and a Speculative Technical Roadmap
58 Tech
58 Tech
Apr 11, 2025 · Artificial Intelligence

Optimization of Multimodal Visual Large Model Inference: Pre‑processing, ViT TensorRT, CUDA Graphs, Tokenization, Prefix Cache, and Quantization

This report details a comprehensive set of optimizations for multimodal visual large‑model (VLM) inference—including image pre‑processing acceleration, TensorRT integration for the ViT module, CUDA‑Graph replay, token‑count reduction, prefix‑cache handling, and weight quantization—demonstrating up to three‑fold throughput gains while maintaining accuracy.

CUDA GraphTensorRTinference-optimization
0 likes · 19 min read
Optimization of Multimodal Visual Large Model Inference: Pre‑processing, ViT TensorRT, CUDA Graphs, Tokenization, Prefix Cache, and Quantization
AI Algorithm Path
AI Algorithm Path
Apr 10, 2025 · Artificial Intelligence

Beginner-Friendly Guide to Understanding Large Language Models

This article walks readers through the fundamentals of large language models, covering what tokens are, how tokenization works, the conversion of tokens to numeric IDs, the transformer architecture—including positional encoding, self‑attention, feed‑forward networks and softmax—and explains how these components enable next‑token prediction.

EmbeddingLLMSelf-Attention
0 likes · 9 min read
Beginner-Friendly Guide to Understanding Large Language Models
Code Mala Tang
Code Mala Tang
Mar 27, 2025 · Artificial Intelligence

How Do BPE, WordPiece, and SentencePiece Shape Modern NLP Tokenization?

This article explains the fundamentals, workflows, examples, and trade‑offs of three major subword tokenization algorithms—Byte Pair Encoding, WordPiece, and SentencePiece—helping practitioners choose the right method for their large language model pipelines.

BPENLPSentencePiece
0 likes · 12 min read
How Do BPE, WordPiece, and SentencePiece Shape Modern NLP Tokenization?
NewBeeNLP
NewBeeNLP
Jan 17, 2025 · Artificial Intelligence

Unlocking Multimodal Intelligence: A Deep Dive into Next Token Prediction

This comprehensive survey examines the foundations, tokenization techniques, model architectures, training paradigms, evaluation benchmarks, and open challenges of multimodal next‑token prediction (MMNTP), offering researchers a clear roadmap for future advances in multimodal AI.

Model architectureMultimodal AINext Token Prediction
0 likes · 9 min read
Unlocking Multimodal Intelligence: A Deep Dive into Next Token Prediction
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 17, 2024 · Frontend Development

Choosing the Best LangChain Text Splitter for Frontend LLM Apps

This article compares five LangChain text splitters—CharacterTextSplitter, RecursiveCharacterTextSplitter, TokenTextSplitter, MarkdownTextSplitter, and LatexTextSplitter—by examining their principles, pros and cons, and ideal use cases, helping developers select the most suitable splitter for their frontend large‑model applications.

JavaScriptLLMLangChain
0 likes · 10 min read
Choosing the Best LangChain Text Splitter for Frontend LLM Apps
Infra Learning Club
Infra Learning Club
Oct 31, 2024 · Artificial Intelligence

What Is a Token in Large Language Models?

The article explains that a token is the unit processed by large language models, describes three common tokenizer methods—word‑level, character‑level, and sub‑word level—with English and Chinese examples, discusses their advantages and limitations, and shows how OpenAI’s tokenizer varies across model versions.

NLPTokencharacter-level
0 likes · 5 min read
What Is a Token in Large Language Models?
Java Tech Enthusiast
Java Tech Enthusiast
Sep 15, 2024 · Fundamentals

How Source Code Is Transformed into Machine Instructions

A compiler transforms source code into executable machine instructions by first tokenizing the text into keywords, identifiers and literals, then parsing these tokens into an abstract syntax tree, generating and optimizing intermediate code, and finally assembling and linking the output for the target architecture or LLVM IR.

ASTLLVMMachine Code
0 likes · 4 min read
How Source Code Is Transformed into Machine Instructions
21CTO
21CTO
Aug 11, 2024 · Artificial Intelligence

Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI

This article explains the fundamentals of large language models, covering tokenization, probability prediction, Markov chain basics, training data limitations, context windows, and the transition to neural network architectures like Transformers, while providing Python examples and insights into model scaling and the illusion of intelligence.

AILLMNeural Networks
0 likes · 18 min read
Demystifying LLMs: How Tokens, Training, and Transformers Power Generative AI
Architect
Architect
Aug 11, 2024 · Artificial Intelligence

Understanding Large Language Models: Tokens, Tokenization, and the Evolution from Markov Chains to Transformers

This article explains how generative AI models work by demystifying tokens, tokenization with tools like tiktoken, simple Markov‑chain training, the limitations of small context windows, and how modern LLMs use neural networks, transformers and attention mechanisms to predict the next token.

LLMMarkov chainTransformer
0 likes · 20 min read
Understanding Large Language Models: Tokens, Tokenization, and the Evolution from Markov Chains to Transformers
Java Tech Enthusiast
Java Tech Enthusiast
Jul 16, 2024 · Artificial Intelligence

LLMs Misjudge Simple Number Comparison: 9.11 vs 9.9

Recent tests reveal that popular large language models—including GPT‑4o, Gemini Advanced, and Claude 3.5—often claim 9.11 is larger than 9.9 because their tokenizers split the numbers, but rephrasing, zero‑shot chain‑of‑thought prompts, or treating the values as floating‑point numbers can correct the mistake, a pattern also seen variably in Chinese models.

AI EvaluationLLMPrompt engineering
0 likes · 7 min read
LLMs Misjudge Simple Number Comparison: 9.11 vs 9.9
JD Cloud Developers
JD Cloud Developers
Jun 25, 2024 · Artificial Intelligence

Why Do Large Language Models Output Text Word‑by‑Word? Inside the Transformer Mechanics

This article explains the fundamental architecture of large language models, from the dual file nature of parameters and code, through neural network basics, perceptrons, and weight training, to the Transformer’s tokenization, positional encoding, self‑attention, and inference processes, illustrated with diagrams and examples.

Neural NetworkSelf-AttentionTransformer
0 likes · 22 min read
Why Do Large Language Models Output Text Word‑by‑Word? Inside the Transformer Mechanics
Python Programming Learning Circle
Python Programming Learning Circle
Nov 17, 2023 · Big Data

Building a Simple Search Engine with Bloom Filter, Tokenization, and Inverted Index in Python

This article demonstrates how to implement a basic big‑data search engine in Python by creating a Bloom filter for fast existence checks, designing tokenization functions for major and minor segmentation, building an inverted index, and supporting AND/OR queries with example code and execution results.

Big DataSearchbloom-filter
0 likes · 12 min read
Building a Simple Search Engine with Bloom Filter, Tokenization, and Inverted Index in Python
Meituan Technology Team
Meituan Technology Team
Sep 22, 2022 · Information Security

Tokenization for Data Security: Design, Implementation, and Engineering Practices

The article explains how tokenization transforms data security into a built‑in attribute that automatically scales with data growth, detailing its design principles, generation methods, architectural layers, security safeguards, and practical engineering experiences to address exposure risks in modern digital businesses.

Data GovernancePIISecurity Architecture
0 likes · 24 min read
Tokenization for Data Security: Design, Implementation, and Engineering Practices
Programmer DD
Programmer DD
Aug 30, 2022 · Artificial Intelligence

How to Build a Custom HanLP Analyzer Plugin for Elasticsearch with Nginx

This guide walks through setting up a Java GraalVM 17 environment, installing Nginx to serve static dictionary files, configuring a HanLP‑based Elasticsearch analyzer plugin, packaging and deploying it, and testing the analyzer with JUnit5 and curl commands.

ElasticsearchHanLPJava
0 likes · 14 min read
How to Build a Custom HanLP Analyzer Plugin for Elasticsearch with Nginx
Baidu Geek Talk
Baidu Geek Talk
Mar 21, 2022 · Frontend Development

How WebKit Parses HTML: Decoding, Tokenization, and DOM Tree Construction

The article details WebKit’s rendering pipeline in WKWebView, describing how the network process streams HTML bytes to the rendering process, which decodes them via TextResourceDecoder, tokenizes the characters with HTMLTokenizer’s state machine, and constructs an efficient DOM tree using HTMLTreeBuilder and queued insertion tasks.

DOMWebKitbrowser engine
0 likes · 33 min read
How WebKit Parses HTML: Decoding, Tokenization, and DOM Tree Construction
Python Crawling & Data Mining
Python Crawling & Data Mining
Jun 16, 2021 · Artificial Intelligence

Master Chinese Text Segmentation with jieba: Installation, Modes, and Advanced Tricks

This tutorial walks you through installing the jieba Python library, explains its three segmentation modes—precise, full, and search—demonstrates how to add or delete words, manage custom dictionaries, handle stop words, perform weight analysis, adjust word frequencies, and retrieve token positions, all with clear code examples and visual output.

NLPPythonchinese segmentation
0 likes · 10 min read
Master Chinese Text Segmentation with jieba: Installation, Modes, and Advanced Tricks
WecTeam
WecTeam
Oct 24, 2019 · Fundamentals

How to Build a JavaScript Lexer for Arithmetic Expressions Using a Finite State Machine

This article explains how to implement a lexical analyzer in JavaScript that tokenizes simple arithmetic expressions by using a finite state machine, covering the conversion from infix notation to an abstract syntax tree, token definitions, state transitions, and complete source code examples.

ASTFinite State MachineJavaScript
0 likes · 9 min read
How to Build a JavaScript Lexer for Arithmetic Expressions Using a Finite State Machine
Java Captain
Java Captain
Feb 17, 2019 · Backend Development

Implementing a JSON Parser in Java: Structures, Tokenization, and Parsing

This article explains the fundamentals of JSON, its object and array structures, maps JSON types to Java equivalents, and provides a complete Java implementation of a JSON parser including token definitions, lexical analysis, and object/array construction with detailed code examples.

JSONJavaParser
0 likes · 14 min read
Implementing a JSON Parser in Java: Structures, Tokenization, and Parsing
MaGe Linux Operations
MaGe Linux Operations
Jan 23, 2019 · Big Data

How Bloom Filters Power Fast Big Data Searches with Python

This tutorial walks through building a simple Python search engine for big data, covering Bloom filter basics, tokenization with major and minor segmentation, inverted index creation, and implementing both simple and complex (AND/OR) queries, complete with code examples and visual illustrations.

AND/OR queriesBig DataPython
0 likes · 15 min read
How Bloom Filters Power Fast Big Data Searches with Python
MaGe Linux Operations
MaGe Linux Operations
Nov 27, 2018 · Big Data

How a Simple Python Bloom Filter Powers Fast Big Data Search

This article demonstrates how to implement a basic Bloom filter, tokenization, and inverted index in Python to illustrate the core principles of big‑data search, including fast negative lookups, term segmentation, and support for AND/OR queries.

AND/OR queriesbig data searchbloom-filter
0 likes · 13 min read
How a Simple Python Bloom Filter Powers Fast Big Data Search
High Availability Architecture
High Availability Architecture
Jun 4, 2018 · Blockchain

Key Competitive Points of Public Chains and Insights from the GIAC Shenzhen Conference

The article summarizes the author’s takeaways from the GIAC Shenzhen conference, analyzing various public‑chain projects, their architectural choices, competitive focuses such as scalability, asset tokenization, DApp support, and the role of alliance chains in finance, traceability, and anti‑counterfeiting.

ScalabilityUTXOalliance chain
0 likes · 8 min read
Key Competitive Points of Public Chains and Insights from the GIAC Shenzhen Conference
Architects Research Society
Architects Research Society
Mar 31, 2018 · Blockchain

Understanding Blockchain Technology: Records, Identity, Smart Contracts, and Real‑World Applications

This article explains how blockchain, as a decentralized and trust‑less ledger, enables digital identity, tokenized assets, smart contracts, automated governance, and streamlined settlement across sectors such as finance, government, and healthcare, while also discussing its challenges and future potential.

decentralized ledgerdigital identitygovernance
0 likes · 7 min read
Understanding Blockchain Technology: Records, Identity, Smart Contracts, and Real‑World Applications
ITPUB
ITPUB
Dec 23, 2015 · Artificial Intelligence

How Computers Turn Words into Numbers: A Beginner’s Guide to Tokenization and Vector Similarity

This article explains how natural language processing stores word meanings as numeric vectors, builds token dictionaries, represents sentences as binary vectors, and uses dot‑product calculations to measure similarity, illustrating concepts with simple examples and highlighting current limitations and future directions.

NLPartificial intelligencetokenization
0 likes · 7 min read
How Computers Turn Words into Numbers: A Beginner’s Guide to Tokenization and Vector Similarity