Tag

deduplication

0 views collected around this technical thread.

php中文网 Courses
php中文网 Courses
Jun 11, 2025 · Backend Development

Master PHP’s array_unique(): Remove Duplicates and Sort Arrays Efficiently

Learn how to use PHP’s array_unique() function to eliminate duplicate values from arrays, understand its parameters—including optional sort flags—and see practical code examples that demonstrate deduplication and sorting for both string and numeric arrays.

PHParray_uniquearrays
0 likes · 4 min read
Master PHP’s array_unique(): Remove Duplicates and Sort Arrays Efficiently
macrozheng
macrozheng
Apr 7, 2025 · Big Data

How to Deduplicate 4 Billion QQ Numbers Using a Bitmap Under 1 GB

This article explains how to efficiently remove duplicates from 4 billion QQ numbers within a 1 GB memory limit by replacing the naïve HashSet approach with a memory‑saving Bitmap data structure, complete with calculations, algorithm steps, Java code, and a discussion of its pros and cons.

Big DataBitmapJava
0 likes · 9 min read
How to Deduplicate 4 Billion QQ Numbers Using a Bitmap Under 1 GB
IT Services Circle
IT Services Circle
Mar 8, 2025 · Backend Development

Handling Duplicate Messages in Message Queues: Semantics, Producer and Broker Deduplication, and Consumer Strategies

Message queues can cause duplicate messages that affect idempotent business processes, so this article explains the three delivery semantics (At Least Once, Exactly Once, At Most Once), the causes of duplication, and practical deduplication techniques for producers, brokers (Kafka, Pulsar), and consumers using code examples.

KafkaPulsardeduplication
0 likes · 8 min read
Handling Duplicate Messages in Message Queues: Semantics, Producer and Broker Deduplication, and Consumer Strategies
Java Architect Essentials
Java Architect Essentials
Nov 24, 2024 · Big Data

Using Bitmap and Bloom Filter for Large‑Scale Data Deduplication in Java

The article explains how to store and deduplicate billions of identifiers by using a bitmap to represent presence with a single bit per value, calculates memory requirements, shows Redis bitmap commands, and introduces Bloom filters as an extension with multiple hash functions for efficient large‑scale data handling.

Big DataBitmapBloom Filter
0 likes · 5 min read
Using Bitmap and Bloom Filter for Large‑Scale Data Deduplication in Java
php中文网 Courses
php中文网 Courses
Nov 22, 2024 · Backend Development

Understanding PHP's array_unique() Function: Definition, Implementation, Usage, and Performance Optimization

This article explains PHP's array_unique() function, covering its definition, parameters, implementation, usage examples, and performance optimization techniques, while providing complete code snippets and practical guidance for developers, including discussion of alternative approaches such as array_flip and array_keys for faster deduplication.

FunctionsPHParray_unique
0 likes · 5 min read
Understanding PHP's array_unique() Function: Definition, Implementation, Usage, and Performance Optimization
Python Programming Learning Circle
Python Programming Learning Circle
Aug 30, 2024 · Fundamentals

Understanding Python Set Type: Definitions, Operations, and Applications

This article explains Python's set type, covering its definition, creation (including empty sets), key properties such as unordered uniqueness, conversion using set(), the eight basic and four augmented set operators, common set methods, and practical applications like data deduplication.

Pythondata structuresdeduplication
0 likes · 5 min read
Understanding Python Set Type: Definitions, Operations, and Applications
DataFunSummit
DataFunSummit
Dec 16, 2023 · Databases

Optimizing Precise Deduplication with Doris Bitmap: Architecture, Performance Enhancements, and Practical Practices

This article presents a comprehensive overview of precise deduplication in Meituan's Doris database, detailing the underlying bitmap data structures, aggregation bottlenecks, and a series of optimizations—including memory management, fast union, orthogonal encoding, and vectorized engine integration—that together achieve significant performance gains in high‑cardinality scenarios.

BitmapDatabaseDoris
0 likes · 20 min read
Optimizing Precise Deduplication with Doris Bitmap: Architecture, Performance Enhancements, and Practical Practices
php中文网 Courses
php中文网 Courses
Nov 14, 2023 · Backend Development

Using PHP’s array_unique() Function to Remove Duplicate Array Elements

This article explains PHP’s array_unique() function, detailing its syntax, parameters, return value, and optional sorting flag, and provides code examples that demonstrate how to remove duplicate elements from arrays and control sorting behavior.

PHParrayarray_unique
0 likes · 5 min read
Using PHP’s array_unique() Function to Remove Duplicate Array Elements
政采云技术
政采云技术
Sep 19, 2023 · Big Data

Techniques for Processing Massive Data: Sorting, Querying, Top‑K, and Deduplication

This article explains core concepts and practical solutions for handling massive datasets that cannot fit into memory, covering batch processing, distributed sorting, bitmap indexing, hash‑based lookups, top‑K extraction, and deduplication techniques with code examples and multi‑machine strategies.

Big Databitmap indexingdeduplication
0 likes · 18 min read
Techniques for Processing Massive Data: Sorting, Querying, Top‑K, and Deduplication
Python Programming Learning Circle
Python Programming Learning Circle
Sep 14, 2023 · Fundamentals

Python Set Type Definition, Operations, and Applications

This article explains Python's set data type, covering its definition, unordered and unique characteristics, creation methods, basic and augmented operators, practical usage examples, handling techniques, and common application scenarios such as data deduplication.

Collectiondata structuresdeduplication
0 likes · 6 min read
Python Set Type Definition, Operations, and Applications
Architect's Tech Stack
Architect's Tech Stack
Aug 3, 2023 · Fundamentals

Performance Comparison of Different Java List Deduplication Methods

This article examines several Java deduplication techniques—including List.contains, HashSet, double-loop removal, and Stream.distinct—by providing sample code, measuring execution time on a 20,000‑element list, and analyzing their time complexities to guide developers toward efficient duplicate‑removal strategies.

HashSetJavaStream
0 likes · 7 min read
Performance Comparison of Different Java List Deduplication Methods
Architect's Guide
Architect's Guide
Jul 24, 2023 · Big Data

Using Bitmap and Bloom Filter to De‑duplicate 4 Billion IDs Within 1 GB Memory

The article explains how to store and de‑duplicate 4 billion unsigned integers using a bitmap to reduce memory from 14.9 GB to under 500 MB, introduces the concept and benefits of bitmaps, describes Bloom filters, their principles, advantages, limitations, typical use cases, and provides Java and Redis implementation examples.

Big DataBitmapBloom Filter
0 likes · 10 min read
Using Bitmap and Bloom Filter to De‑duplicate 4 Billion IDs Within 1 GB Memory
政采云技术
政采云技术
May 31, 2022 · Fundamentals

Efficient Parent‑Child Relationship Deduplication Using Hashset Caching

This article presents an efficient deduplication algorithm for parent‑child relationship validation that caches intermediate results in a hashset to eliminate redundant computations, dramatically improving performance and scalability for large datasets by reducing verification steps through stored validated nodes.

HashSetalgorithmdeduplication
0 likes · 11 min read
Efficient Parent‑Child Relationship Deduplication Using Hashset Caching
Top Architect
Top Architect
May 21, 2022 · Backend Development

Handling Duplicate Requests in Backend Services with Redis and Java

This article explains various techniques for detecting and preventing duplicate backend requests—using unique request IDs, business parameter hashing, MD5 summaries, and Redis SETNX with expiration—providing Java code examples and a complete deduplication utility.

Duplicate RequestJavaMD5
0 likes · 9 min read
Handling Duplicate Requests in Backend Services with Redis and Java
DeWu Technology
DeWu Technology
Apr 29, 2022 · Frontend Development

Optimization of Tinode Message Processing in a Customer Service Dashboard

The article examines Tinode‑based message‑processing bottlenecks in a multi‑channel customer‑service dashboard and proposes a suite of optimizations—including global deduplication maps, binary‑search insertion sorting, cache reclamation, targeted status updates, and asynchronous keyword interception—that together cut average first‑response time from 8.40 s to 6.82 s and overall response time from 19.9 s to 18.22 s, proving that careful cache design and algorithmic refinements markedly boost real‑time IM performance.

CacheMessage OptimizationPerformance
0 likes · 9 min read
Optimization of Tinode Message Processing in a Customer Service Dashboard
Laravel Tech Community
Laravel Tech Community
Apr 11, 2022 · Databases

Three Methods for SQL Deduplication: DISTINCT, GROUP BY, and ROW_NUMBER

This article explains three common SQL deduplication techniques—using DISTINCT, GROUP BY, and the ROW_NUMBER window function—illustrating their syntax, performance considerations, and example queries for counting unique task identifiers in relational databases.

DISTINCTGROUP BYROW_NUMBER
0 likes · 4 min read
Three Methods for SQL Deduplication: DISTINCT, GROUP BY, and ROW_NUMBER
vivo Internet Technology
vivo Internet Technology
Mar 30, 2022 · Backend Development

Design of a Bloom Filter‑Based Video Recommendation Deduplication Service for Short Video Platforms

The paper proposes a Bloom‑filter‑based deduplication service for short‑video recommendation that moves three‑month playback histories to disk‑backed Bloom filters while keeping the latest 100 served IDs in Redis, employing write batching, sharding, expiration policies, and an incremental migration strategy to replace memory‑intensive Redis ZSets and dramatically reduce storage costs.

Bloom FilterRedisdata migration
0 likes · 21 min read
Design of a Bloom Filter‑Based Video Recommendation Deduplication Service for Short Video Platforms
Code Ape Tech Column
Code Ape Tech Column
Jan 21, 2022 · Backend Development

Message Deduplication and Exactly-Once Semantics in RocketMQ: Strategies and Implementation

This article explains the challenges of at‑least‑once delivery in distributed message middleware like RocketMQ, examines simple and concurrent deduplication techniques, and presents both transaction‑based and non‑transactional exactly‑once solutions using database tables or Redis, along with practical Java code examples.

RedisTransactiondeduplication
0 likes · 17 min read
Message Deduplication and Exactly-Once Semantics in RocketMQ: Strategies and Implementation
Laravel Tech Community
Laravel Tech Community
Nov 11, 2021 · Databases

Using DISTINCT, GROUP BY, and ROW_NUMBER for De‑duplication in SQL

This article explains how to remove duplicate rows in SQL by using DISTINCT, GROUP BY, and the ROW_NUMBER window function, compares their syntax and performance, and provides concrete examples with a Task table and a Test helper table.

DISTINCTGROUP BYROW_NUMBER
0 likes · 4 min read
Using DISTINCT, GROUP BY, and ROW_NUMBER for De‑duplication in SQL