Tagged articles
94 articles
Page 1 of 1
AI Architect Hub
AI Architect Hub
Apr 24, 2026 · Artificial Intelligence

RAG Level 1: Avoid Dirty Data Poisoning Your AI – A Data Cleaning Guide

This article explains why noisy documents cripple Retrieval‑Augmented Generation, enumerates common garbage data types, describes three typical data‑quality problems, warns against over‑cleaning, encoding, and regex pitfalls, and provides a configurable LangChain pipeline with deduplication and validation best practices.

AIEmbeddingLangChain
0 likes · 21 min read
RAG Level 1: Avoid Dirty Data Poisoning Your AI – A Data Cleaning Guide
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Dec 27, 2025 · Backend Development

Why Kafka Messages Duplicate and How to Prevent It

The article explains the main causes of duplicate Kafka messages—including producer retries, consumer offset handling, partition leader changes, and lack of idempotence—and provides practical configuration and design solutions to achieve exactly‑once delivery.

Consumer OffsetMessage DuplicationReplication
0 likes · 5 min read
Why Kafka Messages Duplicate and How to Prevent It
Sohu Tech Products
Sohu Tech Products
Oct 9, 2025 · Databases

When to Use SELECT DISTINCT vs GROUP BY in MySQL: Performance & Use Cases

This article compares MySQL’s SELECT DISTINCT and GROUP BY clauses, explaining their syntax, functional differences, performance implications, and ideal scenarios through detailed examples, index usage analysis, and a feature comparison table, helping developers choose the right approach for deduplication or aggregation tasks.

GROUP BYSELECT DISTINCTSQL Performance
0 likes · 10 min read
When to Use SELECT DISTINCT vs GROUP BY in MySQL: Performance & Use Cases
NiuNiu MaTe
NiuNiu MaTe
Sep 22, 2025 · Big Data

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

Learn four practical techniques—simple sorting, hashmap deduplication, external merge sort, and bitmap bit‑set optimization—to efficiently remove duplicate QQ numbers from a 40‑billion‑record file while staying within a strict 1 GB memory limit, even handling tighter 100 MB constraints.

Big DataBitmapalgorithm
0 likes · 9 min read
How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM
ITPUB
ITPUB
Jul 29, 2025 · Big Data

How to Deduplicate 4 Billion QQ IDs Using a Bitmap Within 1 GB Memory

Learn how to efficiently remove duplicates from 4 billion QQ numbers using a memory‑friendly Bitmap approach that fits within a 1 GB limit, including calculations, step‑by‑step implementation, Java code, and a discussion of its advantages and drawbacks.

Big DataBitmapData Structures
0 likes · 9 min read
How to Deduplicate 4 Billion QQ IDs Using a Bitmap Within 1 GB Memory
ITPUB
ITPUB
Jul 28, 2025 · Backend Development

How WeChat Guarantees No Lost Messages: The Secrets of Reliable IM Delivery

This article explains the three types of IM packets—Request, Acknowledge, and Notify—illustrates the basic message flow between client and server, identifies reliability gaps such as lost notifications, and proposes an application‑level solution using acknowledgments, timeout‑driven retransmission, and message deduplication to achieve dependable delivery.

AcknowledgmentBackend ArchitectureIM
0 likes · 8 min read
How WeChat Guarantees No Lost Messages: The Secrets of Reliable IM Delivery
Su San Talks Tech
Su San Talks Tech
Jul 17, 2025 · Big Data

How to De‑Duplicate 1 Billion QQ Numbers Using Under 1 GB of Memory

This article explores multiple techniques—including bitmap indexing, Bloom filters, external sorting, Spark, and layered bitmap structures—to efficiently remove duplicate QQ numbers from a dataset of up to one billion entries while keeping memory usage below a gigabyte and maintaining high accuracy.

BitmapDistributed SystemsSpark
0 likes · 12 min read
How to De‑Duplicate 1 Billion QQ Numbers Using Under 1 GB of Memory
macrozheng
macrozheng
Apr 7, 2025 · Big Data

How to Deduplicate 4 Billion QQ Numbers Using a Bitmap Under 1 GB

This article explains how to efficiently remove duplicates from 4 billion QQ numbers within a 1 GB memory limit by replacing the naïve HashSet approach with a memory‑saving Bitmap data structure, complete with calculations, algorithm steps, Java code, and a discussion of its pros and cons.

BitmapJavaMemory Optimization
0 likes · 9 min read
How to Deduplicate 4 Billion QQ Numbers Using a Bitmap Under 1 GB
IT Services Circle
IT Services Circle
Mar 8, 2025 · Backend Development

Handling Duplicate Messages in Message Queues: Semantics, Producer and Broker Deduplication, and Consumer Strategies

Message queues can cause duplicate messages that affect idempotent business processes, so this article explains the three delivery semantics (At Least Once, Exactly Once, At Most Once), the causes of duplication, and practical deduplication techniques for producers, brokers (Kafka, Pulsar), and consumers using code examples.

IdempotenceKafkaPulsar
0 likes · 8 min read
Handling Duplicate Messages in Message Queues: Semantics, Producer and Broker Deduplication, and Consumer Strategies
Java Architect Essentials
Java Architect Essentials
Nov 24, 2024 · Big Data

Using Bitmap and Bloom Filter for Large‑Scale Data Deduplication in Java

The article explains how to store and deduplicate billions of identifiers by using a bitmap to represent presence with a single bit per value, calculates memory requirements, shows Redis bitmap commands, and introduces Bloom filters as an extension with multiple hash functions for efficient large‑scale data handling.

BitmapData Structuresbloom-filter
0 likes · 5 min read
Using Bitmap and Bloom Filter for Large‑Scale Data Deduplication in Java
php Courses
php Courses
Nov 22, 2024 · Backend Development

Understanding PHP's array_unique() Function: Definition, Implementation, Usage, and Performance Optimization

This article explains PHP's array_unique() function, covering its definition, parameters, implementation, usage examples, and performance optimization techniques, while providing complete code snippets and practical guidance for developers, including discussion of alternative approaches such as array_flip and array_keys for faster deduplication.

BackendPHParray_unique
0 likes · 5 min read
Understanding PHP's array_unique() Function: Definition, Implementation, Usage, and Performance Optimization
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 25, 2023 · Big Data

How to Cut Data Cube Processing Time by 60% with Deduplication Optimization

This article explains how to dramatically reduce the cost of deduplication‑Cube calculations in large‑scale data pipelines by replacing costly data‑expansion steps with a UID‑level tagging approach, detailing the scenario, common methods, performance analysis, a new solution, implementation steps, and experimental results.

Big DataSQL Optimizationdata cube
0 likes · 15 min read
How to Cut Data Cube Processing Time by 60% with Deduplication Optimization
DataFunSummit
DataFunSummit
Dec 16, 2023 · Databases

Optimizing Precise Deduplication with Doris Bitmap: Architecture, Performance Enhancements, and Practical Practices

This article presents a comprehensive overview of precise deduplication in Meituan's Doris database, detailing the underlying bitmap data structures, aggregation bottlenecks, and a series of optimizations—including memory management, fast union, orthogonal encoding, and vectorized engine integration—that together achieve significant performance gains in high‑cardinality scenarios.

BitmapOLAPdatabase
0 likes · 20 min read
Optimizing Precise Deduplication with Doris Bitmap: Architecture, Performance Enhancements, and Practical Practices
Ops Development Stories
Ops Development Stories
Nov 10, 2023 · Backend Development

Master Java Stream API: Grouping, Sorting, Deduplication & More

This guide demonstrates essential Java Stream API techniques—including grouping collections into maps, extracting first entries, performing reductions to find max or min, converting streams to lists, sorting, removing duplicates, and a comprehensive list of frequently used stream operations—providing practical code examples for Java 11 and 17.

JavaSortingStream API
0 likes · 7 min read
Master Java Stream API: Grouping, Sorting, Deduplication & More
Architect's Tech Stack
Architect's Tech Stack
Aug 3, 2023 · Fundamentals

Performance Comparison of Different Java List Deduplication Methods

This article examines several Java deduplication techniques—including List.contains, HashSet, double-loop removal, and Stream.distinct—by providing sample code, measuring execution time on a 20,000‑element list, and analyzing their time complexities to guide developers toward efficient duplicate‑removal strategies.

CollectionsJavaStream
0 likes · 7 min read
Performance Comparison of Different Java List Deduplication Methods
Architect's Guide
Architect's Guide
Jul 24, 2023 · Big Data

Using Bitmap and Bloom Filter to De‑duplicate 4 Billion IDs Within 1 GB Memory

The article explains how to store and de‑duplicate 4 billion unsigned integers using a bitmap to reduce memory from 14.9 GB to under 500 MB, introduces the concept and benefits of bitmaps, describes Bloom filters, their principles, advantages, limitations, typical use cases, and provides Java and Redis implementation examples.

Big DataBitmapData Structures
0 likes · 10 min read
Using Bitmap and Bloom Filter to De‑duplicate 4 Billion IDs Within 1 GB Memory
Programmer DD
Programmer DD
Jun 13, 2023 · Backend Development

Why list.contains Is So Slow: Java Deduplication Performance Showdown

This article compares several Java duplicate‑removal techniques—including list.contains, HashSet, double‑loop removal, and Stream.distinct—by generating a 20 000‑element test list, measuring execution time, and explaining the underlying algorithmic complexities that make some approaches dramatically faster than others.

ListStreamdeduplication
0 likes · 7 min read
Why list.contains Is So Slow: Java Deduplication Performance Showdown
dbaplus Community
dbaplus Community
May 19, 2023 · Backend Development

How Vivo Scaled Its E‑Commerce Inventory: Architecture & High‑Concurrency Solutions

Vivo’s e‑commerce inventory system evolved from a monolithic design into a multi‑layered service architecture that separates warehouse, scheduling, and sales layers, introduces distinct stock types, implements deduplication, anti‑oversell, high‑concurrency and hotspot mitigation strategies, and integrates sync mechanisms with warehouses and product systems.

System Designdeduplicatione‑commerce
0 likes · 16 min read
How Vivo Scaled Its E‑Commerce Inventory: Architecture & High‑Concurrency Solutions
Network Intelligence Research Center (NIRC)
Network Intelligence Research Center (NIRC)
May 10, 2023 · Artificial Intelligence

How LLaMA Preprocesses Training Data with CCNet Before Model Training

Before training large language models like LLaMA, MetaAI applies a multi‑stage CCNet pipeline that crawls web data, stores it in WET format, deduplicates paragraphs, detects and filters languages using fastText, and further refines content by similarity to Wikipedia and citation‑based linear models.

CCNetLLaMAdata preprocessing
0 likes · 7 min read
How LLaMA Preprocesses Training Data with CCNet Before Model Training
Huolala Tech
Huolala Tech
Apr 25, 2023 · Frontend Development

How to Solve Async Race Conditions and Caching in React with Custom Hooks

This article explains why React's synchronous component model struggles with async data fetching, demonstrates how to cancel stale requests, introduces a refetch‑only hook, and shows how to implement caching and deduplication to make queries pure functions, while recommending React Query and SWR for production use.

AsyncCustom HookReact
0 likes · 9 min read
How to Solve Async Race Conditions and Caching in React with Custom Hooks
Python Crawling & Data Mining
Python Crawling & Data Mining
Nov 29, 2022 · Fundamentals

How to Simplify Image Filename Deduplication in Python

This article walks through a practical Python example for deduplicating image file names, compares an initial verbose implementation with a more concise solution, and demonstrates how to reduce redundant conditional checks for cleaner, more readable code.

Code OptimizationPythondeduplication
0 likes · 4 min read
How to Simplify Image Filename Deduplication in Python
政采云技术
政采云技术
May 31, 2022 · Fundamentals

Efficient Parent‑Child Relationship Deduplication Using Hashset Caching

This article presents an efficient deduplication algorithm for parent‑child relationship validation that caches intermediate results in a hashset to eliminate redundant computations, dramatically improving performance and scalability for large datasets by reducing verification steps through stored validated nodes.

Parent-ChildPerformance Optimizationdeduplication
0 likes · 11 min read
Efficient Parent‑Child Relationship Deduplication Using Hashset Caching
Top Architect
Top Architect
May 21, 2022 · Backend Development

Handling Duplicate Requests in Backend Services with Redis and Java

This article explains various techniques for detecting and preventing duplicate backend requests—using unique request IDs, business parameter hashing, MD5 summaries, and Redis SETNX with expiration—providing Java code examples and a complete deduplication utility.

BackendMD5Request ID
0 likes · 9 min read
Handling Duplicate Requests in Backend Services with Redis and Java
DeWu Technology
DeWu Technology
Apr 29, 2022 · Frontend Development

Optimization of Tinode Message Processing in a Customer Service Dashboard

The article examines Tinode‑based message‑processing bottlenecks in a multi‑channel customer‑service dashboard and proposes a suite of optimizations—including global deduplication maps, binary‑search insertion sorting, cache reclamation, targeted status updates, and asynchronous keyword interception—that together cut average first‑response time from 8.40 s to 6.82 s and overall response time from 19.9 s to 18.22 s, proving that careful cache design and algorithmic refinements markedly boost real‑time IM performance.

Message OptimizationTinodededuplication
0 likes · 9 min read
Optimization of Tinode Message Processing in a Customer Service Dashboard
vivo Internet Technology
vivo Internet Technology
Mar 30, 2022 · Backend Development

Design of a Bloom Filter‑Based Video Recommendation Deduplication Service for Short Video Platforms

The paper proposes a Bloom‑filter‑based deduplication service for short‑video recommendation that moves three‑month playback histories to disk‑backed Bloom filters while keeping the latest 100 served IDs in Redis, employing write batching, sharding, expiration policies, and an incremental migration strategy to replace memory‑intensive Redis ZSets and dramatically reduce storage costs.

Data Migrationdeduplicationdisk KV
0 likes · 21 min read
Design of a Bloom Filter‑Based Video Recommendation Deduplication Service for Short Video Platforms
Code Ape Tech Column
Code Ape Tech Column
Jan 21, 2022 · Backend Development

Message Deduplication and Exactly-Once Semantics in RocketMQ: Strategies and Implementation

This article explains the challenges of at‑least‑once delivery in distributed message middleware like RocketMQ, examines simple and concurrent deduplication techniques, and presents both transaction‑based and non‑transactional exactly‑once solutions using database tables or Redis, along with practical Java code examples.

Exactly-OnceRocketMQdeduplication
0 likes · 17 min read
Message Deduplication and Exactly-Once Semantics in RocketMQ: Strategies and Implementation
Liangxu Linux
Liangxu Linux
Dec 15, 2021 · Fundamentals

Cracking the 4‑Billion QQ Deduplication Challenge with 1 GB Memory

This article walks through four approaches—sorting, hashmap, file splitting, and a bitmap technique—to deduplicate 4 billion QQ numbers within a 1 GB memory limit, explains why the first three fail, and shows how a bitmap solves the problem efficiently.

Big DataBitmapMemory Optimization
0 likes · 8 min read
Cracking the 4‑Billion QQ Deduplication Challenge with 1 GB Memory
Python Crawling & Data Mining
Python Crawling & Data Mining
Dec 13, 2021 · Big Data

How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM

This article explains several algorithmic strategies—including sorting, hash maps, file splitting, and bitmap techniques—to remove duplicates from a file containing 4 billion QQ numbers while staying within a 1 GB memory limit, and it provides extension exercises for sorting, median, top‑K, and duplicate detection.

Big DataBitmapMemory Optimization
0 likes · 8 min read
How to De‑duplicate 4 Billion QQ Numbers with Only 1 GB RAM
NiuNiu MaTe
NiuNiu MaTe
Nov 26, 2021 · Big Data

How to Deduplicate 4 Billion QQ Numbers Using Only 1 GB of Memory

This article walks through four practical techniques—sorting, hashmap, file splitting, and bitmap—to remove duplicate QQ numbers from a 4‑billion‑record file within a 1 GB memory limit, and provides extended exercises for sorting, median, top‑K, and duplicate detection.

Big DataBitmapalgorithm
0 likes · 8 min read
How to Deduplicate 4 Billion QQ Numbers Using Only 1 GB of Memory
Java Interview Crash Guide
Java Interview Crash Guide
Oct 13, 2021 · Backend Development

How to Achieve Exactly-Once Message Processing with RocketMQ Deduplication

Message middleware guarantees at-least-once delivery, but duplicate deliveries can cause issues; this article explains RocketMQ’s three duplication scenarios, explores simple and advanced deduplication strategies—including database-transaction and non-transactional approaches using Redis—and provides practical code samples for implementing reliable exactly-once processing.

Distributed SystemsExactly-OnceMessage Queue
0 likes · 21 min read
How to Achieve Exactly-Once Message Processing with RocketMQ Deduplication
21CTO
21CTO
Jun 3, 2021 · Backend Development

How to Prevent Duplicate Requests on the Server Using Redis and Request Hashing

This article explains how to handle duplicate user requests—especially write operations—by using unique request IDs with Redis, computing MD5 hashes of sorted JSON parameters, and providing a Java helper class to reliably deduplicate requests on the server side.

JavaMD5deduplication
0 likes · 8 min read
How to Prevent Duplicate Requests on the Server Using Redis and Request Hashing
Java Interview Crash Guide
Java Interview Crash Guide
May 31, 2021 · Backend Development

How to Prevent Duplicate Requests with Redis: A Complete Backend Deduplication Guide

This article explains why duplicate requests—especially write operations—can cause serious issues, outlines common duplication scenarios, and provides a comprehensive server‑side solution using unique request IDs, parameter hashing with MD5, and Redis SETNX with expiration to reliably detect and block repeats.

MD5deduplicationduplicate request
0 likes · 10 min read
How to Prevent Duplicate Requests with Redis: A Complete Backend Deduplication Guide
DataFunTalk
DataFunTalk
Apr 2, 2021 · Artificial Intelligence

Engineering Practices of the K‑Song Recommendation System at Tencent Music

This article presents a comprehensive technical overview of the K‑Song recommendation platform, covering its backend architecture, the evolution of recall strategies, feature management and ranking pipelines, large‑scale deduplication techniques, and the debugging and monitoring infrastructure that support high‑performance personalized music recommendations.

DebuggingK‑SongTencent Music
0 likes · 23 min read
Engineering Practices of the K‑Song Recommendation System at Tencent Music
21CTO
21CTO
Feb 14, 2021 · Backend Development

How to Gracefully Prevent Duplicate Requests with Redis and Java

This article explains why duplicate requests—especially those that modify data—can be dangerous, outlines common causes, and provides a comprehensive server‑side solution using unique request IDs, parameter hashing, and a Java utility class with Redis to reliably deduplicate incoming calls.

IdempotencyJavaRequest Handling
0 likes · 8 min read
How to Gracefully Prevent Duplicate Requests with Redis and Java
Architect
Architect
Feb 14, 2021 · Backend Development

Message Idempotency and Exactly‑Once Processing in RocketMQ

This article explains why message middleware like RocketMQ guarantees at‑least‑once delivery, the resulting duplicate‑delivery problem, and presents both transaction‑based and non‑transactional idempotency solutions—including select‑for‑update, optimistic locking, and a Redis‑backed deduplication table—to achieve exactly‑once semantics in distributed systems.

Distributed SystemsExactly-OnceRocketMQ
0 likes · 16 min read
Message Idempotency and Exactly‑Once Processing in RocketMQ
Tencent Cloud Developer
Tencent Cloud Developer
Sep 9, 2020 · Big Data

Tencent Game Marketing Deduplication Service: Technical Evolution from TDW to ClickHouse

Tencent’s game marketing analysis system “EAS” evolved from inefficient TDW HiveSQL jobs and file‑heavy real‑time pipelines to a scalable ClickHouse‑based deduplication service that processes hundreds of thousands of daily activity counts in sub‑second time, offering fast, reliable, and maintainable participant deduplication for massive marketing campaigns.

ClickHouseLevelDBMPP
0 likes · 10 min read
Tencent Game Marketing Deduplication Service: Technical Evolution from TDW to ClickHouse
IT Architects Alliance
IT Architects Alliance
Jul 27, 2020 · Operations

Why Tape Backup Is Failing and How Disk Backup Can Save Your Data

The article analyzes the growing limitations of tape backup, outlines a step‑by‑step migration to disk‑based backup using deduplication, compression and modern storage technologies, and explains how this transition improves reliability, cost efficiency and recovery speed for enterprises.

BackupData ProtectionOperations
0 likes · 11 min read
Why Tape Backup Is Failing and How Disk Backup Can Save Your Data
Qunar Tech Salon
Qunar Tech Salon
Aug 13, 2019 · Databases

Efficient Deduplication of Large MySQL Tables Using Indexes, Variables, and Window Functions

This article demonstrates how to efficiently remove duplicate rows from a million‑record MySQL table by comparing created_time and item_name, exploring various approaches such as correlated subqueries, joins, user‑defined variables, index optimization, window functions, and parallel execution with shell scripts and MySQL events to achieve significant performance gains.

Parallel ExecutionSQL PerformanceWindow Functions
0 likes · 21 min read
Efficient Deduplication of Large MySQL Tables Using Indexes, Variables, and Window Functions
Seewo Tech Circle
Seewo Tech Circle
Aug 9, 2019 · Backend Development

Ensuring Reliable, Ordered, and Duplicate‑Free Messaging in IM Systems

This article explains the stringent reliability requirements of instant messaging—ordered delivery, low latency, no loss, and deduplication—analyzes causes of disorder such as multi‑process and multi‑thread architectures, and presents practical solutions including hash‑based routing, sequential IDs, push‑pull mechanisms, ACK optimization, and distributed ID generation.

Instant MessagingMessage Reliabilitybackend design
0 likes · 9 min read
Ensuring Reliable, Ordered, and Duplicate‑Free Messaging in IM Systems
DataFunTalk
DataFunTalk
May 17, 2019 · Big Data

Kuaishou Druid Platform Overview and Precise Deduplication Design

This article presents Kuaishou’s adoption of Apache Druid for massive real‑time analytics, explains why precise deduplication is required, details the platform’s architecture, the hashset and dictionary‑plus‑Bitmap deduplication designs, concurrency handling, performance optimizations, and outlines the future roadmap, providing practical insights for big‑data engineers.

Data PlatformDruidPerformance Optimization
0 likes · 18 min read
Kuaishou Druid Platform Overview and Precise Deduplication Design
Python Crawling & Data Mining
Python Crawling & Data Mining
Nov 30, 2018 · Backend Development

How to Eliminate Duplicate URLs in Large-Scale Python Crawlers

This article explains five practical techniques—list storage, in‑memory set, MD5 hashing, bitmap compression, and Bloom filter—to efficiently deduplicate URLs during large‑scale Python web crawling, highlighting their trade‑offs in speed, memory usage, and collision risk.

Data StructuresPythonbloom-filter
0 likes · 8 min read
How to Eliminate Duplicate URLs in Large-Scale Python Crawlers
360 Quality & Efficiency
360 Quality & Efficiency
Jul 6, 2018 · Backend Development

Understanding Idempotency and How to Ensure It in Backend Systems

The article explains the mathematical definition of idempotency, its importance in preventing duplicate operations such as repeated payments or order creation, and presents practical strategies—including unique business IDs, optimistic locking, deduplication tables, distributed locks, token mechanisms, and payment buffering—to achieve reliable idempotent behavior in backend services.

BackendIdempotencyToken
0 likes · 6 min read
Understanding Idempotency and How to Ensure It in Backend Systems
Architects' Tech Alliance
Architects' Tech Alliance
Jun 29, 2017 · Operations

Comprehensive Guide to Data Backup Solutions and Architectures

This article compiles a series of detailed data backup resources, covering introductory concepts, backup software architectures, distributed indexing, key features like snapshots and deduplication, virtualization support, storage sizing, cloud optimization, and agent‑less solutions to help practitioners design reliable backup strategies.

Cloud BackupData ProtectionVirtualization
0 likes · 9 min read
Comprehensive Guide to Data Backup Solutions and Architectures
Architects' Tech Alliance
Architects' Tech Alliance
Jun 18, 2017 · Fundamentals

Differences and Implementation of Data Deduplication and Compression in Primary Storage and Flash Systems

This article explains the technical distinctions between data deduplication and compression, compares their use in backup versus primary storage environments, and details how major vendors implement these technologies in SSD and flash arrays, highlighting performance, architectural, and operational considerations.

FLASHPrimary Storagecompression
0 likes · 16 min read
Differences and Implementation of Data Deduplication and Compression in Primary Storage and Flash Systems
Architects' Tech Alliance
Architects' Tech Alliance
Apr 10, 2017 · Information Security

Overview of Backup Technologies and Major Enterprise Backup Software

This article provides a comprehensive overview of backup concepts, various backup architectures such as Host, LAN, LAN‑free, Server‑free and Server‑less, evaluates leading enterprise backup solutions, and analyzes key features like deduplication, NDMP support, OS compatibility and maintainability.

BackupData ProtectionEnterprise Software
0 likes · 15 min read
Overview of Backup Technologies and Major Enterprise Backup Software
Architects' Tech Alliance
Architects' Tech Alliance
Aug 28, 2016 · Operations

Calculating Backup Storage Capacity and Performance Requirements

This article explains how to calculate the required capacity and performance for backup storage media, covering assumptions about data volume, retention policies, full and incremental backups, deduplication ratios, RAID configurations, and provides formulas to size storage and IOPS for reliable disaster recovery.

BackupIOPSRAID
0 likes · 8 min read
Calculating Backup Storage Capacity and Performance Requirements
Architects' Tech Alliance
Architects' Tech Alliance
Aug 4, 2016 · Information Security

Reliability and High Availability of Backup Software Systems

This article examines how backup software ensures enterprise data reliability through media redundancy, server failover, load balancing, and both cold and high‑availability solutions for the management server, highlighting technologies such as GridStor, dual‑array clustering, and deduplication.

BackupData ProtectionReliability
0 likes · 11 min read
Reliability and High Availability of Backup Software Systems
Architects' Tech Alliance
Architects' Tech Alliance
Jul 23, 2016 · Fundamentals

Understanding Data Deduplication and Compression in Backup Software

This article explains the core features of backup software, focusing on data deduplication and compression techniques, including source‑side, target‑side, and media‑side deduplication, parallel deduplication architectures, replication methods, and hardware snapshot integration, illustrated with SimPana and AnyBackup examples.

Backupcompressiondeduplication
0 likes · 14 min read
Understanding Data Deduplication and Compression in Backup Software
21CTO
21CTO
Jun 9, 2016 · Backend Development

Mastering Web Crawlers: From a 3‑Line Script to Scalable Distributed Scrapers

This article explains what a web crawler is, shows a minimal three‑line Python example, expands it into a functional crawler, identifies common shortcomings, and presents practical solutions such as parallelism, priority queues, DNS caching, Bloom‑filter deduplication, storage choices, and inter‑process communication for building robust distributed scrapers.

ParallelismWeb Crawlingdeduplication
0 likes · 9 min read
Mastering Web Crawlers: From a 3‑Line Script to Scalable Distributed Scrapers
Architect
Architect
Apr 23, 2016 · Artificial Intelligence

Architecture and Techniques of an E‑commerce Search Engine

The article explains the overall architecture of an e‑commerce search engine, covering indexing, static scoring, retrieval, title and store deduplication, query analysis and rewriting, and related big‑data and AI techniques used to improve relevance and diversity of search results.

Query Rewritingdeduplicatione‑commerce
0 likes · 14 min read
Architecture and Techniques of an E‑commerce Search Engine
Efficient Ops
Efficient Ops
Mar 15, 2016 · Operations

How to Use Redis for Efficient Deduplication in Operations Data Analysis

This article explains practical methods for deduplicating and counting data in operational analytics using Redis, covering SET, ZSET, BITSET, HyperLogLog, and Bloom filter structures, their advantages, limitations, and suitable scenarios for real‑time and large‑scale metric calculations.

HyperLogLogdeduplicationredis
0 likes · 10 min read
How to Use Redis for Efficient Deduplication in Operations Data Analysis
Architects' Tech Alliance
Architects' Tech Alliance
Feb 12, 2016 · Industry Insights

Unlocking Massive Data Deduplication: PBBA Appliances vs Backup Software

Backup environments generate abundant duplicate data, making deduplication essential; this article examines how purpose‑built backup appliances (PBBA) and leading backup software implement variable‑length, global deduplication, compare scale‑out versus scale‑up architectures, and discuss performance trade‑offs and CPU bottlenecks.

BackupPBBAdeduplication
0 likes · 7 min read
Unlocking Massive Data Deduplication: PBBA Appliances vs Backup Software
Architects' Tech Alliance
Architects' Tech Alliance
Jan 28, 2016 · Fundamentals

Deduplication and Compression Techniques in Primary Storage: Differences from Backup Scenarios

This article examines how deduplication and compression technologies, widely used in backup environments, are adapted for primary storage systems—particularly HDD arrays—by analyzing differences in I/O size, patterns, performance requirements, resource allocation, and implementation approaches of major vendors such as NetApp and EMC.

BackupEMCNetApp
0 likes · 8 min read
Deduplication and Compression Techniques in Primary Storage: Differences from Backup Scenarios
Architects' Tech Alliance
Architects' Tech Alliance
Jan 27, 2016 · Fundamentals

Deduplication and Compression Techniques in All‑Flash Arrays: Implementation Details and Vendor Comparisons

All‑flash storage arrays increasingly rely on deduplication combined with compression to extend SSD lifespan, and this article explains the underlying block‑level workflow, hash‑based fingerprinting, and key implementation differences among vendors such as EMC Xtremio, Pure Storage, and HP 3PAR.

HashingPure StorageSSD endurance
0 likes · 6 min read
Deduplication and Compression Techniques in All‑Flash Arrays: Implementation Details and Vendor Comparisons
Architects' Tech Alliance
Architects' Tech Alliance
Jan 6, 2016 · Fundamentals

Key Architectural Features and Technical Considerations of Flash Storage Systems

The article explains flash storage's low latency, high IOPS, and inline data protection features such as deduplication, compression, thin provisioning, while detailing scale‑out capabilities, symmetric A/A controller design, metadata management, global FTL functions, wear‑leveling, and power‑loss protection mechanisms.

SSDdeduplicationmetadata management
0 likes · 8 min read
Key Architectural Features and Technical Considerations of Flash Storage Systems
Architects' Tech Alliance
Architects' Tech Alliance
Sep 10, 2015 · Databases

How DD Boost Supercharges Oracle RMAN Backups and Deduplication

DD Boost integrates tightly with Oracle RMAN to provide a flexible, policy‑driven backup solution that allows DBAs to manage local and disaster‑recovery sites independently, simplifies deployment via a simple plugin, and dramatically improves performance by sending only unique data blocks to Data Domain for deduplication.

BackupDD BoostData Domain
0 likes · 3 min read
How DD Boost Supercharges Oracle RMAN Backups and Deduplication