Tag

BigData

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Dec 28, 2024 · Big Data

Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI

This article examines the current challenges of enterprise data analysis platforms, outlines three traditional analysis modes, and presents a next‑generation solution that combines Headless BI’s semantic modeling with Chat BI’s large‑language‑model interaction to deliver a more efficient, secure, and user‑friendly analytics experience.

BigDataChatBIDataGovernance
0 likes · 15 min read
Next‑Generation Data Analysis Platform: Integrating Chat BI and Headless BI
Selected Java Interview Questions
Selected Java Interview Questions
Sep 28, 2024 · Big Data

Using Bitmap and Bloom Filter for Large-Scale Data Deduplication in Java

The article explains how to store and deduplicate billions of identifiers efficiently by using a bitmap backed by Redis and extending it with a Bloom filter implementation in Java, highlighting memory calculations, practical commands, and code examples.

BigDataBitmapBloomFilter
0 likes · 5 min read
Using Bitmap and Bloom Filter for Large-Scale Data Deduplication in Java
360 Smart Cloud
360 Smart Cloud
May 28, 2024 · Big Data

HDFS Upgrade from 2.6.0‑cdh to 3.1.2 with DataNode Federation and Mixed Deployment

This article details the background, planning, step‑by‑step procedures, encountered issues, and rollback strategies for upgrading a Hadoop HDFS cluster from version 2.6.0‑cdh to 3.1.2, including mixed‑deployment of DataNodes across different federations and necessary configuration changes.

BigDataClusterDataNode
0 likes · 16 min read
HDFS Upgrade from 2.6.0‑cdh to 3.1.2 with DataNode Federation and Mixed Deployment
JD Tech
JD Tech
Mar 14, 2024 · Databases

JD ElasticSearch Supports ZSTD Compression: Implementation, Performance Evaluation, and Usage Guide

This article explains how JD ElasticSearch has integrated the high‑performance ZSTD compression algorithm, details the motivations behind its adoption, presents benchmark results comparing it with LZ4 and best_compression, and provides step‑by‑step instructions and code snippets for configuring and using the new jd_zstd codec in Elasticsearch.

BigDataJavaLucene
0 likes · 14 min read
JD ElasticSearch Supports ZSTD Compression: Implementation, Performance Evaluation, and Usage Guide
DataFunSummit
DataFunSummit
Nov 15, 2023 · Big Data

Alibaba Cloud DataWorks Intelligent Data Modeling: Practices and Insights

This article introduces Alibaba Cloud DataWorks' intelligent data modeling tool, outlines the data demand flow, shares best practices and practical demonstrations of data warehouse modeling, discusses model application and data asset management, and answers common questions while highlighting its commercial availability.

AlibabaCloudBigDataDataGovernance
0 likes · 12 min read
Alibaba Cloud DataWorks Intelligent Data Modeling: Practices and Insights
DataFunTalk
DataFunTalk
Oct 27, 2023 · Big Data

PrestoDB vs Trino: Testing, Selection, Alluxio Acceleration, and Deployment Practices at Zhihu

This article details Zhihu's evaluation of PrestoDB and Trino, the integration of Alluxio for query acceleration, the architectural choices and deployment modes, extensive TPC‑DS and production performance tests, encountered challenges, and future optimization directions for their OLAP platform.

AlluxioBigDataCaching
0 likes · 28 min read
PrestoDB vs Trino: Testing, Selection, Alluxio Acceleration, and Deployment Practices at Zhihu
DataFunSummit
DataFunSummit
Mar 6, 2023 · Big Data

Building a Unified Scheduling Center with Apache DolphinScheduler: Lenovo’s Practice

This article details Lenovo’s implementation of a unified scheduling center using Apache DolphinScheduler, covering background requirements, reasons for choosing the platform, architectural evolution, feature enhancements, and practical deployments such as HTTP task parameter passing, Java task plugins, global parameters, and future roadmap.

BigDataDolphinSchedulerLenovo
0 likes · 19 min read
Building a Unified Scheduling Center with Apache DolphinScheduler: Lenovo’s Practice
Tencent Cloud Developer
Tencent Cloud Developer
Mar 1, 2023 · Big Data

We Analysis User Profiling System: Architecture and Technical Implementation

We Analysis, the official data‑analysis platform for WeChat mini‑program providers, delivers a zero‑learning‑curve user‑profiling system that combines basic tag analysis and flexible, rule‑based segmentation, using an ETL pipeline to store pre‑computed data in TDSQL and online bitmap‑optimized queries in ClickHouse with RoaringBitmap, ensuring low‑latency, stable, and comprehensive analytics.

BigDataClickHouseDataPipeline
0 likes · 20 min read
We Analysis User Profiling System: Architecture and Technical Implementation
Big Data Technology Architecture
Big Data Technology Architecture
Feb 15, 2023 · Databases

ClickHouse Usage Guide: Table Engines, Best Practices, and Cluster Architecture

This comprehensive guide introduces ClickHouse as a high‑performance columnar DBMS, outlines its main application scenarios, details the various table engines and their creation syntax, and provides practical development, deployment, and operational recommendations for building reliable ClickHouse clusters.

BigDataClickHouseClusterArchitecture
0 likes · 22 min read
ClickHouse Usage Guide: Table Engines, Best Practices, and Cluster Architecture
Java Architect Essentials
Java Architect Essentials
Jan 31, 2023 · Big Data

Optimizing Large-Scale Data Retrieval: ClickHouse Pagination, Elasticsearch Scroll Scan, ES+HBase, and RediSearch + RedisJSON Solutions

This article examines a business requirement to filter and rank up to 100,000 records from a pool of tens of millions, presenting and evaluating four technical solutions—multithreaded ClickHouse pagination, Elasticsearch scroll‑scan deep paging, an ES‑HBase combined query, and a RediSearch + RedisJSON approach—along with performance data and code examples.

BigDataClickHouseHBase
0 likes · 12 min read
Optimizing Large-Scale Data Retrieval: ClickHouse Pagination, Elasticsearch Scroll Scan, ES+HBase, and RediSearch + RedisJSON Solutions
Sohu Tech Products
Sohu Tech Products
Jan 18, 2023 · Big Data

Root Cause Analysis of Flink TaskManager Failover Causing Data Reprocessing and Business Impact

An incident report details how a scheduled machine reboot on Alibaba Cloud triggered a Flink TaskManager failover, leading to excessive data replay, increased ES pressure, and significant business latency, and explains the root cause involving disabled checkpoints and timestamp‑based offset consumption.

BigDataCheckpointFailover
0 likes · 10 min read
Root Cause Analysis of Flink TaskManager Failover Causing Data Reprocessing and Business Impact
Top Architect
Top Architect
Jan 7, 2023 · Big Data

Real‑time Data Processing with ElasticSearch, Kibana and Logstash: Installation, CRUD, Bulk Import, and Data Transformation

This tutorial walks through building a real‑time data processing pipeline using ElasticSearch, Kibana and Logstash, covering core concepts such as data volume, velocity, variety and accuracy, detailed installation steps, CRUD operations, bulk data import, Java‑based data conversion, and Logstash pipeline configuration with filters and date parsing.

BigDataBulkImportDataPipeline
0 likes · 31 min read
Real‑time Data Processing with ElasticSearch, Kibana and Logstash: Installation, CRUD, Bulk Import, and Data Transformation
Architect
Architect
Dec 19, 2022 · Databases

Understanding Elasticsearch DSL Query Syntax (7.x)

This article provides a comprehensive guide to Elasticsearch 7.x DSL query syntax, explaining core keywords, field mappings, various query types such as match, term, range, fuzzy, and bool, and includes practical code examples for building effective search queries.

BigDataDSLElasticsearch7
0 likes · 8 min read
Understanding Elasticsearch DSL Query Syntax (7.x)
政采云技术
政采云技术
Jul 12, 2022 · Big Data

Understanding Spark SQL Physical Execution Plans and Optimization Techniques

This article explains Spark SQL's physical execution plan, detailing each operator, how to interpret the plan, and practical optimization tips for data warehouse developers to improve SQL performance and resource utilization.

BigDataDataWarehouseExecutionPlan
0 likes · 10 min read
Understanding Spark SQL Physical Execution Plans and Optimization Techniques
AntTech
AntTech
Jun 28, 2022 · Operations

AntMonitor: Evolution, Features, and Core Technologies of Ant Group’s Observability Platform

The article details Ant Group’s AntMonitor observability platform, covering its development timeline, holographic monitoring capabilities, integrated performance analysis, efficient data integration, built‑in AI‑driven analytics, Monitoring‑as‑a‑Service, and the underlying high‑performance time‑series database and cloud‑native architecture that support massive real‑time data processing.

AIOpsBigDataCloudNative
0 likes · 17 min read
AntMonitor: Evolution, Features, and Core Technologies of Ant Group’s Observability Platform
Sohu Tech Products
Sohu Tech Products
Mar 23, 2022 · Big Data

Microservice Tracing with Zipkin and StarRocks: Architecture and Practice

This article describes how Sohu Intelligent Media built a microservice tracing system using Zipkin for data collection and StarRocks for storage and analysis, covering architecture, data model, ingestion pipeline, SQL analytics, performance monitoring, and future improvements.

BigDataMicroserviceStarRocks
0 likes · 27 min read
Microservice Tracing with Zipkin and StarRocks: Architecture and Practice
Architecture Digest
Architecture Digest
Oct 16, 2021 · Backend Development

Reflections on Technology Choices: Efficiency, Environment, and Team in Backend and Big Data Development

The author shares a personal journey through Java backend development, big‑data frameworks, database evolution, and team decision‑making, analyzing efficiency, environmental influences, and the impact of community and leadership on technology selection, while emphasizing practical trade‑offs over theoretical performance gains.

BigDataJavaTeamManagement
0 likes · 31 min read
Reflections on Technology Choices: Efficiency, Environment, and Team in Backend and Big Data Development
DataFunTalk
DataFunTalk
Aug 28, 2021 · Databases

ClickHouse Projection: Concepts, Use Cases, Implementation and Production Benefits

This article presents an in‑depth overview of ClickHouse's Projection feature, covering its background, definition, storage and query mechanisms, practical use‑case demonstrations, performance comparisons with competing OLAP systems, and real‑world production results that highlight its advantages and limitations.

BigDataClickHouseDataWarehouse
0 likes · 20 min read
ClickHouse Projection: Concepts, Use Cases, Implementation and Production Benefits
Big Data Technology Architecture
Big Data Technology Architecture
May 6, 2021 · Databases

Elasticsearch Pagination: From+size, search_after, and Scroll – Differences, Advantages, and Use Cases

This article explains Elasticsearch’s three pagination methods—From + size, search_after, and Scroll—detailing their definitions, code examples, advantages, disadvantages, and suitable scenarios, while also discussing max_result_window limits, PIT views, and best practices for handling large result sets.

BigDatabackendelasticsearch
0 likes · 13 min read
Elasticsearch Pagination: From+size, search_after, and Scroll – Differences, Advantages, and Use Cases
Big Data Technology Architecture
Big Data Technology Architecture
Mar 10, 2021 · Big Data

Implementing a Spark DataSource for REST JSON Interfaces

This article explains how to create a custom Spark DataSource that reads JSON data from a standard REST API, covering the design of DefaultSource, schema inference, data fetching, and integration with Spark SQL for seamless downstream processing.

BigDataDataSourceJSON
0 likes · 12 min read
Implementing a Spark DataSource for REST JSON Interfaces