Tagged articles
138 articles
Page 1 of 2
ITPUB
ITPUB
Feb 13, 2026 · Big Data

Real‑Time Sync of New MySQL Tables to Doris Using Flink CDC

This article explains how to extend a Flink CDC job that already syncs an entire MySQL database to Doris so that newly created tables are automatically created in Doris in real time, using the CdcTools utility, side‑output streams, and asynchronous I/O.

CDCCdcToolsFlink
0 likes · 9 min read
Real‑Time Sync of New MySQL Tables to Doris Using Flink CDC
ITPUB
ITPUB
Jan 22, 2026 · Backend Development

Sync New MySQL Tables to Doris in Real‑Time with Flink CDC and CdcTools

This article explains how to use Flink CDC together with the CdcTools utility to automatically capture newly created MySQL tables and synchronize both their schema and data to a Doris database in real time, covering the required code, side‑output handling, async execution, and a special delete‑sign field.

Async IOCDCFlink
0 likes · 10 min read
Sync New MySQL Tables to Doris in Real‑Time with Flink CDC and CdcTools
Big Data Technology Tribe
Big Data Technology Tribe
Jan 20, 2026 · Big Data

Extending Spark SQL with LanceSparkSessionExtensions: A Complete Guide

This article explains how to inject the LanceSpark plugin into Spark, covering the core LanceSparkSessionExtensions class, various ways to register extensions, the custom parser and planner strategy implementations, and the underlying Spark mechanisms such as injectParser, injectPlannerStrategy, and PredicateHelper.

DataSourceV2LanceSparkPlannerStrategy
0 likes · 14 min read
Extending Spark SQL with LanceSparkSessionExtensions: A Complete Guide
Java Architecture Diary
Java Architecture Diary
Nov 5, 2025 · Fundamentals

Why Scala Has Been Ahead of Java for Over a Decade – A Feature‑by‑Feature Comparison

The article examines the recent push for modern features in Java, contrasts them with Scala implementations that have existed for years, and walks through functional programming, pattern matching, immutable collections, type inference, string interpolation, sealed classes, and concurrency with side‑by‑side code examples in both languages.

Immutable CollectionsScalaconcurrency
0 likes · 17 min read
Why Scala Has Been Ahead of Java for Over a Decade – A Feature‑by‑Feature Comparison
Big Data Technology Tribe
Big Data Technology Tribe
Aug 5, 2025 · Big Data

How Spark’s Catalyst Optimizer Transforms SQL Queries: Trees, Rules, and Code Generation

This article explains Spark SQL’s Catalyst optimizer, describing its extensible design, tree‑based representation, rule‑driven transformations, batch execution to a fixed point, and how Scala’s pattern matching and quasiquotes enable efficient analysis, logical optimization, physical planning, and code generation.

Big DataCatalyst OptimizerScala
0 likes · 18 min read
How Spark’s Catalyst Optimizer Transforms SQL Queries: Trees, Rules, and Code Generation
Bitu Technology
Bitu Technology
Mar 21, 2025 · Backend Development

Optimizing Redis Latency for an Online Feature Store: A Batch Query Case Study

This article describes how Tubi improved the latency of its Redis‑backed online feature store for machine‑learning inference by analyzing query patterns, measuring client‑side bottlenecks, and applying optimizations such as binary Avro encoding, MGET usage, virtual partitioning, and parallel deserialization to meet a sub‑10 ms SLA.

Feature StoreLatencyMLOps
0 likes · 9 min read
Optimizing Redis Latency for an Online Feature Store: A Batch Query Case Study
21CTO
21CTO
Feb 4, 2025 · Big Data

Why Python Beats Java and Scala for Modern Data Engineering

The article compares Java, Scala, SQL, and Python for data‑engineering tasks, arguing that Python’s versatility, rich ecosystem, and ease of use make it the preferred language for both small‑scale and massive Spark workloads despite its performance trade‑offs.

Big DataScalaSpark
0 likes · 7 min read
Why Python Beats Java and Scala for Modern Data Engineering
DataFunSummit
DataFunSummit
Nov 11, 2024 · Big Data

Understanding Spark SQL Parsing Layer and Its Optimizations

This talk, the third in a Spark series, introduces the Spark SQL parsing layer, explains its architecture and integration with ANTLR4, details core implementation classes, and presents a real‑world optimization case that reduces code complexity and improves maintainability.

Antlr4Big DataScala
0 likes · 15 min read
Understanding Spark SQL Parsing Layer and Its Optimizations
Java Architecture Stack
Java Architecture Stack
Oct 18, 2024 · Big Data

How to Fix Spark OOM Errors: Practical Memory & Performance Tuning

This guide analyzes common Spark Out‑Of‑Memory scenarios—such as massive data volumes, data skew, and improper resource allocation—and provides step‑by‑step configurations, memory‑management tweaks, partitioning strategies, and shuffle optimizations to prevent OOM failures in production.

Big DataMemory TuningOOM
0 likes · 8 min read
How to Fix Spark OOM Errors: Practical Memory & Performance Tuning
MaGe Linux Operations
MaGe Linux Operations
Feb 10, 2024 · Backend Development

Mastering the Sidecar Pattern: Log Collection, Request Forwarding, and Interception in Kubernetes

This article explains the sidecar concept, compares it with SDK approaches, and provides detailed Kubernetes examples—including a log‑collection sidecar, a request‑forwarding sidecar, and an HTTP‑intercepting sidecar—complete with YAML manifests and Rust and Scala code to demonstrate implementation and deployment.

KubernetesRustScala
0 likes · 9 min read
Mastering the Sidecar Pattern: Log Collection, Request Forwarding, and Interception in Kubernetes
Bitu Technology
Bitu Technology
Dec 8, 2023 · Backend Development

Why Every Java Developer Should Learn Scala – Key Advantages and Insights from the Scala Meetup

The article reviews a Scala meetup where experts compare Java and Scala, highlighting Scala's stronger expressiveness, type inference, pattern matching, safety, and concurrency features, and discusses real‑world adoption, developer experiences, and a recruitment opportunity for a Scala‑focused big‑data team.

Big DataScalaType Inference
0 likes · 13 min read
Why Every Java Developer Should Learn Scala – Key Advantages and Insights from the Scala Meetup
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Apr 28, 2023 · Artificial Intelligence

Exploring Alibaba’s Tongyi Qianwen AI Model, SWOT, Recipe Demo, and Code Samples for Spark Same‑Period Analysis and Java Bubble Sort

The article reviews Alibaba’s Tongyi Qianwen large‑language model, shares a cooking recipe generated by the AI, presents a SWOT analysis, and provides code examples—including a Spark Scala script for same‑period month‑over‑month calculations and a Java bubble‑sort implementation.

AISWOTScala
0 likes · 12 min read
Exploring Alibaba’s Tongyi Qianwen AI Model, SWOT, Recipe Demo, and Code Samples for Spark Same‑Period Analysis and Java Bubble Sort
Programmer DD
Programmer DD
Mar 31, 2023 · Fundamentals

IntelliJ IDEA 2023.1: Key UI, Performance, and Language Updates

IntelliJ IDEA 2023.1 introduces a revamped UI with new split options, compact mode, and redesigned widgets, boosts performance through faster Maven imports and smarter indexing, adds background commit checks, enhances Java, Scala, and web development support, and improves overall user experience with scaling and AI‑powered search.

IDEIntelliJ IDEAScala
0 likes · 7 min read
IntelliJ IDEA 2023.1: Key UI, Performance, and Language Updates
Bitu Technology
Bitu Technology
Jun 29, 2022 · Backend Development

Recap of Scala Meetup #7: Tubi Recommendation System Architecture, The Nature of Computation, and Reactive Streams in Large-Scale Scenarios

The seventh Scala Meetup gathered over 1400 online participants to share three technical talks covering Tubi's content recommendation system architecture, philosophical insights into the nature of computation, and practical experiences with reactive streams in large‑scale JVM environments, followed by a round‑table discussion and audience feedback.

Category TheoryReactive StreamsScala
0 likes · 15 min read
Recap of Scala Meetup #7: Tubi Recommendation System Architecture, The Nature of Computation, and Reactive Streams in Large-Scale Scenarios
DataFunTalk
DataFunTalk
Apr 30, 2022 · Artificial Intelligence

Insights into BIDMach: An Unusual Machine Learning Framework and Thoughts on Building Industrial‑Grade ML Systems

The article introduces BIDMach, a compact Scala‑based machine‑learning framework built with JNI‑driven CUDA/MKL, explains its three‑layer architecture, and discusses broader considerations for designing usable, high‑performance, and extensible industrial AI frameworks, emphasizing co‑design, algorithm‑framework co‑evolution, and ecosystem factors.

BIDMachFrameworkIndustrial AI
0 likes · 8 min read
Insights into BIDMach: An Unusual Machine Learning Framework and Thoughts on Building Industrial‑Grade ML Systems
JavaEdge
JavaEdge
Apr 17, 2022 · Big Data

Why Spark Overtook MapReduce: Core Advantages and RDD Programming Model

The article explains how Spark, developed by UC Berkeley's AMP Lab, quickly surpassed MapReduce by offering faster execution, a simpler Scala‑based programming model, lazy RDD transformations, a rich ecosystem including SQL, Streaming, MLlib and GraphX, and practical code examples such as a three‑line WordCount.

Big DataMapReduceRDD
0 likes · 7 min read
Why Spark Overtook MapReduce: Core Advantages and RDD Programming Model
Big Data Technology Architecture
Big Data Technology Architecture
Nov 2, 2021 · Big Data

Comprehensive Guide to FlinkSQL and Table API: Background, Dependencies, Planners, and Usage

This article provides a detailed introduction to FlinkSQL, covering its background, the Table API, required dependencies, differences between old and Blink planners, various API usage patterns, connector configurations for CSV, Kafka, Elasticsearch, MySQL, and how to convert between DataStream and Table in Flink's unified batch‑stream processing model.

ConnectorDataStreamFlinkSQL
0 likes · 23 min read
Comprehensive Guide to FlinkSQL and Table API: Background, Dependencies, Planners, and Usage
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 10, 2021 · Big Data

Understanding Flink Table API and SQL: Dependencies, Planners, and Practical Usage

This article provides a comprehensive guide to Apache Flink's Table API and SQL, covering required dependencies, the differences between old and Blink planners, program structure, table environment creation, catalog registration, query execution, conversion between DataStream and Table, update modes, and time attribute handling, with Scala code examples throughout.

FlinkScalaStreaming
0 likes · 26 min read
Understanding Flink Table API and SQL: Dependencies, Planners, and Practical Usage
Ops Development Stories
Ops Development Stories
Aug 28, 2021 · Operations

Inside Kafka's Topic Deletion: Code Walkthrough & Process Explained

This article explains the complete Kafka topic deletion workflow, from the client’s deleteTopics request through Zookeeper node creation, controller coordination, broker StopReplica handling, log renaming, delayed file removal, and final cleanup, while providing code excerpts and practical Q&A for common pitfalls.

BrokerKafkaScala
0 likes · 17 min read
Inside Kafka's Topic Deletion: Code Walkthrough & Process Explained
Big Data Technology Architecture
Big Data Technology Architecture
Jul 15, 2021 · Big Data

Resolving Spark Task Not Serializable Errors: Causes, Code Examples, and Best Practices

This article analyzes why Spark tasks fail with a "Task not serializable" exception when closures reference class members, demonstrates the issue with Scala code examples, and provides practical solutions such as using @transient annotations, moving functions to objects, and ensuring proper class serialization.

ScalaSparkTask Not Serializable
0 likes · 12 min read
Resolving Spark Task Not Serializable Errors: Causes, Code Examples, and Best Practices
Architect
Architect
Apr 3, 2021 · Big Data

Advanced Spark Performance Optimization: Data Skew and Shuffle Tuning

This article explains advanced Spark performance tuning techniques, focusing on diagnosing and resolving data skew and shuffle bottlenecks through stage analysis, key distribution inspection, and a variety of practical solutions such as Hive pre‑processing, key filtering, parallelism increase, two‑stage aggregation, map‑join, and combined strategies, while also covering ShuffleManager internals and related configuration parameters.

Big DataData SkewScala
0 likes · 47 min read
Advanced Spark Performance Optimization: Data Skew and Shuffle Tuning
DataFunTalk
DataFunTalk
Mar 18, 2021 · Fundamentals

Building Popper: Tubi’s Scalable Experimentation Platform

Tubi’s Popper platform combines a Scala‑based experiment engine, reproducible JSON‑stored configurations, a React UI, and data pipelines using Spark and Akka to enable fast, cross‑team A/B testing, automated analysis, health checks, and data‑driven decision making across mobile and OTT services.

A/B testingAkkaExperimentation platform
0 likes · 15 min read
Building Popper: Tubi’s Scalable Experimentation Platform
Bitu Technology
Bitu Technology
Mar 12, 2021 · Backend Development

Building Popper: Tubi’s Scalable Experiment Platform for Data‑Driven Decision Making

At Tubi, the Popper experiment engine—a Scala‑based, Akka‑powered backend service—combined with a self‑serve UI, automated analysis pipelines, and rigorous health checks, enables teams across ML, mobile, and OTT to run scalable A/B tests, rapidly iterate, and make data‑driven product decisions.

A/B testingAkkaExperiment Platform
0 likes · 14 min read
Building Popper: Tubi’s Scalable Experiment Platform for Data‑Driven Decision Making
Bitu Technology
Bitu Technology
Feb 2, 2021 · Backend Development

Recap of the Online Scala Meetup: Reactive Ad Platform, Functional Programming, Scala 3 Typeclasses, and Spring‑Akka Microservices

The 2021 Online Scala Meetup organized by Tubi featured four technical talks covering a fully reactive ad‑serving platform built with Scala and Akka‑Streams, a pragmatic take on functional programming, Scala 3 typeclass implementation, and a Spring‑Akka microservice integration, followed by summaries and recruitment information.

AkkaMicroservicesScala
0 likes · 4 min read
Recap of the Online Scala Meetup: Reactive Ad Platform, Functional Programming, Scala 3 Typeclasses, and Spring‑Akka Microservices
Programmer DD
Programmer DD
Dec 28, 2020 · Operations

How to Install and Use Cerebro for Easy Elasticsearch Cluster Management

This guide explains what Cerebro is, how to install it (including binary and Docker options), how to run it on Linux, macOS, and Windows, and how to use its UI to connect to an Elasticsearch node, view cluster overviews, manage shards, and execute DSL queries.

AngularJSCerebroCluster Management
0 likes · 5 min read
How to Install and Use Cerebro for Easy Elasticsearch Cluster Management
Programmer DD
Programmer DD
Dec 2, 2020 · Backend Development

How Kafka Uses a Timing Wheel for Efficient Timeout Handling

Kafka handles many requests that require asynchronous processing or waiting for conditions by attaching a timeout parameter; if the condition isn’t met within the timeout, Kafka returns a timeout response, and it implements this efficiently using a hierarchical Timing Wheel data structure that offers O(1) insertion and fast expiration checks.

BackendKafkaScala
0 likes · 12 min read
How Kafka Uses a Timing Wheel for Efficient Timeout Handling
Architecture Digest
Architecture Digest
Oct 22, 2020 · Backend Development

Kafka Timing Wheel: Design, Operation, and Code Walkthrough

The article explains how Kafka handles timeout‑based requests using a Timing Wheel data structure, detailing its design, parameters, operation principles, overflow handling, and providing Scala code examples that illustrate O(1) task insertion compared to traditional O(logN) delay queues.

Data StructuresKafkaScala
0 likes · 10 min read
Kafka Timing Wheel: Design, Operation, and Code Walkthrough
Bitu Technology
Bitu Technology
Sep 25, 2020 · Backend Development

Scala Meetup Recap: Akka State Management, Parsec Combinators, and ZIO Overview

The article summarizes a 2020 Scala online meetup, covering Akka's distributed state‑management for ad‑serving, Parsec parser combinators for arithmetic expression evaluation, AST manipulation with ANTLR, and an introduction to the functional ZIO library, while reflecting on personal takeaways.

AkkaMeetupParser Combinators
0 likes · 12 min read
Scala Meetup Recap: Akka State Management, Parsec Combinators, and ZIO Overview
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 23, 2020 · Big Data

Apache Hudi Overview, Core Concepts, and Quick‑Start Guide

This article introduces Apache Hudi, explaining its storage types, query views, timeline feature, typical use cases such as near‑real‑time ingestion and incremental pipelines, and provides a step‑by‑step Scala/Spark quick‑start guide with code examples for compiling, inserting, updating, querying, and syncing data to Hive.

Apache HudiBig DataData Lake
0 likes · 18 min read
Apache Hudi Overview, Core Concepts, and Quick‑Start Guide
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 4, 2020 · Big Data

Manual Kafka Offset Management in Spark Streaming using createDirectStream (Java & Scala)

This article explains how to use Spark Streaming's Direct Approach with Kafka, manually manage offsets, and provides complete Java and Scala implementations—including a JavaKafkaManager class, a demo application, and a Scala KafkaManager—illustrating the creation of DirectKafkaInputDStream, offset handling, and integration with Spark.

KafkaOffset ManagementScala
0 likes · 14 min read
Manual Kafka Offset Management in Spark Streaming using createDirectStream (Java & Scala)
Architect
Architect
Jun 10, 2020 · Big Data

Understanding Flink Time Notions: ProcessTime, EventTime, IngestionTime and Watermarks with Code Examples

This article explains the three time notions supported by Apache Flink—ProcessTime, EventTime, and IngestionTime—detailing their semantics, how Watermarks enable event‑time processing, and provides Scala code samples for configuring time characteristics, assigning timestamps, and generating Watermarks in a streaming job.

EventTimeFlinkScala
0 likes · 16 min read
Understanding Flink Time Notions: ProcessTime, EventTime, IngestionTime and Watermarks with Code Examples
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 9, 2020 · Big Data

Comprehensive Overview and Best Practices for Apache Spark Streaming

This article provides a detailed introduction to Spark Streaming, covering its architecture, DStream concepts, initialization, data sources, transformations, windowed aggregations, output operations, checkpointing, fault‑tolerance semantics, deployment, performance tuning, and monitoring for building reliable high‑throughput streaming applications.

Big DataDstreamScala
0 likes · 17 min read
Comprehensive Overview and Best Practices for Apache Spark Streaming
Top Architect
Top Architect
May 4, 2020 · Backend Development

Aloha: A Scala‑Based Distributed Task Scheduling Framework – Overview, Extensions, and Architecture

Aloha is a Scala‑implemented distributed task scheduling and management framework built on Spark that provides plug‑in extensions, high‑availability Master‑Worker architecture, custom event listeners, and a lightweight Scala‑based RPC layer for managing long‑running jobs such as Spark, Flink, and ETL tasks.

ALOHABackendDistributed Scheduling
0 likes · 19 min read
Aloha: A Scala‑Based Distributed Task Scheduling Framework – Overview, Extensions, and Architecture
Bitu Technology
Bitu Technology
Apr 27, 2020 · Fundamentals

Recap of the 2020 Online Scala Meetup: Design Patterns, ZIO STM, Scala‑Java Integration, and Shapeless

On April 18, the inaugural 2020 Online Scala Meetup organized by Tubi featured four speakers who explored Scala design patterns, demonstrated ZIO STM usage, shared best practices for integrating Scala into Java codebases, and introduced Shapeless’s type-level programming, offering valuable insights for developers.

Design PatternsScalaShapeless
0 likes · 6 min read
Recap of the 2020 Online Scala Meetup: Design Patterns, ZIO STM, Scala‑Java Integration, and Shapeless
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 30, 2020 · Big Data

Comprehensive Guide to Spark Performance Optimization (Development, Resource, Data Skew, and Shuffle Tuning)

This article provides an in‑depth, step‑by‑step guide to optimizing Spark jobs, covering development‑time best practices, resource‑parameter tuning, data‑skew detection and mitigation techniques, and shuffle‑stage performance tweaks, complete with Scala code examples and practical recommendations.

Big DataData SkewResource Tuning
0 likes · 67 min read
Comprehensive Guide to Spark Performance Optimization (Development, Resource, Data Skew, and Shuffle Tuning)
Bitu Technology
Bitu Technology
Nov 15, 2019 · Backend Development

Recap of the Scala Meetup: Cats Introduction and Akka Applications

The November 3 Scala Meetup featured senior expert Deng Caoyuan’s whiteboard brainstorming and Tencent engineer Qu Guodong’s Cats tutorial, followed by a deep dive into Akka with real‑world examples such as a financial computing platform, real‑time article recommendation, web crawling, streaming, and practical tips for Scala developers.

AkkaCatsMeetup
0 likes · 7 min read
Recap of the Scala Meetup: Cats Introduction and Akka Applications
Bitu Technology
Bitu Technology
Nov 13, 2019 · Backend Development

Rebuilding Tubi's Advertising System with Scala and Akka – Part 1: Request Parsing, Validation, and Filtering

This article explains why Tubi rewrote its legacy PHP ad platform, how it adopted Scala, Akka, and Reactive Streams to model the ad request lifecycle as a reactive stream, and details the first three processing steps—parsing, enrichment, and precise filtering—along with sample Scala filter code.

AdTechAkkaMicroservices
0 likes · 8 min read
Rebuilding Tubi's Advertising System with Scala and Akka – Part 1: Request Parsing, Validation, and Filtering
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 17, 2019 · Big Data

Delta Lake: Architecture, Features, and Hands‑On Tutorial

This article explains the origins and motivations of Delta Lake, details its ACID transaction support, schema enforcement, metadata handling, versioning, and unified batch‑and‑stream processing, and provides a step‑by‑step Maven and Spark code tutorial for creating, updating, and querying Delta tables.

ACIDApache SparkBig Data
0 likes · 10 min read
Delta Lake: Architecture, Features, and Hands‑On Tutorial
DataFunTalk
DataFunTalk
Sep 24, 2019 · Big Data

Collaborative Filtering: Fundamentals, Similarity Measures, and Distributed Implementation on Spark

This article introduces the basic concepts of collaborative filtering, explains user‑based and item‑based approaches, presents co‑occurrence, Euclidean, Pearson, and Cosine similarity formulas, and provides complete Scala implementations for these metrics and association‑rule mining on the Spark platform, along with practical scalability tips.

Scalabig-datacollaborative-filtering
0 likes · 17 min read
Collaborative Filtering: Fundamentals, Similarity Measures, and Distributed Implementation on Spark
Beike Product & Technology
Beike Product & Technology
Sep 20, 2019 · Big Data

Understanding DStream Construction and Execution in Spark Streaming

This article explains how Spark Streaming's DStream abstraction is built from InputDStream through successive transform operators, details the internal ForEachDStream implementation, describes the job generation and scheduling workflow, and outlines how Beike's real‑time platform leverages these mechanisms for large‑scale streaming tasks.

Big DataDstreamReal-time Processing
0 likes · 10 min read
Understanding DStream Construction and Execution in Spark Streaming
vivo Internet Technology
vivo Internet Technology
Sep 4, 2019 · Fundamentals

Exploring Functional Programming: Concepts and Practical Applications

The article surveys functional programming fundamentals—contrasting it with imperative and declarative styles—and illustrates key concepts such as closures, currying, promises, partial functions, map/reduce, and divmod through Java, JavaScript, and Python examples, before highlighting Scala’s hybrid approach and the advantages of FP for writing elegant, maintainable, concurrent code.

Programming ParadigmsScalaclosures
0 likes · 12 min read
Exploring Functional Programming: Concepts and Practical Applications
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 3, 2019 · Big Data

Understanding SparkEnv Initialization: Components and Their Setup

This article walks through the SparkEnv initialization process in Apache Spark, detailing how the driver and executor environments are created, the key components such as SecurityManager, RpcEnv, SerializerManager, BroadcastManager, MapOutputTracker, ShuffleManager, MemoryManager, BlockManager, MetricsSystem, and OutputCommitCoordinator are instantiated, and how the final SparkEnv instance is assembled and stored.

Big DataScalaSpark
0 likes · 13 min read
Understanding SparkEnv Initialization: Components and Their Setup
Youzan Coder
Youzan Coder
Jul 3, 2019 · Operations

Gatling‑Dubbo 2.0: High‑Performance Dubbo Load‑Testing Plugin

Gatling‑Dubbo 2.0 is a Gatling‑based load‑testing plugin that replaces generic Dubbo invocations with real API calls, offering richer scenario orchestration, traffic models, native throttling, lower resource use, and higher concurrency, while providing Action, Check, and DSL components illustrated through a complete mixed‑scenario simulation.

DSLGatlingScala
0 likes · 13 min read
Gatling‑Dubbo 2.0: High‑Performance Dubbo Load‑Testing Plugin
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 18, 2019 · Big Data

Understanding Watermarks, Event Time, and Processing Time in Apache Flink

This article explains the three time concepts in Flink—Process Time, Event Time, and Ingestion Time—illustrates their impact on windowed computations with examples, introduces watermarks and allowed lateness for handling out‑of‑order data, and provides complete Scala code for both processing‑time and event‑time streaming applications.

EventTimeFlinkScala
0 likes · 13 min read
Understanding Watermarks, Event Time, and Processing Time in Apache Flink
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 17, 2019 · Big Data

Understanding Spark SQL: Concepts, Queries, Data Sources, and Practical Examples

This article introduces Spark SQL fundamentals, including its architecture, DataFrame and Dataset abstractions, query methods, interoperability with RDD, user-defined functions, integration with Hive, data source handling, and provides step‑by‑step Scala code examples for loading data, performing aggregations, and solving common analytical tasks.

DataFramesScalaSparkSQL
0 likes · 15 min read
Understanding Spark SQL: Concepts, Queries, Data Sources, and Practical Examples
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 12, 2019 · Big Data

Comprehensive Guide to FlinkCEP: API Overview, Pattern Definitions, Quantifiers, Conditions, and Usage Examples

This article provides a detailed introduction to FlinkCEP, covering how to add the library, define simple and composite patterns, use quantifiers and conditions, handle skip strategies, time constraints, and select results, with complete Java and Scala code examples for complex event processing.

Big DataCEPFlink
0 likes · 27 min read
Comprehensive Guide to FlinkCEP: API Overview, Pattern Definitions, Quantifiers, Conditions, and Usage Examples
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 5, 2019 · Big Data

Real-Time Advertising Click Counting with Spark Structured Streaming and Redis Streams

This article presents a complete solution for real‑time advertising click counting using Spark Structured Streaming combined with Redis Streams, detailing the business scenario, data flow, input/output formats, and step‑by‑step implementation including data extraction, processing, storage, and query via Spark‑SQL.

Big DataRedis StreamScala
0 likes · 11 min read
Real-Time Advertising Click Counting with Spark Structured Streaming and Redis Streams
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 1, 2019 · Big Data

Comprehensive Overview of Hadoop: Core Modules, HDFS Architecture, MapReduce, YARN, and a Scala WordCount Example

This article provides a detailed introduction to Hadoop's ecosystem—including its core modules (Common, HDFS, YARN, MapReduce), the design of a high‑availability HDFS cluster, the principles of distributed file systems, and a complete Scala WordCount MapReduce program—offering a solid foundation for big‑data practitioners.

Big DataHDFSHadoop
0 likes · 15 min read
Comprehensive Overview of Hadoop: Core Modules, HDFS Architecture, MapReduce, YARN, and a Scala WordCount Example
Architecture Digest
Architecture Digest
Mar 28, 2019 · Backend Development

Aloha: A Scala‑Based Distributed Task Scheduling and Management Framework

Aloha is a Scala‑implemented distributed scheduling framework built on Spark that provides extensible plugins, high‑availability master/worker architecture, REST submission, custom application interfaces, event listeners, and a Scala‑based RPC system for managing long‑running tasks such as Spark, Flink, and ETL jobs.

BackendDistributed SchedulingRPC
0 likes · 17 min read
Aloha: A Scala‑Based Distributed Task Scheduling and Management Framework
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 21, 2019 · Big Data

Apache Flink Table API Tutorial and End‑to‑End Examples

This article provides a comprehensive tutorial on Apache Flink's Table API, explaining its concepts, core features, and a wide range of operators such as SELECT, WHERE, GROUP BY, UNION, JOIN, and various window functions, while offering complete Scala code examples, custom sources, sinks, and an end‑to‑end job that computes page‑view counts per region using event‑time tumbling windows.

Big DataFlinkScala
0 likes · 36 min read
Apache Flink Table API Tutorial and End‑to‑End Examples
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 15, 2019 · Fundamentals

Scala Series Article Collection – Curated List of Tutorials

This resource provides a curated collection of links to a series of Scala tutorial articles covering installation, basic syntax, data types, variables, access modifiers, operators, control structures, functions, collections, traits, pattern matching, and more, offering a comprehensive learning path for developers.

CollectionsLanguageScala
0 likes · 3 min read
Scala Series Article Collection – Curated List of Tutorials