Big Data 11 min read

Real-Time UV and PV Analytics with Flink SQL on Tencent Cloud Oceanus

This guide shows how to build a real‑time UV and PV analytics pipeline on Tencent Cloud Oceanus by connecting a self‑hosted Kafka cluster to Flink SQL, using Redis for deduplicated visitor counts, page view logs, and conversion‑rate calculations via hop windows.

Tencent Cloud Developer

Oct 21, 2021

Real-Time UV and PV Analytics with Flink SQL on Tencent Cloud Oceanus

This article explains how to implement real-time UV (Unique Visitor) and PV (Page View) metrics statistics using Apache Flink on Tencent Cloud Oceanus platform, combined with self-built Kafka cluster and Redis database.

Solution Overview:

The solution combines local self-built Kafka cluster, Tencent Cloud Oceanus (Flink), and Cloud Redis to perform real-time visual analysis of UV, PV, and conversion rate metrics for blogs and e-commerce websites.

Key Concepts:

UV (Unique Visitor): Number of unique visitors. If a user visits the same page 5 times, UV only increases by 1 as it counts deduplicated users.

PV (Page View): Number of page views. If a user visits the same page 5 times, PV increases by 5.

Conversion Rate: Transactions / Page Views

Architecture Components:

Self-built Kafka cluster in local IDC

Private Network (VPC)

Direct Connect/Cloud Connect/VPN/Peer Connection

Oceanus (Flink)

Cloud Redis

Implementation Steps:

1. Create VPC network

2. Create Oceanus cluster

3. Create Redis cluster

4. Configure self-built Kafka cluster - modify advertised.listeners to use IP instead of hostname

5. Establish network connectivity between IDC and Tencent Cloud via VPN

Data Format:

Kafka topic stores data in JSON format:

{"record_type":0, "user_id": 6, "client_ip": "100.0.0.6", "product_id": 101, "create_time": "2021-09-06 16:00:00"}

Where record_type 0 = browse record, record_type 1 = purchase record.

Flink SQL Implementation:

Source table definition:

CREATE TABLE `input_web_record` (
  `record_type` INT,
  `user_id` INT,
  `client_ip` VARCHAR,
  `product_id` INT,
  `create_time` TIMESTAMP,
  `times` AS create_time,
  WATERMARK FOR times AS times - INTERVAL '10' MINUTE
) WITH (
    'connector' = 'kafka',
    'topic' = 'uvpv-demo',
    'scan.startup.mode' = 'earliest-offset',
    'properties.bootstrap.servers' = '10.1.0.10:9092',
    'properties.group.id' = 'WebRecordGroup',
    'format' = 'json',
    'json.ignore-parse-errors' = 'true',
    'json.fail-on-missing-field' = 'false'
);

Sink tables for UV (using Redis SET), PV (using Redis LIST), and conversion rate (using Redis STRING).

Business logic uses HOP window for 10-minute aggregation intervals.

Result Storage:

userids: Stores UV using Redis SET type for deduplication

pagevisits: Stores PV using Redis LIST type

conversion_rate: Stores conversion rate (purchases/page views)

The article notes that for large-scale UV deduplication, Redis HyperLogLog can be used for minimal memory footprint.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink real-time analytics Redis Kafka Tencent Cloud Oceanus UV PV Statistics

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.