Big Data 11 min read

Real-Time UV and PV Analytics with Flink SQL on Tencent Cloud Oceanus

This guide shows how to build a real‑time UV and PV analytics pipeline on Tencent Cloud Oceanus by connecting a self‑hosted Kafka cluster to Flink SQL, using Redis for deduplicated visitor counts, page view logs, and conversion‑rate calculations via hop windows.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
Real-Time UV and PV Analytics with Flink SQL on Tencent Cloud Oceanus

This article explains how to implement real-time UV (Unique Visitor) and PV (Page View) metrics statistics using Apache Flink on Tencent Cloud Oceanus platform, combined with self-built Kafka cluster and Redis database.

Solution Overview:

The solution combines local self-built Kafka cluster, Tencent Cloud Oceanus (Flink), and Cloud Redis to perform real-time visual analysis of UV, PV, and conversion rate metrics for blogs and e-commerce websites.

Key Concepts:

UV (Unique Visitor): Number of unique visitors. If a user visits the same page 5 times, UV only increases by 1 as it counts deduplicated users.

PV (Page View): Number of page views. If a user visits the same page 5 times, PV increases by 5.

Conversion Rate: Transactions / Page Views

Architecture Components:

Self-built Kafka cluster in local IDC

Private Network (VPC)

Direct Connect/Cloud Connect/VPN/Peer Connection

Oceanus (Flink)

Cloud Redis

Implementation Steps:

1. Create VPC network

2. Create Oceanus cluster

3. Create Redis cluster

4. Configure self-built Kafka cluster - modify advertised.listeners to use IP instead of hostname

5. Establish network connectivity between IDC and Tencent Cloud via VPN

Data Format:

Kafka topic stores data in JSON format:

{"record_type":0, "user_id": 6, "client_ip": "100.0.0.6", "product_id": 101, "create_time": "2021-09-06 16:00:00"}

Where record_type 0 = browse record, record_type 1 = purchase record.

Flink SQL Implementation:

Source table definition:

CREATE TABLE `input_web_record` (
  `record_type` INT,
  `user_id` INT,
  `client_ip` VARCHAR,
  `product_id` INT,
  `create_time` TIMESTAMP,
  `times` AS create_time,
  WATERMARK FOR times AS times - INTERVAL '10' MINUTE
) WITH (
    'connector' = 'kafka',
    'topic' = 'uvpv-demo',
    'scan.startup.mode' = 'earliest-offset',
    'properties.bootstrap.servers' = '10.1.0.10:9092',
    'properties.group.id' = 'WebRecordGroup',
    'format' = 'json',
    'json.ignore-parse-errors' = 'true',
    'json.fail-on-missing-field' = 'false'
);

Sink tables for UV (using Redis SET), PV (using Redis LIST), and conversion rate (using Redis STRING).

Business logic uses HOP window for 10-minute aggregation intervals.

Result Storage:

userids: Stores UV using Redis SET type for deduplication

pagevisits: Stores PV using Redis LIST type

conversion_rate: Stores conversion rate (purchases/page views)

The article notes that for large-scale UV deduplication, Redis HyperLogLog can be used for minimal memory footprint.

FlinkStream ProcessingReal-time AnalyticsRedisKafkaTencent CloudOceanusUV PV Statistics
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.