Tag

data collection

0 views collected around this technical thread.

Python Programming Learning Circle
Python Programming Learning Circle
May 29, 2025 · Big Data

Common Python Web Scraping Techniques for E‑commerce Data Collection

This article introduces ten practical Python-based web scraping methods—including requests, Selenium, Scrapy, Crawley, PySpider, aiohttp, asks, vibora, Pyppeteer, and Fiddler‑plus‑Node reverse engineering—explaining their use cases, advantages, and code examples for efficiently gathering e‑commerce and app data.

PythonRequestsScrapy
0 likes · 8 min read
Common Python Web Scraping Techniques for E‑commerce Data Collection
DataFunSummit
DataFunSummit
Feb 25, 2025 · Artificial Intelligence

Collecting High-Quality LLM Training Data and Custom Model Training Guide

This article explains what constitutes high‑quality LLM training data, why large datasets are essential, outlines the step‑by‑step process for collecting, preprocessing, and fine‑tuning models, and highlights the best data sources—including web content, books, code repositories, and news—while noting available free datasets.

AILLMdata collection
0 likes · 9 min read
Collecting High-Quality LLM Training Data and Custom Model Training Guide
Python Programming Learning Circle
Python Programming Learning Circle
Dec 10, 2024 · Big Data

23 Python Web Scraping Projects with GitHub Links

This article compiles twenty‑three Python web‑scraping projects, each described with its purpose, key features, and a direct GitHub repository link, offering developers a ready‑made toolbox for data collection across platforms such as WeChat, DouBan, Zhihu, Bilibili, and more.

GitHubRequestsScrapy
0 likes · 9 min read
23 Python Web Scraping Projects with GitHub Links
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Nov 28, 2024 · Backend Development

How Arkit’s Go Plugin Architecture Boosts Data Collection and Monitoring

This article explains how Arkit, a Go‑based unified monitoring agent, collects and parses data from multiple sources, leverages Go plugins for flexible, high‑performance processing, provides custom plugin development guidelines, and discusses the performance benefits and limitations of the plugin system.

ArkitCustom PluginGo
0 likes · 14 min read
How Arkit’s Go Plugin Architecture Boosts Data Collection and Monitoring
High Availability Architecture
High Availability Architecture
Nov 4, 2024 · Operations

Ctrip's Weak Network Identification Model: Design, Implementation, and Practice

This article details Ctrip's approach to weak network detection, covering background, data collection, processing, dynamic weighting algorithms, result output, deployment effects, and future plans, and provides practical code examples and threshold settings for improving mobile network performance.

Dynamic WeightingWeak Network Detectiondata collection
0 likes · 26 min read
Ctrip's Weak Network Identification Model: Design, Implementation, and Practice
Zhuanzhuan Tech
Zhuanzhuan Tech
Aug 28, 2024 · Big Data

Quality Inspection Data Collection: Design, Architecture, and Applications

This article outlines the design, architecture, and practical applications of a quality inspection data collection system, covering data point structures, reporting mechanisms, compliance analysis, intelligent strategy iteration, and BI dashboards, illustrating how big‑data techniques enable digital transformation of inspection processes.

BIBig DataCompliance
0 likes · 10 min read
Quality Inspection Data Collection: Design, Architecture, and Applications
Wukong Talks Architecture
Wukong Talks Architecture
Aug 5, 2024 · Operations

Comprehensive Case Study of Large‑Scale Desktop IT Management and Automated Fault Detection at Ctrip

This article presents a detailed case study of Ctrip's large‑scale desktop IT management solution, describing the challenges of handling tens of thousands of office PCs, the full‑link architecture built with Rust, Tauri, SpringBoot and Django, automated health monitoring, fault detection, remediation workflows, security measures, performance optimizations, and the measurable operational improvements achieved.

AutomationDesktop ManagementIT Operations
0 likes · 16 min read
Comprehensive Case Study of Large‑Scale Desktop IT Management and Automated Fault Detection at Ctrip
Python Programming Learning Circle
Python Programming Learning Circle
Jun 5, 2024 · Backend Development

Various Python Methods for E‑commerce Data Collection and Web Scraping

This article introduces ten practical Python techniques—including requests, Selenium, Scrapy, Crawley, PySpider, aiohttp, asks, vibora, Pyppeteer, and Fiddler‑based reverse engineering—to efficiently collect e‑commerce and app data while addressing common challenges such as IP blocking, captchas, and authentication.

ScrapySeleniumaiohttp
0 likes · 8 min read
Various Python Methods for E‑commerce Data Collection and Web Scraping
Beijing SF i-TECH City Technology Team
Beijing SF i-TECH City Technology Team
May 30, 2024 · Mobile Development

Android Mobile Client Data Collection Implementation

This article details the necessity of establishing a no-buried-point scheme mindset, key advantages over traditional methods, mobile client no-buried-point collection implementation mechanisms, data processing, and the benefits of this approach.

Bytecode ManipulationData ProcessingMobile Development
0 likes · 12 min read
Android Mobile Client Data Collection Implementation
Model Perspective
Model Perspective
May 21, 2024 · Fundamentals

How to Turn Mathematical Modeling from Theory into Real‑World Solutions

This article outlines practical steps—understanding problem background, gathering quality data, selecting appropriate models, solving and analyzing them, and applying results—to ensure mathematical modeling moves beyond theory and effectively addresses real-world issues.

Case Studydata collectionmathematical modeling
0 likes · 9 min read
How to Turn Mathematical Modeling from Theory into Real‑World Solutions
DeWu Technology
DeWu Technology
Mar 6, 2024 · Frontend Development

Visual Event Tracking Solution: Architecture, Implementation, and Practices

The visual event tracking solution replaces costly manual code instrumentation with a Data‑Trackid and relative Xpath system, a VSCode plugin for automatic ID generation, and an SDK that captures clicks and exposures, dynamically loads third‑party analytics, and provides validation, monitoring, and future decoupling for scalable, real‑time product analytics.

SDKdata collectionevent analytics
0 likes · 9 min read
Visual Event Tracking Solution: Architecture, Implementation, and Practices
ByteFE
ByteFE
Jan 26, 2024 · Frontend Development

A Comprehensive Guide to Frontend Event Tracking (埋点)

This article explains what frontend event tracking (埋点) is, why it is essential for product analytics, when and how to implement it, the different tracking models and reporting methods, as well as practical tips, iteration processes, and common pitfalls for developers and product teams.

Webanalyticsdata collection
0 likes · 18 min read
A Comprehensive Guide to Frontend Event Tracking (埋点)
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Jan 25, 2024 · Frontend Development

Front-End Event Tracking (埋点) – Fundamentals, Types, and Best Practices

This article provides a comprehensive guide to front‑end event tracking, covering its definition, motivations, scenarios, various tracking types, data models, reporting mechanisms, implementation steps, security considerations, and practical tips for ensuring accurate and non‑blocking data collection in web applications.

analyticsdata collectionevent tracking
0 likes · 23 min read
Front-End Event Tracking (埋点) – Fundamentals, Types, and Best Practices
Python Programming Learning Circle
Python Programming Learning Circle
Dec 4, 2023 · Backend Development

Scraping Zhihu "Beauty" Topic Images with Python and Baidu AI Face Detection

This article explains how to collect images from Zhihu's "beauty" topic using Python's Requests and lxml libraries, filter them with Baidu AI's AipFace face detection service, and store the qualified pictures locally, detailing the required environment, logic, and preparation steps.

PythonZhihubaidu-ai
0 likes · 5 min read
Scraping Zhihu "Beauty" Topic Images with Python and Baidu AI Face Detection
JD Retail Technology
JD Retail Technology
Nov 21, 2023 · Fundamentals

Understanding Event Tracking (埋点): Definition, Purpose, Data Flow, Teams, Process, and Testing

This article explains what event tracking (埋点) is, its role in collecting user behavior data, the real‑time and offline data pipelines, the responsible teams, the end‑to‑end implementation process, common quality issues, detailed test cases, and the Track platform used for verification.

Mobileanalyticsdata collection
0 likes · 11 min read
Understanding Event Tracking (埋点): Definition, Purpose, Data Flow, Teams, Process, and Testing
DataFunSummit
DataFunSummit
Oct 19, 2023 · Big Data

Design and Evolution of Zhihu's Event Tracking (埋点) System

This article presents a comprehensive overview of Zhihu's event‑tracking system, covering its motivation, toolset, demand‑management platform, verification workflow, data‑collection pipeline, query service architecture, cloud‑native data service design, and practical Q&A on best practices and optimization strategies.

Big Datacloud-nativedata collection
0 likes · 12 min read
Design and Evolution of Zhihu's Event Tracking (埋点) System
FunTester
FunTester
Sep 1, 2023 · Operations

Observability in the Cloud‑Native Era: Data Collection Strategies and Sampling Techniques

The article explains how cloud‑native observability systems gather massive telemetry from infrastructure, containers, middleware and services, compares direct push and file‑based collection approaches, and details head, tail and local sampling methods to optimize data completeness and performance.

Distributed Tracingcloud-nativedata collection
0 likes · 10 min read
Observability in the Cloud‑Native Era: Data Collection Strategies and Sampling Techniques
DevOps Cloud Academy
DevOps Cloud Academy
Aug 29, 2023 · Cloud Native

Observability and Data Collection Strategies in Cloud‑Native Environments

The article explains that while observability is not new, cloud‑native systems have driven rapid development of observable platforms, detailing data collection architectures, direct push versus file‑based approaches, and various sampling techniques (head, tail, and local sampling) to balance completeness, real‑time reporting, and performance impact.

Microservicescloud-nativedata collection
0 likes · 11 min read
Observability and Data Collection Strategies in Cloud‑Native Environments
vivo Internet Technology
vivo Internet Technology
Apr 26, 2023 · Backend Development

Design and Evolution of Vivo's Points Task System

Vivo’s Points Task System evolved from a simple configuration‑driven task model into a scalable, multi‑source behavior incentive platform that uses an AviatorScript engine, unified SDK, and three isolated services—event collection, computation, and task handling—to deliver configurable tasks, real‑time rewards, and flexible user notifications while ensuring stability and extensibility.

backend architecturebehavior SDKdata collection
0 likes · 14 min read
Design and Evolution of Vivo's Points Task System
DataFunSummit
DataFunSummit
Jan 12, 2023 · Big Data

Industrial IoT Data Collection Platform: Neuron v2.0 Architecture, Design, and Case Studies

This article presents a comprehensive overview of EMQ's Neuron industrial IoT data collection platform, detailing the lessons learned from version 1.x, the redesigned v2.0 architecture, core modules, plugin mechanisms, data‑tag management, eKuiper integration, and two real‑world case studies in oil‑field and smart‑factory environments.

Big DataEdge ComputingIndustrial Automation
0 likes · 16 min read
Industrial IoT Data Collection Platform: Neuron v2.0 Architecture, Design, and Case Studies