Big Data 9 min read

AI‑Powered JD.com Review Collection, Indexing, and Kibana Visualization with Elasticsearch

The author builds a fully AI‑driven pipeline that scrapes JD.com comments about an Elasticsearch book, processes the data through cleaning and preprocessing, indexes it into Elasticsearch, and creates a series of Kibana visualizations, while reflecting on model selection and practical challenges.

Mingyi World Elasticsearch

Nov 9, 2024

AI‑Powered JD.com Review Collection, Indexing, and Kibana Visualization with Elasticsearch

Motivation

Collecting JD.com reader comments for an Elasticsearch book manually was cumbersome, prompting the design of an automated crawler.

AI Tool Selection

Multiple large language models were evaluated. The workflow primarily used ChatGPT‑4o (paid $20/month) and Perplexity AI for their integration capabilities. The author emphasizes comparing at least three models and not relying on a single provider.

Overall Architecture

The process consists of five steps:

Data collection from JD.com.

Optional data cleaning.

Data preprocessing.

Writing data to Elasticsearch.

Kibana visualization.

Steps 3 and 4 can be swapped because preprocessing can be performed via an Elasticsearch ingest pipeline during index creation or later with an update_by_query operation.

Data Collection

JD.com provides an API; however, frequent calls trigger a 24‑hour ban, so request volume was limited. The API specification (shown in the original article) was used to retrieve comment data, which was saved as CSV files and verified for correctness before further processing.

Data Writing and Storage

Before ingestion, an Elasticsearch index with appropriate mapping and settings was created. Incorrect mappings would cause repeated rework during visualization. The author prompted ChatGPT‑4o‑mini to suggest visualizable dimensions, resulting in a more comprehensive schema than initially imagined.

Final DSL and code are available at: https://articles.zsxq.com/id_cidkaht43ssd.html

Data Preprocessing

The raw geographic field contains province names, which must be converted to city‑level GEO coordinates for visualization. Two transformations are required:

Map each province to its capital city (e.g., 河北 → 石家庄).

Map each capital city to its latitude‑longitude coordinates.

These transformations were generated via AI prompts, and the resulting enriched data enabled accurate geo‑visualization.

Kibana Visualizations

The following visualizations were produced step‑by‑step:

Reader city distribution map (covers major cities nationwide).

Province distribution bar chart.

Rating pie chart (≈90.48% five‑star reviews).

Comment word‑cloud.

Time‑series trend chart (relatively stable).

Sales chart (added manually and importable in Kibana).

AI Usage Reflection

AI dramatically reduced manual coding effort. The most effective approach involved iterative discussion with the model and comparing multiple providers; relying on a single model was discouraged.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data pipeline Elasticsearch ChatGPT AI Automation Kibana JD.com

Written by

Mingyi World Elasticsearch

The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.