AI‑Powered JD.com Review Collection, Indexing, and Kibana Visualization with Elasticsearch
The author builds a fully AI‑driven pipeline that scrapes JD.com comments about an Elasticsearch book, processes the data through cleaning and preprocessing, indexes it into Elasticsearch, and creates a series of Kibana visualizations, while reflecting on model selection and practical challenges.
Motivation
Collecting JD.com reader comments for an Elasticsearch book manually was cumbersome, prompting the design of an automated crawler.
AI Tool Selection
Multiple large language models were evaluated. The workflow primarily used ChatGPT‑4o (paid $20/month) and Perplexity AI for their integration capabilities. The author emphasizes comparing at least three models and not relying on a single provider.
Overall Architecture
The process consists of five steps:
Data collection from JD.com.
Optional data cleaning.
Data preprocessing.
Writing data to Elasticsearch.
Kibana visualization.
Steps 3 and 4 can be swapped because preprocessing can be performed via an Elasticsearch ingest pipeline during index creation or later with an update_by_query operation.
Data Collection
JD.com provides an API; however, frequent calls trigger a 24‑hour ban, so request volume was limited. The API specification (shown in the original article) was used to retrieve comment data, which was saved as CSV files and verified for correctness before further processing.
Data Writing and Storage
Before ingestion, an Elasticsearch index with appropriate mapping and settings was created. Incorrect mappings would cause repeated rework during visualization. The author prompted ChatGPT‑4o‑mini to suggest visualizable dimensions, resulting in a more comprehensive schema than initially imagined.
Final DSL and code are available at: https://articles.zsxq.com/id_cidkaht43ssd.html
Data Preprocessing
The raw geographic field contains province names, which must be converted to city‑level GEO coordinates for visualization. Two transformations are required:
Map each province to its capital city (e.g., 河北 → 石家庄).
Map each capital city to its latitude‑longitude coordinates.
These transformations were generated via AI prompts, and the resulting enriched data enabled accurate geo‑visualization.
Kibana Visualizations
The following visualizations were produced step‑by‑step:
Reader city distribution map (covers major cities nationwide).
Province distribution bar chart.
Rating pie chart (≈90.48% five‑star reviews).
Comment word‑cloud.
Time‑series trend chart (relatively stable).
Sales chart (added manually and importable in Kibana).
AI Usage Reflection
AI dramatically reduced manual coding effort. The most effective approach involved iterative discussion with the model and comparing multiple providers; relying on a single model was discouraged.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Mingyi World Elasticsearch
The leading WeChat public account for Elasticsearch fundamentals, advanced topics, and hands‑on practice. Join us to dive deep into the ELK Stack (Elasticsearch, Logstash, Kibana, Beats).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
