How to Build a Watermelon Sweetness Dataset: From Field to Features
This article describes how the author collected a watermelon dataset, defined measurable features such as size, color, sugar content, seed count, and texture, and documented the process with photos, tables, and a brief discussion of data characteristics for future machine‑learning analysis.
Inspired by a TV scene about picking watermelons, the author decided to explore whether the sweetness of a watermelon can be predicted from observable features using data mining techniques.
The author collected personal data by buying a watermelon, measuring its diameter, color, and sweetness, and recording additional attributes such as seed counts, texture, shape ratio, skin thickness, and crispness. A handheld refractometer (sugar meter) was used to measure sugar content based on the refractive index of the juice.
When light passes from one medium to another, the change in direction creates a reflection; the sine of the incident angle remains constant. This ratio, called the refractive index, is proportional to the soluble solid content in fruit juice under stable temperature and pressure, allowing sugar concentration to be inferred.
After acquiring a more precise electronic sugar meter and a measuring tape, the author recorded 22 entries from July to August, noting date, ambient temperature, sugar percentage, total price, price per jin, weight, seed counts, diameter, estimated skin thickness, length‑width ratio, color, and crispness.
Date
Temperature(°C)
Sugar(%)
Total Price(¥)
Price(¥/jin)
Weight(jin)
Black Seeds
White Seeds
Cross‑section Seeds
Diameter(cm)
Estimated Skin Thickness(cm)
Length‑Width Ratio
Color
Crispness
2020-07-14
35.2
9.5
12.8
2
6.4
20
1
21
nan
1.2
1.2
Deep
Crunchy
2020-07-15
32.6
10
12.3
2
6.15
13
11
24
15
1
1
Deep
Crunchy
2020-07-17
33.1
8
19.4
2.5
7.76
16
1
17
17.5
1.2
nan
nan
Crunchy
2020-07-19
12
32
10.4
2
5.2
24
3
27
19
0.9
nan
nan
Crunchy
2020-07-20
32.5
10.5
19
2.5
7.6
19
4
23
20.3
0.8
1.1
Medium
Crunchy
Additional features such as seed distribution (black vs. white), texture classification (non‑crunchy, crunchy, very crunchy), length‑width ratio, and color depth were also recorded, each illustrated with photos.
Feature Definition
Sugar Meter Usage
The meter displays sugar percentage and measurement temperature after a drop of juice is placed on the sensor, typically measured in the hot afternoon when ambient temperature exceeds 30 °C.
Black and White Seed Counts
Seed patterns fall into four categories: mostly black, mostly white, mixed, or seedless.
Crispness
Three levels are defined: non‑crunchy (requires full cutting), crunchy (splits halfway), and very crunchy (splits without cutting).
Length‑Width Ratio
Values greater than 1 indicate elongated shape; a ratio of 1 denotes a round watermelon.
Color Depth
Images illustrate deep versus light skin color.
Data Characteristics
The following table summarizes statistical properties of the collected dataset.
Temperature(°C)
Sugar(%)
Total Price(¥)
Price(¥/jin)
Weight(jin)
Black Seeds
White Seeds
Cross‑section Seeds
Diameter(cm)
Skin Thickness(cm)
Length‑Width Ratio
count
21
22
21
21
21
22
22
22
21
22
17
mean
31
12
16
2
7
15
6
21
19
1
1
std
4
5
4
0
1
8
5
7
1
0
0
min
12
8
10
2
5
0
1
6
15
0
1
25%
31
10
13
2
6
9
3
16
18
1
1
50%
32
10
16
2
7
13
5
22
19
1
1
75%
32
12
19
2
8
20
8
26
19
1
1
max
35
32
24
2
10
29
23
36
21
1
1
Average values for each attribute are also visualized below.
Some entries have missing values, as shown in the following diagram.
Conclusion
The author will continue to analyze and model this watermelon dataset to answer the question: what makes a watermelon truly sweet?
If you are interested in the dataset, follow the public account "Model Perspective" and send the keyword 西瓜 to receive the download link.
References
Sugar meter principle: https://baijiahao.baidu.com/s?id=1715369131474544014&wfr=spider&for=pc
Zhou Zhihua. Machine Learning. Beijing: Tsinghua University Press, 2016.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Model Perspective
Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
