Fundamentals 12 min read

How to Build a Watermelon Sweetness Dataset: From Field to Features

This article describes how the author collected a watermelon dataset, defined measurable features such as size, color, sugar content, seed count, and texture, and documented the process with photos, tables, and a brief discussion of data characteristics for future machine‑learning analysis.

Model Perspective
Model Perspective
Model Perspective
How to Build a Watermelon Sweetness Dataset: From Field to Features

Inspired by a TV scene about picking watermelons, the author decided to explore whether the sweetness of a watermelon can be predicted from observable features using data mining techniques.

The author collected personal data by buying a watermelon, measuring its diameter, color, and sweetness, and recording additional attributes such as seed counts, texture, shape ratio, skin thickness, and crispness. A handheld refractometer (sugar meter) was used to measure sugar content based on the refractive index of the juice.

When light passes from one medium to another, the change in direction creates a reflection; the sine of the incident angle remains constant. This ratio, called the refractive index, is proportional to the soluble solid content in fruit juice under stable temperature and pressure, allowing sugar concentration to be inferred.

After acquiring a more precise electronic sugar meter and a measuring tape, the author recorded 22 entries from July to August, noting date, ambient temperature, sugar percentage, total price, price per jin, weight, seed counts, diameter, estimated skin thickness, length‑width ratio, color, and crispness.

Date

Temperature(°C)

Sugar(%)

Total Price(¥)

Price(¥/jin)

Weight(jin)

Black Seeds

White Seeds

Cross‑section Seeds

Diameter(cm)

Estimated Skin Thickness(cm)

Length‑Width Ratio

Color

Crispness

2020-07-14

35.2

9.5

12.8

2

6.4

20

1

21

nan

1.2

1.2

Deep

Crunchy

2020-07-15

32.6

10

12.3

2

6.15

13

11

24

15

1

1

Deep

Crunchy

2020-07-17

33.1

8

19.4

2.5

7.76

16

1

17

17.5

1.2

nan

nan

Crunchy

2020-07-19

12

32

10.4

2

5.2

24

3

27

19

0.9

nan

nan

Crunchy

2020-07-20

32.5

10.5

19

2.5

7.6

19

4

23

20.3

0.8

1.1

Medium

Crunchy

Additional features such as seed distribution (black vs. white), texture classification (non‑crunchy, crunchy, very crunchy), length‑width ratio, and color depth were also recorded, each illustrated with photos.

Feature Definition

Sugar Meter Usage

The meter displays sugar percentage and measurement temperature after a drop of juice is placed on the sensor, typically measured in the hot afternoon when ambient temperature exceeds 30 °C.

Black and White Seed Counts

Seed patterns fall into four categories: mostly black, mostly white, mixed, or seedless.

Crispness

Three levels are defined: non‑crunchy (requires full cutting), crunchy (splits halfway), and very crunchy (splits without cutting).

Length‑Width Ratio

Values greater than 1 indicate elongated shape; a ratio of 1 denotes a round watermelon.

Color Depth

Images illustrate deep versus light skin color.

Data Characteristics

The following table summarizes statistical properties of the collected dataset.

Temperature(°C)

Sugar(%)

Total Price(¥)

Price(¥/jin)

Weight(jin)

Black Seeds

White Seeds

Cross‑section Seeds

Diameter(cm)

Skin Thickness(cm)

Length‑Width Ratio

count

21

22

21

21

21

22

22

22

21

22

17

mean

31

12

16

2

7

15

6

21

19

1

1

std

4

5

4

0

1

8

5

7

1

0

0

min

12

8

10

2

5

0

1

6

15

0

1

25%

31

10

13

2

6

9

3

16

18

1

1

50%

32

10

16

2

7

13

5

22

19

1

1

75%

32

12

19

2

8

20

8

26

19

1

1

max

35

32

24

2

10

29

23

36

21

1

1

Average values for each attribute are also visualized below.

Some entries have missing values, as shown in the following diagram.

Conclusion

The author will continue to analyze and model this watermelon dataset to answer the question: what makes a watermelon truly sweet?

If you are interested in the dataset, follow the public account "Model Perspective" and send the keyword 西瓜 to receive the download link.

References

Sugar meter principle: https://baijiahao.baidu.com/s?id=1715369131474544014&wfr=spider&for=pc

Zhou Zhihua. Machine Learning. Beijing: Tsinghua University Press, 2016.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data collectionmachine learningfeature engineeringdata analysiswatermelon dataset
Model Perspective
Written by

Model Perspective

Insights, knowledge, and enjoyment from a mathematical modeling researcher and educator. Hosted by Haihua Wang, a modeling instructor and author of "Clever Use of Chat for Mathematical Modeling", "Modeling: The Mathematics of Thinking", "Mathematical Modeling Practice: A Hands‑On Guide to Competitions", and co‑author of "Mathematical Modeling: Teaching Design and Cases".

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.