Artificial Intelligence 9 min read

Daily and Sports Activities Dataset: Description, Preprocessing Pipeline, and CNN Classification Results

This article introduces the Daily_and_Sports_Activities sensor dataset, details its structure and characteristics, provides a Python preprocessing pipeline with sliding‑window segmentation and Z‑score normalization, and reports CNN training results achieving 87.93% accuracy on activity classification.

Rare Earth Juejin Tech Community

Jul 7, 2024

Daily and Sports Activities Dataset: Description, Preprocessing Pipeline, and CNN Classification Results

The Daily_and_Sports_Activities dataset contains sensor recordings of 19 daily and sports activities performed by eight subjects (four female, four male) for five minutes each, captured with five Xsens‑MTx units at 25 Hz. The data are segmented into 5‑second windows, yielding 480 segments per activity.

Each activity folder (a01‑a19) holds subfolders for each subject (p1‑p8), which contain 60 text files (s01‑s60). Every file includes 125 rows (5 seconds × 25 Hz) and 45 columns (5 units × 9 sensors: accelerometer, gyroscope, magnetometer for torso, right/left arms, right/left legs).

Dataset characteristics: multivariate time‑series, 9 120 instances, 5 625 features, suitable for classification and clustering tasks in computer science.

Preprocessing pipeline :

1. Environment setup – install numpy, pandas, os, and sys (install via pip install numpy pandas if needed).

2. Dataset download – a function download_dataset downloads and extracts the zip file from the UCI repository.

import os
import zipfile
from urllib.request import urlretrieve

def download_dataset(dataset_name, file_url, dataset_dir):
    file_name = dataset_name + '.zip'
    download_path = os.path.join(dataset_dir, file_name)
    urlretrieve(file_url, download_path)
    with zipfile.ZipFile(download_path, 'r') as zip_ref:
        zip_ref.extractall(dataset_dir)
    os.remove(download_path)

download_dataset(
    dataset_name='Daily_and_Sports_Activities',
    file_url='http://archive.ics.uci.edu/static/public/256/daily+and+sports+activities.zip',
    dataset_dir='./data'
)

3. Data preprocessing function (DASA) – parameters include WINDOW_SIZE (125), OVERLAP_RATE (0.4), SPLIT_RATE (8:2), VALIDATION_SUBJECTS, Z_SCORE, and SAVE_PATH. The function concatenates each subject’s 60 files, applies a sliding‑window, splits into training and test sets (leave‑one‑out or average split), optionally performs Z‑score standardization, and saves the processed arrays.

def DASA(dataset_dir='./data', WINDOW_SIZE=125, OVERLAP_RATE=0.4, SPLIT_RATE=(8,2), VALIDATION_SUBJECTS={7,8}, Z_SCORE=True, SAVE_PATH=os.path.abspath('../../HAR-datasets')):
    # ... omitted code ...
    pass

4. Sliding‑window and splitting – for each activity and participant, files are read, stacked with np.vstack, and processed by sliding_window. Depending on VALIDATION_SUBJECTS, a leave‑one‑out or average split is applied.

for label_id, adl in enumerate(adls):
    for participant_idx, participant in enumerate(participants):
        files = sorted(os.listdir(participant))
        concat_data = np.vstack([pd.read_csv(file, sep=',', header=None).to_numpy() for file in files])
        cur_data = sliding_window(array=concat_data, windowsize=WINDOW_SIZE, overlaprate=OVERLAP_RATE)
        # split logic ...

If Z_SCORE=True, the training and test arrays are standardized using z_score_standard.

if Z_SCORE:
    xtrain, xtest = z_score_standard(xtrain=xtrain, xtest=xtest)

The processed data are saved to SAVE_PATH for downstream model training.

CNN training – a convolutional neural network is trained on the preprocessed dataset. The model achieves 87.93% accuracy, 91.60% precision, 87.93% recall, and an F1 score of 0.8973 on the test set, with inference times between 0.0010 s and 0.0013 s per sample.

These steps provide a ready‑to‑use pipeline for researchers to explore human activity recognition using the Daily_and_Sports_Activities dataset.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN machine learning Data preprocessing human activity recognition sensor dataset UCI

Written by

Rare Earth Juejin Tech Community

Juejin, a tech community that helps developers grow.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.