Daily and Sports Activities Dataset: Description, Preprocessing Pipeline, and CNN Classification Results
This article introduces the Daily_and_Sports_Activities sensor dataset, details its structure and characteristics, provides a Python preprocessing pipeline with sliding‑window segmentation and Z‑score normalization, and reports CNN training results achieving 87.93% accuracy on activity classification.
The Daily_and_Sports_Activities dataset contains sensor recordings of 19 daily and sports activities performed by eight subjects (four female, four male) for five minutes each, captured with five Xsens‑MTx units at 25 Hz. The data are segmented into 5‑second windows, yielding 480 segments per activity.
Each activity folder (a01‑a19) holds subfolders for each subject (p1‑p8), which contain 60 text files (s01‑s60). Every file includes 125 rows (5 seconds × 25 Hz) and 45 columns (5 units × 9 sensors: accelerometer, gyroscope, magnetometer for torso, right/left arms, right/left legs).
Dataset characteristics: multivariate time‑series, 9 120 instances, 5 625 features, suitable for classification and clustering tasks in computer science.
Preprocessing pipeline :
1. Environment setup – install numpy , pandas , os , and sys (install via pip install numpy pandas if needed).
2. Dataset download – a function download_dataset downloads and extracts the zip file from the UCI repository.
import os
import zipfile
from urllib.request import urlretrieve
def download_dataset(dataset_name, file_url, dataset_dir):
file_name = dataset_name + '.zip'
download_path = os.path.join(dataset_dir, file_name)
urlretrieve(file_url, download_path)
with zipfile.ZipFile(download_path, 'r') as zip_ref:
zip_ref.extractall(dataset_dir)
os.remove(download_path)
download_dataset(
dataset_name='Daily_and_Sports_Activities',
file_url='http://archive.ics.uci.edu/static/public/256/daily+and+sports+activities.zip',
dataset_dir='./data'
)3. Data preprocessing function (DASA) – parameters include WINDOW_SIZE (125), OVERLAP_RATE (0.4), SPLIT_RATE (8:2), VALIDATION_SUBJECTS , Z_SCORE , and SAVE_PATH . The function concatenates each subject’s 60 files, applies a sliding‑window, splits into training and test sets (leave‑one‑out or average split), optionally performs Z‑score standardization, and saves the processed arrays.
def DASA(dataset_dir='./data', WINDOW_SIZE=125, OVERLAP_RATE=0.4, SPLIT_RATE=(8,2), VALIDATION_SUBJECTS={7,8}, Z_SCORE=True, SAVE_PATH=os.path.abspath('../../HAR-datasets')):
# ... omitted code ...
pass4. Sliding‑window and splitting – for each activity and participant, files are read, stacked with np.vstack , and processed by sliding_window . Depending on VALIDATION_SUBJECTS , a leave‑one‑out or average split is applied.
for label_id, adl in enumerate(adls):
for participant_idx, participant in enumerate(participants):
files = sorted(os.listdir(participant))
concat_data = np.vstack([pd.read_csv(file, sep=',', header=None).to_numpy() for file in files])
cur_data = sliding_window(array=concat_data, windowsize=WINDOW_SIZE, overlaprate=OVERLAP_RATE)
# split logic ...If Z_SCORE=True , the training and test arrays are standardized using z_score_standard .
if Z_SCORE:
xtrain, xtest = z_score_standard(xtrain=xtrain, xtest=xtest)The processed data are saved to SAVE_PATH for downstream model training.
CNN training – a convolutional neural network is trained on the preprocessed dataset. The model achieves 87.93% accuracy, 91.60% precision, 87.93% recall, and an F1 score of 0.8973 on the test set, with inference times between 0.0010 s and 0.0013 s per sample.
These steps provide a ready‑to‑use pipeline for researchers to explore human activity recognition using the Daily_and_Sports_Activities dataset.
Rare Earth Juejin Tech Community
Juejin, a tech community that helps developers grow.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.