Fundamentals 17 min read

Time Series Analysis in Python: Visualization, FFT, Entropy, PCA and Autocorrelation

This article demonstrates how to analyze and visualize time‑series sensor data in Python using libraries such as NumPy, Pandas, Matplotlib, Seaborn and Scikit‑learn, covering data import, preprocessing, mean‑std plots, boxplots, Fourier transforms, entropy calculation, PCA dimensionality reduction and autocorrelation analysis.

Python Programming Learning Circle

Sep 15, 2022

Time Series Analysis in Python: Visualization, FFT, Entropy, PCA and Autocorrelation

The article introduces time‑series analysis in Python, explaining that data visualization is a crucial step for extracting insights and that the focus will be on analytical visualizations for sensor data.

What is a time series? A numeric time series is an ordered set of observations with timestamps, each representing a scalar measurement from the same process.

What is a timestamp? It is a representation of a point in time with required precision, e.g., a formatted date string or Unix epoch milliseconds.

Python libraries used with Jupyter notebooks are primarily NumPy and Pandas, with additional imports for plotting and machine‑learning preprocessing:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm
from matplotlib.dates import date2num
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.feature_selection import SelectKBest, chi2
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf
import warnings
warnings.filterwarnings("ignore")

Importing the data from a CSV file and converting the datetime column to timestamps:

df_orig = pd.read_csv('data/data.csv',
          usecols=['datetime', 'machine_status', 'sensor_00', 'sensor_10', 'sensor_20',
                   'sensor_30', 'sensor_40', 'sensor_50'])
df_orig['datetime'] = pd.to_datetime(df_orig['datetime'])
cond_1 = df_orig['datetime'] >= '2018-04-12 00:00:00'
cond_2 = df_orig['datetime'] <= '2018-04-19 00:00:00'
df_orig = df_orig[cond_1 & cond_2]

The dataset contains six sensor columns, a datetime column and a machine status label ("BROKEN", "NORMAL", "RECOVERING"). For visualization the status is simplified to binary (0 for normal/recovering, 1 for broken).

Data preprocessing includes duplicate removal and missing‑value filling:

def drop_duplicates(df: pd.DataFrame(), subset: list = ['DATE_TIME']) -> pd.DataFrame():
    df = df.drop_duplicates(subset)
    return df

def fill_missing_date(df: pd.DataFrame(), column_datetime: str = 'DATE_TIME'):
    print(f'输入形状: {df.shape}')
    data_s = df.drop([column_datetime], axis=1)
    datetime_s = df[column_datetime].astype(str)
    start_date = min(df[column_datetime])
    end_date = max(df[column_datetime])
    date_s = pd.date_range(start_date, end_date, freq="min").strftime('%Y-%m-%d %H:%M:%S')
    data_processed_s = []
    for date_val in date_s:
        pos = np.where(date_val == datetime_s)[0]
        assert len(pos) in [0, 1]
        if len(pos) == 0:
            data = [date_val] + [0] * data_s.shape[1]
        else:
            data = [date_val] + data_s.iloc[pos].values.tolist()[0]
        data_processed_s.append(data)
    df_processed = pd.DataFrame(data_processed_s, columns=[column_datetime] + data_s.columns.values.tolist())
    df_processed[column_datetime] = pd.to_datetime(df_processed[column_datetime])
    print(f'输出形状: {df_processed.shape}')
    return df_processed

The preprocessing pipeline is applied as follows:

df_processed = drop_duplicates(df_orig, subset=['datetime'])
df = fill_missing_date(df_processed, column_datetime='datetime')

After preprocessing the shapes are (10081, 7) for inputs and (10081, 2) for outputs.

Data visualization plots each sensor series together with the binary machine‑status label using Matplotlib:

df_data_hour = df_data.groupby(pd.Grouper(key='datetime', axis=0, freq='H')).mean()
df_labels_hour = df_labels.groupby(pd.Grouper(key='datetime', axis=0, freq='H')).sum()
for name in df.columns:
    if name not in ['datetime', 'machine_status']:
        fig, axs = plt.subplots(1, 1, figsize=(15, 2))
        axs.plot(df_data_hour[name], color='blue')
        axs_twinx = axs.twinx()
        axs_twinx.plot(df_labels_hour['machine_status'], color='red')
        axs.set_title(name)
        plt.show()

Mean and standard deviation plots show the hourly mean together with rolling daily mean and standard deviation:

df_rollmean = df_data_hour.resample('D').mean()
df_rollstd = df_data_hour.resample('D').std()
for name in df.columns:
    if name not in ['datetime', 'machine_status']:
        fig, axs = plt.subplots(1, 1, figsize=(15, 2))
        axs.plot(df_data_hour[name], color='blue', label='Original')
        axs.plot(df_rollmean[name], color='red', label='Rolling Mean')
        plt.plot(df_rollstd[name], color='black', label='Rolling Std')
        axs.set_title(name)
        plt.legend()
        plt.show()

Boxplot visualizations display distribution, quartiles and outliers for each sensor per day:

df_boxplot = df_data.copy()
df_boxplot['date'] = df_boxplot['datetime'].dt.strftime('%Y-%m-%d')
for name in df_boxplot.columns:
    if name not in ['datetime', 'date']:
        fig, axs = plt.subplots(1, 1, figsize=(15, 2))
        sns.boxplot(y=name, x='date', data=df_boxplot)
        axs.set_ylabel('Value')
        axs.set_title(name)
        plt.show()

Fourier Transform (FFT) is used to extract frequency features from the sensor signals. A sliding window of 64 samples (frequency range 1‑32 Hz) is applied:

def fft(data, nwindow=64, freq=32):
    ffts = []
    for i in range(0, len(data)-nwindow, nwindow//2):
        sliced = data[i:i+nwindow]
        fft = np.abs(np.fft.rfft(sliced*np.hamming(nwindow))[:freq])
        ffts.append(fft.tolist())
    return np.array(ffts)

def data_plot(date_time, data, labels, ax):
    ax.plot(date_time, data)
    ax.set_xlim(date2num(np.min(date_time)), date2num(np.max(date_time)))
    ax_twin = ax.twinx()
    ax_twin.plot(date_time, labels, color='red')
    ax.set_ylabel('Label')

def fft_plot(ffts, ax):
    ax.imshow(np.flipud(np.rot90(ffts)), aspect='auto', cmap=matplotlib.cm.bwr,
              norm=LogNorm(vmin=np.min(ffts), vmax=np.max(ffts)))
    ax.set_xlabel('Timestamp')
    ax.set_ylabel('Freq')

df_fourier = df_data.copy()
for name in df_boxplot.columns:
    if name not in ['datetime', 'date']:
        fig, axs = plt.subplots(2, 1, figsize=(15, 6))
        data = df_fourier[name].to_numpy()
        ffts = fft(data, nwindow=64, freq=32)
        data_plot(df_fourier['datetime'], data, df_labels['machine_status'], axs[0])
        fft_plot(ffts, axs[1])
        axs[0].set_title(name)
        plt.show()

Entropy calculation measures the information content of each sliding window, which is useful for feature selection and decision‑tree models:

def entropy(data, nwindow=64, freq=32):
    entropy_s = []
    for i in range(0, len(data)-nwindow, nwindow//2):
        sliced = data[i:i+nwindow]
        fft = np.abs(np.fft.rfft(sliced*np.hamming(nwindow))[:nwindow//2])
        p = fft / np.sum(fft)
        entropy_s.append(-np.sum(p * np.log(p)))
    return np.array(entropy_s)

def entropy_plot(data, ax):
    ax.plot(data, c='k')
    ax.set_xlabel('Timestamp')
    ax.set_ylabel('Entropy')

df_entropy = df_data.copy()
for name in df_boxplot.columns:
    if name not in ['datetime', 'date']:
        fig, axs = plt.subplots(2, 1, figsize=(15, 6))
        data = df_entropy[name].to_numpy()
        entropy_s = entropy(data, nwindow=64, freq=32)
        data_plot(df_entropy['datetime'], data, df_labels['machine_status'], axs[0])
        entropy_plot(entropy_s, axs[1])
        axs[0].set_title(name)
        plt.show()

Dimensionality reduction with PCA extracts the main components from the six sensor streams:

x = df_data.drop(columns=['datetime'])
scaler = StandardScaler()
pca = PCA()
pipeline = make_pipeline(scaler, pca)
pipeline.fit(x)
features = range(pca.n_components_)
plt.figure(figsize=(22, 5))
plt.bar(features, pca.explained_variance_ratio_)
plt.xlabel('PCA feature')
plt.ylabel('Variance')
plt.title('Importance of the Principal Components based on inertia')
plt.show()

Using two components, the data are transformed and plotted together with the machine‑status label:

pca = PCA(n_components=2)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data=principalComponents, columns=['pc1', 'pc2'])
df_pca = df_data.copy()
df_pca['pca1'] = principalDf['pc1']
df_pca['pca2'] = principalDf['pc2']

df_pca_hour = df_pca.groupby(pd.Grouper(key='datetime', axis=0, freq='H')).mean()
df_labels_hour = df_labels.groupby(pd.Grouper(key='datetime', axis=0, freq='H')).sum()
for name in ['pca1', 'pca2']:
    fig, axs = plt.subplots(1, 1, figsize=(15, 2))
    axs.plot(df_pca_hour[name], color='blue')
    axs_twin = axs.twinx()
    axs_twin.plot(df_labels_hour['machine_status'], color='red')
    axs.set_title(name)
    plt.show()

Autocorrelation of the first principal component is computed and plotted to assess temporal dependence:

pca1 = principalDf['pc1'].pct_change()
autocorrelation = pca1.dropna().autocorr()
print('Autocorrelation is:', autocorrelation)
plot_acf(pca1.dropna(), lags=20, alpha=0.05)
plt.show()

The Augmented Dickey‑Fuller test is applied to each sensor and PCA component to check stationarity:

for name in df_pca.columns:
    if name not in ['datetime', 'date']:
        print(f'{name}: ', adfuller(df_pca[name]))

Overall, the article provides a comprehensive, code‑driven workflow for preprocessing, visualizing, and extracting statistical and frequency‑domain features from multivariate time‑series data in Python.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python entropy PCA Data Visualization FFT autocorrelation

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.