Fundamentals 16 min read

Master Date Operations in pandas and SQL: Retrieval, Conversion, and Calculation

This tutorial walks through loading order data into pandas and SQL, then demonstrates how to retrieve current dates, extract date components, convert between readable dates and Unix timestamps, transform between 10‑digit and 8‑digit date formats, and perform date arithmetic using pandas, MySQL, and Hive.

ITPUB
ITPUB
ITPUB
Master Date Operations in pandas and SQL: Retrieval, Conversion, and Calculation

Introduction

The article continues a series comparing pandas and SQL for common data‑processing tasks, focusing on date handling. It uses an order dataset (available as order.xlsx) and shows how to load it into a pandas DataFrame and Hive/MySQL tables.

Loading Data

In pandas:

import pandas as pd
data = pd.read_excel('order.xlsx')
# data2 = pd.read_excel('order.xlsx', parse_dates=['ts'])
print(data.head())
print(data.dtypes)

Both read_excel and read_csv support the parse_dates parameter to convert columns to datetime64[ns]. If automatic conversion fails, specify the column explicitly as shown in the commented line.

In MySQL and Hive the data is stored as strings. The article provides a t_order.sql file to create the table and a Hive LOAD DATA statement to import the CSV.

Date Retrieval

Three categories of date operations are covered: Date Retrieval , Date Conversion , and Date Calculation .

1. Get Current Date and Time

pandas:

# add a column with the current timestamp
data['current_dt'] = pd.datetime.now()
# format as string if needed
data['current_dt'] = data['current_dt'].apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))

MySQL functions: NOW(), CURRENT_TIMESTAMP, SYSDATE(), LOCALTIME, etc.

Hive: CURRENT_TIMESTAMP() (returns milliseconds); use SUBSTR to trim to seconds.

2. Get Current Date Only

pandas does not have a direct function; the same now() approach with formatting can be used.

MySQL: CURDATE() Hive:

CURRENT_DATE()

3. Extract Date Components

pandas (assuming ts is already a datetime column):

data['dt_day']   = data['ts'].dt.date
data['year']    = data['ts'].dt.year
data['month']   = data['ts'].dt.month
data['day']     = data['ts'].dt.day
data['dt_time'] = data['ts'].dt.time
data['hour']    = data['ts'].dt.hour
data['minute']  = data['ts'].dt.minute
data['second']  = data['ts'].dt.second

MySQL/Hive: use SUBSTR on the string column ts to extract year, month, day, hour, minute, second (positions shown in the article).

Date Conversion

1. Convert Readable Date to Unix Timestamp

pandas:

# convert datetime to string, then to timestamp
data['str_timestamp'] = data['ts'].apply(lambda x: int(time.mktime(time.strptime(x.strftime('%Y-%m-%d %H:%M:%S'), '%Y-%m-%d %H:%M:%S'))))

MySQL: CAST(UNIX_TIMESTAMP(ts) AS INT) (returns a decimal that needs casting).

Hive: UNIX_TIMESTAMP(ts) (returns integer directly).

2. Convert Unix Timestamp to Readable Date

pandas:

# convert integer timestamp back to datetime
data['ori_dt'] = pd.to_datetime(data['str_timestamp'], unit='s', utc=True).tz_convert('Asia/Shanghai')

MySQL: FROM_UNIXTIME(CAST(UNIX_TIMESTAMP(ts) AS INT)) Hive:

FROM_UNIXTIME(UNIX_TIMESTAMP(ts))

3. Transform 10‑digit Date (YYYY‑MM‑DD) to 8‑digit Date (YYYYMMDD)

pandas:

data['str_ts_8'] = data['str_ts'].astype(str).str[:10].str.replace('-', '')

MySQL:

SELECT REPLACE(SUBSTR(ts,1,10), '-', '') AS dt8 FROM t_order;

Hive:

SELECT REGEXP_REPLACE(SUBSTR(ts,1,10), '-', '') AS dt8 FROM t_order;

4. Transform 8‑digit Date back to 10‑digit

Method 1 – string concatenation:

# pandas
data['str_ts_10'] = data['str_ts_8'].apply(lambda x: f"{x[:4]}-{x[4:6]}-{x[6:]}")

MySQL/Hive: use CONCAT with SUBSTR (or REGEXP_REPLACE in Hive) to insert hyphens.

Method 2 – use Unix timestamp conversion:

# pandas (datetime module)
def ts_to_ymd(x):
    from datetime import datetime
    return datetime.strptime(x, '%Y%m%d').strftime('%Y-%m-%d')

data['str_ts_10_2'] = data['str_ts_8'].apply(ts_to_ymd)

MySQL/Hive: apply UNIX_TIMESTAMP on the 8‑digit string (after removing hyphens) and then FROM_UNIXTIME(..., '%Y-%m-%d').

Date Calculation

1. Date Offsets (Add/Subtract Days)

pandas:

from datetime import timedelta
# add 5 days
data['ts_plus_5'] = data['ts'] + timedelta(days=5)
# subtract 3 days
data['ts_minus_3'] = data['ts'] - timedelta(days=3)

MySQL: DATE_ADD(ts, INTERVAL 5 DAY), DATE_SUB(ts, INTERVAL 3 DAY) Hive: DATE_ADD(ts, 5),

DATE_SUB(ts, 3)

2. Date Difference

pandas (datetime column):

data['diff_days'] = (data['ts'] - data['other_ts']).dt.days

If the column is a string, first convert:

data['dt_ts'] = pd.to_datetime(data['str_ts'], format='%Y-%m-%d %H:%M:%S')

MySQL/Hive: DATEDIFF(date1, date2). Hive requires 10‑digit dates; 8‑digit inputs return NULL, while MySQL can handle both.

Conclusion

The guide summarizes the most common date‑related operations that are available in both pandas and SQL (MySQL/Hive). While the examples cover typical scenarios—retrieving the current timestamp, extracting components, converting to/from Unix timestamps, reshaping date strings, and performing arithmetic—many additional functions exist in each ecosystem for more advanced use cases.

All code snippets and the sample order.xlsx dataset can be obtained from the original WeChat public account (reply with the keyword indicated in the article).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

SQLdata-processingHivemysqldatetimepandasdate handling
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.