Master Date Operations in pandas and SQL: Retrieval, Conversion, and Calculation
This tutorial walks through loading order data into pandas and SQL, then demonstrates how to retrieve current dates, extract date components, convert between readable dates and Unix timestamps, transform between 10‑digit and 8‑digit date formats, and perform date arithmetic using pandas, MySQL, and Hive.
Introduction
The article continues a series comparing pandas and SQL for common data‑processing tasks, focusing on date handling. It uses an order dataset (available as order.xlsx) and shows how to load it into a pandas DataFrame and Hive/MySQL tables.
Loading Data
In pandas:
import pandas as pd
data = pd.read_excel('order.xlsx')
# data2 = pd.read_excel('order.xlsx', parse_dates=['ts'])
print(data.head())
print(data.dtypes)Both read_excel and read_csv support the parse_dates parameter to convert columns to datetime64[ns]. If automatic conversion fails, specify the column explicitly as shown in the commented line.
In MySQL and Hive the data is stored as strings. The article provides a t_order.sql file to create the table and a Hive LOAD DATA statement to import the CSV.
Date Retrieval
Three categories of date operations are covered: Date Retrieval , Date Conversion , and Date Calculation .
1. Get Current Date and Time
pandas:
# add a column with the current timestamp
data['current_dt'] = pd.datetime.now()
# format as string if needed
data['current_dt'] = data['current_dt'].apply(lambda x: x.strftime('%Y-%m-%d %H:%M:%S'))MySQL functions: NOW(), CURRENT_TIMESTAMP, SYSDATE(), LOCALTIME, etc.
Hive: CURRENT_TIMESTAMP() (returns milliseconds); use SUBSTR to trim to seconds.
2. Get Current Date Only
pandas does not have a direct function; the same now() approach with formatting can be used.
MySQL: CURDATE() Hive:
CURRENT_DATE()3. Extract Date Components
pandas (assuming ts is already a datetime column):
data['dt_day'] = data['ts'].dt.date
data['year'] = data['ts'].dt.year
data['month'] = data['ts'].dt.month
data['day'] = data['ts'].dt.day
data['dt_time'] = data['ts'].dt.time
data['hour'] = data['ts'].dt.hour
data['minute'] = data['ts'].dt.minute
data['second'] = data['ts'].dt.secondMySQL/Hive: use SUBSTR on the string column ts to extract year, month, day, hour, minute, second (positions shown in the article).
Date Conversion
1. Convert Readable Date to Unix Timestamp
pandas:
# convert datetime to string, then to timestamp
data['str_timestamp'] = data['ts'].apply(lambda x: int(time.mktime(time.strptime(x.strftime('%Y-%m-%d %H:%M:%S'), '%Y-%m-%d %H:%M:%S'))))MySQL: CAST(UNIX_TIMESTAMP(ts) AS INT) (returns a decimal that needs casting).
Hive: UNIX_TIMESTAMP(ts) (returns integer directly).
2. Convert Unix Timestamp to Readable Date
pandas:
# convert integer timestamp back to datetime
data['ori_dt'] = pd.to_datetime(data['str_timestamp'], unit='s', utc=True).tz_convert('Asia/Shanghai')MySQL: FROM_UNIXTIME(CAST(UNIX_TIMESTAMP(ts) AS INT)) Hive:
FROM_UNIXTIME(UNIX_TIMESTAMP(ts))3. Transform 10‑digit Date (YYYY‑MM‑DD) to 8‑digit Date (YYYYMMDD)
pandas:
data['str_ts_8'] = data['str_ts'].astype(str).str[:10].str.replace('-', '')MySQL:
SELECT REPLACE(SUBSTR(ts,1,10), '-', '') AS dt8 FROM t_order;Hive:
SELECT REGEXP_REPLACE(SUBSTR(ts,1,10), '-', '') AS dt8 FROM t_order;4. Transform 8‑digit Date back to 10‑digit
Method 1 – string concatenation:
# pandas
data['str_ts_10'] = data['str_ts_8'].apply(lambda x: f"{x[:4]}-{x[4:6]}-{x[6:]}")MySQL/Hive: use CONCAT with SUBSTR (or REGEXP_REPLACE in Hive) to insert hyphens.
Method 2 – use Unix timestamp conversion:
# pandas (datetime module)
def ts_to_ymd(x):
from datetime import datetime
return datetime.strptime(x, '%Y%m%d').strftime('%Y-%m-%d')
data['str_ts_10_2'] = data['str_ts_8'].apply(ts_to_ymd)MySQL/Hive: apply UNIX_TIMESTAMP on the 8‑digit string (after removing hyphens) and then FROM_UNIXTIME(..., '%Y-%m-%d').
Date Calculation
1. Date Offsets (Add/Subtract Days)
pandas:
from datetime import timedelta
# add 5 days
data['ts_plus_5'] = data['ts'] + timedelta(days=5)
# subtract 3 days
data['ts_minus_3'] = data['ts'] - timedelta(days=3)MySQL: DATE_ADD(ts, INTERVAL 5 DAY), DATE_SUB(ts, INTERVAL 3 DAY) Hive: DATE_ADD(ts, 5),
DATE_SUB(ts, 3)2. Date Difference
pandas (datetime column):
data['diff_days'] = (data['ts'] - data['other_ts']).dt.daysIf the column is a string, first convert:
data['dt_ts'] = pd.to_datetime(data['str_ts'], format='%Y-%m-%d %H:%M:%S')MySQL/Hive: DATEDIFF(date1, date2). Hive requires 10‑digit dates; 8‑digit inputs return NULL, while MySQL can handle both.
Conclusion
The guide summarizes the most common date‑related operations that are available in both pandas and SQL (MySQL/Hive). While the examples cover typical scenarios—retrieving the current timestamp, extracting components, converting to/from Unix timestamps, reshaping date strings, and performing arithmetic—many additional functions exist in each ecosystem for more advanced use cases.
All code snippets and the sample order.xlsx dataset can be obtained from the original WeChat public account (reply with the keyword indicated in the article).
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
