Master Pandas: From Installation to Advanced Data Manipulation and Stock Screening
This guide walks you through installing Pandas, understanding its core data structures like Series, DataFrame, and Panel, performing data access, adding or removing rows and columns, filtering datasets, and applying these techniques to a real‑world stock selection workflow using TuShare and TALib.
Installation
Pandas can be installed easily via Anaconda (pre‑installed) or with pip: pip install pandas Import the library before use:
import pandas as pdData Structures
Pandas provides three main data structures built on NumPy:
Series : a one‑dimensional array with an index.
DataFrame : a two‑dimensional table with rows and columns.
Panel : a three‑dimensional container (rarely used).
Series
Example of creating a Series with random numbers:
import pandas as pd
import numpy as np
s = pd.Series(np.random.randn(4))
print(s)The default index is shown in the output table.
DataFrame
Create a DataFrame from a dictionary of equal‑length lists:
data = {
'name': ['张三', '李四', '王五'],
'gender': ['M', 'F', 'M'],
'height': [174, 160, 185],
'weight': [80, 48, 70]
}
frame = pd.DataFrame(data)
print(frame)DataFrames support column ordering via the columns keyword and custom indexing via index:
frame2 = pd.DataFrame(data, columns=['name', 'gender', 'weight'], index=['one', 'two', 'three'])
print(frame2)Data Access and Traversal
Access rows by integer location:
frame2.iloc[0]
# name 张三
gender M
weight 80Access rows by label:
frame2.loc['two']
# name 李四
gender F
weight 48Iterate over rows:
for i in range(len(frame2)):
print(frame2.iloc[i])
for index, row in frame2.iterrows():
print(row)Adding and Deleting Columns
Add a new column (e.g., BMI) calculated from existing columns:
frame['BMI'] = frame['weight'] / (frame['height'] * frame['height']) * 10000
print(frame)Delete a column:
del frame2['gender']
print(frame2)Adding and Deleting Rows
Add rows by creating a new DataFrame and appending:
frame3 = pd.DataFrame([
['小红', 46],
['小明', 68]
], columns=['name', 'weight'], index=['four', 'five'])
frame4 = frame2.append(frame3)
print(frame4)Delete rows by index label:
frame4 = frame4.drop('four')
print(frame4)Data Filtering
Select the first two records:
frame[:2]Filter rows where a condition holds, e.g., BMI > 20:
mask = frame['BMI'] > 20
filtered = frame.loc[mask]
print(filtered)Panel (Brief)
Panel is a three‑dimensional structure similar to a dictionary of DataFrames; it is rarely used and omitted here.
Practical Stock Screening with Pandas
Using Pandas together with TuShare and TALib to select A‑share stocks that meet specific performance criteria:
import tushare as ts
import talib as tl
data = ts.get_k_data('300573', autype='qfq')
# Calculate daily rate of change
data['p_change'] = tl.ROC(data['close'], 1)
threshold = 60
if len(data) < threshold:
return False
data = data.tail(n=threshold)
ratio_increase = (data.iloc[-1]['close'] - data.iloc[0]['close']) / data.iloc[0]['close']
if ratio_increase < 0.6:
return False
for i in range(1, len(data)):
# Single‑day drop >7%
if data.iloc[i]['p_change'] < -7:
return False
# Two‑day cumulative drop >10%
if (data.iloc[i]['p_change'] + data.iloc[i-1]['p_change']) < -10:
return False
return TrueThe script returns a list of qualifying stocks, demonstrating how Pandas can power quantitative finance workflows.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
360 Zhihui Cloud Developer
360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
