Practical Data Analysis Code Samples for Business Decision Making
This article presents ten practical Python code examples that demonstrate common data analysis techniques—such as handling missing values, sorting, pivot tables, visualization, association rules, outlier detection, time‑series forecasting, clustering, feature selection, and cross‑validation—to help improve business decision effectiveness.
Data analysis can significantly enhance business decision effectiveness, and the following ten Python code snippets illustrate common data‑analysis scenarios.
1. Handling missing values:
# 删除包含缺失值的行
data.dropna()
# 填充缺失值为指定值
data.fillna(value)
# 使用列的均值填充缺失值
data.fillna(data.mean())2. Sorting and ranking:
# 按列对数据进行排序
sorted_data = data.sort_values('column')
# 计算数据列的排名
data['rank'] = data['column'].rank()3. Pivot table creation:
# 创建数据透视表
pivot_table = data.pivot_table(values='value', index='index_column', columns='columns_column', aggfunc='mean')4. Data distribution visualization:
import seaborn as sns
# 绘制直方图
sns.histplot(data['column'])
# 绘制箱线图
sns.boxplot(data['column'])5. Association analysis:
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules
# 执行关联分析
frequent_itemsets = apriori(data, min_support=0.1, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.5)6. Outlier detection:
import numpy as np
from scipy import stats
# 计算数据列的Z-score
z_scores = np.abs(stats.zscore(data['column']))
# 根据阈值筛选异常值
outliers = data[z_scores > threshold]7. Time‑series forecasting:
from statsmodels.tsa.arima.model import ARIMA
# 创建ARIMA模型
model = ARIMA(data, order=(p, d, q))
# 拟合模型
model_fit = model.fit()
# 进行预测
predictions = model_fit.predict(start=start_date, end=end_date)8. Clustering analysis:
from sklearn.cluster import KMeans
# 创建KMeans聚类模型
model = KMeans(n_clusters=k)
# 对数据进行聚类
clusters = model.fit_predict(data)9. Feature selection:
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
# 使用卡方检验选择K个最佳特征
selector = SelectKBest(chi2, k=k)
selected_features = selector.fit_transform(X, y)10. Cross‑validation:
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
# 创建逻辑回归模型
model = LogisticRegression()
# 执行交叉验证
scores = cross_val_score(model, X, y, cv=5)These examples cover data preprocessing, visualization, association analysis, outlier detection, forecasting, clustering, feature selection, and model evaluation, providing a solid foundation for applying data analysis to real‑world business problems.
By studying and adapting these techniques to specific business needs, you can extract deeper insights and create greater value from your data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
