Big Data 15 min read

Python Data Analysis Project: US Presidential Election Contributions

This tutorial walks through a Python-based data analysis project that explores over 750,000 US voter donation records from the 2020 presidential election, covering data preparation, cleaning, exploratory analysis, and visualizations such as bar charts, pie charts, and word clouds.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Python Data Analysis Project: US Presidential Election Contributions

This tutorial demonstrates a practical data analysis project using Python to explore US voter donation data for the 2020 presidential election.

Preparation: The required datasets (weball20.txt, ccl.txt, itcont_2020_20200722_20200820.txt) are downloaded from the FEC, and necessary Python packages (pandas, matplotlib, wordcloud) are installed.

<code># install packages
pip install wordcloud
pip install matplotlib
pip install pandas</code>

Data loading and merging: Candidates, committee linkage, and individual contribution files are read with pandas (specifying separators and column names) and merged on CAND_ID and CMTE_ID to create a combined table linking donors to candidates.

<code>import pandas as pd
candidates = pd.read_csv("weball20.txt", sep='|', names=[
    'CAND_ID','CAND_NAME','CAND_ICI','PTY_CD','CAND_PTY_AFFILIATION','TTL_RECEIPTS',
    'TRANS_FROM_AUTH','TTL_DISB','TRANS_TO_AUTH','COH_BOP','COH_COP','CAND_CONTRIB',
    'CAND_LOANS','OTHER_LOANS','CAND_LOAN_REPAY','OTHER_LOAN_REPAY','DEBTS_OWED_BY',
    'TTL_INDIV_CONTRIB','CAND_OFFICE_ST','CAND_OFFICE_DISTRICT','SPEC_ELECTION','PRIM_ELECTION',
    'RUN_ELECTION','GEN_ELECTION','GEN_ELECTION_PRECENT','OTHER_POL_CMTE_CONTRIB','POL_PTY_CONTRIB',
    'CVG_END_DT','INDIV_REFUNDS','CMTE_REFUNDS'])

ccl = pd.read_csv("ccl.txt", sep='|', names=[
    'CAND_ID','CAND_ELECTION_YR','FEC_ELECTION_YR','CMTE_ID','CMTE_TP','CMTE_DSGN','LINKAGE_ID'])

ccl = pd.merge(ccl, candidates)

itcont = pd.read_csv("itcont_2020_20200722_20200820.txt", sep='|', names=[
    'CMTE_ID','AMNDT_IND','RPT_TP','TRANSACTION_PGI','IMAGE_NUM','TRANSACTION_TP','ENTITY_TP',
    'NAME','CITY','STATE','ZIP_CODE','EMPLOYER','OCCUPATION','TRANSACTION_DT','TRANSACTION_AMT',
    'OTHER_ID','TRAN_ID','FILE_NUM','MEMO_CD','MEMO_TEXT','SUB_ID'])

c_itcont = pd.merge(ccl, itcont)
</code>

Data cleaning: Missing values in STATE, EMPLOYER, OCCUPATION are filled with "NOT PROVIDED", and the transaction date column is converted to a readable string format.

<code>c_itcont['STATE'].fillna('NOT PROVIDED', inplace=True)
c_itcont['EMPLOYER'].fillna('NOT PROVIDED', inplace=True)
c_itcont['OCCUPATION'].fillna('NOT PROVIDED', inplace=True)

c_itcont['TRANSACTION_DT'] = c_itcont['TRANSACTION_DT'].astype(str)
# Convert from YYYYMMDD integer to YYYY-MM-DD string (example conversion)
c_itcont['TRANSACTION_DT'] = [i[3:7] + i[0] + i[1:3] for i in c_itcont['TRANSACTION_DT']]
</code>

Exploratory analysis: The dataset contains 756,205 rows and 8 columns; basic statistics are displayed with .shape, .info(), and .describe().

Key analyses include:

Total donation amount by party (DEM, REP, etc.)

Total donation amount by candidate (BIDEN, TRUMP, etc.)

Top occupations by donation amount and by donor count

Top states by donation amount and by donor count

Visualization: Bar charts for state donation totals and donor counts, a pie chart for Biden’s state‑wise donation share, and a word cloud of Biden’s donors are generated using matplotlib and wordcloud.

<code># Example: bar chart of top states by donation amount
st_amt = c_itcont.groupby('STATE').sum().sort_values('TRANSACTION_AMT', ascending=False)[:10]
st_amt.plot(kind='bar')

# Example: pie chart of Biden’s donations by state
biden = c_itcont[c_itcont['CAND_NAME'] == 'BIDEN, JOSEPH R JR']
biden_state = biden.groupby('STATE').sum().sort_values('TRANSACTION_AMT', ascending=False)[:10]
biden_state.plot.pie(figsize=(10,10), autopct='%0.2f%%')
</code>

All code, data sources, and results are available at the competition page https://tianchi.aliyun.com/competition/entrance/531837/introduction.

Big Datapythondata analysisMatplotlibpandaselectionWordCloud
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.