How to Turn Real‑Estate Data into Insightful Reports and Charts with Pandas
This guide demonstrates how to use a second‑hand housing dataset to create various reports—frequency tables, cross‑tables, and summary statistics—and visualise them with bar and stacked bar charts using Python's pandas library.
Introduction
Reports and statistical charts are essential for presenting data insights. Using a second‑hand housing CSV file, this article shows step‑by‑step how to generate different types of reports and corresponding visualisations.
Report Types
Reports are classified by the variables they contain:
Dimension (categorical) indicators
Measure (continuous) indicators
Only dimension indicators produce frequency tables (single‑category) or cross‑tables (multiple categories). When both dimensions and measures are present, the result is a summary table , where measures appear as statistical aggregates such as mean, sum, or count.
Bar Charts and Their Relation to Tables
Bar charts map directly to tables: a one‑dimensional bar chart corresponds to a single categorical variable, while a two‑dimensional bar chart represents two categorical variables. The bar length reflects a frequency or a statistical measure.
Dataset Overview
The example uses sndHsPr.csv, which contains fields like district, price, subway proximity, and school‑district status. The goal is to explore both the statistical characteristics of the houses and the factors influencing those characteristics.
1) Single‑Factor Frequency
Analyze the distribution of a single categorical variable, providing count, percentage, and cumulative values.
Code example: snd.district.value_counts() To visualise the frequencies as a bar chart:
snd.district.value_counts().plot(kind='bar')2) Table Analysis (Cross‑Table)
Examine the joint distribution of two categorical variables, showing cell frequencies, percentages, and marginal distributions.
Generate a cross‑table with pandas: pd.crosstab(snd.subway, snd.school) Standardise the rows to percentages and plot a stacked bar chart:
sub_sch = pd.crosstab(snd.subway, snd.school)
row_perc = sub_sch.div(sub_sch.sum(axis=1), axis=0)
row_perc.plot(kind='bar', stacked=True)The article recommends a custom stack2dim function for creating stacked bar charts. Its key parameters are: raw: the pandas DataFrame i, j: the names of the two categorical variables (as quoted strings, e.g., "school")
3) Summary Statistics
Group data by a categorical variable and compute descriptive statistics for a continuous variable.
Example for average, maximum, and minimum house prices per district:
snd.price.groupby(snd.district).agg(['mean', 'max', 'min'])Visual Illustrations
The article includes several figures that illustrate each step, such as the two‑dimensional table template, single‑factor frequency chart, cross‑table visualisation, stacked bar chart, and summary statistics chart.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
