Parsing Complex JSON Structures with pandas json_normalize
This article explains how to use pandas' json_normalize function to transform different JSON formats—including simple objects, nested objects, lists, and deeply nested structures—into DataFrames, covering parameters such as record_path, meta, max_level, sep, errors, and prefix handling, with practical code examples.
Introduction
This tutorial introduces the json_normalize function from the pandas library and demonstrates how to convert various JSON data structures into tabular DataFrame objects for easier analysis.
JSON Data Overview
JSON is a widely used data interchange format that is machine‑readable but often difficult for humans to read directly. Converting JSON to a table format simplifies inspection and further analysis.
JSON Formats Covered
Two common JSON representations are discussed:
Standard JSON objects wrapped with {}.
Lists of JSON objects wrapped with [], which may contain nested objects or lists.
json_normalize() Parameters
The function offers several useful arguments, including record_path, meta, max_level, sep, errors, record_prefix, and meta_prefix. These control how nested structures are flattened, how column names are built, and how missing keys are handled.
1. Parsing a Simple JSON Object
from pandas import json_normalize
import pandas as pd
a_dict = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2
}
pd.json_normalize(a_dict)The resulting DataFrame displays the three key‑value pairs as columns.
2. Parsing a JSON List
json_list = [
{'class': 'Year 1', 'student number': 20, 'room': 'Yellow'},
{'class': 'Year 2', 'student number': 25, 'room': 'Blue'}
]
pd.json_normalize(json_list)This produces a DataFrame with one row per list element.
3. Parsing Multi‑Level JSON Objects
json_obj = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2,
'info': {
'president': 'John Kasich',
'contacts': {
'email': {
'admission': '[email protected]',
'general': '[email protected]'
},
'tel': '123456789'
}
}
}
pd.json_normalize(json_obj)By default, nested keys are concatenated with a dot (e.g., info.contacts.email.admission). Setting max_level limits the depth of flattening.
4. Parsing JSON Objects with Nested Lists
json_obj = {
'school': 'ABC primary school',
'location': 'London',
'ranking': 2,
'info': {...},
'students': [
{'name': 'Tom'},
{'name': 'James'},
{'name': 'Jacqueline'}
]
}
pd.json_normalize(json_obj, record_path='students')Using record_path='students' extracts each student as a separate row. Additional metadata can be added via the meta argument.
5. Ignoring Missing Keys
data = [
{
'class': 'Year 1',
'student count': 20,
'room': 'Yellow',
'info': {'teachers': {'math': 'Rick Scott', 'physics': 'Elon Mask'}},
'students': [{'name': 'Tom', 'sex': 'M'}, {'name': 'James', 'sex': 'M'}]
},
{
'class': 'Year 2',
'student count': 25,
'room': 'Blue',
'info': {'teachers': {'physics': 'Albert Einstein'}},
'students': [{'name': 'Tony', 'sex': 'M'}, {'name': 'Jacqueline', 'sex': 'F'}]
}
]
pd.json_normalize(
data,
record_path=['students'],
meta=['class', 'room', ['info', 'teachers', 'math']],
errors='ignore'
)Setting errors='ignore' prevents the function from raising an exception when a specified key (e.g., math) is absent, filling the missing values with NaN.
6. Customizing Key Separators
json_obj = {...}
pd.json_normalize(json_obj, sep='->')The sep argument changes the delimiter used between nested keys, allowing a custom separator such as ->.
7. Adding Prefixes to Columns
pd.json_normalize(
json_obj,
record_path='students',
meta=['school', 'location', ['info', 'contacts', 'tel'], ['info', 'contacts', 'email', 'general']],
record_prefix='students->',
meta_prefix='meta->',
sep='->'
)The record_prefix and meta_prefix arguments prepend identifiers to the generated column names, making the origin of each column explicit.
8. Parsing JSON from a URL
import requests
from pandas import json_normalize
url = 'https://tianqiapi.com/free/week'
r = requests.get(url, params={
"appid": "59257444",
"appsecret": "uULlTGV9 ",
'city': '深圳'
})
result = r.json()
df = json_normalize(result, meta=['city', 'cityid', 'update_time'], record_path=['data'])This example fetches a weather API response, extracts the nested data list, and builds a DataFrame containing daily forecasts.
9. Exploring Multiple Nested Lists
When a JSON object contains more than one nested list, record_path can handle only a single key at a time. The solution is to normalize each list separately and then merge the resulting DataFrames on shared metadata.
# Normalize university list
df1 = pd.json_normalize(json_obj, record_path=['university'], meta=[...], record_prefix='university->', meta_prefix='meta->', sep='->')
# Normalize students list
df2 = pd.json_normalize(json_obj, record_path=['students'], meta=[...], record_prefix='students->', meta_prefix='meta->', sep='->')
# Merge the two results
merged = df1.merge(df2, how='left', left_index=True, right_index=True, suffixes=['->', '->']).T.drop_duplicates().TThe merged DataFrame combines information from both nested lists while removing duplicate columns.
Conclusion
The json_normalize() function is a powerful tool for converting virtually any JSON structure into a flat table, and its flexible parameters enable handling of complex, deeply nested, or irregular data scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
