Fundamentals 12 min read

Parsing Complex JSON Structures with pandas json_normalize

This article explains how to use pandas' json_normalize function to transform different JSON formats—including simple objects, nested objects, lists, and deeply nested structures—into DataFrames, covering parameters such as record_path, meta, max_level, sep, errors, and prefix handling, with practical code examples.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Parsing Complex JSON Structures with pandas json_normalize

Introduction

This tutorial introduces the json_normalize function from the pandas library and demonstrates how to convert various JSON data structures into tabular DataFrame objects for easier analysis.

JSON Data Overview

JSON is a widely used data interchange format that is machine‑readable but often difficult for humans to read directly. Converting JSON to a table format simplifies inspection and further analysis.

JSON Formats Covered

Two common JSON representations are discussed:

Standard JSON objects wrapped with {}.

Lists of JSON objects wrapped with [], which may contain nested objects or lists.

json_normalize() Parameters

The function offers several useful arguments, including record_path, meta, max_level, sep, errors, record_prefix, and meta_prefix. These control how nested structures are flattened, how column names are built, and how missing keys are handled.

1. Parsing a Simple JSON Object

from pandas import json_normalize
import pandas as pd

a_dict = {
    'school': 'ABC primary school',
    'location': 'London',
    'ranking': 2
}
pd.json_normalize(a_dict)

The resulting DataFrame displays the three key‑value pairs as columns.

2. Parsing a JSON List

json_list = [
    {'class': 'Year 1', 'student number': 20, 'room': 'Yellow'},
    {'class': 'Year 2', 'student number': 25, 'room': 'Blue'}
]
pd.json_normalize(json_list)

This produces a DataFrame with one row per list element.

3. Parsing Multi‑Level JSON Objects

json_obj = {
    'school': 'ABC primary school',
    'location': 'London',
    'ranking': 2,
    'info': {
        'president': 'John Kasich',
        'contacts': {
            'email': {
                'admission': '[email protected]',
                'general': '[email protected]'
            },
            'tel': '123456789'
        }
    }
}
pd.json_normalize(json_obj)

By default, nested keys are concatenated with a dot (e.g., info.contacts.email.admission). Setting max_level limits the depth of flattening.

4. Parsing JSON Objects with Nested Lists

json_obj = {
    'school': 'ABC primary school',
    'location': 'London',
    'ranking': 2,
    'info': {...},
    'students': [
        {'name': 'Tom'},
        {'name': 'James'},
        {'name': 'Jacqueline'}
    ]
}
pd.json_normalize(json_obj, record_path='students')

Using record_path='students' extracts each student as a separate row. Additional metadata can be added via the meta argument.

5. Ignoring Missing Keys

data = [
    {
        'class': 'Year 1',
        'student count': 20,
        'room': 'Yellow',
        'info': {'teachers': {'math': 'Rick Scott', 'physics': 'Elon Mask'}},
        'students': [{'name': 'Tom', 'sex': 'M'}, {'name': 'James', 'sex': 'M'}]
    },
    {
        'class': 'Year 2',
        'student count': 25,
        'room': 'Blue',
        'info': {'teachers': {'physics': 'Albert Einstein'}},
        'students': [{'name': 'Tony', 'sex': 'M'}, {'name': 'Jacqueline', 'sex': 'F'}]
    }
]
pd.json_normalize(
    data,
    record_path=['students'],
    meta=['class', 'room', ['info', 'teachers', 'math']],
    errors='ignore'
)

Setting errors='ignore' prevents the function from raising an exception when a specified key (e.g., math) is absent, filling the missing values with NaN.

6. Customizing Key Separators

json_obj = {...}
pd.json_normalize(json_obj, sep='->')

The sep argument changes the delimiter used between nested keys, allowing a custom separator such as ->.

7. Adding Prefixes to Columns

pd.json_normalize(
    json_obj,
    record_path='students',
    meta=['school', 'location', ['info', 'contacts', 'tel'], ['info', 'contacts', 'email', 'general']],
    record_prefix='students->',
    meta_prefix='meta->',
    sep='->'
)

The record_prefix and meta_prefix arguments prepend identifiers to the generated column names, making the origin of each column explicit.

8. Parsing JSON from a URL

import requests
from pandas import json_normalize
url = 'https://tianqiapi.com/free/week'
r = requests.get(url, params={
    "appid": "59257444",
    "appsecret": "uULlTGV9 ",
    'city': '深圳'
})
result = r.json()
df = json_normalize(result, meta=['city', 'cityid', 'update_time'], record_path=['data'])

This example fetches a weather API response, extracts the nested data list, and builds a DataFrame containing daily forecasts.

9. Exploring Multiple Nested Lists

When a JSON object contains more than one nested list, record_path can handle only a single key at a time. The solution is to normalize each list separately and then merge the resulting DataFrames on shared metadata.

# Normalize university list
df1 = pd.json_normalize(json_obj, record_path=['university'], meta=[...], record_prefix='university->', meta_prefix='meta->', sep='->')
# Normalize students list
df2 = pd.json_normalize(json_obj, record_path=['students'], meta=[...], record_prefix='students->', meta_prefix='meta->', sep='->')
# Merge the two results
merged = df1.merge(df2, how='left', left_index=True, right_index=True, suffixes=['->', '->']).T.drop_duplicates().T

The merged DataFrame combines information from both nested lists while removing duplicate columns.

Conclusion

The json_normalize() function is a powerful tool for converting virtually any JSON structure into a flat table, and its flexible parameters enable handling of complex, deeply nested, or irregular data scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JSONdataframepandasdata-processingjson_normalize
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.