Fundamentals 6 min read

How to Parse Complex JSON with Pandas, jsonpath, and Python – A Step‑by‑Step Guide

This article walks through a real‑world Python problem of extracting nested soccer odds from a JSON file, comparing pandas read_json, jsonpath, and regex approaches, and ultimately presenting a complete pandas‑based solution that normalizes the data, cleans it with a helper function, and exports it to CSV.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Parse Complex JSON with Pandas, jsonpath, and Python – A Step‑by‑Step Guide

1. Introduction

In a Python community a user asked how to process a JSON file containing soccer match odds. The author examines several approaches.

2. Attempts

First suggestion was to use pandas.read_json(), but the result did not meet the requirements.

Another contributor proposed using jsonpath or regular expressions.

import json
import jsonpath

obj = json.load(open('EN_soccer_straight.json', 'r', encoding='utf-8'))
follower = jsonpath.jsonpath(obj, '$..data')
print(follower)

Additional code snippets were shared, but they still required further data cleaning.

Finally the author presented a complete solution using pandas.io.json.json_normalize together with a helper function to extract the data field and reshape the odds into separate columns.

from pandas.io.json import json_normalize
import pandas as pd
import json

f = open('./clean/data/EN_soccer_straight.json', 'r')
data = json.loads(f.read())
f.close()

def dat(x):
    if x is None:
        return None
    else:
        return x['data']

# column list omitted for brevity

df = pd.DataFrame.from_dict(
    orient='index',
    columns=["home", "away", "matchup_id", "-3.5", "-3.25", "-3.0", "-2.75", "-2.5", "-2.25", "-2.0", "-1.75", "-1.5", "-1.25", "-1.0", "-0.75", "-0.5", "-0.25", "0.0", "0.25", "0.5", "0.75", "1.0", "1.25", "1.5", "1.75", "2.0", "2.25", "2.5", "2.75", "3.0", "3.25", "3.5", "3.75", "4.0", "4.25"],
    data=data)

for odds in ["-3.5", "-3.25", "-3.0", "-2.75", "-2.5", "-2.25", "-2.0", "-1.75", "-1.5", "-1.25", "-1.0", "-0.75", "-0.5", "-0.25", "0.0", "0.25", "0.5", "0.75", "1.0", "1.25", "1.5", "1.75", "2.0", "2.25", "2.5", "2.75", "3.0", "3.25", "3.5", "3.75", "4.0", "4.25"]:
    df[odds] = df[odds].apply(dat)

print(df.head(10))
df.to_csv('out.csv')

3. Summary

The article demonstrates how to handle nested JSON structures in Python, convert them to a tidy DataFrame, and export the result to CSV. It also reminds readers that json.dumps() converts a dictionary to a JSON string, while json.loads() parses a JSON string back to a dictionary, and that the json module provides dump() and load() for file I/O.

Pythondata processingJSONJSONPath
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.