How to Parse Complex JSON with Pandas, jsonpath, and Python – A Step‑by‑Step Guide
This article walks through a real‑world Python problem of extracting nested soccer odds from a JSON file, comparing pandas read_json, jsonpath, and regex approaches, and ultimately presenting a complete pandas‑based solution that normalizes the data, cleans it with a helper function, and exports it to CSV.
1. Introduction
In a Python community a user asked how to process a JSON file containing soccer match odds. The author examines several approaches.
2. Attempts
First suggestion was to use pandas.read_json(), but the result did not meet the requirements.
Another contributor proposed using jsonpath or regular expressions.
import json
import jsonpath
obj = json.load(open('EN_soccer_straight.json', 'r', encoding='utf-8'))
follower = jsonpath.jsonpath(obj, '$..data')
print(follower)Additional code snippets were shared, but they still required further data cleaning.
Finally the author presented a complete solution using pandas.io.json.json_normalize together with a helper function to extract the data field and reshape the odds into separate columns.
from pandas.io.json import json_normalize
import pandas as pd
import json
f = open('./clean/data/EN_soccer_straight.json', 'r')
data = json.loads(f.read())
f.close()
def dat(x):
if x is None:
return None
else:
return x['data']
# column list omitted for brevity
df = pd.DataFrame.from_dict(
orient='index',
columns=["home", "away", "matchup_id", "-3.5", "-3.25", "-3.0", "-2.75", "-2.5", "-2.25", "-2.0", "-1.75", "-1.5", "-1.25", "-1.0", "-0.75", "-0.5", "-0.25", "0.0", "0.25", "0.5", "0.75", "1.0", "1.25", "1.5", "1.75", "2.0", "2.25", "2.5", "2.75", "3.0", "3.25", "3.5", "3.75", "4.0", "4.25"],
data=data)
for odds in ["-3.5", "-3.25", "-3.0", "-2.75", "-2.5", "-2.25", "-2.0", "-1.75", "-1.5", "-1.25", "-1.0", "-0.75", "-0.5", "-0.25", "0.0", "0.25", "0.5", "0.75", "1.0", "1.25", "1.5", "1.75", "2.0", "2.25", "2.5", "2.75", "3.0", "3.25", "3.5", "3.75", "4.0", "4.25"]:
df[odds] = df[odds].apply(dat)
print(df.head(10))
df.to_csv('out.csv')3. Summary
The article demonstrates how to handle nested JSON structures in Python, convert them to a tidy DataFrame, and export the result to CSV. It also reminds readers that json.dumps() converts a dictionary to a JSON string, while json.loads() parses a JSON string back to a dictionary, and that the json module provides dump() and load() for file I/O.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
