Fundamentals 6 min read

How to Efficiently Parse JSON with Pandas and jsonpath in Python

This article walks through a Python community member's JSON‑processing problem, explores several solutions using pandas, jsonpath, and regular expressions, and presents a final script that normalizes the data into a tidy CSV for further analysis.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
How to Efficiently Parse JSON with Pandas and jsonpath in Python

1. Introduction

The author received a question in a Python community about how to process a JSON file using Python. The original request and a screenshot are shown.

2. Implementation Process

First, a suggestion was made to use pandas.read_json(). A code snippet (image) was provided but did not meet the requester's expectations.

Another suggestion used the jsonpath library together with regular expressions. The corresponding code is:

import json
import jsonpath

obj = json.load(open('EN_soccer_straight.json', 'r', encoding='utf-8'))
follower = jsonpath.jsonpath(obj, '$..data')
print(follower)

A further code example (image) was also shared, yet the requester was still unsatisfied because the data handling seemed too cumbersome.

Finally, the requester supplied his own script that normalizes the JSON data with pandas.json_normalize and reshapes it into a DataFrame. The full script is:

from pandas.io.json import json_normalize
import pandas
import json

f = open('./clean/data/EN_soccer_straight.json', 'r')
data = json.loads(f.read())
f.close()

def dat(x):
    if x == None:
        return None
    else:
        return x['data']

df = pandas.DataFrame.from_dict(
    orient='index',
    columns=['home', 'away', 'matchup_id', '-3.5', '-3.25', '-3.0', '-2.75',
             '-2.5', '-2.25', '-2.0', '-1.75', '-1.5', '-1.25', '-1.0',
             '-0.75', '-0.5', '-0.25', '0.0', '0.25', '0.5', '0.75', '1.0',
             '1.25', '1.5', '1.75', '2.0', '2.25', '2.5', '2.75', '3.0',
             '3.25', '3.5', '3.75', '4.0', '4.25'],
    data=data)

for odds in ['-3.5', '-3.25', '-3.0', '-2.75', '-2.5', '-2.25', '-2.0',
             '-1.75', '-1.5', '-1.25', '-1.0', '-0.75', '-0.5', '-0.25',
             '0.0', '0.25', '0.5', '0.75', '1.0', '1.25', '1.5', '1.75',
             '2.0', '2.25', '2.5', '2.75', '3.0', '3.25', '3.5', '3.75',
             '4.0', '4.25']:
    df[odds] = df[odds].apply(dat)

print(df.head(10))
df.to_csv('out.csv')

Running this script on the original JSON still raised errors because the data structure had been slightly modified.

The resulting CSV data is shown below:

With the cleaned data, further site‑related analysis can proceed smoothly.

3. Additional Notes

The article reminds that json.dumps() converts a dictionary to a JSON‑formatted string, while json.loads() parses a JSON string back into a dictionary. The json module also provides dump() and load() for file operations.

4. Conclusion

The post demonstrated how to handle a specific JSON‑processing problem in Python, offering multiple code solutions and a final working script that transforms the data into a tidy CSV file.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data-processingJsonPathpandas
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.