Fundamentals 7 min read

Master JSONPath in Python: Extract Complex JSON Data with Ease

This tutorial explains how to install and use the Python jsonpath library to efficiently query and extract values from deeply nested JSON structures, covering basic syntax, indexing, conditional filtering, and result formatting with clear code examples and illustrations.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Master JSONPath in Python: Extract Complex JSON Data with Ease

1 Introduction

When working with Python, we often need to handle JSON data, especially complex nested structures, which can be tedious to extract manually.

For hierarchical data like XML, xpath can be used; similarly, JSONPath provides a way to query JSON data in Python via the jsonpath library.

2 Using JSONPath in Python

Install the jsonpath library with pip install jsonpath.

2.1 A Simple Example

The example JSON comes from the Gaode Map walking navigation API, containing a deep hierarchy of steps from Tiananmen Square to Xidan Joy City.

To extract the duration of each step, use the JSONPath expression $..steps[*].duration:

import json
from jsonpath import jsonpath

# Read example JSON data
with open('json示例.json', encoding='utf-8') as j:
    demo_json = json.loads(j.read())

# Extract data with JSONPath
jsonpath(demo_json, '$..steps[*].duration')

The expression $..steps[*].duration describes the location rule for the data.

2.2 Common JSONPath Syntax

JSONPath defines several operators to locate target values:

Root node : $ Current node : @ Child node : . or [] Any child node : * Any descendant node : .. Examples:

# Extract all duration values
jsonpath(demo_json, '$..duration')
# Extract instruction values of all steps
jsonpath(demo_json, '$..steps.*.instruction')

2.3 Indexing Nodes

JSONPath supports selecting multiple or specific indexed nodes:

# Get instruction and action of all steps
jsonpath(demo_json, '$..steps.*[instruction,action]')

# Get instruction and action of the first step
jsonpath(demo_json, '$..steps[0][instruction,action]')

# Get instruction and action of steps 1 to 2 (excluding 3)
jsonpath(demo_json, '$..steps[1:3][instruction,action]')

# Get instruction and action of the last step using @
jsonpath(demo_json, '$..steps[(@.length-1)][instruction,action]')

2.4 Conditional Filtering

Filter nodes based on key values using comparison operators:

# Find steps where orientation is "west"
jsonpath(demo_json, '$..steps[?(@.orientation == "西")]')

To select nodes that contain a specific key:

# Find nodes with a polyline key and retrieve polyline and road values
jsonpath(demo_json, '$..[?(@.polyline)][polyline,road]')

2.5 Result Formats

By default, jsonpath returns the extracted values. Setting result_type=None returns the JSONPath expressions of each result:

jsonpath(demo_json, '$..[?(@.polyline)][polyline,road]', result_type=None)

The jsonpath library covers basic JSON extraction needs; more advanced libraries offer richer features, which will be discussed in a future article.

End of article.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Data ExtractionWeb Scraping
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.