How to Extract Text from JSON‑Like Strings in Python Using Regex and split()
This article shows how to pull specific fields from a JSON‑like string obtained during web crawling by applying Python regular expressions or the split() function, providing complete code examples and visual results for each method.
Introduction
Hello, I'm a Python enthusiast. A fan asked how to extract text from a JSON‑like string obtained during web crawling.
Idea
Regular expressions are a quick way to pull out the desired fields.
Implementation
1. Using regex
# -*- coding: utf-8 -*-
import re
text = """
Top2 26
Top2 "word":"经纪人不得为假唱假演奏提供条件"
Top2 "query":"经纪人不得为假唱假演奏提供条件"
Top2 "show":[]
Top2 "desc":"18日,文旅部发布关于《演出经纪人员管理办法(征求意见稿)》公开征求意见的公告。"
Top2 "img":"https://fyb-1.cdn.bcebos.com/fyb-1//5b4bc1de60744e69f34225af1452a395"
Top2 "url":"https://www.baidu.com/s?wd=..."
Top2 "rawUrl":"https://www.baidu.com/s?wd=..."
Top2 "hotScore":"2325661"
Top2 "hotChange":"same"
Top2 "hotTag":"0"
Top2 "appUrl":"https://www.baidu.com/s?wd=..."
"""
regex = re.findall(r'":"(.*?)"', text)
for data in regex:
print(data)Running the script prints the extracted values.
2. Using split()
# -*- coding: utf-8 -*-
import re
text = """
Top2 26
Top2 "word":"经纪人不得为假唱假演奏提供条件"
Top2 "query":"经纪人不得为假唱假演奏提供条件"
Top2 "show":[]
Top2 "desc":"18日,文旅部发布关于《演出经纪人员管理办法(征求意见稿)》公开征求意见的公告。"
Top2 "img":"https://fyb-1.cdn.bcebos.com/fyb-1//5b4bc1de60744e69f34225af1452a395"
Top2 "url":"https://www.baidu.com/s?wd=..."
Top2 "rawUrl":"https://www.baidu.com/s?wd=..."
Top2 "hotScore":"2325661"
Top2 "hotChange":"same"
Top2 "hotTag":"0"
Top2 "appUrl":"https://www.baidu.com/s?wd=..."
"""
raw_text = text.strip('
').split('Top2 ')
for line in raw_text:
print(line.split('":"')[-1].replace('"', ''))This method works but is less flexible and more brittle.
Conclusion
The article demonstrates how to process a raw string from a Python web crawler using either regular expressions or the split() function, satisfying the fan's request. Readers are encouraged to experiment with other techniques and share their results.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
