How to Clean Malformed JSON Strings in Python: Practical Regex Solutions
This article walks through a real‑world Python JSON parsing issue where extra quotes break the data, presents multiple regex‑based and parsing approaches from community members, and provides ready‑to‑run code snippets to clean and convert the malformed strings into valid JSON.
Introduction
In a Python community chat a member posted a JSON string that cannot be parsed because some fields contain stray double‑quotes, e.g.
{
"taskType": 1,
"printPageHeight": 1459,
"exportTypeTemplate": "html",
"reportTitle": "信息科技"网络安全漏洞扫描系统 "安全评估报告-主机报表",
"companyName": "信息科技",
"createTime": "2024-08-09 10:03:48",
"curr_lang": "zh-CN"pt "漏洞"
}The goal is to clean the string so that json.loads can read it.
Implementation
One suggestion uses regular expressions to remove the extra characters and ensure every value is surrounded by double quotes.
import re
import json
# original JSON string
json_str = '''
{
"taskType": 1,
"printPageHeight": 1459,
"exportTypeTemplate": "html",
"reportTitle": "信息科技"网络安全漏洞扫描系统 "安全评估报告-主机报表",
"companyName": "信息科技",
"createTime": "2024-08-09 10:03:48",
"curr_lang": "zh-CN"pt "漏洞"
}
'''
# attempt to fix stray commas (example)
json_str = re.sub(r',\s*[^,}]*$', '', json_str)
# ensure each value is quoted
json_str = re.sub(r'("([^\"]+)"\s*:\s*)([^\"]+)(,?)', r'\1"\3"\4', json_str)
try:
data = json.loads(json_str)
print("JSON解析成功:", data)
except json.JSONDecodeError as e:
print("JSON解析失败:", e)This approach raised invalid group reference 5 at position 7, so an alternative method was proposed.
The second method extracts key‑value pairs with re.findall and rebuilds a dictionary, then optionally converts it back to a JSON string.
json_str = re.findall(r'"(.*?)":\s*(.*?)[,
]', json_str)
data = {i[0]: i[1].replace('"', ' ') for i in json_str}
# convert to JSON string if needed
json_str = json.dumps(data, ensure_ascii=False)A refined version keeps integer values unchanged:
data = {i[0]: i[1][1:-1] if i[1][0] == '"' else int(i[1]) for i in json_str}Summary
The discussion demonstrates several practical ways to clean malformed JSON strings in Python, from simple regex replacements to extracting pairs with re.findall and rebuilding the structure, enabling successful parsing with json.loads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
