How to Parse Complex Python Strings with List Operations and Regex
This article walks through several Python techniques—including list splitting, combined list‑processing methods, and regular‑expression extraction—to efficiently parse mixed text and URL strings, illustrating each approach with code snippets and results.
In a recent Python community discussion, a user asked how to process a list of strings containing mixed text and URLs. The goal was to extract meaningful parts using list operations and regular expressions.
First, a simple approach splits each item at the ')' character and builds a new list:
lst = [元素列表]
# print(len(lst))
new_lst = [lst[0]]
for item in lst[1:]:
new_item = item.split(')')
new_lst.extend([new_item[0], new_item[1]])
print(len(new_lst))
print(new_lst)This yields the expected result.
Another contributor provided a more compact solution combining two methods:
l1 = sum([*map((lambda x: x.split(')') if 'png)' in x else [x]), lists)], [])
l1 = [x for x in l1 if x != '']
l2 = []
nums = []
for j, item in enumerate(l1):
if 'png' in item:
if item[0] != '!':
b = ' '.join(l1[j-1:j+1]).split('
nums.append(j)
else:
b = item.split('
b = [x for x in b if x != '']
l2.extend(b)
else:
l2.append(item)
lists = [l2[j] for j in range(len(l2)) if j+1 not in nums]Finally, a regular‑expression solution quickly extracts the desired substrings:
import re
data = 'your string here'
temp = re.findall(r'>(.*?)<|src="(.*?)"', data)
result = [i.replace('\u3000', ' ') for j in temp for i in j if i != '']
print(result)The article concludes that these snippets demonstrate how to handle complex string parsing in Python, and thanks the community members who contributed.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
