Extracting Courier Addresses with Python: A Step‑by‑Step Guide
This article walks through a practical Python solution for parsing a list of courier records, extracting province information, grouping entries by location using dictionaries, and optionally applying regular expressions and pandas for further analysis, all illustrated with clear code examples and output screenshots.
Preface
Hello, I am a Python enthusiast. In a recent Python group chat a member shared a basic Python exercise involving courier information. The task is to read a list of name‑address pairs, extract the province or municipality from each address, and group the records by that location.
The desired output is a dictionary where each key is a province and the value is a list of the corresponding records.
1. Idea
The approach is to read the list, slice the first two characters of each address to obtain the province abbreviation, deduplicate these abbreviations, and then iterate through the original list to collect records that match each province. Lists and dictionaries are used to store intermediate data.
2. Solution
Below is the initial implementation provided by the contributor:
# coding: utf-8
def sp(s):
citys = []
dizhi = []
dice = {}
dic = {}
for i in s:
a = i[1]
city = a[0:2]
zlib = a[0:2]
citys.append(city)
dizhi.append(zlib)
cityss = set(citys) # deduplicate
citysss = list(cityss) # convert to list
d = dice.fromkeys(citysss)
for key in d:
h = []
for j in s:
b = j[1]
lgezi = b[0:2]
if lgezi == key:
h.append(j)
dic[key] = h
for key in dic:
print(key, dic[key])
if __name__ == '__main__':
sp([
['王*龙', '北京市海淀区苏州街大恒科技大厦南座4层'],
['郭*峰', '河南省商丘市高新技术开发区恒宇食品厂'],
...
])The code is straightforward and relies only on basic Python constructs such as lists, sets, and dictionaries.
Running the script produces the expected grouping, as shown below:
A more concise version with clearer variable names is presented next:
# coding: utf-8
def sp(text):
city = []
dice = {}
dic = {}
address = [info[-1] for info in text]
for city_info in address:
city.append(city_info[0:2])
cities = list(set(city)) # deduplicate and convert to list
dict_keys = dice.fromkeys(cities)
for key in dict_keys:
h = []
for info in text:
address = info[-1]
city_info = address[0:2]
if city_info == key:
h.append(info)
dic[key] = h
for key in dic:
print(key, dic[key])
if __name__ == '__main__':
sp([
['王*龙', '北京市海淀区苏州街大恒科技大厦南座4层'],
['柴*虎', '北京市昌平区北七家镇顺玮阁小区'],
...
])3. Small Extras
Address extraction can also be performed with regular expressions:
with open("地址信息.txt", 'r', encoding='utf-8') as f:
for line in f:
content = re.compile(r"\['(?P<name>.*?)', '(?P<address>.*?)'\]", re.S)
result = content.finditer(line)
for i in result:
name = i.group("name")
address = i.group("address")
print(name, address)The resulting name‑address pairs are displayed in the following screenshot:
These pairs can be saved to Excel and further processed with pandas. For example, extracting the province part can be done as follows:
df['地区2'] = df.地区.apply(lambda s: s[:(s in ("黑龙江省", "内蒙古自治区")) + 2])4. Summary
This article demonstrates how to extract real‑world courier information using basic Python techniques such as list slicing, set deduplication, and dictionary aggregation. It also shows alternative approaches with regular expressions and pandas for further data manipulation.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
