Fundamentals 11 min read

Extracting Courier Addresses with Python: A Step‑by‑Step Guide

This article walks through a practical Python solution for parsing a list of courier records, extracting province information, grouping entries by location using dictionaries, and optionally applying regular expressions and pandas for further analysis, all illustrated with clear code examples and output screenshots.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Extracting Courier Addresses with Python: A Step‑by‑Step Guide

Preface

Hello, I am a Python enthusiast. In a recent Python group chat a member shared a basic Python exercise involving courier information. The task is to read a list of name‑address pairs, extract the province or municipality from each address, and group the records by that location.

Problem description
Problem description

The desired output is a dictionary where each key is a province and the value is a list of the corresponding records.

Desired result
Desired result

1. Idea

The approach is to read the list, slice the first two characters of each address to obtain the province abbreviation, deduplicate these abbreviations, and then iterate through the original list to collect records that match each province. Lists and dictionaries are used to store intermediate data.

2. Solution

Below is the initial implementation provided by the contributor:

# coding: utf-8
def sp(s):
    citys = []
    dizhi = []
    dice = {}
    dic = {}
    for i in s:
        a = i[1]
        city = a[0:2]
        zlib = a[0:2]
        citys.append(city)
        dizhi.append(zlib)
    cityss = set(citys)  # deduplicate
    citysss = list(cityss)  # convert to list
    d = dice.fromkeys(citysss)
    for key in d:
        h = []
        for j in s:
            b = j[1]
            lgezi = b[0:2]
            if lgezi == key:
                h.append(j)
            dic[key] = h
    for key in dic:
        print(key, dic[key])

if __name__ == '__main__':
    sp([
        ['王*龙', '北京市海淀区苏州街大恒科技大厦南座4层'],
        ['郭*峰', '河南省商丘市高新技术开发区恒宇食品厂'],
        ...
    ])

The code is straightforward and relies only on basic Python constructs such as lists, sets, and dictionaries.

Running the script produces the expected grouping, as shown below:

Script output
Script output

A more concise version with clearer variable names is presented next:

# coding: utf-8
def sp(text):
    city = []
    dice = {}
    dic = {}
    address = [info[-1] for info in text]
    for city_info in address:
        city.append(city_info[0:2])
    cities = list(set(city))  # deduplicate and convert to list
    dict_keys = dice.fromkeys(cities)
    for key in dict_keys:
        h = []
        for info in text:
            address = info[-1]
            city_info = address[0:2]
            if city_info == key:
                h.append(info)
            dic[key] = h
    for key in dic:
        print(key, dic[key])

if __name__ == '__main__':
    sp([
        ['王*龙', '北京市海淀区苏州街大恒科技大厦南座4层'],
        ['柴*虎', '北京市昌平区北七家镇顺玮阁小区'],
        ...
    ])

3. Small Extras

Address extraction can also be performed with regular expressions:

with open("地址信息.txt", 'r', encoding='utf-8') as f:
    for line in f:
        content = re.compile(r"\['(?P<name>.*?)', '(?P<address>.*?)'\]", re.S)
        result = content.finditer(line)
        for i in result:
            name = i.group("name")
            address = i.group("address")
            print(name, address)

The resulting name‑address pairs are displayed in the following screenshot:

Regex output
Regex output

These pairs can be saved to Excel and further processed with pandas. For example, extracting the province part can be done as follows:

df['地区2'] = df.地区.apply(lambda s: s[:(s in ("黑龙江省", "内蒙古自治区")) + 2])
Pandas result
Pandas result

4. Summary

This article demonstrates how to extract real‑world courier information using basic Python techniques such as list slicing, set deduplication, and dictionary aggregation. It also shows alternative approaches with regular expressions and pandas for further data manipulation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonregexWeb ScrapingpandasListsDictionaries
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.