Find Nearest Geographic Points in Large Excel Files with Python
This article demonstrates a step‑by‑step Python workflow for reading two Excel sheets, calculating haversine distances between latitude‑longitude pairs, identifying the nearest point for each record, and exporting the results, complete with code snippets, progress display, and performance advantages over traditional GIS tools.
Introduction
Hello, I'm Cui Yanfei. In many work scenarios we need to match the nearest latitude‑longitude point from table B to each point in table A. Traditional GIS tools like MapInfo are slow on large datasets, but Python can handle it efficiently and show progress.
Project Goal
Implement nearest‑point calculation between two Excel tables using Python.
Setup
Software: PyCharm
Required libraries: pandas, xlrd, os
Analysis
1) Selecting and reading the Excel files
Use os and xlrd to locate and load the target Excel files.
2) Calculating distance between two latitude‑longitude points
Read both tables with pandas and define a function (e.g., haversine) to compute distances.
3) Looping to find and store the nearest point
Iterate over rows of the first table, compare distances to all rows of the second table, and keep the record with the smallest distance.
4) Saving the results
Use DataFrame.to_excel to write the nearest‑point data to a new Excel file.
Implementation
Step 1: Import libraries
import pandas as pd
import xlrd
import osStep 2: Choose and read the Excel file
path = "D:/a/"
# Get all Excel filenames in the folder
xlsx_names = [x for x in os.listdir(path) if x.endswith(".xlsx")]
xlsx_names1 = xlsx_names[0] # first Excel file
aa = path + xlsx_names1
first_file_fh = xlrd.open_workbook(aa)
first_file_sheet = first_file_fh.sheets()Step 3: Loop to compute nearest point and store it
for i in range(h1):
w1 = df1.loc[i, '纬度']
j1 = df1.loc[i, '经度']
d0 = 1e25
print("原小区第%d个。" % (i+1))
test_dict = {'距离': [d0]}
d3 = pd.DataFrame(test_dict)
for l in range(h2):
w2 = df2.loc[l, '纬度']
j2 = df2.loc[l, '经度']
d = haversine(j1, w1, j2, w2)
if d < d0:
d0 = d
d2 = df2.loc[l, :]
test_dict = {'距离': [d0]}
d3 = pd.DataFrame(test_dict)
else:
continueStep 4: Save the computed results
resultdata1.to_excel(excel_writer=writer, sheet_name='原小区', encoding="utf-8", index=False)
resultdata2.to_excel(excel_writer=writer, sheet_name='最近小区', encoding="utf-8", index=False)
resultdata3.to_excel(excel_writer=writer, sheet_name='距离', encoding="utf-8", index=False)
writer.save()
writer.close()Results
Before processing:
Processing progress display:
Final result:
Conclusion
This guide shows how to compute haversine distances between latitude‑longitude pairs and perform nearest‑point matching across two tables using Python. Compared with MapInfo, the Python solution handles large datasets quickly, offers progress monitoring, and can be further optimized (e.g., chunk processing) for even better performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
