How to Scrape Alibaba International Phone Numbers with Selenium and Export to Excel
This tutorial walks through using Selenium to log into Alibaba International, scrape supplier phone numbers and related details across multiple pages, save the data to CSV, download product images, and finally embed those images into an Excel workbook for easy reference.
Introduction
Alibaba International hides supplier phone numbers behind login pages; the author needed a way to collect these numbers and related company information into a single Excel file.
1. Launch WebDriver and log in
Configure ChromeOptions to disable images, hide the Selenium automation flag, and start a Chrome WebDriver. After opening the Alibaba login page, the script pauses for manual login and then proceeds once the user inputs 1 to break the loop.
from selenium.webdriver import ChromeOptions
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait
import re, time, csv
from lxml import etree
class Chrome_drive():
def __init__(self):
option = ChromeOptions()
option.add_experimental_option('excludeSwitches', ['enable-automation'])
option.add_experimental_option('useAutomationExtension', False)
NoImage = {"profile.managed_default_content_settings.images": 2}
option.add_experimental_option('prefs', NoImage)
self.browser = webdriver.Chrome(executable_path='./chromedriver', options=option)
self.browser.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {'source': 'Object.defineProperty(navigator,"webdriver",{get:()=>undefined})'})
self.browser.set_window_size(1200, 768)
self.wait = WebDriverWait(self.browser, 12)
def get_login(self):
url = 'https://passport.alibaba.com/icbu_login.htm'
self.browser.get(url)
k = input('输入1')
if 'Your Alibaba.com account is temporarily unavailable' in self.browser.page_source:
self.browser.close()
while k == 1:
break
self.browser.refresh()
return2. Extract page content
For each search result page, the script builds the URL, opens it in a new tab, scrolls to load lazy‑loaded images, and uses lxml.etree to parse the HTML. It extracts the company name, link to the phone‑detail page, main product, country, revenue, sales region, and product image URLs. Then it navigates to the phone‑detail page to scrape telephone, mobile phone, and address using regular expressions.
def index_page(self, page, wd):
url = f'https://www.alibaba.com/trade/search?page={page}&keyword={wd}&f1=y&indexArea=company_en&viewType=L&n=38'
self.browser.execute_script(f"window.open('{url}')")
self.browser.switch_to.window(self.browser.window_handles[-1])
self.buffer()
time.sleep(3)
self.wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '#J-items-content')))
html = self.browser.page_source
self.get_products(wd, html)
self.close_window()
def get_products(self, wd, html_text):
e = etree.HTML(html_text)
items = e.xpath('//div[@id="J-items-content"]//div[@class="item-main"]')
for li in items:
company_name = ''.join(li.xpath('./div[@class="top"]//h2[@class="title ellipsis"]/a/text()'))
company_phone_page = ''.join(li.xpath('./div[@class="top"]//a[@class="cd"]/@href'))
product = ''.join(li.xpath('.//div[@class="value ellipsis ph"]/text()'))
Attrs = li.xpath('.//div[@class="attrs"]//span[@class="ellipsis search"]/text()')
# extract country, revenue, sales address from Attrs ...
product_img_list = li.xpath('.//div[@class="product"]/div/a/img/@src')
product_img = ','.join(product_img_list) if product_img_list else ''
self.browser.get(company_phone_page)
try:
if 'Your Alibaba.com account is temporarily unavailable' in self.browser.page_source:
self.browser.close()
self.browser.find_element_by_xpath('//div[@class="sens-mask"]/a').click()
phone = ''.join(re.findall('Telephone:</th><td>(.*?)</td>', self.browser.page_source, re.S))
mobilePhone = ''.join(re.findall('Mobile Phone:</th><td>(.*?)</td>', self.browser.page_source, re.S))
address = ''.join(re.findall('Address:</th><td>(.*?)</td>', self.browser.page_source, re.S))
except:
print('该公司没有电话')
all_down = [wd, company_name, company_phone_page, product, counctry, phone, mobilePhone, address, total_evenue, sell_adress, product_img]
save_csv(all_down)3. Download product images
After the CSV is generated, the script reads the product_img column, splits multiple URLs, prefixes each with https:, and saves the images to a local downloads_picture folder using requests.
def open_requests(img, img_name):
img_url = 'https:' + img
res = requests.get(img_url)
with open(f"./downloads_picture/{img_name}", 'wb') as fn:
fn.write(res.content)
df1 = pd.read_csv('./alibaba_com_img.csv')
for imgs in df1["product_img"]:
imgList = str(imgs).split(',')
if len(imgList) > 0:
img = imgList[0]
img_name = img[24:]
open_requests(img, img_name)4. Insert images into Excel
The CSV is imported into Excel (UTF‑8, text format to preserve phone numbers). Using xlwings and PIL, the script opens the workbook, reads the image file names, resizes each picture proportionally, and inserts it into the corresponding cell.
from PIL import Image
import os, xlwings as xw
path = 'alibaba_com.xlsx'
app = xw.App(visible=True, add_book=False)
wb = app.books.open(path)
sht = wb.sheets['Sheet1']
img_list = sht.range('L2').expand('down').value
def write_pic(cell, img_name):
file_path = f'./downloads_picture/{img_name}'
img = Image.open(file_path).convert('RGB')
w, h = img.size
x_s = 70
y_s = h * x_s / w
sht.pictures.add(file_path, left=sht.range(cell).left, top=sht.range(cell).top, width=x_s, height=y_s)
for index, imgs in enumerate(img_list):
cell = 'C' + str(index + 2)
imgsList = str(imgs).split(',')
if len(imgsList) > 0:
img_name = imgsList[0][24:]
try:
write_pic(cell, img_name)
except:
print('没有找到这个img_name的图片', img_name)
wb.save()
wb.close()
app.quit()Result
The final Excel file contains each supplier’s name, phone numbers, product details, and the corresponding product image embedded in the sheet.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
