Essential Python Libraries for Web Crawling and Web Development
This guide outlines the core steps of a web request, then presents a comprehensive catalog of Python libraries for crawling, parsing, text processing, automation, concurrency, cloud execution, and popular web frameworks, helping developers choose the right tools for backend projects.
When you enter a URL in a browser, four main steps occur: domain name resolution, sending a request to the server, receiving the page content, and parsing it in the browser.
General Libraries
urllib (stdlib)
requests
grab (based on pycurl)
pycurl
urllib3
httplib2
RoboBrowser
MechanicalSoup
mechanize
socket (stdlib)
Unirest for Python
hyper (HTTP/2 client)
PySocks
Web Crawling Frameworks
grab – full‑featured crawler (pycurl/multicur)
scrapy – based on Twisted (no Python 3 support)
pyspider – powerful crawler system
cola – distributed crawler framework
Other: portia (visual Scrapy), restkit, demiurge
HTML/XML Parsers
lxml – fast C‑based parser with XPath
cssselect
pyquery
BeautifulSoup – pure Python
html5lib – WHATWG‑compliant
feedparser
MarkupSafe
xmltodict
xhtml2pdf
untangle
Cleaning
Bleach (requires html5lib)
sanitize
Text Processing
difflib (stdlib)
Levenshtein
fuzzywuzzy
esmre
ftfy
Natural Language Processing
NLTK
Pattern
TextBlob
jieba
SnowNLP
loso
Browser Automation & Simulation
selenium
Ghost.py (PyQt WebKit)
Spynner (PyQt WebKit)
Splinter
Multiprocessing
threading (stdlib)
multiprocessing (stdlib)
celery
concurrent‑futures
Asynchronous
asyncio (Python 3.4+)
Twisted
Tornado
pulsar
diesel
gevent
eventlet
Tomorrow
Queues
celery
huey
mrq
RQ
simpleq (Amazon SQS)
python‑gearman
Cloud Computing
picloud
dominoup.com
Web Content Extraction
newspaper
html2text
python‑goose
lassie
WebSocket
Crossbar
AutobahnPython
WebSocket‑for‑Python
DNS Resolution
dnsyo
pycares
Computer Vision
OpenCV
SimpleCV
mahotas
Popular Python Web Frameworks
Django – full‑stack, supports many databases
Flask – lightweight microframework, extensible via extensions
Web2py – rapid development, browser‑based IDE
Tornado – web server and microframework (related to web.py)
CherryPy – minimalistic, plugin‑friendly framework
Framework Selection Pitfalls
Many developers mistakenly look for "the best" framework; instead, choose the one that fits your team and project. Performance concerns are often overstated for small sites; focus on productivity and suitability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
