Explore 10 Lesser-Known Python Libraries for Data Science & AI
This article introduces a curated selection of lesser‑known Python packages—such as wget, pendulum, imbalanced‑learn, FlashText, fuzzywuzzy, PyFlux, ipyvolume, Dash, and Gym—detailing their installation commands, core functionalities, and code examples to help data scientists expand their toolkit beyond the usual pandas, scikit‑learn, and matplotlib.
Python is a great language and one of the fastest‑growing programming languages in the world. Its extensive ecosystem of third‑party libraries makes it suitable for both beginners and advanced users across many domains, especially data science.
This article reviews several Python libraries useful for data‑science tasks that are less commonly mentioned than pandas, scikit‑learn, or matplotlib.
Wget
Downloading data from the web is a common task for data scientists. Wget is a free utility that can download files non‑interactively over HTTP, HTTPS, and FTP, even behind proxies. $ pip install wget Example:
import wget
url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
filename = wget.download(url)
print(filename) # 'razorback.mp3'Pendulum
Pendulum simplifies date‑time manipulation in Python, offering an easy alternative to the native datetime classes. $ pip install pendulum Example:
import pendulum
dt_toronto = pendulum.datetime(2012, 1, 1, tz='America/Toronto')
dt_vancouver = pendulum.datetime(2012, 1, 1, tz='America/Vancouver')
print(dt_vancouver.diff(dt_toronto).in_hours()) # 3imbalanced-learn
Real‑world datasets are often imbalanced, which hurts model performance. imbalanced‑learn, compatible with scikit‑learn, provides techniques to address this issue.
pip install -U imbalanced-learn
# or
conda install -c conda-forge imbalanced-learnSee the documentation for usage examples.
FlashText
FlashText offers a fast alternative to regular expressions for keyword extraction and replacement, with execution time independent of the number of search terms. $ pip install flashtext Keyword extraction example:
from flashtext import KeywordProcessor
keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('Big Apple', 'New York')
keyword_processor.add_keyword('Bay Area')
keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')
print(keywords_found) # ['New York', 'Bay Area']Keyword replacement example:
keyword_processor.add_keyword('New Delhi', 'NCR region')
new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.')
print(new_sentence) # 'I love New York and NCR region.'fuzzywuzzy
fuzzywuzzy provides simple functions for measuring string similarity and performing fuzzy matching. $ pip install fuzzywuzzy Example:
from fuzzywuzzy import fuzz
print(fuzz.ratio('this is a test', 'this is a test!')) # 97
print(fuzz.partial_ratio('this is a test', 'this is a test!')) # 100PyFlux
PyFlux is an open‑source Python library for time‑series analysis, offering modern models such as ARIMA, GARCH, and VAR with a probabilistic approach. pip install pyflux Refer to the official documentation for detailed usage.
Ipyvolume
Ipyvolume enables 3‑D visualisation of volumes and scatter plots directly in Jupyter notebooks, similar to how matplotlib handles 2‑D images.
pip install ipyvolume conda install -c conda-forge ipyvolumeAnimation
Volume rendering
Dash
Dash is a Python framework for building interactive web applications, built on Flask, Plotly.js, and React.js, allowing developers to create data‑visualisation dashboards without writing JavaScript.
pip install dash==0.29.0 # core dash backend
pip install dash-html-components==0.13.2
pip install dash-core-components==0.36.0
pip install dash-table==3.1.3Example: a dropdown‑controlled chart that loads data from Google Finance into a pandas DataFrame.
Gym
OpenAI Gym provides a collection of environments for developing and comparing reinforcement‑learning algorithms, compatible with libraries such as TensorFlow and Theano. pip install gym Example runs the CartPole-v0 environment for 1000 timesteps, rendering each step.
Summary
The libraries listed above are hand‑picked, lesser‑known tools for data‑science workflows, complementing the more common packages like NumPy and pandas. Feel free to add any other useful libraries in the comments and try them out.
Original article: https://dwz.cn/FBj1Ktxv Translated version: https://dwz.cn/moEU7xzr
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
