Beyond Pandas: 10 Lesser‑Known Python Libraries Every Data Scientist Should Try
This article introduces a curated collection of lesser‑known Python libraries for data‑science tasks—including wget, pendulum, imbalanced‑learn, flashtext, fuzzywuzzy, pyflux, ipyvolume, dash, and gym—detailing their purpose, installation commands, and concise code examples to help practitioners expand their toolkit.
Python is a powerful, fast‑growing language widely used by developers and data‑science professionals. Its rich ecosystem of third‑party libraries makes it suitable for beginners and advanced users alike.
This article explores several Python libraries useful for data‑science tasks beyond the common pandas, scikit‑learn, and matplotlib.
Wget
Wget is a free utility for non‑interactive downloading of files over HTTP, HTTPS, and FTP. It works in the background without requiring user login, making it handy for bulk downloads of websites or images. $ pip install wget Example:
import wget
url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
filename = wget.download(url)
print(filename) # 'razorback.mp3'Pendulum
Pendulum simplifies date‑time manipulation in Python, offering an easy‑to‑use alternative to the native datetime classes. $ pip install pendulum Example:
import pendulum
dt_toronto = pendulum.datetime(2012, 1, 1, tz='America/Toronto')
dt_vancouver = pendulum.datetime(2012, 1, 1, tz='America/Vancouver')
print(dt_vancouver.diff(dt_toronto).in_hours()) # 3imbalanced-learn
Imbalanced‑learn addresses class‑imbalance problems in datasets. It is compatible with scikit‑learn and part of the scikit‑learn‑contrib project.
pip install -U imbalanced-learn
# or
conda install -c conda-forge imbalanced-learnRefer to the documentation for usage examples.
FlashText
FlashText provides fast keyword extraction and replacement, running in constant time regardless of the number of search terms, making it ideal for large‑scale NLP preprocessing. $ pip install flashtext Example (keyword extraction and replacement):
from flashtext import KeywordProcessor
keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('Big Apple', 'New York')
keyword_processor.add_keyword('Bay Area')
keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')
print(keywords_found) # ['New York', 'Bay Area']
keyword_processor.add_keyword('New Delhi', 'NCR region')
new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.')
print(new_sentence) # 'I love New York and NCR region.'fuzzywuzzy
Fuzzywuzzy offers simple functions for fuzzy string matching, useful for comparing and linking records across databases. $ pip install fuzzywuzzy Example:
from fuzzywuzzy import fuzz, process
print(fuzz.ratio('this is a test', 'this is a test!')) # 97
print(fuzz.partial_ratio('this is a test', 'this is a test!')) # 100PyFlux
PyFlux is an open‑source Python library for time‑series analysis, providing modern models such as ARIMA, GARCH, and VAR with a probabilistic approach. $ pip install pyflux See the official documentation for detailed usage.
ipyvolume
ipyvolume enables 3‑D visualisation of volumes and scatter plots directly in Jupyter notebooks, similar to how matplotlib handles 2‑D data.
$ pip install ipyvolume $ conda install -c conda-forge ipyvolumeExamples include animated visualisations and 3‑D mesh rendering.
Dash
Dash is a Python framework for building interactive web applications, built on Flask, Plotly.js, and React.js, allowing data‑science analysts to create rich visualisations without writing JavaScript.
pip install dash==0.29.0
pip install dash-html-components==0.13.2
pip install dash-core-components==0.36.0
pip install dash-table==3.1.3Example: a dropdown‑controlled chart that fetches data from Google Finance into a pandas DataFrame.
Gym
OpenAI Gym is a toolkit for developing and comparing reinforcement‑learning algorithms. It provides a standard interface for a variety of environments. $ pip install gym Example: running the CartPole-v0 environment for up to 1000 timesteps.
Summary
The libraries above are hand‑picked, lesser‑known tools for data‑science work that complement the usual suspects like NumPy and pandas. Feel free to add any other useful libraries you know in the comments and try them out.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
