Useful Python Libraries for Data Science (Beyond pandas and NumPy)
This article introduces a curated list of lesser‑known Python packages for data‑science tasks—including Wget, Pendulum, imbalanced‑learn, FlashText, fuzzywuzzy, PyFlux, Ipyvolume, Dash, and Gym—providing installation commands, brief usage examples, and explanations of when each library is useful.
Python is a versatile language that has become one of the fastest‑growing programming languages worldwide, especially in data‑science and machine‑learning roles. Its rich ecosystem of third‑party libraries makes it suitable for both beginners and advanced users.
1. Wget
Wget is a free utility for non‑interactive downloading of files over HTTP, HTTPS, and FTP. It works in the background without requiring a login, making it handy for bulk downloading of websites or images.
Installation:
$ pip install wgetExample:
import wget
url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
filename = wget.download(url)
print(filename) # 'razorback.mp3'2. Pendulum
Pendulum simplifies date‑time manipulation in Python and serves as a more user‑friendly alternative to the built‑in datetime module.
Installation:
$ pip install pendulumExample:
import pendulum
dt_toronto = pendulum.datetime(2012, 1, 1, tz='America/Toronto')
dt_vancouver = pendulum.datetime(2012, 1, 1, tz='America/Vancouver')
print(dt_vancouver.diff(dt_toronto).in_hours()) # 33. imbalanced‑learn
This library addresses class‑imbalance problems and is compatible with scikit‑learn, making it useful when datasets have uneven class distributions.
Installation:
pip install -U imbalanced-learn
# or
conda install -c conda-forge imbalanced-learnRefer to the official documentation for usage examples.
4. FlashText
FlashText provides fast keyword extraction and replacement, running in constant time regardless of the number of search terms, which is advantageous for large‑scale NLP preprocessing.
Installation:
$ pip install flashtextExample (keyword extraction):
from flashtext import KeywordProcessor
keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('Big Apple', 'New York')
keyword_processor.add_keyword('Bay Area')
keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')
print(keywords_found) # ['New York', 'Bay Area']Example (keyword replacement):
keyword_processor.add_keyword('New Delhi', 'NCR region')
new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.')
print(new_sentence) # 'I love New York and NCR region.'5. fuzzywuzzy
fuzzywuzzy offers simple string‑matching utilities such as ratio and partial ratio, which are handy for approximate matching across datasets.
Installation:
$ pip install fuzzywuzzyExample:
from fuzzywuzzy import fuzz
print(fuzz.ratio('this is a test', 'this is a test!')) # 97
print(fuzz.partial_ratio('this is a test', 'this is a test!')) # 1006. PyFlux
PyFlux is an open‑source Python library for time‑series analysis, offering models such as ARIMA, GARCH, and VAR with a probabilistic approach.
Installation:
pip install pyfluxSee the official documentation for detailed usage.
7. Ipyvolume
Ipyvolume enables 3‑D volumetric and scatter‑plot visualisation directly inside Jupyter notebooks, similar to how matplotlib handles 2‑D images.
Installation (pip):
$ pip install ipyvolumeInstallation (conda):
$ conda install -c conda-forge ipyvolume8. Dash
Dash is a Python framework built on Flask, Plotly.js, and React.js for creating interactive web applications, especially data‑visualisation dashboards, without writing JavaScript.
Installation:
pip install dash==0.29.0
pip install dash-html-components==0.13.2
pip install dash-core-components==0.36.0
pip install dash-table==3.1.3Example code demonstrates a dropdown‑controlled chart that pulls data from Google Finance into a pandas DataFrame.
9. Gym
OpenAI Gym is a toolkit for developing and comparing reinforcement‑learning algorithms. It provides a collection of environments with a unified interface, compatible with libraries like TensorFlow or Theano.
Installation:
pip install gymExample: Running the CartPole-v0 environment for 1000 timesteps renders the scene at each step.
Conclusion
The libraries listed above are hand‑picked, less‑common Python packages for data‑science tasks. Feel free to add any other useful libraries in the comments and remember to try them out first.
Python Programming Learning Circle
A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.