Artificial Intelligence 9 min read

Beyond Pandas: 10 Lesser‑Known Python Libraries Every Data Scientist Should Try

This article introduces a curated collection of lesser‑known Python libraries for data‑science tasks—including wget, pendulum, imbalanced‑learn, flashtext, fuzzywuzzy, pyflux, ipyvolume, dash, and gym—detailing their purpose, installation commands, and concise code examples to help practitioners expand their toolkit.

MaGe Linux Operations

Sep 13, 2020

Beyond Pandas: 10 Lesser‑Known Python Libraries Every Data Scientist Should Try

Python is a powerful, fast‑growing language widely used by developers and data‑science professionals. Its rich ecosystem of third‑party libraries makes it suitable for beginners and advanced users alike.

This article explores several Python libraries useful for data‑science tasks beyond the common pandas, scikit‑learn, and matplotlib.

Wget

Wget is a free utility for non‑interactive downloading of files over HTTP, HTTPS, and FTP. It works in the background without requiring user login, making it handy for bulk downloads of websites or images. $ pip install wget Example:

import wget
url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
filename = wget.download(url)
print(filename)  # 'razorback.mp3'

Pendulum

Pendulum simplifies date‑time manipulation in Python, offering an easy‑to‑use alternative to the native datetime classes. $ pip install pendulum Example:

import pendulum

dt_toronto = pendulum.datetime(2012, 1, 1, tz='America/Toronto')
dt_vancouver = pendulum.datetime(2012, 1, 1, tz='America/Vancouver')
print(dt_vancouver.diff(dt_toronto).in_hours())  # 3

imbalanced-learn

Imbalanced‑learn addresses class‑imbalance problems in datasets. It is compatible with scikit‑learn and part of the scikit‑learn‑contrib project.

pip install -U imbalanced-learn
# or
conda install -c conda-forge imbalanced-learn

Refer to the documentation for usage examples.

FlashText

FlashText provides fast keyword extraction and replacement, running in constant time regardless of the number of search terms, making it ideal for large‑scale NLP preprocessing. $ pip install flashtext Example (keyword extraction and replacement):

from flashtext import KeywordProcessor
keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('Big Apple', 'New York')
keyword_processor.add_keyword('Bay Area')
keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')
print(keywords_found)  # ['New York', 'Bay Area']

keyword_processor.add_keyword('New Delhi', 'NCR region')
new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.')
print(new_sentence)  # 'I love New York and NCR region.'

fuzzywuzzy

Fuzzywuzzy offers simple functions for fuzzy string matching, useful for comparing and linking records across databases. $ pip install fuzzywuzzy Example:

from fuzzywuzzy import fuzz, process
print(fuzz.ratio('this is a test', 'this is a test!'))      # 97
print(fuzz.partial_ratio('this is a test', 'this is a test!'))  # 100

PyFlux

PyFlux is an open‑source Python library for time‑series analysis, providing modern models such as ARIMA, GARCH, and VAR with a probabilistic approach. $ pip install pyflux See the official documentation for detailed usage.

ipyvolume

ipyvolume enables 3‑D visualisation of volumes and scatter plots directly in Jupyter notebooks, similar to how matplotlib handles 2‑D data.

$ pip install ipyvolume

$ conda install -c conda-forge ipyvolume

Examples include animated visualisations and 3‑D mesh rendering.

Dash

Dash is a Python framework for building interactive web applications, built on Flask, Plotly.js, and React.js, allowing data‑science analysts to create rich visualisations without writing JavaScript.

pip install dash==0.29.0
pip install dash-html-components==0.13.2
pip install dash-core-components==0.36.0
pip install dash-table==3.1.3

Example: a dropdown‑controlled chart that fetches data from Google Finance into a pandas DataFrame.

Gym

OpenAI Gym is a toolkit for developing and comparing reinforcement‑learning algorithms. It provides a standard interface for a variety of environments. $ pip install gym Example: running the CartPole-v0 environment for up to 1000 timesteps.

Summary

The libraries above are hand‑picked, lesser‑known tools for data‑science work that complement the usual suspects like NumPy and pandas. Feel free to add any other useful libraries you know in the comments and try them out.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python NLP visualization time series

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.