Artificial Intelligence 10 min read

Useful Python Libraries for Data Science Beyond Pandas and NumPy

This article introduces a curated selection of lesser‑known Python libraries for data‑science tasks—including data acquisition, date‑time handling, imbalanced‑learning, fast keyword extraction, fuzzy string matching, time‑series modeling, 3‑D visualization, web‑app building, and reinforcement‑learning—providing installation commands and concise usage examples.

Python Programming Learning Circle

May 24, 2021

Useful Python Libraries for Data Science Beyond Pandas and NumPy

Python is a versatile language that has become one of the fastest‑growing programming languages worldwide, widely used by developers and data‑science practitioners. Its rich ecosystem of third‑party libraries makes it suitable for both beginners and advanced users.

Wget

Wget is a free utility for non‑interactive downloading of files over HTTP, HTTPS, and FTP. It works in the background without requiring a login, making it handy for bulk downloading of websites or images.

Installation

$ pip install wget

Example

import wget
url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
filename = wget.download(url)
print(filename)  # 'razorback.mp3'

Pendulum

Pendulum simplifies date‑time manipulation in Python, offering an easy‑to‑use alternative to the native datetime classes.

Installation

$ pip install pendulum

Example

import pendulum

dt_toronto = pendulum.datetime(2012, 1, 1, tz='America/Toronto')
dt_vancouver = pendulum.datetime(2012, 1, 1, tz='America/Vancouver')
print(dt_vancouver.diff(dt_toronto).in_hours())  # 3

imbalanced-learn

When class distributions are uneven, most classifiers suffer. The imbalanced‑learn library, compatible with scikit‑learn, provides resampling techniques to address this issue.

Installation

pip install -U imbalanced-learn
# or
conda install -c conda-forge imbalanced-learn

Example

Refer to the official documentation for usage examples.

FlashText

FlashText offers fast keyword extraction and replacement, running in constant time regardless of the number of search terms, making it ideal for large‑scale NLP preprocessing.

Installation

$ pip install flashtext

Example – Extract Keywords

from flashtext import KeywordProcessor
keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('Big Apple', 'New York')
keyword_processor.add_keyword('Bay Area')
keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')
print(keywords_found)  # ['New York', 'Bay Area']

Example – Replace Keywords

keyword_processor.add_keyword('New Delhi', 'NCR region')
new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.')
print(new_sentence)  # 'I love New York and NCR region.'

Fuzzywuzzy

Fuzzywuzzy provides simple functions for fuzzy string matching, useful for record linkage across databases.

Installation

$ pip install fuzzywuzzy

Example

from fuzzywuzzy import fuzz, process
print(fuzz.ratio('this is a test', 'this is a test!'))   # 97
print(fuzz.partial_ratio('this is a test', 'this is a test!'))   # 100

PyFlux

PyFlux is an open‑source Python library for time‑series analysis, offering models such as ARIMA, GARCH, and VAR with a probabilistic approach.

Installation

pip install pyflux

Example

See the official documentation for detailed usage.

Ipyvolume

Ipyvolume enables 3‑D volume and scatter plot visualizations inside Jupyter notebooks, similar to how matplotlib handles 2‑D images.

Installation

# Using pip
$ pip install ipyvolume
# Using conda
$ conda install -c conda-forge ipyvolume

Example

Refer to the library’s examples for animated and volumetric rendering.

Dash

Dash is a Python framework for building interactive web applications, built on Flask, Plotly.js, and React.js, allowing developers to create data‑visualization dashboards without writing JavaScript.

Installation

pip install dash==0.29.0
pip install dash-html-components==0.13.2
pip install dash-core-components==0.36.0
pip install dash-table==3.1.3

Example

An example demonstrates a dropdown‑controlled chart that fetches data from Google Finance into a pandas DataFrame; the source code is available online.

Gym

OpenAI Gym is a toolkit for developing and comparing reinforcement‑learning algorithms, providing a standardized set of environments that work with any numerical backend such as TensorFlow or Theano.

Installation

pip install gym

Example

The example runs a CartPole‑v0 environment for 1,000 timesteps, rendering each frame.

Conclusion

The libraries listed above are hand‑picked data‑science tools that go beyond the usual NumPy and pandas stack. Readers are encouraged to try them out and suggest additional useful packages in the comments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

libraries data science NLP time series

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.