Artificial Intelligence 9 min read

Explore 10 Lesser-Known Python Libraries for Data Science & AI

This article introduces a curated selection of lesser‑known Python packages—such as wget, pendulum, imbalanced‑learn, FlashText, fuzzywuzzy, PyFlux, ipyvolume, Dash, and Gym—detailing their installation commands, core functionalities, and code examples to help data scientists expand their toolkit beyond the usual pandas, scikit‑learn, and matplotlib.

MaGe Linux Operations

Aug 20, 2020

Explore 10 Lesser-Known Python Libraries for Data Science & AI

Python is a great language and one of the fastest‑growing programming languages in the world. Its extensive ecosystem of third‑party libraries makes it suitable for both beginners and advanced users across many domains, especially data science.

This article reviews several Python libraries useful for data‑science tasks that are less commonly mentioned than pandas, scikit‑learn, or matplotlib.

Wget

Downloading data from the web is a common task for data scientists. Wget is a free utility that can download files non‑interactively over HTTP, HTTPS, and FTP, even behind proxies. $ pip install wget Example:

import wget
url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
filename = wget.download(url)
print(filename)  # 'razorback.mp3'

Pendulum

Pendulum simplifies date‑time manipulation in Python, offering an easy alternative to the native datetime classes. $ pip install pendulum Example:

import pendulum

dt_toronto = pendulum.datetime(2012, 1, 1, tz='America/Toronto')
dt_vancouver = pendulum.datetime(2012, 1, 1, tz='America/Vancouver')
print(dt_vancouver.diff(dt_toronto).in_hours())  # 3

imbalanced-learn

Real‑world datasets are often imbalanced, which hurts model performance. imbalanced‑learn, compatible with scikit‑learn, provides techniques to address this issue.

pip install -U imbalanced-learn
# or
conda install -c conda-forge imbalanced-learn

See the documentation for usage examples.

FlashText

FlashText offers a fast alternative to regular expressions for keyword extraction and replacement, with execution time independent of the number of search terms. $ pip install flashtext Keyword extraction example:

from flashtext import KeywordProcessor
keyword_processor = KeywordProcessor()
keyword_processor.add_keyword('Big Apple', 'New York')
keyword_processor.add_keyword('Bay Area')
keywords_found = keyword_processor.extract_keywords('I love Big Apple and Bay Area.')
print(keywords_found)  # ['New York', 'Bay Area']

Keyword replacement example:

keyword_processor.add_keyword('New Delhi', 'NCR region')
new_sentence = keyword_processor.replace_keywords('I love Big Apple and new delhi.')
print(new_sentence)  # 'I love New York and NCR region.'

fuzzywuzzy

fuzzywuzzy provides simple functions for measuring string similarity and performing fuzzy matching. $ pip install fuzzywuzzy Example:

from fuzzywuzzy import fuzz
print(fuzz.ratio('this is a test', 'this is a test!'))      # 97
print(fuzz.partial_ratio('this is a test', 'this is a test!'))  # 100

PyFlux

PyFlux is an open‑source Python library for time‑series analysis, offering modern models such as ARIMA, GARCH, and VAR with a probabilistic approach. pip install pyflux Refer to the official documentation for detailed usage.

Ipyvolume

Ipyvolume enables 3‑D visualisation of volumes and scatter plots directly in Jupyter notebooks, similar to how matplotlib handles 2‑D images.

pip install ipyvolume

conda install -c conda-forge ipyvolume

Animation

Volume rendering

Dash

Dash is a Python framework for building interactive web applications, built on Flask, Plotly.js, and React.js, allowing developers to create data‑visualisation dashboards without writing JavaScript.

pip install dash==0.29.0      # core dash backend
pip install dash-html-components==0.13.2
pip install dash-core-components==0.36.0
pip install dash-table==3.1.3

Example: a dropdown‑controlled chart that loads data from Google Finance into a pandas DataFrame.

Gym

OpenAI Gym provides a collection of environments for developing and comparing reinforcement‑learning algorithms, compatible with libraries such as TensorFlow and Theano. pip install gym Example runs the CartPole-v0 environment for 1000 timesteps, rendering each step.

Summary

The libraries listed above are hand‑picked, lesser‑known tools for data‑science workflows, complementing the more common packages like NumPy and pandas. Feel free to add any other useful libraries in the comments and try them out.

Original article: https://dwz.cn/FBj1Ktxv Translated version: https://dwz.cn/moEU7xzr

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python libraries data science NLP visualization

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.