Fundamentals 11 min read

12 Essential Numpy and Pandas Functions for Efficient Data Analysis

This article introduces twelve high‑performance Numpy and Pandas functions—six for each library—explaining their purpose, typical use cases, and providing ready‑to‑run code examples to help data analysts work faster and more efficiently.

Python Programming Learning Circle

Jun 12, 2021

12 Essential Numpy and Pandas Functions for Efficient Data Analysis

Numpy is a Python extension for scientific computing that provides powerful n‑dimensional arrays, linear algebra, Fourier transforms, random number generation, and seamless integration with C/C++ and Fortran. Beyond these core features, Numpy can serve as a fast, multi‑dimensional container for generic data, enabling rapid interaction with various databases.

Six efficient Numpy functions

1. argpartition() – Finds the indices of the N largest values and optionally sorts them.

x = np.array([12, 10, 12, 0, 6, 8, 9, 1, 16, 4, 6, 0])
index_val = np.argpartition(x, -4)[-4:]
index_val
# array([1, 8, 2, 0], dtype=int64)
np.sort(x[index_val])
# array([10, 12, 12, 16])

2. allclose() – Checks whether two arrays are element‑wise equal within a tolerance, returning a Boolean.

array1 = np.array([0.12,0.17,0.24,0.29])
array2 = np.array([0.13,0.19,0.26,0.31])
np.allclose(array1, array2, 0.1)  # False
np.allclose(array1, array2, 0.2)  # True

3. clip() – Limits the values in an array to a specified interval.

x = np.array([3, 17, 14, 23, 2, 2, 6, 8, 1, 2, 16, 0])
np.clip(x, 2, 5)
# array([3, 5, 5, 5, 2, 2, 5, 5, 2, 2, 5, 2])

4. extract() – Retrieves elements that satisfy a given condition, supporting logical operators.

# Random integers
array = np.random.randint(20, size=12)
cond = np.mod(array, 2) == 1
np.extract(cond, array)
# np.extract(((array < 3) | (array > 15)), array)

5. where() – Returns indices or values where a condition holds, similar to SQL's WHERE clause.

y = np.array([1,5,6,8,1,7,3,6,9])
np.where(y > 5)
# array([2, 3, 5, 7, 8])
np.where(y > 5, "Hit", "Miss")

6. percentile() – Computes the n‑th percentile of array elements along a specified axis.

a = np.array([1,5,6,8,1,7,3,6,9])
print("50th Percentile of a, axis = 0 :", np.percentile(a, 50, axis=0))
# 6.0
b = np.array([[10, 7, 4], [3, 2, 1]])
print("30th Percentile of b, axis = 0 :", np.percentile(b, 30, axis=0))
# [5.1 3.5 1.9]

These six Numpy functions can dramatically simplify common data‑processing tasks. The article then moves on to six equally useful Pandas functions.

Pandas is a Python library that provides fast, flexible, and expressive data structures for handling structured (tabular, multidimensional, heterogeneous) and time‑series data.

Six efficient Pandas functions

1. read_csv(nrows=n) – Reads only the first *n* rows of a CSV file, saving memory and time for large datasets.

import io, requests, pandas as pd
url = "https://raw.github.com/vincentarelbundock/Rdatasets/master/csv/datasets/AirPassengers.csv"
s = requests.get(url).content
# read only first 10 rows
df = pd.read_csv(io.StringIO(s.decode('utf-8')), nrows=10, index_col=0)

2. map() – Maps each value in a Series to another value using a function, dict, or another Series.

dframe = pd.DataFrame(np.random.randn(4, 3), columns=list('ABC'), index=['India','USA','China','Russia'])
changefn = lambda x: "%.2f" % x
dframe['A'].map(changefn)

3. apply() – Applies a user‑defined function along an axis of the DataFrame.

fn = lambda x: x.max() - x.min()
dframe.apply(fn)

4. isin() – Filters rows where column values belong to a specified set.

filter1 = df["value"].isin([112])
filter2 = df["time"].isin([1949.0])
df[filter1 & filter2]

5. copy() – Creates a deep copy of a Pandas object to avoid unintended side‑effects when modifying data.

data = pd.Series(['India','Pakistan','China','Mongolia'])
new = data.copy()
new[1] = 'Changed value'
print(new)
print(data)

6. select_dtypes() – Returns a subset of columns based on their data types.

framex = df.select_dtypes(include="float64")

Finally, the article mentions pivot_table() as a powerful Pandas function for creating pivot tables similar to Excel.

All code examples are available in a Jupyter Notebook on the GitHub repository

https://github.com/kunaldhariwal/12-Amazing-Pandas-NumPy-Functions

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python Pandas data-analysis

Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.