Big Data 17 min read

Introduction to NumPy and Pandas: Fundamentals, Operations, and Data Handling in Python

This article provides a comprehensive overview of NumPy and pandas, covering ndarray basics, multi‑dimensional array creation, core array attributes, broadcasting, random number generation, reshaping, as well as pandas Series and DataFrame structures, data import/export, grouping, merging, and advanced data manipulation techniques for scientific and data‑analysis tasks.

Python Programming Learning Circle
Python Programming Learning Circle
Python Programming Learning Circle
Introduction to NumPy and Pandas: Fundamentals, Operations, and Data Handling in Python

NumPy is an open‑source Python extension designed for efficient operations on ndarray objects, offering superior storage and I/O performance compared to native Python lists and enabling fast numerical computation similar to MATLAB.

Arrays can be represented in one, two, or three dimensions. A one‑dimensional array is created with np.array([1, 2, 3]) , while two‑dimensional data can be built from nested lists or directly from np.array and later converted to a pandas DataFrame for tabular analysis.

Key ndarray attributes include shape , dtype , size , and ndim . Common creation functions are np.arange() , np.ones() , np.zeros() , np.full() , np.eye() , and np.ones_like() , each allowing specification of dimensions and data types.

NumPy supports broadcasting for element‑wise arithmetic, statistical functions such as np.sum(array, axis=0) , and random number generation via np.random.rand() (uniform) and np.random.randn() (standard normal). The np.random.seed() function ensures reproducibility, and arrays can be reshaped with reshape() (returns a new view) or resize() (modifies in place). Axis swapping ( swapaxes(0,1) ) and flattening ( flatten() ) are also demonstrated.

pandas introduces two primary data structures: Series (one‑dimensional, similar to a NumPy array) and DataFrame (two‑dimensional table). Series can be created from ndarrays, dictionaries, lists, or scalars, and support indexing, slicing, the get() method with default values, and vectorized operations. DataFrames are built from dictionaries, lists of dictionaries, or ndarrays, allowing custom column labels and index handling.

Time‑series functionality is provided by pd.date_range() , which generates date ranges with configurable start, end, periods, frequency, and timezone. Frequency conversion is performed with asfreq() , and indexing follows the same principles as Series.

Data import and export are illustrated using pandas methods: read_csv() , to_csv() , read_excel() , to_excel() , to_json() , to_hdf() , and to_sql() for SQLite databases. An example shows retrieving financial data with the TuShare library and saving reports, profit, and operation data to CSV, Excel, JSON, HDF, and a SQLite database.

Data manipulation techniques include selecting columns, adding new columns via df['new_col'] = pd.Series(...) , deleting columns with del df['col'] , and merging DataFrames using pd.concat() or merge() . Grouping and aggregation are performed with groupby() , followed by count() , sum() , and describe() for descriptive statistics.

Advanced pandas operations cover data fusion, joining multiple tables, and leveraging SQL‑like queries for complex analyses, emphasizing that pandas, combined with NumPy, provides a powerful toolkit for scientific computing and large‑scale data processing.

pythondata analysispandasNumPyscientific-computingDataFramesArray Operations
Python Programming Learning Circle
Written by

Python Programming Learning Circle

A global community of Chinese Python developers offering technical articles, columns, original video tutorials, and problem sets. Topics include web full‑stack development, web scraping, data analysis, natural language processing, image processing, machine learning, automated testing, DevOps automation, and big data.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.