Databases 11 min read

Export MongoDB Data to CSV, Excel, JSON, Parquet with mongo2file – A Complete Guide

This article introduces the mongo2file library for converting MongoDB collections into various table formats such as CSV, Excel, JSON, pickle, feather, and parquet, explains its PyArrow dependency, shows installation and quick‑start code, discusses performance bottlenecks, and provides a full reference API.

Python Crawling & Data Mining
Python Crawling & Data Mining
Python Crawling & Data Mining
Export MongoDB Data to CSV, Excel, JSON, Parquet with mongo2file – A Complete Guide

Introduction

The mongo2file library converts MongoDB databases into table files. It simplifies bulk data export, eliminating the need to write custom scripts for each format.

Dependency on PyArrow

mongo2file

relies on the PyArrow library, which is the Python implementation of the C++ Arrow project. PyArrow supports Python 3.7, 3.8, 3.9, and 3.10.

Installation

pip install mongo2file

Basic Usage

Quick Start

import os
from mongo2file import MongoEngine

M = MongoEngine(
    host=os.getenv('MONGO_HOST', '127.0.0.1'),
    port=int(os.getenv('MONGO_PORT', 27017)),
    username=os.getenv('MONGO_USERNAME', None),
    password=os.getenv('MONGO_PASSWORD', None),
    database=os.getenv('MONGO_DATABASE', 'test_'),
    collection=os.getenv('MONGO_COLLECTION', 'test_')
)

def to_csv():
    result_ = M.to_csv()
    assert "successfully" in result_

def to_excel():
    result_ = M.to_excel()
    assert "successfully" in result_

def to_json():
    result_ = M.to_json()
    assert "successfully" in result_

def to_pickle():
    result_ = M.to_pickle()
    assert "successfully" in result_

def to_feather():
    result_ = M.to_feather()
    assert "successfully" in result_

def to_parquet():
    result_ = M.to_parquet()
    assert "successfully" in result_

to_csv()

The MongoEngine class can target a specific collection or, if no collection is specified, export all collections in the database.

Method Parameters

query

: dictionary of query conditions, effective only for single‑table export. folder_path: directory where files are saved. filename: name of the exported file; defaults to collection_name + timestamp. _id: whether to include the MongoDB _id field (default False). limit: maximum number of rows to export (default -1 for no limit). is_block: enable block‑wise export for large datasets. block_size: size of each block when is_block is True. mode: Excel export mode ( sheet or xlsx) used when is_block is True. ignore_error: ignore non‑serializable data; may affect performance.

Performance Considerations

MongoDB queries are generally fast; the main bottleneck is converting large result sets into in‑memory lists before writing to files. The default xlsxwriter writer loads all data into memory, which can cause severe slowdown for tables with >1 million rows. Using block export with an appropriate block_size and a thread pool improves throughput.

Recommendations

When writing with xlsxwriter, openpyxl, xlwings, or pandas, illegal characters are filtered; avoid storing empty lists [] or empty dicts {} in MongoDB documents.

Store only meaningful data in MongoDB to prevent serialization errors during export.

Reference API

MongoEngine

MongoEngine(
    host='localhost',
    port=27017,
    username=None,
    password=None,
    database='test_db',
    collection='test_collection'
)

to_csv(query, folder_path, filename, ...)

:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1 (no limit)
:param is_block: enable block export
:param block_size: block size when is_block is True

to_excel(query, folder_path, filename, ...)

:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1
:param is_block: enable block export
:param block_size: block size when is_block is True
:param mode: export mode (sheet or xlsx) when is_block is True
:param ignore_error: ignore non‑serializable data, may affect performance

to_json(query, folder_path, filename, ...)

:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1
:param is_block: enable block export
:param block_size: block size when is_block is True

to_pickle(query, folder_path, filename, ...)

:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1

to_feather(query, folder_path, filename, ...)

:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1

to_parquet(query, folder_path, filename, ...)

:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1

Conclusion

The mongo2file library provides a convenient way to convert MongoDB collections into CSV, Excel, JSON, pickle, feather, and parquet files, supporting both simple and block‑wise export for large datasets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MongoDBExcelParquetData Export
Python Crawling & Data Mining
Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.