Export MongoDB Data to CSV, Excel, JSON, Parquet with mongo2file – A Complete Guide
This article introduces the mongo2file library for converting MongoDB collections into various table formats such as CSV, Excel, JSON, pickle, feather, and parquet, explains its PyArrow dependency, shows installation and quick‑start code, discusses performance bottlenecks, and provides a full reference API.
Introduction
The mongo2file library converts MongoDB databases into table files. It simplifies bulk data export, eliminating the need to write custom scripts for each format.
Dependency on PyArrow
mongo2filerelies on the PyArrow library, which is the Python implementation of the C++ Arrow project. PyArrow supports Python 3.7, 3.8, 3.9, and 3.10.
Installation
pip install mongo2fileBasic Usage
Quick Start
import os
from mongo2file import MongoEngine
M = MongoEngine(
host=os.getenv('MONGO_HOST', '127.0.0.1'),
port=int(os.getenv('MONGO_PORT', 27017)),
username=os.getenv('MONGO_USERNAME', None),
password=os.getenv('MONGO_PASSWORD', None),
database=os.getenv('MONGO_DATABASE', 'test_'),
collection=os.getenv('MONGO_COLLECTION', 'test_')
)
def to_csv():
result_ = M.to_csv()
assert "successfully" in result_
def to_excel():
result_ = M.to_excel()
assert "successfully" in result_
def to_json():
result_ = M.to_json()
assert "successfully" in result_
def to_pickle():
result_ = M.to_pickle()
assert "successfully" in result_
def to_feather():
result_ = M.to_feather()
assert "successfully" in result_
def to_parquet():
result_ = M.to_parquet()
assert "successfully" in result_
to_csv()The MongoEngine class can target a specific collection or, if no collection is specified, export all collections in the database.
Method Parameters
query: dictionary of query conditions, effective only for single‑table export. folder_path: directory where files are saved. filename: name of the exported file; defaults to collection_name + timestamp. _id: whether to include the MongoDB _id field (default False). limit: maximum number of rows to export (default -1 for no limit). is_block: enable block‑wise export for large datasets. block_size: size of each block when is_block is True. mode: Excel export mode ( sheet or xlsx) used when is_block is True. ignore_error: ignore non‑serializable data; may affect performance.
Performance Considerations
MongoDB queries are generally fast; the main bottleneck is converting large result sets into in‑memory lists before writing to files. The default xlsxwriter writer loads all data into memory, which can cause severe slowdown for tables with >1 million rows. Using block export with an appropriate block_size and a thread pool improves throughput.
Recommendations
When writing with xlsxwriter, openpyxl, xlwings, or pandas, illegal characters are filtered; avoid storing empty lists [] or empty dicts {} in MongoDB documents.
Store only meaningful data in MongoDB to prevent serialization errors during export.
Reference API
MongoEngine
MongoEngine(
host='localhost',
port=27017,
username=None,
password=None,
database='test_db',
collection='test_collection'
)to_csv(query, folder_path, filename, ...)
:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1 (no limit)
:param is_block: enable block export
:param block_size: block size when is_block is Trueto_excel(query, folder_path, filename, ...)
:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1
:param is_block: enable block export
:param block_size: block size when is_block is True
:param mode: export mode (sheet or xlsx) when is_block is True
:param ignore_error: ignore non‑serializable data, may affect performanceto_json(query, folder_path, filename, ...)
:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1
:param is_block: enable block export
:param block_size: block size when is_block is Trueto_pickle(query, folder_path, filename, ...)
:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1to_feather(query, folder_path, filename, ...)
:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1to_parquet(query, folder_path, filename, ...)
:param query: database query dict, only for single‑table export
:param folder_path: export directory
:param filename: export file name
:param _id: export _id, default False
:param limit: limit rows, default -1Conclusion
The mongo2file library provides a convenient way to convert MongoDB collections into CSV, Excel, JSON, pickle, feather, and parquet files, supporting both simple and block‑wise export for large datasets.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
