Export MongoDB Data to CSV, Excel, JSON and More with mongo2file
This article introduces the mongo2file Python library that converts MongoDB collections into various table formats such as CSV, Excel, JSON, Pickle, Feather, and Parquet, explains its PyArrow dependency, shows installation and usage examples, discusses performance bottlenecks, and provides API reference details.
Introduction
Hello, I am Wu. I am sharing a library I developed called mongo2file that converts data from a MongoDB database into table files. It can export to CSV, Excel, JSON, as well as binary formats like Pickle, Feather, and Parquet.
Dependency on PyArrow
mongo2filerelies on the PyArrow library, which is the Python implementation of the C++ Arrow project. PyArrow currently supports Python 3.7, 3.8, 3.9, and 3.10. On Windows you may need to install Visual Studio 2015 for proper import.
Warning: PyArrow currently only supports the win64 (64‑bit) platform.
Supported Export Formats
Besides the common csv, excel, and json formats, mongo2file also supports exporting binary compressed files such as pickle, feather, and parquet, which reduce read time by serializing data.
Installation
pip install mongo2fileBasic Usage – Quick Start
import os
from mongo2file import MongoEngine
M = MongoEngine(
host=os.getenv('MONGO_HOST', '127.0.0.1'),
port=int(os.getenv('MONGO_PORT', 27017)),
username=os.getenv('MONGO_USERNAME', None),
password=os.getenv('MONGO_PASSWORD', None),
database=os.getenv('MONGO_DATABASE', 'test_'),
collection=os.getenv('MONGO_COLLECTION', 'test_')
)
def to_csv():
result_ = M.to_csv()
assert "successfully" in result_
def to_excel():
result_ = M.to_excel()
assert "successfully" in result_
def to_json():
result_ = M.to_json()
assert "successfully" in result_
def to_pickle():
result_ = M.to_pickle()
assert "successfully" in result_
def to_feather():
result_ = M.to_feather()
assert "successfully" in result_
def to_parquet():
result_ = M.to_parquet()
assert "successfully" in result_
to_csv()When the MongoEngine instance specifies a collection name, the export methods operate on that collection. If only a database name is provided, all collections in the database are exported.
Performance Bottlenecks and Improvements
MongoDB queries are generally fast; the main bottleneck is converting the retrieved data into large in‑memory lists before writing to files. For very large datasets (e.g., 1 million rows), single‑threaded export can become slow.
Using xlsxwriter writes the entire Excel file eagerly, loading all data into memory.
Large‑scale inserts depend on the host machine’s resources.
Improvements made:
When data volume is huge, split the collection into blocks and export multiple files.
Increase the thread‑pool’s maximum concurrency and adjust block_size for optimal performance.
Data Conversion Recommendations
Engines such as xlsxwriter, openpyxl, xlwings, and pandas filter illegal characters before writing; empty lists [] or empty dicts {} can cause “illegal type” errors.
Avoid storing meaningless empty objects in MongoDB documents; keep the data clean to prevent export failures.
Reference API
MongoEngine
MongoEngine(
host='localhost',
port=27017,
username=None,
password=None,
database='test_db',
collection='test_collection_200000'
)to_csv(query, folder_path, filename, ...)
:param query: database query condition (dict), applies to single‑collection export
:param folder_path: output directory
:param filename: output file name (default: collection name + timestamp)
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to export
:param is_block: whether to export in blocks
:param block_size: size of each block when is_block is Trueto_excel(query, folder_path, filename, ...)
:param query: database query condition (dict)
:param folder_path: output directory
:param filename: output file name
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to export
:param is_block: whether to export in blocks
:param block_size: block size when is_block is True
:param mode: export mode, either "sheet" or "xlsx" (effective when is_block is True)
:param ignore_error: ignore non‑serializable types (may affect performance)to_json(query, folder_path, filename, ...)
:param query: database query condition (dict)
:param folder_path: output directory
:param filename: output file name
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to export
:param is_block: whether to export in blocks
:param block_size: block size when is_block is Trueto_pickle(query, folder_path, filename, ...)
:param query: database query condition (dict)
:param folder_path: output directory
:param filename: output file name
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to exportto_feather(query, folder_path, filename, ...)
:param query: database query condition (dict)
:param folder_path: output directory
:param filename: output file name
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to exportto_parquet(query, folder_path, filename, ...)
:param query: database query condition (dict)
:param folder_path: output directory
:param filename: output file name
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to exportConclusion
The mongo2file library provides a convenient way to convert MongoDB data into various table formats, supporting CSV, Excel, JSON, as well as binary formats like Pickle, Feather, and Parquet. Users are encouraged to try it and reach out for any issues.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Python Crawling & Data Mining
Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
