Databases 11 min read

Export MongoDB Data to CSV, Excel, JSON and More with mongo2file

This article introduces the mongo2file Python library that converts MongoDB collections into various table formats such as CSV, Excel, JSON, Pickle, Feather, and Parquet, explains its PyArrow dependency, shows installation and usage examples, discusses performance bottlenecks, and provides API reference details.

Python Crawling & Data Mining

Oct 26, 2024

Export MongoDB Data to CSV, Excel, JSON and More with mongo2file

Introduction

Hello, I am Wu. I am sharing a library I developed called mongo2file that converts data from a MongoDB database into table files. It can export to CSV, Excel, JSON, as well as binary formats like Pickle, Feather, and Parquet.

Dependency on PyArrow

mongo2file

relies on the PyArrow library, which is the Python implementation of the C++ Arrow project. PyArrow currently supports Python 3.7, 3.8, 3.9, and 3.10. On Windows you may need to install Visual Studio 2015 for proper import.

Warning: PyArrow currently only supports the win64 (64‑bit) platform.

Supported Export Formats

Besides the common csv, excel, and json formats, mongo2file also supports exporting binary compressed files such as pickle, feather, and parquet, which reduce read time by serializing data.

Installation

pip install mongo2file

Basic Usage – Quick Start

import os
from mongo2file import MongoEngine

M = MongoEngine(
    host=os.getenv('MONGO_HOST', '127.0.0.1'),
    port=int(os.getenv('MONGO_PORT', 27017)),
    username=os.getenv('MONGO_USERNAME', None),
    password=os.getenv('MONGO_PASSWORD', None),
    database=os.getenv('MONGO_DATABASE', 'test_'),
    collection=os.getenv('MONGO_COLLECTION', 'test_')
)

def to_csv():
    result_ = M.to_csv()
    assert "successfully" in result_

def to_excel():
    result_ = M.to_excel()
    assert "successfully" in result_

def to_json():
    result_ = M.to_json()
    assert "successfully" in result_

def to_pickle():
    result_ = M.to_pickle()
    assert "successfully" in result_

def to_feather():
    result_ = M.to_feather()
    assert "successfully" in result_

def to_parquet():
    result_ = M.to_parquet()
    assert "successfully" in result_

to_csv()

When the MongoEngine instance specifies a collection name, the export methods operate on that collection. If only a database name is provided, all collections in the database are exported.

Performance Bottlenecks and Improvements

MongoDB queries are generally fast; the main bottleneck is converting the retrieved data into large in‑memory lists before writing to files. For very large datasets (e.g., 1 million rows), single‑threaded export can become slow.

Using xlsxwriter writes the entire Excel file eagerly, loading all data into memory.

Large‑scale inserts depend on the host machine’s resources.

Improvements made:

When data volume is huge, split the collection into blocks and export multiple files.

Increase the thread‑pool’s maximum concurrency and adjust block_size for optimal performance.

Data Conversion Recommendations

Engines such as xlsxwriter, openpyxl, xlwings, and pandas filter illegal characters before writing; empty lists [] or empty dicts {} can cause “illegal type” errors.

Avoid storing meaningless empty objects in MongoDB documents; keep the data clean to prevent export failures.

Reference API

MongoEngine

MongoEngine(
    host='localhost',
    port=27017,
    username=None,
    password=None,
    database='test_db',
    collection='test_collection_200000'
)

to_csv(query, folder_path, filename, ...)

:param query: database query condition (dict), applies to single‑collection export
:param folder_path: output directory
:param filename: output file name (default: collection name + timestamp)
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to export
:param is_block: whether to export in blocks
:param block_size: size of each block when is_block is True

to_excel(query, folder_path, filename, ...)

:param query: database query condition (dict)
:param folder_path: output directory
:param filename: output file name
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to export
:param is_block: whether to export in blocks
:param block_size: block size when is_block is True
:param mode: export mode, either "sheet" or "xlsx" (effective when is_block is True)
:param ignore_error: ignore non‑serializable types (may affect performance)

to_json(query, folder_path, filename, ...)

:param query: database query condition (dict)
:param folder_path: output directory
:param filename: output file name
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to export
:param is_block: whether to export in blocks
:param block_size: block size when is_block is True

to_pickle(query, folder_path, filename, ...)

:param query: database query condition (dict)
:param folder_path: output directory
:param filename: output file name
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to export

to_feather(query, folder_path, filename, ...)

:param query: database query condition (dict)
:param folder_path: output directory
:param filename: output file name
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to export

to_parquet(query, folder_path, filename, ...)

:param query: database query condition (dict)
:param folder_path: output directory
:param filename: output file name
:param _id: whether to export the _id field (default False)
:param limit: maximum number of records to export

Conclusion

The mongo2file library provides a convenient way to convert MongoDB data into various table formats, supporting CSV, Excel, JSON, as well as binary formats like Pickle, Feather, and Parquet. Users are encouraged to try it and reach out for any issues.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Python CSV MongoDB Excel Data Export pyarrow

Written by

Python Crawling & Data Mining

Life's short, I code in Python. This channel shares Python web crawling, data mining, analysis, processing, visualization, automated testing, DevOps, big data, AI, cloud computing, machine learning tools, resources, news, technical articles, tutorial videos and learning materials. Join us!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.