NodeJS: Exporting large MongoDB collections

Recently I had implemented a REST endpoint, using this endpoint users can initiate a background report generation task. Users can download the report once it is generated. The base collection for the report is very large. So we can’t simply load all the collections to memory and process them, due to large memory footprint.

In this post we will see how to export/generate reports on a large mongo db collection. Specifically I will create a pipeline that generates a CSV report from a mongo db collection

My solution involves using mongo db cursor . Mongo db cursor allows us to read the documents one by one instead of loading them all at once into memory. My entire report generation pipeline is based on streams.

Pipeline consists of three components

  • cursor on the collection
  • CSV record generator
  • file sync to write the report to the file system
Mongo DB report generation pipeline

Query Cursor

The cursor method of Query interface returns QueryCursor. QueryCursor implements Stream3 interface. we can directly pass the returned value to pipeline

model.find({some query}).cursor()

CSV Record Generator

to generate CSV records from JS objects, We can use csv-stringify. It has great support for streaming API.

let csvStream = stringify({
            header: true,
            columns: {
               userId: 'USER_ID',
               email:'EMAIL',
               gender: 'GENDER',
            }
        })

File Sink

Here we are using saving the report to local csv file, this can be any writable stream. We will use nodejs file system createWriteStream method to create a writable stream

fs.createWriteStream(path,{autoClose:true})

Example Code snippet

For example lets assume we have a test_db with user collection. We can run generate a report with all active using following snippet

3 Comments

Add a Comment

Your email address will not be published. Required fields are marked *