English 中文(简体)
Parse large JSON file in Nodejs and handle each object independently
原标题:

I need to read a large JSON file (around 630MB) in Nodejs and insert each object to MongoDB.

I ve read the answer here:Parse large JSON file in Nodejs.

However, answers there are handling the JSON file line by line, instead of handling it object by object. Thus, I still don t know how to get an object from this file and operate it.

I have about 100,000 this kind of objects in my JSON file.

Data Format:

[
  {
    "id": "0000000",
    "name": "Donna Blak",
    "livingSuburb": "Tingalpa",
    "age": 53,
    "nearestHospital": "Royal Children s Hospital",
    "treatments": {
        "19890803": {
            "medicine": "Stomach flu B",
            "disease": "Stomach flu"
        },
        "19740112": {
            "medicine": "Progeria C",
            "disease": "Progeria"
        },
        "19830206": {
            "medicine": "Poliomyelitis B",
            "disease": "Poliomyelitis"
        }
    },
    "class": "patient"
  },
 ...
]

Cheers,

Alex

最佳回答

There is a nice module named stream-json that does exactly what you want.

It can parse JSON files far exceeding available memory.

and

StreamArray handles a frequent use case: a huge array of relatively small objects similar to Django-produced database dumps. It streams array components individually taking care of assembling them automatically.

Here is a modern example parsing an arbitrarily large JSON array. It uses import statements and for await...of, which supports returning or breaking from:

import parser from  stream-json ;
import StreamArray from  stream-json/streamers/StreamArray ;
import Chain from  stream-chain ;

async function importJson(filePath) {
  const pipeline = new Chain([
    fs.createReadStream(filePath),
    parser(),
    new StreamArray(),
  ]);

  for await (const { value } of pipeline) {
    await doSomethingWith(value);  // the JSON array element
  }
}

2018

Here is a very basic example using stream events:

const StreamArray = require( stream-json/streamers/StreamArray );
const path = require( path );
const fs = require( fs );

const jsonStream = StreamArray.withParser();

//You ll get json objects here
//Key is an array-index here
jsonStream.on( data , ({key, value}) => {
    console.log(key, value);
});

jsonStream.on( end , () => {
    console.log( All done );
});

const filename = path.join(__dirname,  sample.json );
fs.createReadStream(filename).pipe(jsonStream.input);

If you d like to do something more complex e.g. process one object after another sequentially (keeping the order) and apply some async operations for each of them then you could do the custom Writeable stream like this:

const StreamArray = require( stream-json/streamers/StreamArray );
const {Writable} = require( stream );
const path = require( path );
const fs = require( fs );

const fileStream = fs.createReadStream(path.join(__dirname,  sample.json ));
const jsonStream = StreamArray.withParser();

const processingStream = new Writable({
    write({key, value}, encoding, callback) {
        //Save to mongo or do any other async actions

        setTimeout(() => {
            console.log(value);
            //Next record will be read only current one is fully processed
            callback();
        }, 1000);
    },
    //Don t skip this, as we need to operate with objects, not buffers
    objectMode: true
});

//Pipe the streams as follows
fileStream.pipe(jsonStream.input);
jsonStream.pipe(processingStream);

//So we re waiting for the  finish  event when everything is done.
processingStream.on( finish , () => console.log( All done ));

Please note: The examples above are tested for stream-json@1.1.3 . For some previous versions (presumably proior to 1.0.0) you might have to:

const StreamArray = require( stream-json/utils/StreamArray );

and then

const jsonStream = StreamArray.make();

问题回答

暂无回答




相关问题
selected text in iframe

How to get a selected text inside a iframe. I my page i m having a iframe which is editable true. So how can i get the selected text in that iframe.

How to fire event handlers on the link using javascript

I would like to click a link in my page using javascript. I would like to Fire event handlers on the link without navigating. How can this be done? This has to work both in firefox and Internet ...

How to Add script codes before the </body> tag ASP.NET

Heres the problem, In Masterpage, the google analytics code were pasted before the end of body tag. In ASPX page, I need to generate a script (google addItem tracker) using codebehind ClientScript ...

Clipboard access using Javascript - sans Flash?

Is there a reliable way to access the client machine s clipboard using Javascript? I continue to run into permissions issues when attempting to do this. How does Google Docs do this? Do they use ...

javascript debugging question

I have a large javascript which I didn t write but I need to use it and I m slowely going trough it trying to figure out what does it do and how, I m using alert to print out what it does but now I ...

Parsing date like twitter

I ve made a little forum and I want parse the date on newest posts like twitter, you know "posted 40 minutes ago ","posted 1 hour ago"... What s the best way ? Thanx.

热门标签