Skip to content

uhop/stream-json

Repository files navigation

stream-json NPM version

stream-json is a micro-library of Node.js stream components for creating custom JSON processing pipelines with a minimal memory footprint. It can parse JSON files far exceeding available memory. Even individual data items (keys, strings, and numbers) can be streamed piece-wise. A SAX-inspired event-based API is included.

Components:

  • Parser — streaming JSON parser producing a SAX-like token stream.
    • Optionally packs keys, strings, and numbers (controlled separately).
    • The main module creates a parser decorated with emit().
  • Filters edit a token stream:
    • Pick — selects matching subobjects, ignoring the rest.
    • Replace — substitutes matching subobjects with a replacement.
    • Ignore — removes matching subobjects entirely.
    • Filter — filters subobjects while preserving the JSON shape.
  • Streamers assemble tokens into JavaScript objects:
    • StreamValues — streams successive JSON values (for JSON Streaming or after pick()).
    • StreamArray — streams elements of a top-level array.
    • StreamObject — streams top-level properties of an object.
  • Essentials:
    • Assembler — reconstructs JavaScript objects from tokens (EventEmitter).
    • Disassembler — converts JavaScript objects into a token stream.
    • Stringer — converts a token stream back into JSON text.
    • Emitter — re-emits tokens as named events.
  • Utilities:
    • emit() — attaches token events to any stream.
    • withParser() — creates parser + component pipelines.
    • Batch — groups items into arrays.
    • Verifier — validates JSON text, pinpoints errors.
    • FlexAssembler — Assembler with custom containers (Map, Set, etc.) at specific paths.
    • Utf8Stream — sanitizes multibyte UTF-8 input.
  • JSONL (JSON Lines / NDJSON):
    • jsonl/Parser — parses JSONL into {key, value} objects. Faster than parser({jsonStreaming: true}) + streamValues() when items fit in memory.
    • jsonl/Stringer — serializes objects to JSONL text. Faster than disassembler() + stringer().
  • JSONC (JSON with Comments):

All components are building blocks for custom data processing pipelines. They can be combined with each other and with custom code via stream-chain.

Distributed under the New BSD license.

Introduction

const {chain} = require('stream-chain');

const {parser} = require('stream-json');
const {pick} = require('stream-json/filters/pick.js');
const {ignore} = require('stream-json/filters/ignore.js');
const {streamValues} = require('stream-json/streamers/stream-values.js');

const fs = require('fs');
const zlib = require('zlib');

const pipeline = chain([
  fs.createReadStream('sample.json.gz'),
  zlib.createGunzip(),
  parser(),
  pick({filter: 'data'}),
  ignore({filter: /\b_meta\b/i}),
  streamValues(),
  data => {
    const value = data.value;
    // keep data only for the accounting department
    return value && value.department === 'accounting' ? data : null;
  }
]);

let counter = 0;
pipeline.on('data', () => ++counter);
pipeline.on('end', () => console.log(`The accounting department has ${counter} employees.`));

See the full documentation in Wiki.

Companion projects:

  • stream-csv-as-json streams huge CSV files in a format compatible with stream-json: rows as arrays of string values. If a header row is used, it can stream rows as objects with named fields.

Installation

npm install --save stream-json
# or: yarn add stream-json

Use

The library is organized as small composable components based on Node.js streams and events. The source code is compact — read it to understand how things work and to build your own components.

Bug reports, simplifications, and new generic components are welcome — open a ticket or pull request.

Release History

  • 2.1.0 new: jsonc/Verifier — validates JSONC text with exact error locations. Parser performance improvements (pre-allocated token singletons).
  • 2.0.0 major rewrite: functional API based on stream-chain 3.x, bundled TypeScript definitions. New: JSONC parser/stringer, FlexAssembler. See Migrating from 1.x to 2.x.
  • 1.9.1 fixed a race condition in the Disassembler stream implementation. Thx, Noam Okman.
  • 1.9.0 fixed a slight deviation from the JSON standard. Thx Peter Burns.
  • 1.8.0 added an option to indicate/ignore JSONL errors. Thx, AK.
  • 1.7.5 fixed a stringer bug with ASCII control symbols. Thx, Kraicheck.

The full history is in the wiki: Release history.

About

The micro-library of Node.js stream components for creating custom JSON processing pipelines with a minimal memory footprint. It can parse JSON files far exceeding available memory streaming individual primitives using a SAX-inspired API.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Sponsor this project

  •