Node.js Streams and File Handling: Process Large Files Without Running Out of Memory

One of the most important performance concepts in Node.js is streams. When you need to process a 2GB CSV file, read a large HTTP response, or transform data on the fly, streams let you handle it with constant memory usage instead of loading everything into RAM at once.

The Problem with Non-Streaming Approaches

javascript
import fs from "fs/promises";

// Bad for large files -- loads entire file into memory
const data = await fs.readFile("huge-file.csv", "utf8");
const lines = data.split("\n");
// If file is 2GB, you need 2GB+ of RAM

For small files this is fine. For large files it crashes your process or causes severe memory pressure.

What Are Streams?

Streams process data in chunks as it arrives — you start processing before the entire input is available. Node.js has four stream types:

Readable — source of data (file read, HTTP request body)
Writable — destination for data (file write, HTTP response)
Duplex — both readable and writable (TCP socket)
Transform — duplex that modifies data (gzip, encryption)

Reading Files with Streams

javascript
import fs from "fs";
import readline from "readline";

// Process a large CSV line by line -- constant memory usage
async function processLargeCSV(filePath) {
  const fileStream = fs.createReadStream(filePath, { encoding: "utf8" });
  const rl = readline.createInterface({
    input: fileStream,
    crlfDelay: Infinity,
  });

  let lineCount = 0;
  let headerRow = null;

  for await (const line of rl) {
    if (lineCount === 0) {
      headerRow = line.split(",");
    } else {
      const values = line.split(",");
      const record = Object.fromEntries(
        headerRow.map((col, i) => [col.trim(), values[i]?.trim()])
      );
      await processRecord(record);
    }
    lineCount++;
  }

  console.log(`Processed ${lineCount - 1} records`);
}

This processes a 10GB file with the same ~50MB memory footprint as a 10KB file.

Piping Streams

The .pipe() method connects a readable stream to a writable stream — data flows through automatically:

javascript
import fs from "fs";
import zlib from "zlib";

// Compress a large file -- no intermediate buffering
function compressFile(inputPath, outputPath) {
  return new Promise((resolve, reject) => {
    const readStream  = fs.createReadStream(inputPath);
    const gzip        = zlib.createGzip();
    const writeStream = fs.createWriteStream(outputPath);

    readStream
      .pipe(gzip)          // compress
      .pipe(writeStream)   // write to disk
      .on("finish", resolve)
      .on("error", reject);
  });
}

await compressFile("large-log.txt", "large-log.txt.gz");

stream.pipeline (preferred over .pipe)

.pipe() does not propagate errors well. Use stream.pipeline for production code:

javascript
import { pipeline } from "stream/promises";
import fs from "fs";
import zlib from "zlib";
import crypto from "crypto";

// Compress and encrypt in one pipeline
await pipeline(
  fs.createReadStream("data.csv"),
  zlib.createGzip(),
  crypto.createCipheriv("aes-256-gcm", key, iv),
  fs.createWriteStream("data.csv.gz.enc")
);
// Errors from any stage are propagated and cleanup happens automatically

Transform Streams

Transform streams modify data as it passes through:

javascript
import { Transform } from "stream";

// Custom transform: convert CSV rows to JSON objects
class CSVToJSON extends Transform {
  constructor(options = {}) {
    super({ ...options, objectMode: true });
    this.headers = null;
    this.buffer = "";
  }

  _transform(chunk, encoding, callback) {
    this.buffer += chunk.toString();
    const lines = this.buffer.split("\n");
    this.buffer = lines.pop(); // keep incomplete last line

    for (const line of lines) {
      if (!line.trim()) continue;
      if (!this.headers) {
        this.headers = line.split(",").map(h => h.trim());
      } else {
        const values = line.split(",");
        const obj = Object.fromEntries(
          this.headers.map((h, i) => [h, values[i]?.trim() ?? ""])
        );
        this.push(obj);
      }
    }
    callback();
  }

  _flush(callback) {
    if (this.buffer.trim() && this.headers) {
      const values = this.buffer.split(",");
      const obj = Object.fromEntries(
        this.headers.map((h, i) => [h, values[i]?.trim() ?? ""])
      );
      this.push(obj);
    }
    callback();
  }
}

Writable Streams

javascript
import { Writable } from "stream";

class DatabaseWriter extends Writable {
  constructor(db, options = {}) {
    super({ ...options, objectMode: true });
    this.db = db;
    this.batch = [];
    this.batchSize = 100;
  }

  async _write(record, encoding, callback) {
    this.batch.push(record);
    if (this.batch.length >= this.batchSize) {
      try {
        await this.db.batchInsert("records", this.batch);
        this.batch = [];
        callback();
      } catch (err) {
        callback(err);
      }
    } else {
      callback();
    }
  }

  async _final(callback) {
    if (this.batch.length > 0) {
      try {
        await this.db.batchInsert("records", this.batch);
        callback();
      } catch (err) {
        callback(err);
      }
    } else {
      callback();
    }
  }
}

// Use the pipeline
await pipeline(
  fs.createReadStream("data.csv"),
  new CSVToJSON(),
  new DatabaseWriter(db)
);

Backpressure

Backpressure happens when a writable stream cannot keep up with the readable stream. Without handling it, data accumulates in memory. Node.js streams handle this automatically when you use .pipe() or pipeline() — the readable pauses when the writable buffer is full.

If manually controlling flow:

javascript
const readable = fs.createReadStream("large.txt");
const writable = fs.createWriteStream("output.txt");

readable.on("data", (chunk) => {
  const canContinue = writable.write(chunk);
  if (!canContinue) {
    readable.pause(); // slow down the source
  }
});

writable.on("drain", () => {
  readable.resume(); // writable caught up, resume reading
});

Streaming HTTP Responses

Streams are natural for HTTP — start sending the response before you have all the data:

javascript
import http from "http";
import fs from "fs";
import zlib from "zlib";

const server = http.createServer(async (req, res) => {
  if (req.url === "/download") {
    const stat = await fs.promises.stat("large-report.csv");

    res.writeHead(200, {
      "Content-Type": "text/csv",
      "Content-Encoding": "gzip",
      "Content-Disposition": "attachment; filename=report.csv",
    });

    await pipeline(
      fs.createReadStream("large-report.csv"),
      zlib.createGzip(),
      res
    );
  }
});

The client starts receiving data immediately — no waiting for the full file to be read.

Async Iteration over Streams

Modern Node.js supports for await...of on readable streams:

javascript
import fs from "fs";

async function countLines(filePath) {
  const stream = fs.createReadStream(filePath, { encoding: "utf8" });
  let count = 0;
  let remainder = "";

  for await (const chunk of stream) {
    const text = remainder + chunk;
    const lines = text.split("\n");
    remainder = lines.pop();
    count += lines.length;
  }
  if (remainder) count++;
  return count;
}

Common Interview Questions

Q: What is the difference between fs.readFile and fs.createReadStream?

fs.readFile loads the entire file into a Buffer in memory before your callback runs. fs.createReadStream emits chunks of data as they are read from disk, keeping memory usage constant. Use readFile for small files (configuration, templates); use streams for large files or when you want to start processing before the full file is available.

Q: What is backpressure in Node.js streams?

Backpressure is the mechanism that prevents a fast producer from overwhelming a slow consumer. When a writable stream's internal buffer fills up, write() returns false, signaling the producer to pause. When the buffer drains, the drain event fires and the producer resumes. pipeline() handles this automatically.

Q: Why use stream.pipeline instead of .pipe()?

.pipe() does not forward errors from upstream to downstream and does not clean up streams on error — you can end up with resource leaks. stream.pipeline propagates errors across all stages and calls destroy() on all streams when any stage fails.

Practice Node.js on Froquiz

Node.js internals including streams, the event loop, and async patterns are tested in backend interviews. Test your Node.js knowledge on Froquiz across all difficulty levels.

Summary

Streams process data in chunks — constant memory regardless of input size
Four types: Readable, Writable, Duplex, Transform
Use stream.pipeline() instead of .pipe() — it handles errors and cleanup
Transform streams modify data as it passes through — ideal for format conversion
Backpressure prevents memory buildup when consumers are slower than producers
for await...of on readable streams is the cleanest modern API
Stream HTTP responses to start sending data before you have the full content