Node.js Streams and File Handling: Process Large Files Without Running Out of Memory
One of the most important performance concepts in Node.js is streams. When you need to process a 2GB CSV file, read a large HTTP response, or transform data on the fly, streams let you handle it with constant memory usage instead of loading everything into RAM at once.
The Problem with Non-Streaming Approaches
javascriptimport fs from "fs/promises"; // Bad for large files -- loads entire file into memory const data = await fs.readFile("huge-file.csv", "utf8"); const lines = data.split("\n"); // If file is 2GB, you need 2GB+ of RAM
For small files this is fine. For large files it crashes your process or causes severe memory pressure.
What Are Streams?
Streams process data in chunks as it arrives β you start processing before the entire input is available. Node.js has four stream types:
- Readable β source of data (file read, HTTP request body)
- Writable β destination for data (file write, HTTP response)
- Duplex β both readable and writable (TCP socket)
- Transform β duplex that modifies data (gzip, encryption)
Reading Files with Streams
javascriptimport fs from "fs"; import readline from "readline"; // Process a large CSV line by line -- constant memory usage async function processLargeCSV(filePath) { const fileStream = fs.createReadStream(filePath, { encoding: "utf8" }); const rl = readline.createInterface({ input: fileStream, crlfDelay: Infinity, }); let lineCount = 0; let headerRow = null; for await (const line of rl) { if (lineCount === 0) { headerRow = line.split(","); } else { const values = line.split(","); const record = Object.fromEntries( headerRow.map((col, i) => [col.trim(), values[i]?.trim()]) ); await processRecord(record); } lineCount++; } console.log(`Processed ${lineCount - 1} records`); }
This processes a 10GB file with the same ~50MB memory footprint as a 10KB file.
Piping Streams
The .pipe() method connects a readable stream to a writable stream β data flows through automatically:
javascriptimport fs from "fs"; import zlib from "zlib"; // Compress a large file -- no intermediate buffering function compressFile(inputPath, outputPath) { return new Promise((resolve, reject) => { const readStream = fs.createReadStream(inputPath); const gzip = zlib.createGzip(); const writeStream = fs.createWriteStream(outputPath); readStream .pipe(gzip) // compress .pipe(writeStream) // write to disk .on("finish", resolve) .on("error", reject); }); } await compressFile("large-log.txt", "large-log.txt.gz");
stream.pipeline (preferred over .pipe)
.pipe() does not propagate errors well. Use stream.pipeline for production code:
javascriptimport { pipeline } from "stream/promises"; import fs from "fs"; import zlib from "zlib"; import crypto from "crypto"; // Compress and encrypt in one pipeline await pipeline( fs.createReadStream("data.csv"), zlib.createGzip(), crypto.createCipheriv("aes-256-gcm", key, iv), fs.createWriteStream("data.csv.gz.enc") ); // Errors from any stage are propagated and cleanup happens automatically
Transform Streams
Transform streams modify data as it passes through:
javascriptimport { Transform } from "stream"; // Custom transform: convert CSV rows to JSON objects class CSVToJSON extends Transform { constructor(options = {}) { super({ ...options, objectMode: true }); this.headers = null; this.buffer = ""; } _transform(chunk, encoding, callback) { this.buffer += chunk.toString(); const lines = this.buffer.split("\n"); this.buffer = lines.pop(); // keep incomplete last line for (const line of lines) { if (!line.trim()) continue; if (!this.headers) { this.headers = line.split(",").map(h => h.trim()); } else { const values = line.split(","); const obj = Object.fromEntries( this.headers.map((h, i) => [h, values[i]?.trim() ?? ""]) ); this.push(obj); } } callback(); } _flush(callback) { if (this.buffer.trim() && this.headers) { const values = this.buffer.split(","); const obj = Object.fromEntries( this.headers.map((h, i) => [h, values[i]?.trim() ?? ""]) ); this.push(obj); } callback(); } }
Writable Streams
javascriptimport { Writable } from "stream"; class DatabaseWriter extends Writable { constructor(db, options = {}) { super({ ...options, objectMode: true }); this.db = db; this.batch = []; this.batchSize = 100; } async _write(record, encoding, callback) { this.batch.push(record); if (this.batch.length >= this.batchSize) { try { await this.db.batchInsert("records", this.batch); this.batch = []; callback(); } catch (err) { callback(err); } } else { callback(); } } async _final(callback) { if (this.batch.length > 0) { try { await this.db.batchInsert("records", this.batch); callback(); } catch (err) { callback(err); } } else { callback(); } } } // Use the pipeline await pipeline( fs.createReadStream("data.csv"), new CSVToJSON(), new DatabaseWriter(db) );
Backpressure
Backpressure happens when a writable stream cannot keep up with the readable stream. Without handling it, data accumulates in memory. Node.js streams handle this automatically when you use .pipe() or pipeline() β the readable pauses when the writable buffer is full.
If manually controlling flow:
javascriptconst readable = fs.createReadStream("large.txt"); const writable = fs.createWriteStream("output.txt"); readable.on("data", (chunk) => { const canContinue = writable.write(chunk); if (!canContinue) { readable.pause(); // slow down the source } }); writable.on("drain", () => { readable.resume(); // writable caught up, resume reading });
Streaming HTTP Responses
Streams are natural for HTTP β start sending the response before you have all the data:
javascriptimport http from "http"; import fs from "fs"; import zlib from "zlib"; const server = http.createServer(async (req, res) => { if (req.url === "/download") { const stat = await fs.promises.stat("large-report.csv"); res.writeHead(200, { "Content-Type": "text/csv", "Content-Encoding": "gzip", "Content-Disposition": "attachment; filename=report.csv", }); await pipeline( fs.createReadStream("large-report.csv"), zlib.createGzip(), res ); } });
The client starts receiving data immediately β no waiting for the full file to be read.
Async Iteration over Streams
Modern Node.js supports for await...of on readable streams:
javascriptimport fs from "fs"; async function countLines(filePath) { const stream = fs.createReadStream(filePath, { encoding: "utf8" }); let count = 0; let remainder = ""; for await (const chunk of stream) { const text = remainder + chunk; const lines = text.split("\n"); remainder = lines.pop(); count += lines.length; } if (remainder) count++; return count; }
Common Interview Questions
Q: What is the difference between fs.readFile and fs.createReadStream?
fs.readFile loads the entire file into a Buffer in memory before your callback runs. fs.createReadStream emits chunks of data as they are read from disk, keeping memory usage constant. Use readFile for small files (configuration, templates); use streams for large files or when you want to start processing before the full file is available.
Q: What is backpressure in Node.js streams?
Backpressure is the mechanism that prevents a fast producer from overwhelming a slow consumer. When a writable stream's internal buffer fills up, write() returns false, signaling the producer to pause. When the buffer drains, the drain event fires and the producer resumes. pipeline() handles this automatically.
Q: Why use stream.pipeline instead of .pipe()?
.pipe() does not forward errors from upstream to downstream and does not clean up streams on error β you can end up with resource leaks. stream.pipeline propagates errors across all stages and calls destroy() on all streams when any stage fails.
Practice Node.js on Froquiz
Node.js internals including streams, the event loop, and async patterns are tested in backend interviews. Test your Node.js knowledge on Froquiz across all difficulty levels.
Summary
- Streams process data in chunks β constant memory regardless of input size
- Four types: Readable, Writable, Duplex, Transform
- Use
stream.pipeline()instead of.pipe()β it handles errors and cleanup - Transform streams modify data as it passes through β ideal for format conversion
- Backpressure prevents memory buildup when consumers are slower than producers
for await...ofon readable streams is the cleanest modern API- Stream HTTP responses to start sending data before you have the full content