Java Stream API: Filter, Map, Reduce, Collectors and Parallel Streams

The Java Stream API, introduced in Java 8, transformed how Java developers write data processing code. Before streams, processing a collection meant verbose imperative loops. Streams enable a declarative, pipeline-style approach that is concise, readable, and composable. Streams are tested in virtually every Java interview.

What Is a Stream?

A stream is a sequence of elements that supports sequential and parallel aggregate operations. Key properties:

Not a data structure — streams do not store data; they process it from a source
Lazy — intermediate operations are not executed until a terminal operation is called
Can only be consumed once — a stream cannot be reused after a terminal operation
Does not modify the source — streams produce new results without changing the original collection

Creating Streams

java
import java.util.stream.*;
import java.util.*;

// From a Collection
List<String> names = List.of("Alice", "Bob", "Carol", "Dave");
Stream<String> stream = names.stream();

// From an array
int[] numbers = {1, 2, 3, 4, 5};
IntStream intStream = Arrays.stream(numbers);

// Stream.of
Stream<String> s = Stream.of("a", "b", "c");

// Infinite streams
Stream<Integer> naturals   = Stream.iterate(1, n -> n + 1);
Stream<Double>  randoms    = Stream.generate(Math::random);

// Range streams (exclusive end)
IntStream.range(1, 6).forEach(System.out::println); // 1 2 3 4 5
IntStream.rangeClosed(1, 5).forEach(System.out::println); // 1 2 3 4 5

// From String
"hello world".chars()                    // IntStream of char codes
    .filter(c -> c != ' ')
    .mapToObj(c -> String.valueOf((char) c))
    .forEach(System.out::print);         // helloworld

Intermediate Operations (Lazy)

Intermediate operations return a new Stream — they form the pipeline.

java
List<Employee> employees = getEmployees();

// filter: keep elements matching predicate
employees.stream()
    .filter(e -> e.getSalary() > 50000)
    .filter(e -> e.getDepartment().equals("Engineering"));

// map: transform each element
employees.stream()
    .map(Employee::getName)           // Stream<String>
    .map(String::toUpperCase);

// mapToInt / mapToDouble: specialized streams for primitives (avoids boxing)
employees.stream()
    .mapToDouble(Employee::getSalary) // DoubleStream
    .average();

// flatMap: flatten one level of nested streams
List<List<String>> nested = List.of(
    List.of("a", "b"),
    List.of("c", "d")
);
nested.stream()
    .flatMap(Collection::stream)      // Stream<String>: a, b, c, d

// distinct: remove duplicates (uses equals)
Stream.of(1, 2, 2, 3, 3, 3).distinct(); // 1, 2, 3

// sorted: natural order or custom comparator
employees.stream()
    .sorted(Comparator.comparing(Employee::getSalary).reversed());

// peek: for debugging -- inspect elements without consuming
employees.stream()
    .filter(e -> e.getSalary() > 80000)
    .peek(e -> System.out.println("High earner: " + e.getName()))
    .map(Employee::getName)
    .collect(Collectors.toList());

// limit / skip
Stream.iterate(1, n -> n + 1)
    .skip(10)   // skip first 10
    .limit(5)   // take next 5: 11, 12, 13, 14, 15
    .forEach(System.out::println);

// takeWhile / dropWhile (Java 9+)
Stream.of(1, 2, 3, 4, 5, 4, 3)
    .takeWhile(n -> n < 4)    // 1, 2, 3 (stops at first non-matching)
    .forEach(System.out::println);

Terminal Operations (Eager)

Terminal operations trigger evaluation of the pipeline.

java
List<Employee> employees = getEmployees();

// forEach
employees.stream().forEach(System.out::println);

// collect -- most versatile terminal operation
List<String> names = employees.stream()
    .map(Employee::getName)
    .collect(Collectors.toList());

// count
long count = employees.stream()
    .filter(e -> e.getDepartment().equals("Engineering"))
    .count();

// findFirst / findAny
Optional<Employee> first = employees.stream()
    .filter(e -> e.getSalary() > 100000)
    .findFirst();

// anyMatch / allMatch / noneMatch
boolean anyHighEarner = employees.stream()
    .anyMatch(e -> e.getSalary() > 200000);

boolean allActive = employees.stream()
    .allMatch(Employee::isActive);

// reduce
int sum = IntStream.rangeClosed(1, 100).reduce(0, Integer::sum); // 5050

Optional<Employee> highestPaid = employees.stream()
    .reduce((a, b) -> a.getSalary() > b.getSalary() ? a : b);

// min / max
Optional<Employee> youngest = employees.stream()
    .min(Comparator.comparing(Employee::getAge));

// toArray
Employee[] arr = employees.stream()
    .filter(Employee::isActive)
    .toArray(Employee[]::new);

Collectors

The Collectors class provides powerful aggregations:

java
import java.util.stream.Collectors;

// toList, toSet, toUnmodifiableList (Java 10+)
List<String> names         = stream.collect(Collectors.toList());
Set<String> uniqueNames    = stream.collect(Collectors.toSet());
List<String> immutableList = stream.collect(Collectors.toUnmodifiableList());

// joining
String csv = employees.stream()
    .map(Employee::getName)
    .collect(Collectors.joining(", ", "[", "]")); // [Alice, Bob, Carol]

// groupingBy
Map<String, List<Employee>> byDepartment = employees.stream()
    .collect(Collectors.groupingBy(Employee::getDepartment));

// groupingBy with downstream collector
Map<String, Double> avgSalaryByDept = employees.stream()
    .collect(Collectors.groupingBy(
        Employee::getDepartment,
        Collectors.averagingDouble(Employee::getSalary)
    ));

Map<String, Long> countByDept = employees.stream()
    .collect(Collectors.groupingBy(
        Employee::getDepartment,
        Collectors.counting()
    ));

// partitioningBy: splits into two groups (true/false)
Map<Boolean, List<Employee>> activeInactive = employees.stream()
    .collect(Collectors.partitioningBy(Employee::isActive));

List<Employee> active   = activeInactive.get(true);
List<Employee> inactive = activeInactive.get(false);

// toMap
Map<Integer, Employee> byId = employees.stream()
    .collect(Collectors.toMap(
        Employee::getId,
        e -> e,
        (existing, replacement) -> existing // merge function for duplicate keys
    ));

// summarizingDouble
DoubleSummaryStatistics stats = employees.stream()
    .collect(Collectors.summarizingDouble(Employee::getSalary));
// stats.getAverage(), stats.getMax(), stats.getMin(), stats.getSum(), stats.getCount()

flatMap in Depth

flatMap is used when each element maps to multiple elements:

java
// Each employee has multiple skills
List<String> allSkills = employees.stream()
    .flatMap(e -> e.getSkills().stream())
    .distinct()
    .sorted()
    .collect(Collectors.toList());

// Parse words from multiple sentences
List<String> sentences = List.of("hello world", "foo bar baz");
List<String> words = sentences.stream()
    .flatMap(s -> Arrays.stream(s.split(" ")))
    .collect(Collectors.toList());
// [hello, world, foo, bar, baz]

// flatMap with Optional (flatten Optional of Optional)
Optional<String> result = Optional.of("  hello  ")
    .map(String::trim)
    .filter(s -> !s.isEmpty());

Optional

Optional wraps a value that may or may not be present — eliminates null pointer exceptions:

java
Optional<Employee> found = employees.stream()
    .filter(e -> e.getId() == 42)
    .findFirst();

// Bad: always check isPresent() before get()
if (found.isPresent()) {
    System.out.println(found.get().getName());
}

// Good: use functional methods
found.ifPresent(e -> System.out.println(e.getName()));

String name = found
    .map(Employee::getName)
    .orElse("Unknown");

Employee emp = found
    .orElseThrow(() -> new NotFoundException("Employee not found"));

// orElseGet: lazy evaluation (only calls supplier if empty)
Employee emp2 = found.orElseGet(() -> createDefaultEmployee());

Parallel Streams

java
// Switch to parallel processing
long count = employees.parallelStream()
    .filter(e -> e.getSalary() > 50000)
    .count();

// Or convert an existing stream
employees.stream()
    .parallel()
    .filter(...)
    .collect(...);

When parallel streams help: Large datasets (tens of thousands of elements or more), CPU-bound operations, independent elements (no shared mutable state).

When they hurt: Small collections (overhead exceeds benefit), I/O-bound operations, operations that require encounter order, when using non-thread-safe collectors.

java
// Bad: parallel stream with shared mutable state -- race condition
List<String> result = new ArrayList<>(); // not thread-safe
employees.parallelStream()
    .map(Employee::getName)
    .forEach(result::add); // WRONG

// Good: use thread-safe collectors
List<String> result = employees.parallelStream()
    .map(Employee::getName)
    .collect(Collectors.toList()); // Collectors handle thread safety

Common Interview Questions

Q: What is the difference between map and flatMap?

map transforms each element into exactly one element — one-to-one. flatMap transforms each element into zero or more elements (a Stream), then flattens all those streams into a single stream — one-to-many. Use flatMap when each element contains a collection you want to process as individual elements.

Q: Are streams lazy? Explain.

Intermediate operations (filter, map, sorted, etc.) are lazy — they do not execute until a terminal operation is invoked. When you chain stream().filter(...).map(...), nothing happens yet. When you call .collect() or .count(), the pipeline is evaluated, and each element passes through the entire chain before moving to the next. This enables short-circuiting: findFirst() stops as soon as one element matches.

Q: Can you reuse a stream?

No. Once a terminal operation has been called on a stream, the stream is consumed. Calling any operation on it afterwards throws

code

IllegalStateException: stream has already been operated upon or closed

. If you need to process the same data twice, create two streams from the source collection.

Practice Java on Froquiz

The Stream API is tested in almost every Java interview from mid-level upwards. Test your Java knowledge on Froquiz — covering streams, collections, concurrency, and more.

Summary

Streams are lazy pipelines — intermediate operations execute only when a terminal operation is called
filter keeps elements matching a predicate; map transforms; flatMap transforms and flattens
collect(Collectors.groupingBy(...)) is the most powerful aggregation — group, count, average by key
Optional wraps nullable values — use map, orElse, ifPresent instead of isPresent + get
Parallel streams help for large CPU-bound workloads; avoid them for small collections and I/O
Never use shared mutable state with parallel streams — use thread-safe collectors instead
Streams cannot be reused after a terminal operation