Java Stream API: Filter, Map, Reduce, Collectors and Parallel Streams
The Java Stream API, introduced in Java 8, transformed how Java developers write data processing code. Before streams, processing a collection meant verbose imperative loops. Streams enable a declarative, pipeline-style approach that is concise, readable, and composable. Streams are tested in virtually every Java interview.
What Is a Stream?
A stream is a sequence of elements that supports sequential and parallel aggregate operations. Key properties:
- Not a data structure β streams do not store data; they process it from a source
- Lazy β intermediate operations are not executed until a terminal operation is called
- Can only be consumed once β a stream cannot be reused after a terminal operation
- Does not modify the source β streams produce new results without changing the original collection
Creating Streams
javaimport java.util.stream.*; import java.util.*; // From a Collection List<String> names = List.of("Alice", "Bob", "Carol", "Dave"); Stream<String> stream = names.stream(); // From an array int[] numbers = {1, 2, 3, 4, 5}; IntStream intStream = Arrays.stream(numbers); // Stream.of Stream<String> s = Stream.of("a", "b", "c"); // Infinite streams Stream<Integer> naturals = Stream.iterate(1, n -> n + 1); Stream<Double> randoms = Stream.generate(Math::random); // Range streams (exclusive end) IntStream.range(1, 6).forEach(System.out::println); // 1 2 3 4 5 IntStream.rangeClosed(1, 5).forEach(System.out::println); // 1 2 3 4 5 // From String "hello world".chars() // IntStream of char codes .filter(c -> c != ' ') .mapToObj(c -> String.valueOf((char) c)) .forEach(System.out::print); // helloworld
Intermediate Operations (Lazy)
Intermediate operations return a new Stream β they form the pipeline.
javaList<Employee> employees = getEmployees(); // filter: keep elements matching predicate employees.stream() .filter(e -> e.getSalary() > 50000) .filter(e -> e.getDepartment().equals("Engineering")); // map: transform each element employees.stream() .map(Employee::getName) // Stream<String> .map(String::toUpperCase); // mapToInt / mapToDouble: specialized streams for primitives (avoids boxing) employees.stream() .mapToDouble(Employee::getSalary) // DoubleStream .average(); // flatMap: flatten one level of nested streams List<List<String>> nested = List.of( List.of("a", "b"), List.of("c", "d") ); nested.stream() .flatMap(Collection::stream) // Stream<String>: a, b, c, d // distinct: remove duplicates (uses equals) Stream.of(1, 2, 2, 3, 3, 3).distinct(); // 1, 2, 3 // sorted: natural order or custom comparator employees.stream() .sorted(Comparator.comparing(Employee::getSalary).reversed()); // peek: for debugging -- inspect elements without consuming employees.stream() .filter(e -> e.getSalary() > 80000) .peek(e -> System.out.println("High earner: " + e.getName())) .map(Employee::getName) .collect(Collectors.toList()); // limit / skip Stream.iterate(1, n -> n + 1) .skip(10) // skip first 10 .limit(5) // take next 5: 11, 12, 13, 14, 15 .forEach(System.out::println); // takeWhile / dropWhile (Java 9+) Stream.of(1, 2, 3, 4, 5, 4, 3) .takeWhile(n -> n < 4) // 1, 2, 3 (stops at first non-matching) .forEach(System.out::println);
Terminal Operations (Eager)
Terminal operations trigger evaluation of the pipeline.
javaList<Employee> employees = getEmployees(); // forEach employees.stream().forEach(System.out::println); // collect -- most versatile terminal operation List<String> names = employees.stream() .map(Employee::getName) .collect(Collectors.toList()); // count long count = employees.stream() .filter(e -> e.getDepartment().equals("Engineering")) .count(); // findFirst / findAny Optional<Employee> first = employees.stream() .filter(e -> e.getSalary() > 100000) .findFirst(); // anyMatch / allMatch / noneMatch boolean anyHighEarner = employees.stream() .anyMatch(e -> e.getSalary() > 200000); boolean allActive = employees.stream() .allMatch(Employee::isActive); // reduce int sum = IntStream.rangeClosed(1, 100).reduce(0, Integer::sum); // 5050 Optional<Employee> highestPaid = employees.stream() .reduce((a, b) -> a.getSalary() > b.getSalary() ? a : b); // min / max Optional<Employee> youngest = employees.stream() .min(Comparator.comparing(Employee::getAge)); // toArray Employee[] arr = employees.stream() .filter(Employee::isActive) .toArray(Employee[]::new);
Collectors
The Collectors class provides powerful aggregations:
javaimport java.util.stream.Collectors; // toList, toSet, toUnmodifiableList (Java 10+) List<String> names = stream.collect(Collectors.toList()); Set<String> uniqueNames = stream.collect(Collectors.toSet()); List<String> immutableList = stream.collect(Collectors.toUnmodifiableList()); // joining String csv = employees.stream() .map(Employee::getName) .collect(Collectors.joining(", ", "[", "]")); // [Alice, Bob, Carol] // groupingBy Map<String, List<Employee>> byDepartment = employees.stream() .collect(Collectors.groupingBy(Employee::getDepartment)); // groupingBy with downstream collector Map<String, Double> avgSalaryByDept = employees.stream() .collect(Collectors.groupingBy( Employee::getDepartment, Collectors.averagingDouble(Employee::getSalary) )); Map<String, Long> countByDept = employees.stream() .collect(Collectors.groupingBy( Employee::getDepartment, Collectors.counting() )); // partitioningBy: splits into two groups (true/false) Map<Boolean, List<Employee>> activeInactive = employees.stream() .collect(Collectors.partitioningBy(Employee::isActive)); List<Employee> active = activeInactive.get(true); List<Employee> inactive = activeInactive.get(false); // toMap Map<Integer, Employee> byId = employees.stream() .collect(Collectors.toMap( Employee::getId, e -> e, (existing, replacement) -> existing // merge function for duplicate keys )); // summarizingDouble DoubleSummaryStatistics stats = employees.stream() .collect(Collectors.summarizingDouble(Employee::getSalary)); // stats.getAverage(), stats.getMax(), stats.getMin(), stats.getSum(), stats.getCount()
flatMap in Depth
flatMap is used when each element maps to multiple elements:
java// Each employee has multiple skills List<String> allSkills = employees.stream() .flatMap(e -> e.getSkills().stream()) .distinct() .sorted() .collect(Collectors.toList()); // Parse words from multiple sentences List<String> sentences = List.of("hello world", "foo bar baz"); List<String> words = sentences.stream() .flatMap(s -> Arrays.stream(s.split(" "))) .collect(Collectors.toList()); // [hello, world, foo, bar, baz] // flatMap with Optional (flatten Optional of Optional) Optional<String> result = Optional.of(" hello ") .map(String::trim) .filter(s -> !s.isEmpty());
Optional
Optional wraps a value that may or may not be present β eliminates null pointer exceptions:
javaOptional<Employee> found = employees.stream() .filter(e -> e.getId() == 42) .findFirst(); // Bad: always check isPresent() before get() if (found.isPresent()) { System.out.println(found.get().getName()); } // Good: use functional methods found.ifPresent(e -> System.out.println(e.getName())); String name = found .map(Employee::getName) .orElse("Unknown"); Employee emp = found .orElseThrow(() -> new NotFoundException("Employee not found")); // orElseGet: lazy evaluation (only calls supplier if empty) Employee emp2 = found.orElseGet(() -> createDefaultEmployee());
Parallel Streams
java// Switch to parallel processing long count = employees.parallelStream() .filter(e -> e.getSalary() > 50000) .count(); // Or convert an existing stream employees.stream() .parallel() .filter(...) .collect(...);
When parallel streams help: Large datasets (tens of thousands of elements or more), CPU-bound operations, independent elements (no shared mutable state).
When they hurt: Small collections (overhead exceeds benefit), I/O-bound operations, operations that require encounter order, when using non-thread-safe collectors.
java// Bad: parallel stream with shared mutable state -- race condition List<String> result = new ArrayList<>(); // not thread-safe employees.parallelStream() .map(Employee::getName) .forEach(result::add); // WRONG // Good: use thread-safe collectors List<String> result = employees.parallelStream() .map(Employee::getName) .collect(Collectors.toList()); // Collectors handle thread safety
Common Interview Questions
Q: What is the difference between map and flatMap?
map transforms each element into exactly one element β one-to-one. flatMap transforms each element into zero or more elements (a Stream), then flattens all those streams into a single stream β one-to-many. Use flatMap when each element contains a collection you want to process as individual elements.
Q: Are streams lazy? Explain.
Intermediate operations (filter, map, sorted, etc.) are lazy β they do not execute until a terminal operation is invoked. When you chain stream().filter(...).map(...), nothing happens yet. When you call .collect() or .count(), the pipeline is evaluated, and each element passes through the entire chain before moving to the next. This enables short-circuiting: findFirst() stops as soon as one element matches.
Q: Can you reuse a stream?
No. Once a terminal operation has been called on a stream, the stream is consumed. Calling any operation on it afterwards throws
IllegalStateException: stream has already been operated upon or closedPractice Java on Froquiz
The Stream API is tested in almost every Java interview from mid-level upwards. Test your Java knowledge on Froquiz β covering streams, collections, concurrency, and more.
Summary
- Streams are lazy pipelines β intermediate operations execute only when a terminal operation is called
filterkeeps elements matching a predicate;maptransforms;flatMaptransforms and flattenscollect(Collectors.groupingBy(...))is the most powerful aggregation β group, count, average by keyOptionalwraps nullable values β usemap,orElse,ifPresentinstead ofisPresent+get- Parallel streams help for large CPU-bound workloads; avoid them for small collections and I/O
- Never use shared mutable state with parallel streams β use thread-safe collectors instead
- Streams cannot be reused after a terminal operation