A programming model for processing large datasets with a parallel, distributed algorithm.
Batch processing engines that optimize workflows by keeping intermediate state in memory rather than writing to disk.
MapReduce modifies the input data.