Skip to main content
← Knowledge Graph
Data SystemsNew

🧩 Dataflow Engines (Spark/Tez)

Batch processing engines that optimize workflows by keeping intermediate state in memory rather than writing to disk.

🎯 Mastery Criteria

Student can calculate the I/O savings of a Dataflow job vs a chained MapReduce job.

⚠️ Common Trap

Spark is just faster MapReduce.