Travel of Software Developer: How YARN was designed to address the limitations in MRv1?

13.5.15

All from the Book: Hadoop: The Definitive Guide.

because the jobtracker has to manage both jobs and tasks, MRv1 hits scalability bottlenecks in the region of 4,000 nodes, and 40,000 tasks.

YARN/MRv2 overcomes these limitations by virtue of its split resource manager/application master architecture, which means it is designed to scale up to 10,000 nodes, and 100,000 tasks.

With the jobtracker's responsibilities splits between the resource manager and application master in YARN, making the service highly-available became a divide-and-conquer problem: provide HA for the resource manager, then for YARN applications (on a per-application basis). And indeed Hadoop 2 supports HA for both the resource manager, and for the application master for MapReduce jobs, which is similar to my own product.

each tasktracker is configured with a static allocation of fixed size "slots", which are divided into map slots and reduce slots at configuration time.
A map slot can only be used to run a map task, and a reduce slot can only be used for a reduce task.

a node manager manages a pool of resources, rather than a fixed number of designated slots.
MapReduce running on YARN will not hit the situation where a reduce task has to wait because only map slots are available on the cluster.
If the resources to run the task are available, then the application will be eligible for them.
Furthermore, resources in YARN are fine-grained, so an application can make a request for what it needs, rather than for an indivisible slot.

13.5.15