- Scalability
- MRv1:
- because the jobtracker has to manage both jobs and tasks, MRv1 hits scalability bottlenecks in the region of 4,000 nodes, and 40,000 tasks.
- YARN/MRv2 overcomes these limitations by virtue of its split resource manager/application master architecture, which means it is designed to scale up to 10,000 nodes, and 100,000 tasks.
- Availability
- With the jobtracker's responsibilities splits between the resource manager and application master in YARN, making the service highly-available became a divide-and-conquer problem: provide HA for the resource manager, then for YARN applications (on a per-application basis). And indeed Hadoop 2 supports HA for both the resource manager, and for the application master for MapReduce jobs, which is similar to my own product.
- Utilization
- In MRv1:
- each tasktracker is configured with a static allocation of fixed size "slots", which are divided into map slots and reduce slots at configuration time.
- A map slot can only be used to run a map task, and a reduce slot can only be used for a reduce task.
- In YARN/MRv2:
- a node manager manages a pool of resources, rather than a fixed number of designated slots.
- MapReduce running on YARN will not hit the situation where a reduce task has to wait because only map slots are available on the cluster.
- If the resources to run the task are available, then the application will be eligible for them.
- Furthermore, resources in YARN are fine-grained, so an application can make a request for what it needs, rather than for an indivisible slot.
- Multitenancy
13.5.15
How YARN was designed to address the limitations in MRv1?
All from the Book: Hadoop: The Definitive Guide.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment