13.5.15

How YARN was designed to address the limitations in MRv1?

All from the Book: Hadoop: The Definitive Guide.
  • Scalability
    • MRv1:
      • because the jobtracker has to manage both jobs and tasks, MRv1 hits scalability bottlenecks in the region of 4,000 nodes, and 40,000 tasks.
    • YARN/MRv2 overcomes these limitations by virtue of its split resource manager/application master architecture, which means it is designed to scale up to 10,000 nodes, and 100,000 tasks.
  • Availability
    • With the jobtracker's responsibilities splits between the resource manager and application master in YARN, making the service highly-available became a divide-and-conquer problem: provide HA for the resource manager, then for YARN applications (on a per-application basis). And indeed Hadoop 2 supports HA for both the resource manager, and for the application master for MapReduce jobs, which is similar to my own product.
  • Utilization
    • In MRv1:
      • each tasktracker is configured with a static allocation of fixed size "slots", which are divided into map slots and reduce slots at configuration time.
      • A map slot can only be used to run a map task, and a reduce slot can only be used for a reduce task.
    • In YARN/MRv2:
      • a node manager manages a pool of resources, rather than a fixed number of designated slots. 
      • MapReduce running on YARN will not hit the situation where a reduce task has to wait because only map slots are available on the cluster.
      • If the resources to run the task are available, then the application will be eligible for them.
      • Furthermore, resources in YARN are fine-grained, so an application can make a request for what it needs, rather than for an indivisible slot.
  • Multitenancy

No comments: