13.5.15

Apache YARN Scheduler


  • The FIFO Scheduler
    • Places applications in a queue and runs them in the order of submission.
    • Requests for the first application in the queue are allocated first, then once its requests have been satisfied the next application in the queue is served, and so on.
    • The good part:
      • simple to understand
      • not needing any configuration
    • The bad side:
      • not suitable for shared clusters.
        • Large applications will use all the resources in a cluster
          • So each application has to wait its turn.
      • On a shared cluster it is better to use the Capacity Scheduler or the Fair Scheduler.
        • Both of these allow long-running jobs to complete in a timely manner,
          • while still allowing users who are running concurrent smaller ad hoc queries to get results back in a reasonable time.
  • The Capacity Scheduler
    • A separate dedicated queue allows the small job to start as soon as it is submitted,
      • although this is at the cost of
        • overall cluster utilization
          • since the queue capacity is reserved for jobs in that queue.
            • This means that the large job finishes later than when using the FIFO Scheduler.
  • The Fair Scheduler
    • There is no need to reserve a set amount of capacity since it will dynamically balance resources between all running jobs.
    • When the second (small) job starts it is allocated half of the cluster resources so that each job is using its fair share of resources.
  • Delay Scheduling

No comments: