MapReduce / Spark Scheduling

In the era of big data and cloud computing, large amounts of data are generated from user applications and need to be processed in the datacenter. High-performance and scalable frameworks have become the future trending for data-intensive processing and analytics in both industry and academia. For example, Hadoop MapReduce is a typical two-stage process for parallel data processing; and Apache Spark leverages distributed memory to cache the intermediate results and nowadays reveals the potential to become the new standard distributed framework due to its favorable efficiency and expandability. When more and more applications are adopting these parallel-data computing frameworks, how to maximize resource utilization and minimize big data processing time becomes a focus of research and development. However, given the limited resources in the cluster and a complex dependency in data flow, it is challenging to decide how many resources should be allocated to each job/stage. Therefore, we put significant efforts on developing new schemes for job scheduling and resource allocation for these parallel-data computing frameworks.

TuMM: Self-Adjusting Slot Configurations for MapReduce Jobs in Hadoop Clusters - [IEEE CLOUD’13] [IEEE TCC 2017]

One of the primary concerns in MapReduce Hadoop is how to minimize the completion length (i.e., makespan) of a set of MapReduce jobs. The current Hadoop only allows static slot configuration, i.e., fixed numbers of map slots and reduce slots throughout the lifetime of a cluster. However, we found that such a static configuration may lead to low system resource utilizations as well as long completion length. Motivated by this, we propose TuMM, a simple yet effective scheme that uses slot ratio between map and reduce tasks as a tunable knob for reducing the makespan of a given set. By leveraging the workload information of recently completed jobs, our schemes dynamically allocates resources (or slots) to map and reduce tasks. We implement the presented scheme in Hadoop v0.20.2 and evaluate them with representative MapReduce benchmarks at Amazon EC2.  The experimental results demonstrate the effectiveness and robustness of our schemes under both simple workloads and more complex mixed workloads. 

HaSTE: Hadoop YARN Scheduling Based on Task-Dependency and Resource-Demand - [IEEE CLOUD’14] [IEEE TCC 2019]

The Hadoop ecosystem has evolved into its second generation, Hadoop YARN, which adopts fine-grained resource management schemes for job scheduling. However, the precedence constraint or fairness constraint in current widely used scheduling policies in YARN, such as FIFO and Fair, can both lead to inefficient resource allocation in the Hadoop YARN cluster. They also omit the dependency between tasks which is crucial for the efficiency of resource utilization. We thus propose a new YARN scheduler, named HaSTE, which can effectively reduce the makespan of MapReduce jobs in YARN by leveraging the information of requested resources, resource capacities, and dependency between tasks. We implement HaSTE as a pluggable scheduler in the most recent version of Hadoop YARN, and evaluate it with classic MapReduce benchmarks. The experimental results demonstrate that our YARN scheduler effectively reduces the makespan and improves resource utilization compare to the current scheduling policies.

OPERA: YARN Non-Exclusive Resource Management Scheme through Opportunistic Idle Resource Assignment - [ICCCN’16] [IEEE TCC 2018]

Efficiently managing resources and improving throughput in a large-scale cluster has become a crucial problem with the explosion of data processing applications in recent years. However, in the existing resource management (e.g., Hadoop YARN and Mesos), a certain amount of resources are exclusively allocated to a running task and can only be re-assigned after that task is completed. This exclusive mode unfortunately leads to a potential problem that may under-utilize the cluster resources and degrade system performance. To address this issue, we propose a novel opportunistic and efficient resource allocation scheme, named OPERA, which breaks the barriers among the encapsulated resource containers by leveraging the knowledge of actual runtime resource utilizations to re-assign opportunistic available resources to the pending tasks. OPERA avoids incurring severe performance interference to active tasks by further using two approaches to efficiently balances the starvations of reserved tasks and normal queued tasks. We implement and evaluate OPERA in Hadoop YARN v2.5. Our experimental results show that OPERA significantly reduces the average job execution time and increases the resource (CPU and memory) utilizations. 

AutoPath: Harnessing Parallel Execution Paths for Efficient Resource Allocation in Big Data Frameworks - [ICCCN’17]

Due to the flexibility of data operations and scalability of in-memory cache, Spark has revealed the potential to become the standard distributed framework to replace Hadoop for data-intensive processing in both industry and academia. However, we observe that the built-in scheduling algorithms in Spark (i.e., FIFO and FAIR) are not optimized for the applications with multiple parallel and independent branches in stages. Specifically, the child stage needs to wait and collect data from all its parent branches, but this wait has no guaranteed upper bound since it is tightly coupled with each branch’s workload characteristic, stage order, and their corresponding allocated computing resource. To address this challenge, we investigate a superior solution which ensures all branches acquire suitable resources according to their workload demand in order to let the finish time of each branch be as close as possible. Based on this, we propose a novel scheduling policy, named AutoPath, which can effectively reduce the overall makespan of such kind of applications by detecting and leveraging the parallel path, and adaptively assigning computing resources based on the estimated workload demands during runtime. We implement the new scheduling scheme in Spark v1.5.0 and evaluate it with selected representative workloads. The experiments demonstrate that our new scheduler effectively reduces the makespan and improves resource utilizations for these applications, compared to the current FIFO and FAIR schedulers.