Datacenter Resource Management
The world has entered the era of “Big Data”. To handle this high I/O traffic, datacenter servers are being equipped with the best possible hardware available encompassing compute, memory, networking and storage domains. Traditional hard disk drives (HDDs) are used in datacenters which suffer from throughput limitations. To counter HDD performance limitations and the I/O bottleneck, solid state devices (SSDs) emerge as a viable storage alternative. However, deploying Flash based SSDs in data centers faces major management concerns, such as write amplification, load balancing, fault tolerance, and shared resource partitioning. Thus, we aim to solve these challenges by developing techniques to manege flash-based SSD resources and minimize the cost of ownership and maintenance. Additionally, we research on methods to balance the fault tolerance of the storage system using data replication.
GReM: Dynamic SSD Resource Allocation In Virtualized Storage Systems With Heterogeneous IO Workloads - [IPCCC'17]
In a shared virtualized storage system that runs VMs with heterogeneous IO demands, it becomes a problem for the hypervisor to cost-effectively partition and allocate SSD resources among multiple VMs. There are two straightforward approaches to solving this problem: equally assigning SSDs to each VM or managing SSD resources in a fair competition mode. Unfortunately, neither of these approaches can fully utilize the benefits of SSD resources, particularly when the workloads frequently change and bursty IOs occur from time to time. In this work, we design a Global SSD Resource Management solution - GReM, which aims to fully utilize SSD resources as a second-level cache under the consideration of performance isolation. In particular, GReM takes dynamic IO demands of all VMs into consideration to split the entire SSD space into a long-term zone and a short-term zone, and cost-effectively updates the content of SSDs in these two zones. GReM is able to adaptively adjust the reservation for each VM inside the long-term zone based on their IO changes. GReM can further dynamically partition SSDs between the long- and short-term zones during runtime by leveraging the feedbacks from both cache performance and bursty workloads. Experimental results show that GReM can capture the cross-VM IO changes to make correct decisions on resource allocation, and thus obtain high IO hit ratio and low IO management costs, compared with both traditional and state-of-the-art caching algorithms.
vFRM: Flash Resource Manager in VMware ESX Server - [TCC 2017], [NOMS'14]
One popular approach of leveraging Flash technology in the virtual machine environment today is using it as a secondary-level host-side cache. Although this approach delivers I/O acceleration for a single VM workload, it might not be able to fully exploit the outstanding performance of Flash and justify the high cost-per-GB of Flash resources. In this work, we present the design for VMware Flash Resource Manager (vFRM), which aims to maximize the utilization of Flash resources with minimal CPU, memory and I/O cost for managing and operating Flash. It borrows the ideas of heating and cooling from thermodynamics to identify the data blocks that benefit most from being put on Flash, and lazily and asynchronously migrates the data blocks between Flash and spinning disks. Experimental evaluation of the prototype shows that vFRM achieves better cost-effectiveness than traditional caching solutions, and costs orders of magnitude less memory and I/O bandwidth.
minTCO: A Fresh Perspective on Total Cost of Ownership Models for Flash Storage in Datacenters - [IEEE CLOUD'16]
Recently, adoption of Flash based devices has become increasingly common in all forms of computing devices. Flash devices have started to become more economically viable for large storage installations like datacenters, where metrics like Total Cost of Ownership (TCO) are of paramount importance. Flash devices suffer from write amplification (WA), which, if unaccounted, can substantially increase the TCO of a storage system. In this work, we develop a TCO model for Flash storage devices, and then plug a Write Amplification (WA) model of NVMe SSDs we build based on empirical data into this TCO model. Our new WA model accounts for workload characteristics like write rate and percentage of sequential writes. Furthermore, using both the TCO and WA models as the optimization criterion, we design new Flash resource management schemes (minTCO) to guide datacenter managers to make workload allocation decisions under the consideration of TCO for SSDs. Experimental results show that minTCO can reduce the TCO and keep relatively high throughput and space utilization of the entire datacenter storage.
AutoReplica: Automatic Data Replica Manager in Distributed Caching and Data Processing Systems - [CCNCPS'16]
Nowadays, replication technique is widely used in data center storage systems for large scale Cyber-physical Systems (CPS) to prevent data loss. However, side-effect of replication is mainly the overhead of extra network and I/O traffics, which inevitably downgrades the overall I/O performance of the cluster. To effectively balance the trade-off between I/O performance and fault tolerance, in this work, we propose a complete solution called “AutoReplica” - a replica manager in distributed caching and data processing systems with SSD-HDD tier storages. In detail, AutoReplica utilizes the remote SSDs (connected by high speed fibers) to replicate local SSD caches to protect data. In order to conduct load balancing among nodes and reduce the network overhead, we propose three approaches (i.e., ring, network, and multiple-SLA network) to automatically setup the cross-node replica structure with the consideration of network traffic, I/O speed and SLAs. To improve the performance during migrations triggered by load balance and failure recovery, we propose the a migrate-on-write technique called “fusion cache” to seamlessly migrate and prefetch among local and remote replicas without pausing the subsystem. Moreover, AutoReplica can also recover from different failure scenarios, while limits the performance downgrading degree. Lastly, AutoReplica supports parallel prefetching from multiple nodes with a new dynamic optimizing streaming technique to improve I/O performance. We are currently in the process of implementing AutoReplica to be easily plugged into commonly used distributed caching systems, and solidifying our design and implementation details.
AutoTiering: Automatic Data Placement Manager in Multi-Tier All-Flash Datacenter - [IPCCC'17]
In the year of 2017, the capital expenditure of Flash-based Solid State Drivers (SSDs) keeps declining and the storage capacity of SSDs keeps increasing. As a result, the "selling point" of traditional spinning Hard Disk Drives (HDDs) as a backend storage - low cost and large capacity is no longer unique, and eventually they will be replaced by low-end SSDs which have large capacity but perform orders of magnitude better than HDDs. Thus, it is widely believed that all-flash multi-tier storage systems will be adopted in the enterprise datacenters in the near future. However, existing caching or tiering solutions for SSD-HDD hybrid storage systems are not suitable for all-flash storage systems. This is because that all-flash storage systems do not have a large speed difference (e.g., 10x) among each tier. Instead, different specialties (such as high performance, high capacity, etc.) of each tier should be taken into consideration. Motivated by this, we develop an automatic data placement manager called "AutoTiering" to handle virtual machine disk files (VMDK) allocation and migration in an all-flash multi-tier datacenter to best utilize the storage resource, optimize the performance, and reduce the migration overhead. AutoTiering is based on an optimization framework, whose core technique is to predict VM's performance change on different tiers with different specialties without conducting real migration. As far as we know, AutoTiering is the first optimization solution designed for all-flash multi-tier datacenters. We implement AutoTiering on VMware ESXi, and experimental results show that it can significantly improve the I/O performance compared to existing solutions.