Cost Optimization for Data Management in the Cloud
In this work we explore how to optimize traditional DBMS configurations and data stream processing resources allocations on a cloud computing platform. By analyzing workload processing costs and monetary resource usage costs in a cloud environment such as Amazon EC2, we can determine how to make profitable trade-offs between monetary cost and quality of service guiding users to better system configurations and resouce allocations.
Economic factors are leading to the rise of infrastructures providing software and computing facilities as a service, typically known as cloud services or cloud computing. Cloud services can provide efficiencies for application providers, both by limiting up-front capital expenses and by reducing the cost of ownership over time. Such services are typically delivered by data centers and built on shared commodity hardware for computation and storage using different levels of virtualization technologies.
The most popular utility service today is Amazon Web Service (AWS). Amazon provides basic resources, like processing power, storage and network bandwidth as a commodity. There are numerous advantages from utilizing resources offered from such cloud services: scalability, reliability, availability, reduced monetary cost and customizable computing instances.
Web-based applications and especially data and stream management applications can be deployed very easily under this new application development model. However, identifying the most profitable and performance effective resource configuration remains a challenging task especially in the presence of dynamic workloads. Application developers are given numerous processing and data storage services of different performance levels and costs. Currently, they rely on custom-built benchmarks and trial-and-error techniques for deciding on the best resources to use, or even the best configuration settings for their own data management applications.
A different approach is required that can automatically identify not only the best cloud resources but also any out-of-the-box configurations that data management applications can employ in order to maximize the benefits of utilizing cloud-based infrastructures.
The purpose of this work is to explore how a cloud-based data management system (i.e., a traditional OLTP RDBMS or stream processing systems running in the cloud) can be tuned on top of the cloud computing infrastructure like Amazon or any similar utility provider. Our goal is to provide a framework that allows application providers to automatically identify the in-cloud resource configuration, workload allocations, and database (or processing engine) configurations that would minimize monetary cost for a given workload while meeting application-defined QoS expectations. By the term in-cloud configuration we refer to the types and amounts of computing resources we need to utilize. A database configuration refers to certain database/stream-based optimizations (e.g., indexing, materialized views, stream processing distribution), parameters (e.g., buffer pool size) and general database settings (e.g., data compressions).
Our framework should strive to achieve the following: (i) meet user-defined SLAs (latency of results, transaction throughput, etc.) for each application (ii) minimize the monetary cost.
We will create an infrastructure to monitor both cloud performance and database / stream processing performance continuously. This system will assess system performance on regular intervals. A profiler will assess query statistics and our cloud manager will monitor all instances running concurrently This will be aggregated in the profiler and sent to the meta-optimizer. In this step we will infer where our bottlenecks are and what parts of our configuration are under-utilized. The meta-optimizer will create a series of recommendations to be passed on to our configuration manager which will deploy the changes.
- Brown University
- Nathan Backman
- Jennie Rogers