Skip to content

Latest commit

 

History

History
160 lines (125 loc) · 8.5 KB

File metadata and controls

160 lines (125 loc) · 8.5 KB

Karasu-Collective-Cloud-Profiling

Prototypical implementation of "Karasu" for collective and thus efficient cloud configuration profiling. The whole approach is implemented in Python. Please consider reaching out if you have questions or encounter problems. Artifact DOI: 10.5281/zenodo.6624921

Technical Details

Strieving for understandable code that can be reused and further developed, we use pydantic and python typing whenever feasible.

Key Packages

  • PyTorch 1.13.1, machine learning framework based on the Torch library
  • BoTorch 0.6.0, a framework for Bayesian Optimization in PyTorch
  • pandas 1.3.5, open source data analysis and manipulation tool
  • Ax 0.2.3, a platform for managing and optimizing experiments
  • Hummingbird 0.4.3, a library for compiling traditional ML models into tensor computations
  • scikit-learn 1.0.2, a Python module for machine learning built on top of SciPy
  • NumPy 1.21.6, the primary array programming library for the Python language
  • SciPy 1.7.3, an open-source software for mathematics, science, and engineering

These packages and all other required packages are specified in the requirements.txt and can thus be conveniently installed via pip3 install --user -r requirements.txt in case a normal installation is desired. However, we recommend the containerized approach, as described next.

Hardware Characteristics

For the experiments described in the paper, we had access to a machine equipped with a GPU, which helped us to conduct all the various experiments in a shorter period of time. It had the following characteristics:

Resource Details
CPU Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz
vCores 8
Memory 45 GB RAM
GPU 1 x NVIDIA Quadro RTX 5000 (16 GB memory)

Containerization

To foster the easy deployment and execution of this prototype, we furthermore provide a Dockerfile for building a container image. It can be manually built via docker build -t karasu-container:dev . Note that for your convenience, this process is already handled internally when using our bash functions.

By default, a container started with this image will simply execute a ping command. Further below, we describe how it can be used for actual experiments, i.e. by overriding the default command with specific experiment / evaluation tasks.

Karasu in Action

We present Karasu, a collective and privacy-aware approach for efficient cloud configuration profiling. It trains lightweight performance models using only high-level information of shared workload profilings and combines them into an ensemble method for better exploiting inherent knowledge of the cloud configuration search space. This way, users are able to collaboratively improve their individual prediction capabilities, while obscuring sensitive information. Furthermore, Karasu enables the optimization of multiple objectives or constraints at the same time, like runtime, cost, and carbon footprint.

In the following, we provide instructions for reproducing our results.

Prerequisites

We evaluate our approach on a publicly available dataset consisting of performance data from diverse workloads and their various executions in a cloud environment. Specifically, we use this dataset created in the context of proposed cloud configuration approaches. Among other things, it encompasses data obtained from 18 workloads running 69 configurations (scaleout, VM type) in a multi-node setting (one run per configuration). Workloads were implemented in Hadoop and different Spark versions, realized with various algorithms, and tasked with processing diverse datasets.

For our evaluation, it is required to clone this repository, and copy the folder scout/dataset/osr_multiple_nodes to data/scout_multiple in our repository. The initial processing of this dataset will take a few minutes, depending on the concrete machine used. It is furthermore required to mount the folder data as well as the to-be-created folder artifacts to any container you start. This is all handled internally by the minimalistic bash functions we provide, so you can directly proceed with the next steps!

Emulating a Shared Performance Data Repository

To start with, we emulate a shared performance data repository, which requires appropriate data generation using our baselines. The hereby generated data is used in the subsequent examples for both visualizing the capabilities of individual baselines, and offering Karasu a data source to draw from for its ensemble approach. For single-objective optimization (SOO):

./docker_scripts.sh create_soo_data

Likewise, for multi-objective optimization (MOO):

./docker_scripts.sh create_moo_data

In our experiments, each of the executed scripts ran for approx. 10 hours. The generated data is saved to the artifacts directory and used within the next steps where we investigate our research questions (RQs).

RQ1: General Performance Boost

What is the general potential of exploiting existing models to boost a target one? We evaluate a scenario where support models are available that originate from the same workload, yet were initialized differently and trained with other runtime targets. To run the experiments and generate the data for analysis, run:

./docker_scripts.sh run_rq1_experiment

In our experiments, the script ran for approx. 2 days.

RQ2: Collaborative Applicability

How good does the introduced approach work in a collaborative scenario, with potentially diverse workloads and limited available data? We evaluate a scenario where all the data in the repository originates from different workloads, with individual characteristics, resource needs, and constraints. To run the experiments and generate the data for analysis, run:

./docker_scripts.sh run_rq2_experiment

In our experiments, the script ran for approx. 2 days.

To evaluate the scenario of heterogeneous data that we discussed and reported in the paper, run:

./docker_scripts.sh run_rq2_hetero_experiment

RQ3: Multi-Objective Support

To evaluate Karasu in an MOO setting, we consider two objectives, namely cost and energy consumption, to be minimized under formulated runtime constraints. To run the experiments and generate the data for analysis, run:

./docker_scripts.sh run_rq3_experiment

In our experiments, the script ran for more than 1 day.

Finally: Data Analysis

With the generated data in place, one can analyze the results and produce insightful plots (as in our paper). The plots can be created simply by running:

./docker_scripts.sh analysis

Note that running the analysis requires the completion of all aforementioned steps.

Concluding Remarks

In our experiments, we had access to a rather modern machine equipped with a GPU. The experiment execution still required some time (see sections above). Now, with the execution taking place in a docker container, possibly on less sophisticated hardware, the indicated execution times might be prolonged.

Note that it is possible to abort and resume the execution of specific experiments (Emulation, RQ1, RQ2, RQ3) since we inspect on every container restart the already written experiment data and thus skip the associated configurations to prevent data duplication.

No time for data generation on your own? Consider extracting the artifacts.tar.gz to directly reuse our generated experiment data.

Questions? Something does not work or remains unclear?

Please get in touch, we are happy to help!

How to Cite

@inproceedings{scheinert2023karasu,
  title={Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics},
  author={Dominik Scheinert and Philipp Wiesner and Thorsten Wittkopp and Lauritz Thamsen and Jonathan Will and Odej Kao},
  booktitle={{IEEE} International Performance, Computing, and Communications Conference, {IPCCC} 2023, Anaheim, CA, USA, November 17-19, 2023},
  year={2023}
}

You can also find a preprint here.