Autoscaling Scientific Workflows on the Cloud by Combining On-demand and Spot Instances

David A. Monge • Yisel Garí • Cristian Mateos • Carlos García Garino

D.A. Monge
ITIC-CONICET & Facultad de Ciencias Exactas y Naturales, Universidad Nacional de Cuyo (UNCuyo), Mendoza, Argentina
dmonge@uncu.edu.ar
Tel.: +542614291000

Y. Garí
ITIC-CONICET, Universidad Nacional de Cuyo (UNCuyo), Mendoza, Argentina

C. Mateos
ISISTAN-CONICET, UNICEN, Tandil, Buenos Aires, Argentina

C. García Garino
ITIC & Facultad de Ingeniería, Universidad Nacional de Cuyo (UNCuyo), Mendoza, Argentina

Abstract

Autoscaling strategies achieve efficient and cheap executions of scientific workflows running in the cloud by determining appropriate type and amount of virtual machine instances to use while scheduling the tasks/data . Current strategies only consider on-demand instances ignoring the advantages of a mixed cloud infrastructure comprising also spot instances. Although the latter type of instances are subject to failures and therefore provide an unreliable infrastructure, they potentially offer significant cost and time improvements if used wisely. This paper discusses a novel autoscaling strategy with two features. First, it combines both types of instances to acquire a better cost-performance balance in the infrastructure. And second, it uses heuristic scheduling to deal with the unreliability of spot instances. Simulated experiments based on 4 scientific workflows showed substantial makespan and cost reductions of our strategy when compared with a reference strategy from the state of the art entitled Scaling First. These promising results represent a step towards new and better strategies for workflow autoscaling in the cloud.

Workflow Tasks

Figure 1↓ shows the tasks profile for the studied workflow applications. The figure presents the median duration in minutes and the number of tasks grouped by task type. Each application comprises 1000 tasks except for SIPHT that comprises 968 tasks.

workflow-tasks
Figure 1  Tasks profile of the scientific workflows. Bars represent the median task duration for each type in minutes using a logarithmic scale. The label on each bar represents the number of tasks for each type in the corresponding workflow application.
From the figure can be seen that the applications present very different workload patterns resulting from the different types and number of tasks. CyberShake and Montage comprise many short-duration tasks, and just a few of them (4 and 3 respectively) have durations that exceed one hour of computation. On the other hand, LIGO’s Inspiral and SIPHT comprise many long-duration tasks with durations that exceed 1 hour of computation.
The structure of task dependencies on each workflow application are also very dissimilar as can be appreciated from Figure 2↓.

cybershake
(a) CyberShake.
inspiral
(b) LIGO’s Inspiral.
montage
(c) Montage.

sipht

(d) SIPHT.
Figure 2 Example of dependency structures on small versions of the studied workflows. Images have been adapted from [1].

Infrastructure

Table 1↓ presents the characteristics of each type of on-demand instances considered during the experimentation. Values in the table correspond to actual characteristics of Amazon Elastic Compute Cloud (EC2) instances, where vCPU is the number of (virtual) CPUs available for such instance type, ECUtot (acronym for EC2 compute units) are the relative computing power of the instances considering all the virtual CPUs [A]  [A] One ECU is equivalent to a CPU capacity of a 1.0-1.2 GHz 2007 Opteron., ECU is the relative performance of one of the CPUs, and the last column denotes the price in US dollars (USD) of an hour of computation. Note that each VM-type name is accompanied by a category label that stands for the type of processes for which the instance is best suitable:
  • general purpose (GP): balanced characteristics suitable for most of the tasks,
  • compute optimized (CO): best suitable for compute-intensive tasks, and
  • memory optimized (MO): best suitable for memory-intensive tasks.

Table 1 Characteristics of the on-demand instances. The data corresponds to instances belonging to the us-west (Oregon) region.
VM type vCPU ECUtot ECU Price [USD]
t2.micro (GP) 1 1 1 0.013
m3.medium (GP) 1 3 2 0.07
c3.2xlarge (CO) 8 28 3.5 0.42
r3.xlarge (MO) 4 13 3.25 0.35
m3.2xlarge (GP) 8 26 3.25 0.56

Budget

For each applicacion, we computed a fit budget value, Bfit, according to Eq. (). To generate a larger variety of experimentation scenarios, from each fit budget value we derived reduced and wide budgets. The reduced and wide budgets are computed from the fit budget (Bfit) as Breduced = 0.8⋅Bfit and Bwide = 1.2⋅Bfit respectively. Table 2↓ presents the estimated budgets used for each workflow application.

Table 2 Estimated budgets for each workflow application.
Workflow Breduced Bfit Bwide
CyberShake 4.04 5.04 6.05
LIGO’s Inspiral 7.28 9.09 10.91
Montage 1.39 1.74 2.09
SIPHT 1.48 1.85 2.21

Autoscaling Strategies

For performing complete comparison between Scaling First and SIAA we defined different settings of SIAA to evaluate the improvements resulting from the use of (i) spot instances, (ii) the heuristic scheduling algorithm, and (iii) both features together. Table 3↓ provides a summary of all the autoscaling strategies evaluated and their settings including: if spot instances were considered, which bidding strategy was used and if the heuristic scheduling algorithm was used. As can be seen, 4 families of strategies can be identified, namely Scaling First, SIAA-no-spots, SIAA-greedy, and SIAA-full.

Table 3 Summary of the autoscaling strategies and their configuration. Text in parenthesis P<0.1, P

Name Spot? Bidding Scheduling
Scaling First no greedy
SIAA-no-spots no heuristic
SIAA-greedy (current) yes current greedy
SIAA-greedy (P<0.1) yes prob. greedy
SIAA-greedy (P<0.05) yes prob. greedy
SIAA-greedy (P<0.01) yes prob. greedy
SIAA-full (current) yes current heuristic
SIAA-full (P<0.1) yes prob. heuristic
SIAA-full (P<0.05) yes prob. heuristic
SIAA-full (P<0.01) yes prob. heuristic

Experimental Scenarios

The evaluation of the autoscaling strategies was performed according to a series of scenarios described as follows. Each scenario is defined by (i) the autoscaling strategy used (Scaling First, SIAA with all its possible configurations), (ii) the workflow at hand (CyberShake, LIGO’s Inspiral, Montage, SIPHT), and (iii) the budget available (fit, reduced, wide). Each experimental scenario was simulated 30 times (10 for each type of budget) for statistical significance of results using the CloudSim simulator [2]. In all cases, task durations were affected by a 20% error (uniformly distributed, as described in “Duration of Tasks”) to provide a more realistic environment according to the performance variability of the cloud. This 20% variability on performance was selected in concordance with the settings used by other colleagues [4, 5]. Table 4↓ summarizes the parameters used in the experiments.

Table 4 Experimental settings. SIAA was configured without using spot instances, and with spot instances using the current price, and the probabilistic bidding methods with probability of failure P<α and α ∈ {0.1, 0.05, 0.01}. These settings give a total of 1200 simulations.
Setting Value
Strategy Scaling First, SIAA (all 9 configurations)
Application CyberShake, LIGO, Montage, SIPHT
Budget fit, reduced, wide
Perf. var. 20%
Repetitions 30 (10 per type of budget)

Global Results

Table 5↓ presents the global results per strategy including the mean values for the metrics of the number of instances, the percentage of spot instances, speedup with respect to the sequential execution of the workflows, cost of execution and number of out-of-bid errors and task failures. The best performances are highlighted in bold font.

Table 5 Summary of results. Each row presents the mean values per strategy considering all the studied applications.
Strategy Instances % spot Out-of-bid errors Task Failures Speedup Cost
Scaling First 15.5 0.0% N/A N/A 57.6 174.9
SIAA-no-spots 15.6 0.0% N/A N/A 68.8 139.4
SIAA-greedy (current) 150.0 91.6% 135.8 755.7 49.2 129.0
SIAA-greedy (P<0.1) 28.8 71.4% 5.9 23.4 70.6 111.7
SIAA-greedy (P<0.05) 26.9 70.2% 3.3 13.2 72.5 105.4
SIAA-greedy (P<0.01) 23.8 65.8% 0.6 3.6 73.5 101.6
SIAA-full (current) 158.2 91.7% 143.9 825.1 52.1 120.2
SIAA-full (P<0.1) 28.6 71.8% 5.1 24.7 73.7 107.4
SIAA-full (P<0.05) 26.9 70.4% 2.5 13.1 76.5 99.2
SIAA-full (P<0.01) 23.8 66.0% 0.6 3.7 77.8 96.1

Speedup Results

Figure 3↓ presents the average speedups and standard deviations per strategy for each workflow application. Strategies are sorted in increasing order according to speedup.

speedups
Figure 3 Speedup comparison for each workflow application. Words in parenthesis (current), (P<0.1), (P<0.05) and (P<0.01) stand for the current and the probabilistic bidding methods.

Statistical Significance

Table 6↓ shows, for each application the speedup improvements resulting from the comparison with Scaling First. Only the best performing configuration for SIAA-greedy and SIAA-full are presented. The table presents the difference between speedups and the percentage of improvement with respect to the average Scaling First speedup. To check statistical significance of results we used the Mann–Whitney U test [3] with a confidence level of α = 0.01.

Table 6 Speedup improvements of SIAA (with and without using spot instances) compared with Scaling First. Results are aggregated by workflow application.
Workflow Strategy Improvement Percentage U statistic p-value significant?
CyberShake SIAA-no-spots 4.91 7.98% 564 0.092 no
SIAA-greedy (P<0.1) 20.71 33.70% 892 < 0.001 yes
SIAA-full (P<0.05) 22.44 36.52% 900 < 0.001 yes
LIGO’s Inspiral SIAA-no-spots 34.70 27.24% 898 < 0.001 yes
SIAA-greedy (P<0.1)
39.03 30.64% 691 < 0.001 yes
SIAA-full (P<0.01) 43.48 34.13% 900 < 0.001 yes
Montage SIAA-no-spots -0.02 -0.10% 471 0.756 no
SIAA-greedy (P<0.05) 5.87 32.52% 888 < 0.001 yes
SIAA-full (P<0.05) 5.84 32.35% 890 < 0.001 yes
SIPHT SIAA-no-spots 8.17 31.81% 900 < 0.001 yes
SIAA-greedy (P<0.01) 9.94 38.68% 893 < 0.001 yes
SIAA-full (P<0.01) 10.28 40.02% 900 < 0.001 yes

Execution Cost Results

Figure 4↓ presents the average execution costs and standard deviations per strategy for each workflow application. Strategies are sorted in decreasing order according to cost.

costs
Figure 4 Execution cost comparison for each workflow application. Words in parenthesis (current), (P<0.1), (P<0.05) and (P<0.01) stand for the current and the probabilistic bidding methods.

Statistical Significance

Table 7↓ presents the raw and percentual cost savings of SIAA with respect to Scaling First as well as the results of the Mann–Whitney U test.

Table 7 Cost savings of SIAA (with and without using spot instances) compared with Scaling First. Results are aggregated by workflow application.
Workflow Strategy Reduction Percentage U statistic p-value significant?
CyberShake SIAA-no-spots 1.6 5.99% 713 < 0.001 yes
SIAA-greedy (P<0.01) 10.7 40.13% 900 < 0.001 yes
SIAA-full (P<0.05) 11.4 42.79% 900 < 0.001 yes
LIGO’s Inspiral SIAA-no-spots 71.5 21.58% 900 < 0.001 yes
SIAA-greedy (P<0.01) 148.7 44.85% 900 < 0.001 yes
SIAA-full (P<0.01) 158.9 47.93% 900 < 0.001 yes
Montage SIAA-no-spots 0.8 3.23% 528.5 0.246 no
SIAA-greedy (P<0.1) 8.8 37.80% 900 < 0.001 yes
SIAA-full (P<0.1) 8.8 37.80% 900 < 0.001 yes
SIPHT SIAA-no-spots 87.1 26.40% 894 < 0.001 yes
SIAA-greedy (P<0.01) 142.9 43.31% 900 < 0.001 yes
SIAA-full (P<0.01) 148.1 44.89% 900 < 0.001 yes

Effect of Task Failures

Figure 5↓ shows the effect of tasks failures in the speedup (on top) and cost-of-execution metrics (at the bottom) for the mentioned autoscaling strategies.

failure-trends
Figure 5 Effect of tasks failures in the speedup of the applications and the total execution cost. Curves correspond to strategies using all the bidding strategies.

Detailed Results

This section cantains the tables of results corresponding to the experiments performed in this study. In particular, results corresponding to the number of instances, speedup and cost of execution are presented.
Table 8↓ reports the average values of the total number of instances considering on-demand and spot instances, the 2 following columns show the number of instances discriminating between both types and the last column shows the percentage of spot instances.

Table 8 Comparison of the number of instances for the autoscaling strategies per workflow. Strategies using spot instances (at the bottom) and strategies not using spot instances (on top) are separated with an horizontal line.
CyberShake
Strategy total on-demand spot % of spot
Scaling First 12.9 12.9 0.0 0.00%
SIAA-no-spots 12.8 12.8 0.0 0.00%
SIAA-greedy (current) 77.1 9.0 68.2 88.39%
SIAA-greedy (P<0.1) 26.5 7.1 19.4 73.25%
SIAA-greedy (P<0.05) 25.1 7.0 18.1 72.03%
SIAA-greedy (P<0.01) 22.5 7.0 15.5 68.87%
SIAA-full (current) 80.1 9.1 71.1 88.67%
SIAA-full (P<0.1) 26.6 7.1 19.5 73.44%
SIAA-full (P<0.05) 25.1 7.0 18.1 72.04%
SIAA-full (P<0.01) 22.6 7.0 15.6 69.08%
LIGO’s Inspiral
Strategy total on-demand spot % of spot
Scaling First 32.9 32.9 0.0 0.00%
SIAA-no-spots 33.0 33.0 0.0 0.00%
SIAA-greedy (current) 245.9 24.4 221.6 90.09%
SIAA-greedy (P<0.1) 54.6 18.2 36.4 66.66%
SIAA-greedy (P<0.05) 51.4 18.0 33.4 65.02%
SIAA-greedy (P<0.01) 47.0 17.4 29.6 62.97%
SIAA-full (current) 261.8 25.5 236.3 90.26%
SIAA-full (P<0.1) 53.7 17.8 35.8 66.78%
SIAA-full (P<0.05) 51.8 17.9 33.9 65.50%
SIAA-full (P<0.01) 47.2 17.3 29.8 63.24%
Montage
Strategy total on-demand spot % of spot
Scaling First 8.2 8.2 0.0 0.00%
SIAA-no-spots 8.3 8.3 0.0 0.00%
SIAA-greedy (current) 57.8 4.9 52.8 91.44%
SIAA-greedy (P<0.1) 16.5 4.3 12.2 73.81%
SIAA-greedy (P<0.05) 15.7 4.3 11.4 72.53%
SIAA-greedy (P<0.01) 13.8 4.4 9.3 67.85%
SIAA-full (current) 57.5 4.9 52.5 91.41%
SIAA-full (P<0.1) 16.4 4.3 12.1 73.65%
SIAA-full (P<0.05) 15.5 4.2 11.3 72.69%
SIAA-full (P<0.01) 13.6 4.3 9.3 68.51%
SIPHT
Strategy total on-demand spot % of spot
Scaling First 8.0 8.0 0.0 0.00%
SIAA-no-spots 8.1 8.1 0.0 0.00%
SIAA-greedy (current) 219.0 8.1 210.8 96.29%
SIAA-greedy (P<0.1) 17.8 5.0 12.8 72.04%
SIAA-greedy (P<0.05) 15.2 4.4 10.8 71.20%
SIAA-greedy (P<0.01) 11.8 4.3 7.5 63.33%
SIAA-full (current) 233.5 8.1 225.3 96.52%
SIAA-full (P<0.1) 17.8 4.8 13.0 73.27%
SIAA-full (P<0.05) 15.1 4.3 10.8 71.37%
SIAA-full (P<0.01) 11.8 4.3 7.4 63.10%
Table 9↓ reports the mean, median, minimum (min) and maximum (max) speedups registered as well as the standard deviation (SD) of the registered speedups per strategy for each workflow application. The highest mean and median speedups are highlighted in bold font.

Table 9 Speedup comparison of the autoscaling strategies per workflow. Strategies using spot instances (at the bottom) and strategies not using spot instances (on top) are separated with an horizontal line.
CyberShake
Strategy mean median min max SD
Scaling First 60.0 61.4 48.0 69.2 6.4
SIAA-no-spots 63.8 66.3 47.7 73.9 7.7
SIAA-greedy (current) 55.4 51.8 36.8 89.6 15.2
SIAA-greedy (P<0.1) 80.7 82.1 68.3 91.8 6.2
SIAA-greedy (P<0.05) 79.6 80.1 67.5 91.7 6.7
SIAA-greedy (P<0.01) 77.1 76.4 64.4 89.6 6.5
SIAA-full (current) 59.4 54.4 33.5 90.0 16.7
SIAA-full (P<0.1) 83.5 83.5 69.2 92.3 4.7
SIAA-full (P<0.05) 82.7 83.9 69.3 92.3 6.0
SIAA-full (P<0.01) 80.7 82.9 66.2 91.2 6.2
LIGO’s Inspiral
Strategy mean median min max SD
Scaling First 126.3 127.4 109.2 142.7 11.8
SIAA-no-spots 159.7 162.1 141.8 176.3 10.4
SIAA-greedy (current) 101.1 94.4 68.6 149.1 20.3
SIAA-greedy (P<0.1) 150.8 166.4 107.7 170.6 24.1
SIAA-greedy (P<0.05) 153.8 166.4 114.7 170.6 20.5
SIAA-greedy (P<0.01) 159.0 166.4 134.1 170.6 11.6
SIAA-full (current) 105.9 105.0 70.6 166.2 22.1
SIAA-full (P<0.1) 160.1 170.8 109.1 175.6 23.9
SIAA-full (P<0.05) 166.8 170.8 120.0 175.6 15.5
SIAA-full (P<0.01) 170.9 170.9 160.9 175.6 3.1
Montage
Strategy mean median min max SD
Scaling First 18.2 18.1 16.6 20.1 1.0
SIAA-no-spots 18.3 18.0 17.3 20.0 0.9
SIAA-greedy (current) 16.0 16.2 12.3 19.9 2.3
SIAA-greedy (P<0.1) 22.9 23.4 19.5 27.1 2.4
SIAA-greedy (P<0.05) 23.5 23.9 19.5 27.3 1.9
SIAA-greedy (P<0.01) 23.3 23.9 21.1 24.9 1.2
SIAA-full (current) 16.0 16.3 12.3 19.8 2.2
SIAA-full (P<0.1) 22.9 23.4 19.5 27.1 2.3
SIAA-full (P<0.05) 23.6 23.9 19.5 27.2 1.8
SIAA-full (P<0.01) 23.4 23.9 21.2 25.0 1.2
SIPHT
Strategy mean median min max SD
Scaling First 26.1 25.7 22.8 29.5 1.8
SIAA-no-spots 33.4 33.9 29.9 36.1 2.0
SIAA-greedy (current) 24.2 24.2 16.0 34.7 4.9
SIAA-greedy (P<0.1) 27.9 27.6 20.2 38.3 4.6
SIAA-greedy (P<0.05) 32.9 35.6 20.1 38.2 5.5
SIAA-greedy (P<0.01) 34.6 35.6 27.8 37.4 2.3
SIAA-full (current) 26.9 27.3 16.6 36.7 5.8
SIAA-full (P<0.1) 28.2 28.0 21.5 36.8 4.1
SIAA-full (P<0.05) 33.0 35.4 23.9 37.1 4.4
SIAA-full (P<0.01) 36.1 36.0 34.6 37.6 0.9
Table 10↓ reports the results with respect to the cost of execution. Values represent the price to pay for the execution of each workflow application. For each combination of application and autoscaling strategy, the table shows mean, median, minimum (min), maximum (max) and standard deviation (SD) of the registered execution costs. Minimum, mean and median cost of execution are highlighted in bold font.

Table 10 Execution cost comparison of the autoscaling strategies per workflow measured in USD. Strategies using spot instances (at the bottom) and strategies not using spot instances (on top) are separated with an horizontal line.
CyberShake
Strategy mean median min max SD
Scaling First 27.2 26.6 25.9 30.8 1.3
SIAA-no-spots 25.4 25.0 23.6 26.9 1.1
SIAA-greedy (current) 20.7 21.0 14.5 27.5 3.6
SIAA-greedy (P<0.1) 16.4 16.6 11.6 21.7 2.2
SIAA-greedy (P<0.05) 16.0 16.2 11.6 19.8 1.7
SIAA-greedy (P<0.01) 16.2 15.9 12.8 20.6 1.9
SIAA-full (current) 19.3 19.0 12.5 25.2 3.3
SIAA-full (P<0.1) 15.6 15.6 11.7 21.3 2.2
SIAA-full (P<0.05) 15.3 15.2 11.4 21.4 2.2
SIAA-full (P<0.01) 15.5 15.5 10.8 20.6 1.9
LIGO’s Inspiral
Strategy mean median min max SD
Scaling First 326.6 331.5 297.4 354.8 16.8
SIAA-no-spots 264.0 259.9 236.3 293.9 17.0
SIAA-greedy (current) 230.9 235.4 184.2 274.2 22.8
SIAA-greedy (P<0.1) 190.4 182.9 145.9 260.2 27.7
SIAA-greedy (P<0.05) 188.7 184.4 145.9 265.4 27.9
SIAA-greedy (P<0.01) 183.0 182.8 162.8 203.6 10.9
SIAA-full (current) 221.3 227.8 157.8 255.6 21.6
SIAA-full (P<0.1) 181.7 176.1 148.4 252.5 25.7
SIAA-full (P<0.05) 171.8 172.7 143.5 195.1 14.8
SIAA-full (P<0.01) 169.8 172.6 146.3 191.6 15.0
Montage
Strategy mean median min max SD
Scaling First 23.7 23.3 20.4 26.0 1.7
SIAA-no-spots 23.2 22.5 21.3 26.0 1.7
SIAA-greedy (current) 19.6 19.9 12.3 23.6 2.7
SIAA-greedy (P<0.1) 14.6 14.5 11.0 20.4 2.4
SIAA-greedy (P<0.05) 14.8 14.8 11.9 19.0 1.6
SIAA-greedy (P<0.01) 15.2 15.1 12.3 18.1 1.3
SIAA-full (current) 19.3 19.3 12.3 23.6 2.9
SIAA-full (P<0.1) 14.4 14.5 10.7 20.4 2.5
SIAA-full (P<0.05) 14.6 15.0 11.5 19.0 1.7
SIAA-full (P<0.01) 15.1 15.1 11.3 18.1 1.5
SIPHT
Strategy mean median min max SD
Scaling First 322.1 329.9 264.3 373.4 30.8
SIAA-no-spots 245.2 242.8 226.2 283.7 14.0
SIAA-greedy (current) 244.8 250.2 162.9 315.2 39.7
SIAA-greedy (P<0.1) 225.4 228.6 155.1 303.9 37.0
SIAA-greedy (P<0.05) 202.3 190.9 149.4 307.3 42.1
SIAA-greedy (P<0.01) 192.0 187.1 159.0 225.5 19.3
SIAA-full (current) 220.8 219.4 149.0 285.1 36.2
SIAA-full (P<0.1) 217.8 219.2 168.9 301.1 34.0
SIAA-full (P<0.05) 195.1 190.8 146.1 288.1 35.1
SIAA-full (P<0.01) 183.8 181.8 147.9 226.8 24.4

References

[1] S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, Mei-Hui Su, K. Vahi: “Characterization of scientific workflows”, Workflows in Support of Large-Scale Science, 2008. WORKS 2008. Third Workshop on, pp. 1—10, 2008.

[2] Rodrigo N Calheiros, Rajiv Ranjan, Anton Beloglazov, César AF De Rose, Rajkumar Buyya: “CloudSim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms”, Software: Practice and Experience, pp. 23—50, 2011.

[3] H. B. Mann, D. R. Whitney: “On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other”, The Annals of Mathematical Statistics, pp. 50—60, 1947.

[4] Ming Mao, Marty Humphrey: “Auto-scaling to minimize cost and meet application deadlines in cloud workflows”, Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 49, 2011.

[5] Ming Mao, Marty Humphrey: “Scaling and scheduling to maximize application performance within budget constraints in cloud workflows”, Parallel & Distributed Processing (IPDPS), 2013 IEEE 27th International Symposium on, pp. 67—78, 2013.