Ensemble Learning of Run-time Prediction Models for Gene-Expression Analysis Workflows

Abstract. One of the core issues for the efficient management of workflow applications is the prediction of tasks performance. This paper proposes a novel approach that enables the construction models for predicting task’s running-times of data-intensive scientific workflows. Ensemble Machine Learning techniques are used to produce robust combined models of high predictive accuracy. Information provided by workflow systems (e.g. benchmarks) and the attributes and provenance of the data, are exploited to guarantee the accuracy of the models. The proposed approach has been tested on bioinformatic workflows for gene expressions analysis over homogeneous and heterogeneous computing environments. Obtained results highlight the convenience of using ensemble models in comparison with single/standalone prediction models. Ensemble learning techniques permitted reductions of the prediction error up to 14.8% (homogeneous) and 8.7% (heterogeneous) in comparison with single-model strategies.

 

Datasets


For this study we used performance data for 3 workflow tasks (Random and SVM-RFE rankers and GELF classifier). The performance data corrsponds to homogeneous and heterogeneous computing environments. [get datasets]

Results


Relative Absolute Errors

Relative Absolute Errors for the Homogeneous and Heterogeneous environments. Smallest errors are highlighted.

Prediction error – Homogeneous env. [%]
Prediction error – Heterogeneous env. [%]
Task
Strategy
mean
median
min
max
stdev
mean
median
min
max
stdev
 Random
k-NN
47.96
48.13
44.23
51.31
1.72
20.96
20.95
19.23
22.62
1.07
M5P
46.88
47.05
43.75
50.11
1.26
22.05
21.98
20.68
24.53
1.03
ANN
55.83
52.42
45.40
102.42
11.89
29.28
25.36
19.86
54.39
10.09
SVR
49.58
49.77
46.37
51.53
1.21
34.70
34.73
32.16
38.00
1.45
Bagging
46.63
46.41
43.92
53.72
1.86
19.96
19.94
18.08
21.63
0.86
SVM-RFE
k-NN
28.08
28.01
25.77
30.66
1.12
11.08
11.11
10.11
12.48
0.53
M5P
27.62
27.70
25.46
30.02
0.95
15.44
15.68
13.11
17.41
0.97
ANN
37.48
34.24
27.04
74.97
11.24
18.50
16.26
12.52
55.45
8.07
SVR
31.11
30.97
29.22
32.89
1.01
27.63
27.59
24.92
29.78
1.17
Bagging
26.54
26.68
24.80
28.26
0.95
10.31
10.33
9.46
11.23
0.50
GELF
k-NN
44.55
44.44
42.49
46.87
1.12
31.24
31.49
28.91
33.22
1.14
M5P
43.32
43.50
40.77
45.15
1.07
30.70
30.78
28.47
32.62
0.98
ANN
57.21
54.18
46.59
83.49
10.38
38.19
35.97
30.74
65.44
6.64
SVR
52.34
52.51
49.82
55.91
1.40
39.54
39.55
37.61
41.98
1.17
Bagging
28.85
28.72
27.13
31.36
1.16
22.36
22.06
20.57
31.48
1.86

Mann-Withney U test

The following list presents all the results for the Mann-Withney U test. Each line describes (1) the environment, (2) which is the learning strategy compared to Bagging, and (3) the result of the test. Here, false indicates that results coming from both strategies camo from different populations, and therefore that the differences in performance are significative. Conversely, true indicates that results come from the same population and therefore differences on performance are not significative.

gelf-homogeneous,NN3,false
gelf-homogeneous,NN3,false
gelf-homogeneous,M5P,false
gelf-homogeneous,MLP,false
gelf-homogeneous,SMOregRBF,false
gelf-heterogeneous,NN3,false
gelf-heterogeneous,M5P,false
gelf-heterogeneous,MLP,false
gelf-heterogeneous,SMOregRBF,false
svmrfe-homogeneous,NN3,false
svmrfe-homogeneous,M5P,false
svmrfe-homogeneous,MLP,false
svmrfe-homogeneous,SMOregRBF,false
svmrfe-heterogeneous,NN3,false
svmrfe-heterogeneous,M5P,false
svmrfe-heterogeneous,MLP,false
svmrfe-heterogeneous,SMOregRBF,false
random-homogeneous,NN3,false
random-homogeneous,M5P,true (performance improvements using Bagging are not significative for this case)
random-homogeneous,MLP,false
random-homogeneous,SMOregRBF,false
random-heterogeneous,NN3,false
random-heterogeneous,M5P,false
random-heterogeneous,MLP,false
random-heterogeneous,SMOregRBF,false

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s