SLA violation prediction : a machine learning perspective

Cloud computing reduces the maintenance costs of services and allows users to access on demand services without being involved in technical implementation details. The relationship between a cloud provider and a customer is governed with a Service Level Agreement (SLA) that is established to define the level of the service and its associated costs. SLA usually contains specific parameters and a minimum level of quality for each element of the service that is negotiated between a cloud provider and a customer. However, one or more than one of the agreed terms in an SLA might be violated due to several issues such as occasional technical problems. Violations do happen in real world. In terms of availability, Amazon Elastic Cloud faced an outage in 2011 when it crashed and many large customers such as Reddit and Quora were down for more than one day. As SLA violation prediction benefits both user and cloud provider, in recent years, cloud researchers have started investigating models that are capable of prediction future violations. From a Machine Learning point of view, the problem of SLA violation prediction amounts to a binary classification problem. In this thesis, we explore two Machine Learning classification models: Naive Bayes and Random Forest to predict future violations using features of a submitted task. Unlike previous works on SLA violation prediction or avoidance, our models are trained on a real world dataset which introduces new challenges. We validate our models using Google Cloud Cluster trace as the dataset. Since SLA violations are rare events in real world 2.2 %, the classification task becomes more challenging because the classifier will always have the tendency to predict the dominant class. In order to overcome this issue, we use several re-sampling methods such as Random Over-Sampling, Under-Sampling, SMOTH, NearMiss, One-sided Selection, Neighborhood Cleaning Rule and an ensemble of them to re-balance the dataset.

Le cloud computing réduit les coûts de maintenance des services et permet aux utilisateurs d'accéder à la demande aux services sans devoir être impliqués dans des détails techniques d'implémentation. Le lien entre un fournisseur de services cloud et un client est régi par une Validation du Niveau Service (VNS) qui définit pour chaque service le niveau et le coût associé. La VNS contient habituellement des paramètres spécifiques et un niveau minimum de qualité pour chaque élément du service qui est négocié entre les deux parties. Cependant, une ou plusieurs des conditions convenues dans une VNS pourraient être violées en raison de plusieurs problèmes tels que des problèmes techniques occasionnels. Du point de vue d'apprentissage automatique, le problème de la prédiction de violation de la VNS équivaut à un problème de classification binaire. Nous avons exploré deux modèles de classification en apprentissage automatique lors de cette thèse. Il s’agit des modèles de classification de Bayes naïve et de Forêts Aléatoires afin de prédire des violations futures d’une certaine tâche utilisant ses traits caractéristiques. Comparativement aux travaux précédents sur la prédiction d'une violation de la VNS, nos modèles ont été entraînés sur des ensembles de données réels introduisant ainsi de nouveaux défis. Nous avons validé le tout en utilisant Google Cloud Cluster trace comme avec l’ensemble de données. Les violations de la VNS étant des évènements rares 2.2 %, leur classification automatique reste une tâche difficile. Un modèle de classification aura en effet une forte tendance à prédire la classe dominante au détriment des classes rares. Pour répondre à ce problème, il existe plusieurs méthodes de ré-échantillonages telles que Random Over-Sampling, Under-Sampling, SMOTH, NearMiss, One-sided Selection, Neighborhood Cleaning Rule. Il est donc possible de les combiner afin de ré-équilibrer le jeu de données.

Keywords

Validation du niveau service, Forêt aléatoire, Classification de Bayes Naive, Naive Bayes, Machine learning, Cloud computing, Apprentissage automatique, Unbalanced classification, Service level agreements, Classification déséquilibrée, Random forest

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now