Workload Modelling and Elasticity Management of Data-Intensive Systems
dc.contributor.author | Khoshkbar Foroushha, Ali Reza | |
dc.date.accessioned | 2018-12-05T03:01:49Z | |
dc.date.available | 2018-12-05T03:01:49Z | |
dc.date.issued | 2018 | |
dc.description.abstract | Efficiently and effectively processing large volume of data (often at high velocity) using an optimal mix of data-intensive systems (e.g., batch processing, stream processing, NoSQL) is the key step in the big data value chain. Availability and affordability of these data-intensive systems as cloud managed services (e.g, AmazonElastic MapReduce, Amazon DynamoDB) have enabled data scientists and software engineers to deploy versatile data analytics flow applications, such as click-stream analysis and collaborative filtering with less efforts. Although easy to deploy, run-time performance and elasticity management of these complex data analytics flow applications has emerged as a major challenge. As we discuss later in this thesis, the data analytics flow applications combine multiple programming models for per-forming specialized and pre-defined set of activities, such as ingestion, analytics, and storage of data. To support users across such heterogeneous workloads where they are charged for every CPU cycle used and every data byte transferred in or out of the cloud datacenter, we need a set of intelligent performance and workload management techniques and tools. Our research methodology investigates and develops these techniques and tools by significantly extending the well known formal mod-els available from other disciplines of computer science including machine learning, optimization and control theory. To this end, this PhD dissertation makes the following core research contributions: a) investigates a novel workload prediction models (based on machine learn-ing techniques, such as Mixture Density Networks) to forecast how performance parameters of data-intensive systems are affected due to run-time variations in dataflow behaviours (e.g. data volume, data velocity, query mix) b) investigates control-theoretic approach for managing elasticity of data-intensive systems for ensuring the achievement of service level objectives. In the former (a), we propose a novel application of Mixture Density Networks in distribution-based resource and performance modelling of both stream and batch processing data-intensive systems. We argue that distribution-based resource and performance modelling approach, unlike the existing single point techniques, is able to predict the whole spectrum of resource usage and performance behaviours as probability distribution functions. Therefore, they provide more valuable statistical measures about the system performance at run-time. To demonstrate the usefulness of our technique, we apply it to undertake following workload management activities: i) predictable auto-scaling policy setting which highlights the potential of distribution prediction in consistent definition of cloud elasticity rules; and ii) designing a predictive admission controller which is able to efficiently admit or reject incoming queries based on probabilistic service level agreements compliance goals. In the latter (b), we apply advanced techniques in control and optimization theory, for designing an adaptive control scheme that is able to continuously detect and self-adapt to workload changes for meeting the users’ service level objectives. More-over, we also develop a workload management tool called Flower for end-to-end elasticity management of different data-intensive systems across the data analytics flows. Through extensive numerical and empirical evaluation we validate the pro-posed models, techniques and tools. | en_AU |
dc.identifier.other | b58077777 | |
dc.identifier.uri | http://hdl.handle.net/1885/154330 | |
dc.language.iso | en_AU | en_AU |
dc.subject | Data-Intensive Systems | en_AU |
dc.subject | Big Data | en_AU |
dc.subject | Workload Modelling | en_AU |
dc.subject | Elasticity | en_AU |
dc.subject | Performance prediction | en_AU |
dc.title | Workload Modelling and Elasticity Management of Data-Intensive Systems | en_AU |
dc.type | Thesis (PhD) | en_AU |
dcterms.valid | 2018 | en_AU |
local.contributor.affiliation | ANU College of Engineering & Computer Science | en_AU |
local.contributor.authoremail | a.khoshkbarforoushha@anu.edu.au | en_AU |
local.contributor.supervisor | Ranjan, Rajiv | |
local.contributor.supervisorcontact | raj.ranjan@ncl.ac.uk | en_AU |
local.description.notes | The author has deposited the thesis. | en_AU |
local.identifier.doi | 10.25911/5d514439629c1 | |
local.mintdoi | mint | |
local.type.degree | Doctor of Philosophy (PhD) | en_AU |