Workload Modelling and Elasticity Management of Data-Intensive Systems

dc.contributor.authorKhoshkbar Foroushha, Ali Reza
dc.date.accessioned2018-12-05T03:01:49Z
dc.date.available2018-12-05T03:01:49Z
dc.date.issued2018
dc.description.abstractEfficiently and effectively processing large volume of data (often at high velocity) using an optimal mix of data-intensive systems (e.g., batch processing, stream processing, NoSQL) is the key step in the big data value chain. Availability and affordability of these data-intensive systems as cloud managed services (e.g, AmazonElastic MapReduce, Amazon DynamoDB) have enabled data scientists and software engineers to deploy versatile data analytics flow applications, such as click-stream analysis and collaborative filtering with less efforts. Although easy to deploy, run-time performance and elasticity management of these complex data analytics flow applications has emerged as a major challenge. As we discuss later in this thesis, the data analytics flow applications combine multiple programming models for per-forming specialized and pre-defined set of activities, such as ingestion, analytics, and storage of data. To support users across such heterogeneous workloads where they are charged for every CPU cycle used and every data byte transferred in or out of the cloud datacenter, we need a set of intelligent performance and workload management techniques and tools. Our research methodology investigates and develops these techniques and tools by significantly extending the well known formal mod-els available from other disciplines of computer science including machine learning, optimization and control theory. To this end, this PhD dissertation makes the following core research contributions: a) investigates a novel workload prediction models (based on machine learn-ing techniques, such as Mixture Density Networks) to forecast how performance parameters of data-intensive systems are affected due to run-time variations in dataflow behaviours (e.g. data volume, data velocity, query mix) b) investigates control-theoretic approach for managing elasticity of data-intensive systems for ensuring the achievement of service level objectives. In the former (a), we propose a novel application of Mixture Density Networks in distribution-based resource and performance modelling of both stream and batch processing data-intensive systems. We argue that distribution-based resource and performance modelling approach, unlike the existing single point techniques, is able to predict the whole spectrum of resource usage and performance behaviours as probability distribution functions. Therefore, they provide more valuable statistical measures about the system performance at run-time. To demonstrate the usefulness of our technique, we apply it to undertake following workload management activities: i) predictable auto-scaling policy setting which highlights the potential of distribution prediction in consistent definition of cloud elasticity rules; and ii) designing a predictive admission controller which is able to efficiently admit or reject incoming queries based on probabilistic service level agreements compliance goals. In the latter (b), we apply advanced techniques in control and optimization theory, for designing an adaptive control scheme that is able to continuously detect and self-adapt to workload changes for meeting the users’ service level objectives. More-over, we also develop a workload management tool called Flower for end-to-end elasticity management of different data-intensive systems across the data analytics flows. Through extensive numerical and empirical evaluation we validate the pro-posed models, techniques and tools.en_AU
dc.identifier.otherb58077777
dc.identifier.urihttp://hdl.handle.net/1885/154330
dc.language.isoen_AUen_AU
dc.subjectData-Intensive Systemsen_AU
dc.subjectBig Dataen_AU
dc.subjectWorkload Modellingen_AU
dc.subjectElasticityen_AU
dc.subjectPerformance predictionen_AU
dc.titleWorkload Modelling and Elasticity Management of Data-Intensive Systemsen_AU
dc.typeThesis (PhD)en_AU
dcterms.valid2018en_AU
local.contributor.affiliationANU College of Engineering & Computer Scienceen_AU
local.contributor.authoremaila.khoshkbarforoushha@anu.edu.auen_AU
local.contributor.supervisorRanjan, Rajiv
local.contributor.supervisorcontactraj.ranjan@ncl.ac.uken_AU
local.description.notesThe author has deposited the thesis.en_AU
local.identifier.doi10.25911/5d514439629c1
local.mintdoimint
local.type.degreeDoctor of Philosophy (PhD)en_AU

Downloads

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Khoshkbarforoushha A Thesis 2018.pdf
Size:
3.16 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
884 B
Format:
Item-specific license agreed upon to submission
Description: