Benchmarking distributed stream processing frameworks for classical machine learning applications (MS Dissertation)

Show simple item record

dc.contributor.advisor Dr. Timothy A. Gonsalves
dc.contributor.advisor Dr. Sriram Kailasam
dc.contributor.author Sundar, Merlin
dc.date.accessioned 2021-05-04T06:45:59Z
dc.date.available 2021-05-04T06:45:59Z
dc.date.issued 2021-04
dc.identifier.uri http://hdl.handle.net/123456789/427
dc.description A thesis submitted for the award of the degree of Doctor of Philosophy under the guidance of Dr. Timothy A. Gonsalves and Dr. Sriram Kailasam (Faculty, SCEE) en_US
dc.description.abstract In India, the large telecom service providers each serve 100 million - 400 mil lion subscribers. Where, each telecommunications network may contain hundreds or more different types of network devices transmitting data among each other and to the customer subscriber. In this scenario, a Network Management System (NMS) may collect millions of records/sec of data. There can be lots of network faults that keep happening in real-time. Some of these may be of low priority while others may be of high priority. Hence, it becomes imperative to analyze the data in real-time to manage the high priority network faults in real-time too. In view of the growing com plexity and rapid changes in the demands on the network, machine learning (ML) techniques are being used for advanced NMS. ML models are typically computa tionally intensive, involving training and testing phases. To handle the huge volume of data streaming at high velocity, we not only require powerful machines but also mechanisms to distribute the computation involved across multiple nodes. There are several open-source distributed stream processing frameworks such as Apache Storm, Apache Flink, Apache Spark and Confluent Kafka for building real-time ma chine learning applications. Prior works benchmarked some of these platforms using low-level operations like filters, joins, windowed computations etc. In this thesis, we first survey multiple Distributed Stream Processing Frameworks qualitatively for choosing appropriate frameworks and also Message Queuing Appli cation for ordered message delivery. Once the platforms are decided, we benchmark our four chosen DSPFs for their applicability to execute classical machine learning models. For variety in complexity of computation, we have chosen three classical machine leaning models - Online K-Means, Online Linear Regression and Online Logistic Regression. We study the following quantitative metrics of evaluation: throughput, latency, CPU utilization, memory usage and Input/Output usage. The experiments were conducted in both standalone and clusters setups to determine the scalability of the models. In this study, we found that all four frameworks are comparable, except Apache Spark performs marginally better than the others for standalone setup for all algorithms. Whereas, for cluster node setup the best per forming framework varies between Apache Storm and Apache Spark. We have alsoobserved speedup across different setups. These results can help system designers choose the right model and the right framework, given a specific configuration of streaming data. Later in the thesis, we also discuss a direct application based on the benchmark ing experiment, called the Configuration Planner. This Configuration Planner is designed to make recommendations to a telecom network administrator for a server configuration based on the network size to be managed. We describe in detail the parameters involved, design overview and also the data structures of each compo nent of this planner. This thesis covers the design aspect of the planner and also a possible user interface of the Planner. en_US
dc.language.iso en_US en_US
dc.publisher IITMandi en_US
dc.subject Machine Learning Algorithms en_US
dc.subject Distributed Stream Processing Frameworks en_US
dc.title Benchmarking distributed stream processing frameworks for classical machine learning applications (MS Dissertation) en_US
dc.type Thesis en_US


Files in this item

This item appears in the following Collection(s)

Show simple item record

Search IIT Mandi Repository


Advanced Search

Browse

My Account