Download PDFOpen PDF in browser

Distributed Computing for Advanced Smart Meter Data Management with focus on Electrical Utility Applications

EasyChair Preprint no. 2567

6 pagesDate: February 5, 2020


With the advent of internet-of-things devices and sensors in smart grid, big data analytics tools have recently gained immense research interest for big data management and parallel processing of data. However, for the efficient use of big data analytics platforms, complex parameter configurations and in-depth understanding of the data processing design concept are essential. In this work, we analyze the parallelization by utilizing spark regression python library to assess the performance with workloads on up to 8 nodes. With the analysis of the effect of different configurations and architecture on the performance of Apache Spark, it was found that a trade-off between the number of nodes and cores is necessary to perform efficient parallel computing. A set of combinations of nodes and cores are considered to evaluate the response of the run time. The work also shows the importance of high-performance computing capability for the big data management in the smart meters. We infer that the computational time is not only dependent on the size but also on the number of compute nodes and the number of cores used to execute the program.

Keyphrases: Apache Spark, Big data Parallel computing, execution time, High Performance Computing, load forecast, parallel computing, run-time, Smart Grid, Smart Meter, spark machine learning

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {Ameema Zainab and Shady S. Refaat and Haitham Abu-Rub and Othmane Bouhali},
  title = {Distributed Computing for Advanced Smart Meter Data Management with focus on Electrical Utility Applications},
  howpublished = {EasyChair Preprint no. 2567},

  year = {EasyChair, 2020}}
Download PDFOpen PDF in browser