Download PDFOpen PDF in browser

Is it feasible to identify outputs of an arbitrary process at run time without excessively slowing down workflows?

12 pagesPublished: December 11, 2023

Abstract

In this study, we explore the feasibility of identifying file events for any process in real-time without significant workflow slowdowns, to aid in generating a data provenance report for the dynamic workflow manager, MEOW. Unlike traditional workflow managers, MEOW’s output location isn’t pre-defined, and output can initiate another job. We es- tablished criteria and examined four Linux tools: strace, perf script, inotify, and fanotify. Our findings suggest that strace meets our requirements, and integrating an strace-based tracer into MEOW is both theoretically and practically viable. While the implemented tracer slows the workflow by approximately 1.3 times, worst-case scenarios show it could be up to 5 times. This research forms the base for constructing MEOW’s data provenance report.

Keyphrases: data provenance report, fanotify, inotify, meow, perf, strace, Workflow Manager

In: Lindsay Quarrie (editor). Proceedings of 2023 Concurrent Processes Architectures and Embedded Systems Hybrid Virtual Conference, vol 17, pages 81--92

Links:
BibTeX entry
@inproceedings{COPA2023:Is_it_feasible_to,
  author    = {Philip Shun Jensen and Iben Lilholm and David Marchant},
  title     = {Is it feasible to identify outputs of an arbitrary process at run time without excessively slowing down workflows?},
  booktitle = {Proceedings of 2023 Concurrent Processes Architectures and Embedded Systems Hybrid Virtual Conference},
  editor    = {Lindsay Quarrie},
  series    = {Kalpa Publications in Computing},
  volume    = {17},
  pages     = {81--92},
  year      = {2023},
  publisher = {EasyChair},
  bibsource = {EasyChair, https://easychair.org},
  issn      = {2515-1762},
  url       = {https://easychair.org/publications/paper/H7MN},
  doi       = {10.29007/mj18}}
Download PDFOpen PDF in browser