ARCH23: Papers with AbstractsPapers |
---|
Abstract. Benchmark Proposal: Neural Network Control Systems (NNCS) play critical roles in autonomy. However, verifying their correctness is a substantial challenge. In this paper, we consider the neural network compression of ACAS Xu, a popular benchmark usually considered for open-loop neural network verification. ACAS Xu is an air-to-air collision avoidance system for unmanned aircraft issuing horizontal turn advisories to avoid collision with an intruder aircraft. We propose specific properties and different system assumptions to use this system as a closed-loop NNCS benchmark. We present experimental results for our properties based on randomly generated test cases and provide simulation code. | Abstract. Tool presentation: When formally verifying models of cyber-physical systems, it is obviously important that their verification results can be transferred to all previous observations of the modeled systems. Our tool CORA makes it possible to transfer safety properties by checking whether all measurements of the real system lie in the set of reachable outputs of the corresponding model -- we call this reachset conformance checking. In addition, we provide strategies to establish reachset conformance by injecting nondeterminism in models. This can be seen as some form of system identification, where instead of finding the most likely parameters, we compute a set of parameter values -- not only for the model dynamics but also for the set of disturbances and measurement errors -- to establish reachset conformance. By replacing real measurements with simulation results from a high-fidelity model, one can also check whether a high-fidelity model conforms to a simple model. We demonstrate the usefulness of reachset conformance by several use cases. | Abstract. We present the results of the ARCH1 2023 friendly competition for formal verification of continuous and hybrid systems with linear continuous dynamics. In its seventh edition, three tools participated to solve nine different benchmark problems in the category for linear continuous dynamics (in alphabetical order): CORA, JuliaReach, and Verse. This report is a snapshot of the current landscape of tools and the types of benchmarks they are particularly suited for. Due to the diversity of problems, we are not ranking tools, yet the presented results provide one of the most complete assessments of tools for the safety verification of continuous and hybrid systems with linear continuous dynamics up to this date. | Luca Geretti, Julien Alexandre Dit Sandretto, Matthias Althoff, Luis Benet, Pieter Collins, Marcelo Forets, Elena Ivanova, Yangge Li, Sayan Mitra, Stefan Mitsch, Christian Schilling, Mark Wetzlinger and Daniel Zhuang Abstract. We present the results of a friendly competition for formal verification of continuous and hybrid systems with nonlinear continuous dynamics. The friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2023. This year, 6 tools participated: Ariadne, CORA, DynIbex, JuliaReach, KeYmaera X and Verse (in alphabetic order). These tools are applied to solve reachability analysis problems on six benchmark problems, two of them featuring hybrid dynamics. We do not rank the tools based on the results, but show the current status and discover the potential advantages of different tools. | Abstract. This report presents the results of a friendly competition for formal verification of continuous and hybrid systems with artificial intelligence (AI) components. Specifically, machine learning (ML) components in cyber-physical systems (CPS), such as feedforward neural networks used as feedback controllers in closed-loop systems, are considered, which is a class of systems classically known as intelligent control systems, or in more modern and specific terms, neural network control systems (NNCS). We broadly refer to this category as AI and NNCS (AINNCS). The friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2023. In the fifth edition of this AINNCS category at ARCH-COMP, three tools have been applied to solve ten different benchmark problems, which are CORA, JuliaReach and NNV. In reusing the benchmarks from the last iteration, we demonstrate the continuous progress in developing these tools: Two out of three tools can verify more instances than in the 2022 iteration. A novelty of this year’s iteration is the shared computation hardware that allows for a fairer comparison among the participants. | Alessandro Abate, Henk Blom, Nathalie Cauchi, Joanna Delicaris, Sofie Haesaert, Birgit van Huijgevoort, Abolfazl Lavaei, Anne Remke, Oliver Schön, Stefan Schupp, Fedor Shmarov, Sadegh Soudjani, Lisa Willemsen and Paolo Zuliani Abstract. This report is concerned with a friendly competition for formal verification and policy synthesis of stochastic models. The main goal of the report is to introduce new benchmarks and their properties within this category and recommend next steps toward next year’s edition of the competition. Given that the tools for stochastic models are at their early stages of development compared to those of non-probabilistic models, the main focus is to report on an initiative to collect a set of minimal benchmarks that all such tools can run, thus facilitating the comparison between the efficiency of the implemented techniques. This friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in Summer 2023. | Claudio Menghi, Paolo Arcaini, Walstan Baptista, Gidon Ernst, Georgios Fainekos, Federico Formica, Sauvik Gon, Tanmay Khandait, Atanu Kundu, Giulia Pedrielli, Jarkko Peltomäki, Ivan Porres, Rajarshi Ray, Masaki Waga and Zhenya Zhang Abstract. This report presents the results from the 2023 friendly competition in the ARCH workshop for the falsification of temporal logic specifications over Cyber-Physical Systems. We describe the benchmark models selected to compare the tools and the competition settings and provide background on the participating teams and tools. Finally, we present and discuss our results. | Abstract. This paper reports on the Hybrid Systems Theorem Proving (HSTP) category in the ARCH-COMP Friendly Competition 2023. The characteristic features of the HSTP category remain as in the previous edition: HSTP focuses on flexibility of programming languages as structuring principles for hybrid systems, unambiguity and precision of program semantics, and mathematical rigor of logical reasoning principles. The benchmark set includes nonlinear and parametric continuous and hybrid systems and hybrid games, each in three modes: fully automatic verification, semi-automatic verification from proof hints, proof checking from scripted tactics. This instance of the competition focuses on presenting the differences between the provers on a subset of the benchmark examples. | Abstract. The repeatability evaluation for the 7th International Competition on Verifying Continuous and Hybrid Systems (ARCH-COMP’23) is summarized in this report. The competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2023, affiliated with the 2023 Cyber-Physical Systems and Internet- of-Things Week (CPS-IoT Week). In its seventh edition, tools submitted artifacts through a new automated evaluation system and were synchronized with a Git repository for the repeatability evaluation and archiving, which were applied to solve benchmark instances through different competition categories. Due to procedural changes in execution through the automated system, fewer participants than in past iterations participated in the repeatability evaluation this year. The process was generally to submit scripts to automatically install and execute the tools in containerized virtual environments (specifically Dockerfiles to execute within Docker containers, along with execution scripts). With the automated evaluation system, most participating categories presented performance evaluation information from this common execution platform. |
|
|