Model-based data generation for the evaluation of functional reliability and resilience of distributed machine learning systems against abnormal cases

Altschaffel, Robert; Dittmann, Jana; Scheliga, Daniel; Seeland, Marco; Mäder, Patrick

doi:10.22032/dbt.58936

Buchkapitel Fr., 03. Nov.. 2023 CC BY-SA 4.0

Veröffentlicht

Model-based data generation for the evaluation of functional reliability and resilience of distributed machine learning systems against abnormal cases

Altschaffel, Robert ; Dittmann, Jana ; Scheliga, Daniel; Seeland, Marco ; Mäder, Patrick

Future production technologies will comprise a multitude of systems whose core functionality is closely related to machine-learned models. Such systems require reliable components to ensure the safety of workers and their trust in the systems. The evaluation of the functional reliability and resilience of systems based on machine-learned models is generally challenging. For this purpose, appropriate test data must be available, which also includes abnormal cases. These abnormal cases can be unexpected usage scenarios, erroneous inputs, accidents during operation or even the failure of certain subcomponents.
In this work, approaches to the model-based generation of an arbitrary abundance of data
representing such abnormal cases are explored. Such computer-based generation requires
domain-specific approaches, especially with respect to the nature and distribution of the data, protocols used, or domain-specific communication structures. In previous work, we found that different use cases impose different requirements on synthetic data, and the requirements in turn imply different generation methods [1]. Based on this, various use cases are identified and different methods for computer-based generation of realistic data, as well as for the quality assessment of such data, are explored.
Ultimately we explore the use of Federated Learning (FL) to address data privacy and security challenges in Industrial Control Systems. FL enables local model training while keeping sensitive information decentralized and private to their owners. In detail, we investigate whether FL can benefit clients with limited knowledge by leveraging collaboratively trained models that aggregate client-specific knowledge distributions. We found that in such scenarios federated training results in a significant increase in classification accuracy by 31.3% compared to isolated local training. Furthermore, as we introduce Differential Privacy, the resulting model achieves on par accuracy of 99.62% to an idealized case where data is independent and identically distributed across clients.