Artifact for ASE 2025 Submission 'Navigating the Random Forest: Verifying ML Models in Safety-Critical Applications with SMT Solvers'

Abstract This artifact accompanies the paper submission "Navigating the Random Forest: Verifying ML Models in Safety-Critical Applications with SMT Solvers" and provides the implementation and resources necessary for detecting inconsistencies in Random Forest (RF) models using formal verification techniques. The primary aim is to identify prediction inconsistencies caused by specific input variable values. To achieve this, RF models are transformed into disjunctive normal form (DNF) and evaluated using the Z3 SMT solver. Another non-SMT method simply checks for equal conditions and extracts overlapping boundaries Contents This artifact contains the following items: README.md: this documentation LICENSE/LICENSE: license information of this artifact requirements.txt: list of external libraries required to run this artifact data/internal: a directory containing the JSON files of the Random Forest models src: a directory containing the source code of this artifact results/: a directory containing the experimental results produced from the execution of this artifact Execute Runtime Visualization Environment Setup (root permission required) For a detailed description of the python environment setup, please refer to the README.md file. Run Runtime Visualization After a successful environment setup, run the RF_runtime_visualization.py script. Check if current directory is: ...\RF_formal_verification\src If not, please navigate to it. Run RF_runtime_visualization.py Use console command: python RF_runtime_visualization.py Below is an example output shown on the console after the analysis is finished. ... 297 (0.3796778917312622, 0.3965378850698471] (0.8783503174781799, 1.0] (0.0, 0.5104166567325592] (0.0, 0.6826010048389435] (0.377976194024086, 1.0] (0, 1) 0.513 298 (0.3796778917312622, 0.3965378850698471] (0.8783503174781799, 1.0] (0.0, 0.5104166567325592] (0.6826010048389435, 1.0] (0.377976194024086, 1.0] (0, 1) 0.564 299 (0.3965378850698471, 1.0] (0.8783503174781799, 1.0] (0.5104166567325592, 1.0] (0.0, 1.0] (0.377976194024086, 1.0] (0, 1) 0.544 [300 rows x 7 columns] 2024-10-22 14:41:03,381 - INFO - Results exported to CSV file. 2024-10-22 14:41:03,381 - INFO - Found 300 inconsistencies. 2024-10-22 14:41:03,381 - INFO - Time taken for calculation: 52.79 seconds The execution of RF_runtime_visualization.py will result in two figures: Figure 1 shows the individual solver timings per call for Z3 and the non-SMT algorithm. Figure 2 shows the empirical runtime, including runtime-dependent parameters e1 and t_solver (ONLY FOR ALGORITHM: RF_inconsistency_analysis_nonSMT.py!) Both figures can be saved if desired (no autosave). The resulting figures are derived from data calculated on individual computing machines. Therefore, the displayed figures are highly dependent on hardware! Differences between individual machines and executions are to be expected. In particular, the visualization of the runtime limit will vary based on the evaluated runtime performance, leading to different scales in the plotted graph. Setup configuration You can change the used RF model JSON file and the desired inconsistency difference. The following configurations are supported: --model and --diff. For instance using model rf_20T_D5: python RF_runtime_visualization.py --model rf_20T_D5 # start the analysis using RF model rf_20T_D5 (no.trees=20, depth=5) Please make sure a model with the specified name actually exists within the default target folder data/internal. Please be aware that larger models may require significantly longer computation times (especially the implementation using Z3). Inconsistency threshold This threshold value defines the inconsistency difference between different tree predictions that is considered as relevant. A different threshold value can be set via: python RF_runtime_visualization.py --diff 0.75 # start the analysis using a threshold of >=0.75 for prediction inconsistencies Any values not within the required interval of [0, 1] are clamped! Both arguments (--model,--diff) can be used at the same time, like: python RF_runtime_visualization.py --model rf_20T_D5 --diff 0.75 Reproduction of Results Shown in ASE25 Submission For a detailed description of the reproduction steps, please refer to the README.md file. View the Experimental Results Per default two different results are produced: results/solver_results: CSV file that contains all identified prediction inconsistencies, for a specific configuration setup results/timing_results: CSV files that contain the individual solver times, for a specific solver and configuration setup The plain .csv files are processed via analyses that are not part of this artifact. Figures created and displayed by execution of RF_runtime_visualization.py can be saved manually using the corresponding GUI. Stand-alone Execution of RF_inconsistency_analysis_Z3.py and RF_inconsistency_analysis_nonSMT.py For a detailed description of the stand-alone exectution, please refer to the README.md file. Solvers For a detailed description of the solvers used, please refer to the README.md file. Data The data/internal folder contains different RF models, provided as JSON files, with various hyperparameter settings. These models are used to test the implementation of inconsistency detection. Random Forest Training Data: The training data used for the Random Forest models is not part of the provided artifact. RF Models For a detailed description of the Random Forest models please check the README.md. Known issues For information about known issues, please refer to the README.md file. Licence This project is licensed under GNU GPLv2. See the LICENCE file for more details.

Related Organizations

Ludwig-Maximilians-Universität München
Germany
Fraunhofer Society
Germany
Fraunhofer Institute for Cognitive Systems
Germany

Keywords

Explainable ML, SMT, Random Forest evaluation, Runtime evaluation, Formal Methods, Data-driven model verification

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average