Software supporting the paper: "Comparing machine learning models to choose the variable ordering for cylindrical algebraic decomposition"

This toolbox supports the results in the following publication: M. England and D. Florescu. Comparing machine learning models to choose the variable ordering for cylindrical algebraic decomposition. To appear in Proc CICM 2019, Springer LNCS, 2019. Preprint: Arxiv:1904.11061 The authors are supported by EPSRC Project EP/R019622/1: Embedding Machine Learning within Quantifer Elimination Procedures. This toolbox requires the NLSAT database, which can be downloaded at https://cs.nyu.edu/~dejan/nonlinear/. The first file in this toolbox was run in Matlab R2018b. The CAD routine was run in Maple 2018, with an updated version of the RegularChains Library downloaded in February 2019 from http://www.regularchains.org. This updated library contains bug fixes and additional functionality. The training and evaluation of the machine learning models was done using the scikit-learn package v0.20.2 for Python 2.7. I. << Load_NLSAT.m >> converts the redlog files corresponding to all problems with 3 variables to .txt files NLSAT database =>> \poly\poly1.txt - \poly\poly7200.txt II. <<generate_CAD_data.py>> Input: <<poly\poly*.txt>> Output: <<comp_times\comp_times*.txt, comp_times\challenging_formulas_more_than_64s.txt>> # observation: comp_times is the folder with the computation times for all problems, which are further on split into training and testing Description: - for each problem in \poly, the script copies the corresponding file to \poly\pol.txt and calls the following Maple script <<Maple_Script3.mpl>> Input: <<poly\pol.txt, poly\var_order.txt, poly\time_limit.txt>> Output: <<comp_times\comp_times*.txt>> - for each problem, the time limit starts from 4s and doubles if all of the orderings timed out - other files used/generated by the script: <<poly\rand_order_path.txt>> # pre generated file in pickle format containing a fixed randomised order of the 7200 file names in poly\; is used only for generating the testing data; III.<<generate_CAD_test_data.py>> Input: <<poly_test\poly*.txt>> Output: <<poly\comp_times_test\comp_times*.txt>> Description: - for each problem in poly_test, the script copies the corresponding file to \poly_test\pol.txt and calls the following Maple script <<Maple_Script_test.mpl>> Input: <<poly_test\pol.txt, poly_test\var_order.txt, poly_test\time_limit.txt>> Output: <<poly\comp_times2\comp_times.txt>> - for each problem, the time limit is fixed at 128s. IV. <<data_gen_ml_test.ipynb>> Input: <<'poly\poly*.txt', 'comp_times_test\comp_times*.txt'>> Output: <<'ML_data\y_test.txt', 'ML_data\features_test.txt', 'ML_data\no_optimal_ordering_test.txt', 'ML_data\comp_times_test.txt', 'ML_data\formula_numbers_test.txt'>> Description: - generates the input\output testing data for ML - generates the formula_numbers_test.txt, which is later used to work out the training data - other files used by the script: - all formula numbers from poly_test are written in <<ML_data\formula_numbers_test.txt>> to be used later for ML V. <<data_gen_ml_brown_test.ipynb>> Input: <<'poly\poly*.txt', 'comp_times_test\comp_times*.txt'>> Output: <<'ML_data\y_brown_test.txt', 'ML_data\formula_numbers_brown_test.txt'>> Description: - generates the Brown heuristic predictions on the testing data. VI. <<data_gen_sotd_test.mpl>> Input: <<'poly\poly*.txt', 'comp_times_test\comp_times*.txt'>> Output: <<'ML_data\y_sotd_string.txt','ML_data/formula_numbers_sotd.txt'>> Description: - generates the sotd predictions on the testing dataset - observation: the numbers in formula_numbers_sotd.txt are in a different order than in formula_numbers_test VII.<<data_gen_ml_train.ipynb>> Input: <<ML_data\formula_numbers_test.txt, comp_times\comp_times*.txt, poly\poly*.txt>> Output: <<ML_data\:y_train.txt, features_train.txt, no_optimal_ordering_train.txt, formula_numbers_train.txt>> Description: - generates the input\output data for ML (the heuristics have nothing to do with this step) - formula_numbers_train.txt includes {1,...,7200}\formula_numbers_test.txt VIII.<<ML_data\ML_training_and_prediction.py>> <<ML_data\models_dictionary.py>> Input: <<'ML_data\y_train.txt','ML_data\features_train.txt','ML_data\features_test.txt'>> Output: <<'ML_data\y_'+model_class_name+'_test.txt'>> Description: - generates ML predictions on the testing dataset for: DT, MLP, SVC and KNN Steps for training a model: - the ranges for performing CV are stored in a dictionary coded in models_dictionary.py, in the part corresponding to model_class_name - after the ranges are defined, the model is selected in ML_training_and_prediction.py, e.g., model_class_name='DecisionTreeClassifier' - if they want to train the model, the user should select the choice to perform grid search CV on the training dataset - the best parameters returned by grid search should be manually entered in models_dictionary using the 'def' key of the dictionary - the ranges should be reset and the process repeated until the user is confident that the CV performance could not be improved by extending the range or using a finer resolution - after training, 'ML_training_and_prediction.py' should be run again. This time the user should skip training and select 'y' when asked 'Continue with the previous default parameters from "models_dictionary.py"?' - this will generate the model predictions and save them in 'y_'+model_class_name+'_test.txt' IX. <<test_comparative_performance.ipynb>> Input: <<'ML_data/y_sotd_string.txt', 'ML_data/formula_numbers_sotd.txt', 'ML_data/formula_numbers_test.txt', 'ML_data/y_brown_test.txt', 'ML_data/y_test.txt', 'ML_data/y_LinearSVC_test.txt', 'ML_data/y_SVC_test.txt', 'ML_data/y_MLPClassifier_test.txt', 'ML_data/y_KNeighborsClassifier_test.txt', 'ML_data/y_DecisionTreeClassifier_test.txt', 'ML_data/comp_times_test.txt'>> => time_ ...(all methods) & accuracy ...(all methods) Output: Prints the performance of each method Description: - computes the accuracy and total time using the final prediction on the testing dataset with all methods - plots a histogram showing the percentage increase in computing time with every method, compared to the minimum time

Related Organizations

Coventry University
United Kingdom

Keywords

machine learning, non-linear real arithmetic, cylindrical algebraic decomposition, computer algebra, symbolic computation

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	1
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Usage byUsageCounts

visibility	views	43
download	downloads	2

43
views
2
downloads
Powered by

Found an issue? Give us feedback

visibility

download

1

Average

43

2