Experimental dataset, analysis code, and trained models for: Comparative Evaluation of Six Machine Learning Models for Multi-Fuel Variable Compression Ratio Diesel Engine Emission Prediction Under Leave-One-Out Cross-Validation

This deposit contains the complete experimental dataset, reproducible analysis code, and serialised trained models supporting the journal article "Comparative Evaluation of Six Machine Learning Models for Multi-Fuel Variable Compression Ratio Diesel Engine Emission Prediction Under Leave-One-Out Cross-Validation" by Sathiyaseelan et al. (DOI to be inserted upon acceptance). A reviewer or replicator should be able to download this archive, install the listed Python dependencies, and run a single command to reproduce every numerical result, table, and figure underlying the manuscript. WHAT IS IN THIS DEPOSIT * data/ — The 45-record steady-state emission dataset from a single-cylinder, four-stroke, water-cooled, direct-injection variable compression ratio (VCR) diesel engine (Kirloskar TV1, 661 cc, 5.2 kW at 1500 rpm), provided as both UTF-8 CSV and Microsoft Excel formats. Records cover a balanced 3 × 3 × 5 factorial design: three fuels (diesel, rubber seed oil biodiesel, Chlorella vulgaris algae oil biodiesel), three compression ratios (16:1, 17:1, 18:1), and five engine load conditions (0, 25, 50, 75, 100 % of rated load). Six exhaust gas concentrations were measured by an AVL DiGas 444N five-gas analyser: CO, HC, CO2, O2, NOx, and lambda. A detailed data dictionary documents each column, unit, instrument specification, and measurement protocol. * src/ — Six Python modules implementing the full analysis pipeline: model definitions for linear regression, polynomial regression (degree 2), support vector regression with RBF kernel, random forest, gradient boosting, and a single-hidden-layer multilayer perceptron; leave-one-out and stratified 5-fold cross-validation; performance metrics (R², RMSE, MAE, MAPE); and a single-command entry point that reproduces every reported result. * results/ — Canonical numerical results in JSON and CSV form: LOOCV performance for all six models on all six outputs (Table 6 of the manuscript); ANN-MLP architectural sensitivity analysis (Table 7); LOOCV-versus-5-fold validation comparison (Table 8); compression-ratio-stratified gradient boosting performance (Table 9); per-record out-of-fold predictions for all 36 model-output combinations. * trained_models/ — Eight serialised scikit-learn pipelines (joblib format), fitted on the full 45-sample dataset and ready for downstream prediction. The best-performing model for each output is included along with the runner-up gradient boosting variant for completeness. REPRODUCIBILITY All stochastic models use a fixed random seed. With the dependency versions pinned in requirements.txt, the analysis is bit-for-bit reproducible across machines. Total runtime is approximately 90 seconds on a single CPU thread; no GPU is required. Detailed reproduction instructions are in README.md. LIMITATIONS Per-replicate raw measurements (three replicates per cell collapsed to condition-means in the released dataset) are retained at the originating institution and are available from the corresponding author on reasonable request. Cylinder-pressure traces, brake-specific fuel consumption, brake thermal efficiency, exhaust-gas temperature, and particulate matter measurements were captured during the experimental campaign but are out of scope for this emission-focused study. The trained models were fitted on three fuels, three compression ratios, and five loads; predictions for fuels or operating points outside this calibration envelope constitute extrapolation and are not validated.

Impact byBIP!

	selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	0
	popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network.	Average
	influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically).	Average
	impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network.	Average

Found an issue? Give us feedback

0

Average

Upload OA version

Are you the author of this publication? Upload your Open Access version to Zenodo!

It’s fast and easy, just two clicks!

uploadUpload now