
This deposit contains the complete experimental dataset, reproducible analysis code, and serialised trained models supporting the journal article "Comparative Evaluation of Six Machine Learning Models for Multi-Fuel Variable Compression Ratio Diesel Engine Emission Prediction Under Leave-One-Out Cross-Validation" by Sathiyaseelan et al. (DOI to be inserted upon acceptance). A reviewer or replicator should be able to download this archive, install the listed Python dependencies, and run a single command to reproduce every numerical result, table, and figure underlying the manuscript. WHAT IS IN THIS DEPOSIT * data/ — The 45-record steady-state emission dataset from a single-cylinder, four-stroke, water-cooled, direct-injection variable compression ratio (VCR) diesel engine (Kirloskar TV1, 661 cc, 5.2 kW at 1500 rpm), provided as both UTF-8 CSV and Microsoft Excel formats. Records cover a balanced 3 × 3 × 5 factorial design: three fuels (diesel, rubber seed oil biodiesel, Chlorella vulgaris algae oil biodiesel), three compression ratios (16:1, 17:1, 18:1), and five engine load conditions (0, 25, 50, 75, 100 % of rated load). Six exhaust gas concentrations were measured by an AVL DiGas 444N five-gas analyser: CO, HC, CO2, O2, NOx, and lambda. A detailed data dictionary documents each column, unit, instrument specification, and measurement protocol. * src/ — Six Python modules implementing the full analysis pipeline: model definitions for linear regression, polynomial regression (degree 2), support vector regression with RBF kernel, random forest, gradient boosting, and a single-hidden-layer multilayer perceptron; leave-one-out and stratified 5-fold cross-validation; performance metrics (R², RMSE, MAE, MAPE); and a single-command entry point that reproduces every reported result. * results/ — Canonical numerical results in JSON and CSV form: LOOCV performance for all six models on all six outputs (Table 6 of the manuscript); ANN-MLP architectural sensitivity analysis (Table 7); LOOCV-versus-5-fold validation comparison (Table 8); compression-ratio-stratified gradient boosting performance (Table 9); per-record out-of-fold predictions for all 36 model-output combinations. * trained_models/ — Eight serialised scikit-learn pipelines (joblib format), fitted on the full 45-sample dataset and ready for downstream prediction. The best-performing model for each output is included along with the runner-up gradient boosting variant for completeness. REPRODUCIBILITY All stochastic models use a fixed random seed. With the dependency versions pinned in requirements.txt, the analysis is bit-for-bit reproducible across machines. Total runtime is approximately 90 seconds on a single CPU thread; no GPU is required. Detailed reproduction instructions are in README.md. LIMITATIONS Per-replicate raw measurements (three replicates per cell collapsed to condition-means in the released dataset) are retained at the originating institution and are available from the corresponding author on reasonable request. Cylinder-pressure traces, brake-specific fuel consumption, brake thermal efficiency, exhaust-gas temperature, and particulate matter measurements were captured during the experimental campaign but are out of scope for this emission-focused study. The trained models were fitted on three fuels, three compression ratios, and five loads; predictions for fuels or operating points outside this calibration envelope constitute extrapolation and are not validated.
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 0 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Average | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
