
CO2 huff and puff technology can enhance the recovery of heavy oil in high-water-cut stages. However, the effectiveness of this method varies significantly under different geological and fluid conditions, which leads to a high-dimensional and small-sample (HDSS) dataset. It is difficult for conventional techniques that identify key factors that influence CO2 huff and puff effects, such as fuzzy mathematics, to manage HDSS datasets, which often contain nonlinear and irremovable abnormal data. To accurately pinpoint the primary control factors for heavy oil CO2 huff and puff, four machine learning classification algorithms were adopted. These algorithms were selected to align with the characteristics of HDSS datasets, taking into account algorithmic principles and an analysis of key control factors. The results demonstrated that logistic regression encounters difficulties when dealing with nonlinear data, whereas the extreme gradient boosting and gradient boosting decision tree algorithms exhibit greater sensitivity to abnormal data. By contrast, the random forest algorithm proved to be insensitive to outliers and provided a reliable ranking of factors that influence CO2 huff and puff effects. The top five control factors identified were the distance between parallel wells, cumulative gas injection volume, liquid production rate of parallel wells, huff and puff timing, and heterogeneous Lorentz coefficient. These research findings not only contribute to the precise implementation of heavy oil CO2 huff and puff but also offer valuable insights into selecting classification algorithms for typical HDSS data.
Oils, fats, and waxes, Algorithm screening, Main control factors, TP670-699, Heavy oil, Petroleum refining. Petroleum products, CO2 huff and puff, Classification algorithm, TP690-692.5
Oils, fats, and waxes, Algorithm screening, Main control factors, TP670-699, Heavy oil, Petroleum refining. Petroleum products, CO2 huff and puff, Classification algorithm, TP690-692.5
| selected citations These citations are derived from selected sources. This is an alternative to the "Influence" indicator, which also reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | 4 | |
| popularity This indicator reflects the "current" impact/attention (the "hype") of an article in the research community at large, based on the underlying citation network. | Top 10% | |
| influence This indicator reflects the overall/total impact of an article in the research community at large, based on the underlying citation network (diachronically). | Average | |
| impulse This indicator reflects the initial momentum of an article directly after its publication, based on the underlying citation network. | Average |
