Abstract:
Objective Existing lithology logging identification methods face challenges of imbalanced lithology class processing and insufficient sensitivity when applied to tight sandstone reservoirs.
Methods This study proposes the SSMO−SSA−LGBM model. First, the SVM−SMOTE oversampling algorithm (abbreviated as SSMO) is used to balance samples with fewer lithology data in the training set by generating synthetic samples. These synthetic samples are combined with the original training set to form a new training dataset for constructing the LightGBM (LGBM) model. Given the numerous hyperparameters in LGBM, the Sparrow Search Algorithm (SSA) is employed to optimize hyperparameters and obtain the optimal combination. The model is trained using logging data from the Yan 10 tight sandstone reservoir in the Huachi S Block, and compared with KNN, Adaboost, Random Forest, and other models.
Results After SSMO balancing, the LGBM model exhibits enhanced recognition performance for minority lithology classes. The SSA algorithm achieves global optimization with fewer iterations, obtaining the optimal hyperparameters for LGBM. The SSMO−SSA−LGBM model demonstrates superior predictive performance, with lithology identification results on validation wells showing high consistency with core data.
Conclusions The SSMO algorithm effectively mitigates the adverse effects of lithology class imbalance on prediction accuracy. The SSA algorithm efficiently identifies the optimal hyperparameter combination for LGBM through limited iterations, maximizing model performance. The proposed model achieves satisfactory application results in the Huachi S Block.