Abstract:
ObjectiveTo address the problems in existing lithology recognition methods, including redundant parameters leading to high inference latency, insufficient multi-scale mineral feature extraction, and the loss of shallow fundamental features.Methods A rock thin-section image classification model named CI-RepViT (Compact and Informative RepViT), based on an improved Re-parameterized Vision Transformer (RepViT), is designed. A multi-scale feature extraction strategy is adopted to construct a multi-scale receptive field system suitable for rock feature representation, effectively enhancing the extraction of mineral features at different scales. An Identity branch is introduced to alleviate residual gradient attenuation and preserve shallow fundamental features, enabling lossless propagation of critical information such as grain boundaries to deeper layers. In addition, the attention mechanism is upgraded to Efficient Channel Attention (ECA), which removes the fully connected layers used in Squeeze-and-Excitation (SE) blocks, thereby reducing parameter redundancy and computational overhead while improving feature selection accuracy.Results The results demonstrate that the proposed model reduces the parameter count by 0.64 M and improves classification accuracy by 2.61%. Through comparisons with six commonly used models, including ConvNeXtV2 and GoogLeNet, CI-RepViT achieves superior performance in key metrics such as accuracy and precision.Conclusion This study provides technical support for efficient and lightweight rock thin-section recognition under complex lithological conditions.