Wen S W, Ge Y H, Wang Y K, Wei N S, Zhou J G, Hu G R, et al. Efficient and comprehensive visual solution for a smart apple harvesting robot in complex settings via multi-class instance segmentation. Int J Agric & Biol Eng, 2025; 18(4): 200–215. DOI: 10.25165/j.ijabe.20251804.9619
Citation: Wen S W, Ge Y H, Wang Y K, Wei N S, Zhou J G, Hu G R, et al. Efficient and comprehensive visual solution for a smart apple harvesting robot in complex settings via multi-class instance segmentation. Int J Agric & Biol Eng, 2025; 18(4): 200–215. DOI: 10.25165/j.ijabe.20251804.9619

Efficient and comprehensive visual solution for a smart apple harvesting robot in complex settings via multi-class instance segmentation

  • To enable efficient and low-cost automated apple harvesting, this study presented a multi-class instance segmentation model, SCAL (Star-CAA-LADH), which utilizes a single RGB sensor for image acquisition. The model achieves accurate segmentation of fruits, fruit-bearing branches, and main branches using only a single RGB image, providing comprehensive visual inputs for robotic harvesting. A Star-CAA module was proposed by integrating Star operation with a Context-Anchored Attention mechanism (CAA), enhancing directional sensitivity and multi-scale feature perception. The Backbone and Neck networks were equipped with hierarchically structured SCA-T/F modules to improve the fusion of high- and low-level features, resulting in more continuous masks and sharper boundaries. In the Head network, a Segment_LADH module was employed to optimize classification, bounding box regression, and mask generation, thereby improving segmentation accuracy for small and adherent targets. To enhance robustness in adverse weather conditions, a Chain-of-Thought Prompted Adaptive Enhancer (CPA) module was integrated, thereby increasing model resilience in degraded environments. Experimental results demonstrate that SCAL achieves 94.9% AP_M and 95.1% mAP_M, outperforming YOLOv11s by 6.6% and 4.6%, respectively. Under multi-weather testing conditions, the CPA-SCAL variant consistently outperforms other comparison models in accuracy. After INT8 quantization, the model size was reduced to 14.5 MB, with an inference speed of 47.2 frames per second (fps) on the NVIDIA Jetson AGX Xavier. Experiments conducted in simulated orchard environments validate the effectiveness and generalization capabilities of the SCAL model, demonstrating its suitability as an efficient and comprehensive visual solution for intelligent harvesting in complex agricultural settings.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return