Ai X Y, Zhang T X, Yuan T, Zheng X J, Xiong Z M, Yuan J C. YOLOv8np-RCW: A multi-task deep learning model for comprehensive visual information in tomato harvesting robot. Int J Agric & Biol Eng, 2025; 18(5): 246–258. DOI: 10.25165/j.ijabe.20251805.9719
Citation: Ai X Y, Zhang T X, Yuan T, Zheng X J, Xiong Z M, Yuan J C. YOLOv8np-RCW: A multi-task deep learning model for comprehensive visual information in tomato harvesting robot. Int J Agric & Biol Eng, 2025; 18(5): 246–258. DOI: 10.25165/j.ijabe.20251805.9719

YOLOv8np-RCW: A multi-task deep learning model for comprehensive visual information in tomato harvesting robot

  • In greenhouse environments, using automated machines for tomato harvesting to reduce labor consumption is a future development trend. Accurate and effective visual recognition is essential to accomplish harvesting tasks. However, most current studies use various models to gain harvesting information in multiple steps, resulting in heavy calculation costs, poor real-time availability, and weak recognition precision. In this study, an improved YOLOv8np-RCW end-to-end model based on YOLOv8n pose is proposed to simultaneously detect tomato bunches, maturity, and keypoints using a decoupled-head structure. The model integrates a ResNet-enhanced RepVGG architecture for a balance of accuracy and speed, employs the CARAFE upsampling algorithm for a larger receptive field with lightweight design, and optimizes the loss function with WIoU loss to enhance bounding box prediction, maturity detection, and keypoint extraction. Experimental results indicate that mAP50 of YOLOv8np-RCW model for bounding box and keypoints is 87.3% and 86.8% respectively, which is 6.2% and 5.5% higher than YOLOv8n pose model. Completing the tasks of bunch detection, maturity assessment, and keypoint localization requires only 9.8 ms. Euclidean distance error is less than 20 pixels in detecting keypoints. Based on this model, a method is proposed to quickly determine the orientation of tomato bunches using geometric cross-product and cross-multiplication calculations from keypoint 2D information, providing guidance for the motion planning of the end-effector. In field experiments, the robot achieved a harvesting success rate of 68%, with an average time of 10.8366 seconds per tomato bunch.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return