As robotic systems increasingly operate in unstructured, cluttered, and previously unseen environments, there is a growing need for manipulators that combine compliance, adaptability, and precise control. This work presents a real-time hybrid rigid–soft continuum manipulator system designed for robust open-world object reaching in such challenging environments.
The system integrates vision-based perception and 3D scene reconstruction with shape-aware motion planning to generate safe trajectories. A learning-based controller drives the hybrid arm to arbitrary target poses, leveraging the flexibility of the soft segment while maintaining the precision of the rigid segment. The system operates without environment-specific retraining, enabling direct generalization to new scenes.
Extensive real-world experiments demonstrate consistent reaching performance with errors below 2 cm across diverse cluttered setups, highlighting the potential of hybrid manipulators for adaptive and reliable operation in unstructured environments.
1 Real-world hybrid reaching. To our knowledge, this is the first hybrid manipulator system capable of open-world object reaching in cluttered, unseen environments. The system achieves sub-2 cm accuracy and high success rates across multiple test environments without environment-specific fine-tuning.
2 Multi-view RGB reconstruction for obstacle-aware planning. We develop a lightweight multi-view perception pipeline that enables obstacle-aware planning without relying on depth sensors or environment-specific retraining, making the system suitable for payload-limited hybrid manipulators.
3 Shape estimation for safe manipulation. We demonstrate that explicitly incorporating shape estimation of the soft segment into the planning loop is critical for safe and reliable hybrid manipulation, significantly improving both safety and task success in cluttered environments.
HyReach is built on a hybrid platform: a standard 6-DoF industrial robotic arm with a tri-chambered bending soft continuum arm (SCA) at its distal end. Our framework couples three stages - perception, shape-aware planning, and learning-based control - enabling safe, goal-directed reaching in visually occluded, obstacle-rich scenes.
A monocular tip-mounted camera collects multi-view RGB images during an exploratory sweep. Mast3r reconstructs metric-scale 3D geometry, while YOLO-World localizes goal objects via natural-language queries, no fixed object categories, no depth sensor required.
A modified RRT* planner evaluates candidate trajectories against a voxel occupancy grid, checking collisions along the entire hybrid backbone. A Constant Curvature model estimates the SCA's shape online, enforcing strict collision-free motion for the rigid segment while allowing bounded contact for the compliant soft segment.
A neural controller maps current and goal poses to actuator commands for both rigid and soft segments. Trained once on a target-pose dataset, it generalizes across environments without retraining, accurately executing planned waypoints in real time.
We evaluate HyReach across four real-world test environments of increasing difficulty. Each environment is designed to stress-test a different aspect of the system - from basic tabletop reaching to fully occluded goals behind walls. All evaluations use the same trained models without any environment-specific fine-tuning.
Basic open-world tabletop setup. Unobstructed path to the goal object, validating baseline reaching accuracy and controller performance.
Rigid obstacles block the direct path. The planner must route the arm around them while maintaining collision-free rigid-segment motion.
A plant occludes a target fruit. Heavy visual clutter and dense obstacle structure test multi-view perception and compliant reaching.
Goal placed behind a wall with a small opening. Fully occluded at start - the SCA must thread through or around the aperture to reach the target.
HyReach achieves consistent sub-2 cm reaching accuracy across all four environments. Shape-informed planning significantly improves safety compared to a no-shape-estimation ablation, and outperforms both the Img2Act visual-servoing baseline (which requires environment-specific retraining) and the Rigid Only baseline (which lacks the soft segment's compliance). Adjusting the collision threshold τ allows the planner to balance path success rate against contact frequency, adapting to different obstacle types.