Alibaba has launched the Qwen Robot Suite, its first family of AI models designed for physical world robotics. Developed by Tongyi Lab, Alibaba’s AI research unit, the suite is already in pilot testing with selected Alibaba Cloud enterprise clients. The launch marks Alibaba’s formal entry into embodied AI – the field of machines that can perceive, reason, and interact with physical environments – as the company extends its Qwen model family beyond language and vision into robot control.
📣 Introducing the Qwen-Robot Suite — Qwen-RobotNav, Qwen-RobotManip, Qwen-RobotWorld, three foundation models, a full stack for embodied intelligence.
🧠Qwen-RobotNav — the gateway to mobility.
• Unifies 5 navigation tasks in one model: instruction following, point-goal,… pic.twitter.com/noumjTtTeS— Qwen (@Alibaba_Qwen) June 16, 2026
The suite splits robot intelligence into three interconnected layers. Qwen-RobotNav, a vision-language navigation model, is designed to help machines understand and move through physical spaces. It works in tandem with Qwen-RobotWorld, a video world model that lets robots predict and simulate how physical scenes will evolve before they take action. Physical execution is handled by Qwen-RobotManip, a generalist vision-language-action model built on the Qwen3.5-4B architecture.
What Each Model Does
Qwen-RobotNav serves as the action gateway for physical agents, integrating vision-language capabilities into motion control through controllable observation encoding and tool interfaces, and unifying four task types: instruction following, goal navigation, object tracking, and autonomous driving.
The three models are designed to equip robots with advanced capabilities including dexterous manipulation, efficient navigation, and cognitive processing. Each model can function independently or in collaboration, offering a versatile foundation for deploying robots in real-world scenarios.
The architecture reflects a design principle that has emerged across leading robotics AI programs: rather than building separate specialist models for each task type or robot platform, Alibaba is building toward a generalist system where the same underlying model handles navigation, manipulation, and environment prediction across different hardware configurations.
Alibaba’s Broader Physical AI Positioning
Alibaba described itself as the only company in China operating all five layers of what it calls the full AI stack – from chips through an agentic cloud, models, model-serving platforms, and applications on top. The Qwen Robot Suite extends that stack into the physical layer, where the company has been building ecosystem relationships with Chinese robotics manufacturers including Agibot.
Alongside the robot suite, Alibaba also announced Qwen3.7-Max, a new large language model positioned as a foundation for AI agents, claiming the model can run autonomously for up to 35 hours without performance degrading – a durability claim aimed at the requirements of agentic work where multi-hour task execution is standard.
The Competitive Context
Alibaba’s entry into embodied AI model development positions it alongside Google DeepMind’s Gemini Robotics, NVIDIA’s Isaac GR00T, and Physical Intelligence’s pi models as foundation model providers targeting the robot intelligence layer. The distinction for Alibaba is its existing cloud infrastructure and enterprise client base in China, which gives the Qwen Robot Suite a built-in distribution pathway through Alibaba Cloud that pure robotics AI companies cannot match.
The launch also coincides with ByteDance and Galbot advancing their own embodied AI programs, reflecting a broader convergence of China’s major technology platforms on physical AI as the next competitive frontier beyond chatbots and generative AI.