The Qwen team has released Qwen-AgentWorld, which it positions as the first family of language world models built specifically to simulate the environments that agents act in, rather than to act in them directly. A world model here predicts environment dynamics — given the current observation and a candidate action, it forecasts the next state — and the bet is that giving an agent an accurate internal simulator of its world improves reasoning and planning. Two mixture-of-experts checkpoints ship: Qwen-AgentWorld-35B-A3B, with roughly three billion active parameters, and a much larger Qwen-AgentWorld-397B-A17B with about seventeen billion active. Both are trained to simulate agentic environments spanning seven domains through long chain-of-thought reasoning.
The training data is the headline asset: more than ten million real-world environment-interaction trajectories across those seven domains. The pipeline runs in three stages. Continued pretraining injects general-purpose world-modeling ability from state-transition dynamics and augmented professional corpora; supervised fine-tuning activates next-state-prediction reasoning; and a reinforcement-learning stage sharpens simulation fidelity using a tailored framework with hybrid rubric-and-rule rewards. To measure whether the simulator is any good, the authors introduce AgentWorldBench, assembled from real interactions of five frontier models across nine established benchmarks, and report that Qwen-AgentWorld significantly outperforms existing frontier models at predicting how those environments evolve.
Two downstream uses make the release more than a curiosity. As a decoupled environment simulator, Qwen-AgentWorld can stand in for thousands of real-world environments during agentic reinforcement learning — and the authors report that training agents inside the simulator yields gains that exceed training in the real environments alone, the kind of result that matters when real rollouts are slow, costly, or unsafe to collect. As a unified agent foundation model, world-model pretraining also acts as a warm-up that lifts downstream performance across seven agentic benchmarks. Code is released on the Qwen GitHub. The claim to watch is that simulator-trained transfer: if world-model rollouts really do transfer better than scarce real interaction, that reshapes how agentic reinforcement-learning data gets generated, and it is exactly the sort of finding the community will want to reproduce outside Alibaba's own evaluation harness.
- Surfaced across arXiv's language, agents, and evals feeds plus AK's and Hugging Face's Daily Papers — strong same-day pickup.
- The result drawing the most attention is the simulator-beats-reality claim: agents trained inside the world model reportedly surpass agents trained on real environments alone.