interaction and active reasoning: [2] (Sokoban, Maze, and Taxi)
performs both policy and world model learning: [3]
Survey: [1]
Reference
[1] Zhao, Changyuan, et al. “Edge general intelligence through world models and agentic AI: Fundamentals, solutions, and challenges.” arXiv preprint arXiv:2508.09561 (2025).
[2] Shu, Bao, et al. “Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction.” arXiv preprint arXiv:2511.23476 (2025).
[3] Yu, Xiao, et al. “Dyna-Think: Synergizing Reasoning, Acting, and World Model Simulation in AI Agents.” arXiv preprint arXiv:2506.00320 (2025).