GUI Agent
Research Directions
Safety
[1] Chen, Baicheng, et al. “AdapAction: Adaptive Target Action Backdoor Attack against GUI Agents.” CVPR, 2026. [pdf]
[2] Yan, Zihe, et al. “Lasm: Layer-wise scaling mechanism for defending pop-up attack on gui agents.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2026. [pdf] [code]
Inference Efficiency
[1] Zhou, Xurui, et al. “Hiconagent: History context-aware policy optimization for gui agents.” CVPR, 2026. [pdf] [code] (dynamic history length)
Long Horizon
[1] Deng, Zehao, et al. “Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation.” CVPR, 2026. [pdf] [code]
[2] Kang, Bin, et al. “LongHorizonUI: A Unified Framework for Robust Long-Horizon Task Automation of GUI Agent.” ICLR, 2026. [pdf]
Extra Guidance
[1] Xie, Rui, et al. “GUIDE: Resolving Domain Bias in GUI Agents through Real-Time Web Video Retrieval and Plug-and-Play Annotation.” arXiv preprint arXiv:2603.26266 (2026). [pdf] [code]
[2] Liu, Jingjing, et al. “DocOS: Towards Proactive Document-Guided Actions in GUI Agents.” ICML, 2026. [pdf] [code]
New Domain
[1] Li, Yang, et al. “Gui-ceval: A hierarchical and comprehensive chinese benchmark for mobile gui agents.” CVPR, 2026. [pdf]
[2] Liu, Ziwei, et al. “Continual GUI Agents.” ICML, 2026. [pdf] [code]
[3] Chen, Yuxi, et al. “CAPTCHA Solving for Native GUI Agents: Automated Reasoning-Action Data Generation and Self-Corrective Training.” ICML, 2026. [pdf] [code]
Training Data Synthesis
[1] Zhang, Bofei, et al. “Tongui: Internet-scale trajectories from multimodal web tutorials for generalized gui agents.” AAAI, 2026. [pdf] [code]
[2] Shao, Rui, et al. “Hats: Hardness-aware trajectory synthesis for gui agents.” CVPR, 2026. [pdf] [code]
[3] Lv, Rui, et al. “M $^ 2$-Miner: Multi-Agent Enhanced MCTS for Mobile GUI Agent Data Mining.” ICLR, 2026. [pdf]
[4] Xiong, Weimin, et al. “Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining.” ICML, 2026. [pdf] [code]
Optimization
[1] Xu, Yifan, et al. “Mobilerl: Online agentic reinforcement learning for mobile gui agents.” arXiv preprint arXiv:2509.18119 (2025). [pdf] [code]
World Model
[1] Guan, Yiming, et al. “Computer-using world model.” arXiv preprint arXiv:2602.17365 (2026). [pdf]
[2] Luo, Dezhao, et al. “Vimo: A generative visual gui world model for app agents.” arXiv preprint arXiv:2504.13936 (2025). [pdf] [code]
[3] Cao, Yilin, et al. “MobileDreamer: Generative Sketch World Model for GUI Agent.” arXiv preprint arXiv:2601.04035 (2026). [pdf]
Challenging Benchmark
[1] Gong, Yichen, et al. “VenusBench-Mobile: A Challenging and User-Centric Benchmark for Mobile GUI Agents with Capability Diagnostics.” ICML, 2026. [pdf] [code]