๐Ÿ“ Selected Publications

ICLR 2026
sym

ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows ๐Ÿค–๐Ÿ”ฌ

Qiushi Sun, Zhoumianze Liu, Chang Ma, Zichen Ding, Fangzhi Xu, Zhangyue Yin, Haiteng Zhao, Zhenyu Wu, Kanzhi Cheng, Zhaoyang Liu, Jianing Wang, Qintong Li, Xiangru Tang, Tianbao Xie, Xiachong Feng, Xiang Li, Ben Kao, Wenhai Wang, Biqing Qi, Lingpeng Kong, Zhiyong Wu

[Paper] | [Slides] | [Project] | [Env] | [Code] |

  • First to apply computer-using agents to assist scientific exploration ๐ŸŒŒ
  • Dynamic environment & benchmark for realistic scientific workflows ๐ŸŒ
  • Comprehensive evaluation of SOTA LLM/VLM agents ๐Ÿงญ
ICLR 2026
sym

JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence

Qiushi Sun, Jingyang Gong, Yang Liu, Qiaosheng Chen, Lei Li, Kai Chen, Qipeng Guo, Ben Kao, Fei Yuan

[Paper] | [Slides] | [Project] | | [Code] |

  • JanusCoder series: foundational models establishing a unified visual-programmatic interface. โš™๏ธ
  • A versatile data synthesis toolkit for multimodal code intelligence. ๐Ÿ› ๏ธ
  • Superior performance on diverse text- and vision-centric tasks. ๐Ÿงญ
ACL 2025
sym

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis ๐Ÿ”ฅ๐Ÿ”ฅ

Qiushi Sun*, Kanzhi Cheng*, Zichen Ding*, Chuanyang Jin*, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu

[Paper] | [Slides] | [Project] | [Models & Data] |

  • Shift from task-driven to interaction-driven GUI data synthesis ๐Ÿค–
  • A manual-free pipeline for constructing diverse GUI agent trajectories ๐Ÿงฌ
  • Great performance on online mobile/web benchmarks ๐ŸŒŸ
Preprint
sym

OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows ๐Ÿ›ก๏ธ๐Ÿง

Qiushi Sun*, Mukai Li*, Zhoumianze Liu*, Zhihui Xie*, Fangzhi Xu, Zhangyue Yin, Kanzhi Cheng, Zehao Li, Zichen Ding, Qi Liu, Zhiyong Wu, Zhuosheng Zhang, Ben Kao, Lingpeng Kong

[Paper] | [Slides] | [Project] | [Env] | [Code] |

  • MobileRisk-Live & MobileRisk, a dynamic environment and benchmark for realistic mobile agent safety ๐Ÿ“ฑ
  • OS-Sentinel, a hybrid detection framework combining formal verification with contextual judgment ๐Ÿ›ก๏ธ
  • Advanced mobile agent safety at both the step-level and trajectory-level ๐Ÿงญ
Survey
sym

A Survey of Neural Code Intelligence: Paradigms, Advances and Beyond ๐Ÿ”ฅ๐Ÿ”ฅ

Qiushi Sun, Zhirui Chen, Fangzhi Xu, Chang Ma, Kanzhi Cheng, Zhangyue Yin, Jianing Wang, Chengcheng Han, Renyu Zhu, Shuai Yuan, Pengcheng Yin, Qipeng Guo, Xipeng Qiu, Xiaoli Li, Fei Yuan, Lingpeng Kong, Xiang Li, Zhiyong Wu

[Paper] | [Slides] | [Project] | [Video] |

Let me walk you through the development of neural code intelligence:

  • Follow LMs for code as a thread to trace the fieldโ€™s development ๐Ÿš€
  • Explore cross-domain synergies and opportunities ๐ŸŒฑ
  • Present a broad array of promising research avenues ๐Ÿ’ก

*Denotes equal contribution, โœ‰ denotes corresponding author, more working drafts / preprints under review will be released later โŒ›๏ธ