Chenxin Li | 李宸鑫

Hi! I'm Chenxin "Jason" Li, a final-year Ph.D. candidate at The Chinese University of Hong Kong (CUHK). I work on LLM/VLM-based agents.

I built hands-on experience in scaling the agentic abilities of LLM/VLM, especially in:

  • Computer-use agents, including coding agents for software automation like PS and Blender, skill-driven tool use and GUI agents (e.g., IR3D, JarvisArt, JarvisIR, Seed1.8, UI-TARS2, etc.).
  • Self-evolving agent harness for data flywheels, rubric generation, skill evolution and model recursive self-improvement (e.g., Doubao RSI flywheel, etc.).

I did internships at ByteDance Seed, Tencent AI, Ant Ling, Hedra AI, etc. I also visited UT Austin and UMD for research. I anticipate graduating in the summer of 2026 and am interested in industrial positions (Profile). Please feel free to reach out via email (chenxinli@link.cuhk.edu.hk) or WeChat (jasonchenxinli).

LinkedIn | Google Scholar Scholar | GitHub | X

profile photo
Selected Work
* Equal contribution, † Project Leader, ‡ Corresponding author
Seed-1.8
Seed-1.8: Towards Generalized Real-World Agency
ByteDance Seed Team

[Project] [Model Card]

Contributed to agent post-training through visual coding, IPython-based tool use.

UI-TARS-2
UI-TARS-2: Advancing GUI Agent with Multi-Turn Reinforcement Learning
ByteDance Seed Team

[Project] [Report]

Contributed to GUI grounding flywheel and agent post-training.

Ling
Ling: Open-sourced LLM with MoE Architecture by InclusionAI
Ant Group InclusionAI Team

[Project]

Contributed to long-context post-training and LLM-as-Judge verifiers.

IR3D-Bench Framework
IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering
Parker Liu*, Chenxin Li*, Zhengxin Li, Yipeng Wu, Wuyang Li, Zhiqin Yang, Zhenyuan Zhang, Yunlong Lin, Sirui Han, Brandon Y. Feng
NeurIPS 2025

[Project] [Paper] [Code]

An agentic inverse-rendering framework that closes the loop from visual understanding to structured code generation, Blender execution, and environment feedback.

SAM-Agent Framework
SAM-Agent: Empowering Interactive Image Segmentation with Multi-turn Agentic Reinforcement Learning
Shengyuan Liu*, Chenxin Li*, Liuxin Bao, Qi Yang, Wanting Geng, Boyun Zheng, Wenting Chen, Houwen Peng, Yixuan Yuan
Preprint

[Paper] [Code]

An interactive segmentation agent system that learns multi-turn correction actions with process rewards for iterative image refinement.

U-KAN Framework
U-KAN: U-KAN Makes Strong Backbone for Image Segmentation and Generation
Chenxin Li*, Xinyu Liu*, Wuyang Li*, Cheng Wang*, Hengyu Liu, Yixuan Yuan
AAAI 2025

[Project] [Paper] [Code] Top 1 most influential papers in AAAI 2025

Integrating Linear Attention mechanism like KAN into vision backbone

JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
Yunlong Lin*, Zixu Lin*, Haoyu Chen*, Chenxin Li*, Sixiang Chen, Kairun Wen, Yeying Jin, Wenbo Li, Xinghao Ding‡
CVPR 2025

[Project] [Paper] [Code]

A restoration agent system that schedules structured recovery steps across expert modules and optimizes execution outcomes.

JarvisArt
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
Yunlong Lin*, Zixu Lin*, Kunjie Lin*, Chenxin Li*, Haoyu Chen, Zhongdao Wang, Xinghao Ding†, Wenbo Li, Shuicheng Yan†
NeurIPS 2025

[Project] [Paper] [Code]

An artistic editing agent system that plans multi-step retouching commands and coordinates expert models for execution-quality image refinement.

EMNLP 2024 VLM fine-tuning
Visual Large Language Model Fine-Tuning via Simple Parameter-Efficient Modification
Mengjiao Li, Zhiyuan Ji, Chenxin Li†, Lianliang Nie, Zhiyang Li, Masashi Sugiyama
EMNLP 2024

[Project] [Paper] [Code]

Simple yet efficient parameter-efficient fine-tuning for VLM alignment.

Selected Experience
  • ByteDance Seed: Computer-use agent post-training, rubric-based rewards, and training automation
  • Tencent AI: Structured code -> environment execution for VLM agents
  • Ant Ling: Long-context mid-/post-training and LLM-as-Judge.
  • Hedra AI: Omnimodal control surfaces for production video generation systems
ScholaGO

ScholaGO (Co-founder): LLM-backend Education Startup

Co-founded ScholaGO Education Technology Company Limited (学旅通教育科技有限公司) to build LLM-powered education products that turn static content into immersive, interactive, multimodal learning experiences. Grateful to receiving funding from HKSTP, HK Tech 300, and Alibaba Cloud.

Professional Activities
  • Workshop Organizer: AIM-FM: Advancements In Foundation Models Towards Intelligent Agents (NeurIPS 2024)
  • Talks: "UKAN" at VALSE Summit (Jun 2025) and DAMTP, University of Cambridge (Jul 2024)
  • Conference Reviewer: ICLR, NeurIPS, ICML, CVPR, ICCV, ECCV, EMNLP, AAAI, ACM MM, MICCAI, BIBM
  • Journal Reviewer: Nature Machine Intelligence, PAMI, TIP, DMLR, PR, TNNLS
Beyond Work
Reading: I dedicate substantial time to reading, especially history, philosophy, and sociology, which shapes my perspective on what AGI should be from first principles.

Investment: Investment is real-world RL: returns provide fast feedback to iteratively improve individual decision policy. Recently, I am fascinated by the idea that how to (i) build benchmarks for LLMs that quantify real-world investment utility (in the similar spirit of GPT-5.2's gdpeval benchmark), and (ii) extending quantitative financial metrics to more general event and trend forecasting.