Chenxin Li

⭐

Chenxin Li | 李宸鑫

Hi! I'm Chenxin "Jason" Li, a final-year Ph.D. candidate at The Chinese University of Hong Kong (CUHK). I work on multimodal LLM, reasoning/agent via RL, and world model.

I built hands-on experience in (i) scaling multimodal models (data, architecture, training, benchmarking) and (ii) post-training via RL (reasoning, multi-turn agent, reward modeling and shaping) through internships at ByteDance Seed, Tencent AI, Ant Ling and Hedra AI etc., and research visits with UT Austin and UMD.

I anticipate graduating in the summer of 2026 and am interested in industrial positions (Profile). Please feel free to reach out via email (chenxinli@link.cuhk.edu.hk) or WeChat (jasonchenxinli).

LinkedIn | Scholar | GitHub | X

Selected Publications

* Equal contribution, † Project Leader, ‡ Corresponding author

IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering

Parker Liu*, Chenxin Li*, Zhengxin Li, Yipeng Wu, Wuyang Li, Zhiqin Yang, Zhenyuan Zhang, Yunlong Lin, Sirui Han, Brandon Y. Feng

NeurIPS 2025

[Project] [Paper] [Code]

Evaluating scene understanding capabilities of VLM via inverse rendering tasks.

InfoBridge: Balanced Multimodal Alignment by Maximizing Cross-modal Conditional Mutual Information

Chenxin Li, Yifan Liu, Xinyu Liu, Wuyang Li, Hengyu Liu, Cheng Wang, Weihao Yu, Yunlong Lin, Yixuan Yuan

ICCV 2025

[Project] [Paper] [Code]

Enhanced multimodal alignment by maximizing cross-modal conditional mutual information.

U-KAN: U-KAN Makes Strong Backbone for Image Segmentation and Generation

Chenxin Li*, Xinyu Liu*, Wuyang Li*, Cheng Wang*, Hengyu Liu, Yixuan Yuan

AAAI 2025

[Project] [Paper] [Code] Top 1 most influential papers in AAAI 2025

Integrating Linear Attention mechanism like KAN into vision backbone

JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

Yunlong Lin*, Zixu Lin*, Haoyu Chen*, Chenxin Li*, Sixiang Chen, Kairun Wen, Yeying Jin, Wenbo Li, Xinghao Ding‡

CVPR 2025

[Project] [Paper] [Code]

JarvisIR is a VLM-powered intelligent system that dynamically schedules expert models for restoration.

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

Yunlong Lin*, Zixu Lin*, Kunjie Lin*, Chenxin Li*, Haoyu Chen, Zhongdao Wang, Xinghao Ding†, Wenbo Li, Shuicheng Yan†

Preprint 2025

[Project] [Paper] [Code]

VLM-powered agentic photo retouching system that orchestrates expert models for professional-grade image editing.

Visual Large Language Model Fine-Tuning via Simple Parameter-Efficient Modification

Mengjiao Li, Zhiyuan Ji, Chenxin Li†, Lianliang Nie, Zhiyang Li, Masashi Sugiyama

EMNLP 2024

[Project] [Paper] [Code]

Simple yet efficient fine-tuning strategy for VLM.

Selected Experience

ByteDance Seed: VLM scaling via reasoning/agentic RL
Tencent AI: World model simulation via Blender agent
Ant Ling: Long-context memory RL, hallucination verifiers
Hedra AI: Omnimodal (audio, image, pose) injection for video generation

ScholaGO (Co-founder): LLM-backend Education Startup

Co-founded ScholaGO Education Technology Company Limited (学旅通教育科技有限公司) to build LLM-powered education products that turn static content into immersive, interactive, multimodal learning experiences. Grateful to receiving funding from HKSTP, HK Tech 300, and Alibaba Cloud.

Professional Activities

Workshop Organizer: AIM-FM: Advancements In Foundation Models Towards Intelligent Agents (NeurIPS 2024)
Talks: "UKAN" at VALSE Summit (Jun 2025) and DAMTP, University of Cambridge (Jul 2024)
Conference Reviewer: ICLR, NeurIPS, ICML, CVPR, ICCV, ECCV, EMNLP, AAAI, ACM MM, MICCAI, BIBM
Journal Reviewer: Nature Machine Intelligence, PAMI, TIP, DMLR, PR, TNNLS

Beyond Work

Reading: I dedicate substantial time to reading, especially history, philosophy, and sociology, which shapes my perspective on what AGI should be from first principles.

Investment: Investment is real-world RL: returns provide fast feedback to iteratively improve individual decision policy. Recently, I am fascinated by the idea that how to (i) build benchmarks for LLMs that quantify real-world investment utility (in the similar spirit of GPT-5.2's gdpeval benchmark), and (ii) extending quantitative financial metrics to more general event and trend forecasting.