Chenxin Li

⭐

Chenxin Li | 李宸鑫

I am a Ph.D. student at The Chinese University of Hong Kong, advised by Prof. Yixuan Yuan. I received my M.Eng from Xiamen University under Prof. Xinghao Ding and Prof. Yue Huang, where I also earned my B.Eng.

My research focuses on building multimodal LLM-driven unified frameworks that seamlessly integrate understanding and generation capabilities within intelligent agents. This vision is realized through establishing a closed-loop system from perception to decision-making, leveraging my expertise in multimodal understanding as the perceptual foundation and multimodal generation as the creative engine.

[Pinned] Looking for industry position and internship opportunities.

I warmly welcome discussions on research collaborations, as well as any profound or interesting insights. Feel free to reach out!

Email / Scholar / Github / X / WeChat / RedNote (🤩⭐)

📢 Latest News

[06/2025] Four papers (InfoBridge+X2Gaussian+MetaScope+Dissecting Generalized Category) accepted to ICCV 2025. Appreciate&congratulate the co-authors!
[03/2025] Four papers (Track Any Anomalous Object + EfficientSplat + FlexGS + JarvisIR) accepted to CVPR 2025.
[02/2025] One paper (InstantSplamp) accepted to ICLR 2025.
[01/2025] One paper (U-KAN) accepted to AAAI 2025.
[12/2025] One paper (ConcealGS) accepted to ICASSP 2025 and one paper (Hide-in-Motion) accepted to ICRA 2025.
[11/2024] One paper (EndoGaussian) accepted to TMI 2024.
[09/2024] One paper (Flaws can be Applause) accepted to NeurIPS 2024.
[09/2024] One paper (VLM Fine-tuning) accepted to EMNLP 2024.
[07/2024] One paper (P^2SAM) accepted to ACM MM 2024.
[07/2024] One paper (Multimodal Bio Graph) accepted to ECCV 2024.
[06/2024] Three papers (Endora+EndoSparse+ GS) accepted to MICCAI 2024.
[07/2023] One paper (StegaNeRF) accepted to ICCV 2023.
[05/2022] One paper (Knowledge Condensation Distillation) accepted to ECCV 2022.

🧠 Research Philosophy

My research is guided by four philosophical principles that drive both theoretical innovation and practical impact:

🔬 Feynman Philosophy: "What I cannot create, I do not understand" - Using generation quality to evaluate understanding capability, connecting understanding and generation through attribute-encoded latent spaces
⚡ First Principles: "Goal is the Path" - Using LLMs to eliminate learning costs between users and underlying tools, presenting first-principle thinking
🔄 Reverse Thinking: "Opposing forces drive motion" - Transforming SAM's deterministic perception "disadvantages" into uncertainty perception "advantages" for open-ended visual tasks
⚖️ Dialectical Unity: "Simplicity meets Rigor" - Balancing concise design with strict theory, from simple parameter-efficient bias modifications to information theory and graph theory-based multimodal alignment

📑 Selected Publications ( Google Scholar )

* Equal contribution, † Project Leader, ‡ Corresponding author

Preprint 2025

WonderFieldAgent: Empowering Scene Interaction via an Intelligent Versatile Field-driven Agent

Chenxin Li*, Hengyu Liu*, Zhiyang Yang, Yifan Liu, Wuyang Li, Yixuan Yuan

[Project] [Paper] [Code]

First explored vLLM-based agentic interaction in generated 3D contents.

Preprint 2025

IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering

Parker Liu*, Chenxin Li*, Zhengxin Li, Yipeng Wu, Wuyang Li, Zhiqin Yang, Zhenyuan Zhang, Yunlong Lin, Sirui Han, Brandon Y. Feng

[Project] [Paper] [Code]

First benchmarking framework for evaluating vision-language model scene understanding via inverse rendering tasks through agentic tool use.

Preprint 2025

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

Yunlong Lin, Zixu Lin, Kunjie Lin, Jinbin Bai, Panwang Pan, Chenxin Li, Haoyu Chen, Zhongdao Wang, Xinghao Ding†, Wenbo Li, Shuicheng Yan†

[Project] [Paper] [Code]

JarvisArt outperforms GPT-4o with a 60% improvement in average pixel-level metrics on MMArt-Bench for content fidelity.

Preprint 2025

DynamicVerse: Physically-Aware Multimodal Modeling for Dynamic 4D Worlds

Kairun Wen, Yuzhi Huang, Runyu Chen, Hui Zheng, Yunlong Lin, Panwang Pan, Chenxin Li, Wenyan Cong, Jian Zhang, Junbin Lu, Chenguo Lin, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Yue Huang, Xinghao Ding, Rakesh Ranjan, Zhiwen Fan

[Project] Paper [Code]

DynamicVerse is a physical‑scale, multimodal 4D modeling framework for real-world video.

ICCV 2025

InfoBridge: Balanced Multimodal Fusion by Maximizing Cross-modal Conditional Mutual Information

Chenxin Li, Yifan Liu, Xinyu Liu, Wuyang Li, Hengyu Liu, Cheng Wang, Weihao Yu, Yunlong Lin, Yixuan Yuan

[Project] [Paper] [Code]

Enhance multimodal fusion via conditional mutual information maximization for balanced cross-modal representation learning.

AAAI 2025

U-KAN: U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation

Chenxin Li*, Xinyu Liu*, Wuyang Li*, Cheng Wang*, Hengyu Liu, Yixuan Yuan

[Project] [Paper] [Code]

CVPR 2025

Track Any Anomalous Object: A Granular Video Anomaly Detection Pipeline

Yuzhi Huang*, Chenxin Li*‡, Haitao Zhang, Zixu Lin, Yunlong Lin, Hengyu Liu, Wuyang Li, Xinyu Liu, Jiechao Gao, Yue Huang, Xinghao Ding, Yixuan Yuan

[Project] [Paper] [Code]

A granular video anomaly detection framework that integrates the detection of multiple fine-grained anomalous objects into a unified framework, achieving state-of-the-art performance.

CVPR 2025

FlexGS: Train Once, Deploy Everywhere with Many-in-One Flexible 3D Gaussian Splatting

Hengyu Liu*, Yifan Wang*, Chenxin Li*, Rui Cai, Kai Wang, Wuyang Li, Pavel Molchanov, Panwang Wang, Zhangyang Wang

[Project] [Paper] [Video] [Code]

Train once, deploy everywhere with many-in-one flexible 3D Gaussian splatting for efficient multi-device deployment.

CVPR 2025

JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

Yunlong Lin*, Zixu Lin*, Haoyu Chen*, Panwang Pan*, Chenxin Li, Sixiang Chen, Kairun Wen, Yeying Jin, Wenbo Li, Xinghao Ding‡

[Project] [Paper] [Code]

JarvisIR is a VLM-powered intelligent system that dynamically schedules expert models for restoration.

ICLR 2025

InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting

Chenxin Li*, Hengyu Liu*, Zhiwen Fan, Wuyang Li, Yifan Liu, Panwang Pan, Yixuan Yuan

[Project] [Paper] [Code]

An initial exploration into embedding customizable, imperceptible, and recoverable information within the renders produced by off-the-line 3D generative models, while ensuring minimal impact on the rendered content's quality.

ECCV 2024

GTP-4o: Modality-prompted Heterogeneous Graph Learning for Omni-modal Biomedical Representation

Chenxin Li, Xinyu Liu*, Cheng Wang*, Yifan Liu, Weihao Yu, Jing Shao, Yixuan Yuan (* Equal Second-author Contribution)

[Project] [Paper] [Code]

A pioneering foray into the intriguing realm of embedding, relating and perceiving the heterogeneous patterns from various biomedical modalities holistically via a graph theory.

MICCAI 2024

Endora: Video Generation Models as Endoscopy Simulators

Chenxin Li*, Hengyu Liu*, Yifan Liu*, Brandon Y. Feng, Wuyang Li, Xinyu Liu, Zhen Chen, Jing Shao, Yixuan Yuan (* Equal Contribution)

[Project] [Paper] [Video] [Code]

A pioneering exploration into high-fidelity medical video generation on endoscopy scenes.

NeurIPS 2024

Flaws can be Applause: Unleashing Potential of Segmenting Ambiguous Objects in SAM

Chenxin Li*, Yuzhi Huang*, Wuyang Li, Hengyu Liu, Xinyu Liu, Qing Xu, Zhen Chen, Yue Huang, Yixuan Yuan

[Project] [Paper] [Code]

A novel approach that leverages the ambiguity and uncertainty in object boundaries to improve segmentation performance, turning traditional segmentation "flaws" into advantages.

ACM MM 2024

P²SAM: Probabilistically Prompted SAMs Are Efficient Segmentator for Ambiguous Medical Images

Yuzhi Huang*, Chenxin Li*‡, Zixu Lin, Hengyu Liu, Haote Xu, Yifan Liu, Yue Huang, Xinghao Ding, Yixuan Yuan

[Project] [Paper] [Code]

A probabilistic prompting framework that enhances SAM's performance on ambiguous medical images through uncertainty-aware prompt generation.

ICCV 2023

StegaNeRF: Embedding Invisible Information within Neural Radiance Fields

Chenxin Li*, Brandon Y. Feng*, Zhiwen Fan*, Panwang Pan, Zhangyang Wang (* Equal Contribution)

[Project] [Paper] [Video] [Code]

Embedding multimodal invisible information (image, video, audio) into distributed visual assets.

ECCV 2022

Knowledge Condensation Distillation

Chenxin Li, Mingbao Lin, Zhiyuan Ding, Nie Lin, Yihong Zhu, Xinghao Ding, Yue Huang, Liujuan Cao

[Project] [Paper] [Code]

Training large networks efficiently and smartly by progressive data distillation.

💼 Professional Activities

Conference Reviewer

ICLR, NeurIPS, ICML, CVPR, ICCV, ECCV, EMNLP, AAAI, ACM MM, MICCAI, BIBM (and more)

Journal Reviewer

TIP, DMLR, PR, TNNLS, NCA (and more)