My research focuses on building multimodal LLM-driven unified frameworks that seamlessly integrate understanding and generation capabilities within intelligent agents. This vision is realized through establishing a closed-loop system from perception to decision-making, leveraging my expertise in multimodal understanding as the perceptual foundation and multimodal generation as the creative engine.
[Pinned] Looking for industry position and internship opportunities.
I warmly welcome discussions on research collaborations, as well as any profound or interesting insights. Feel free to reach out!
[06/2025] Four papers (InfoBridge+X2Gaussian+MetaScope+Dissecting Generalized Category) accepted to ICCV 2025. Appreciate&congratulate the co-authors!
[03/2025] Four papers (Track Any Anomalous Object + EfficientSplat + FlexGS + JarvisIR) accepted to CVPR 2025.
[02/2025] One paper (InstantSplamp) accepted to ICLR 2025.
[01/2025] One paper (U-KAN) accepted to AAAI 2025.
[12/2025] One paper (ConcealGS) accepted to ICASSP 2025 and one paper (Hide-in-Motion) accepted to ICRA 2025.
[11/2024] One paper (EndoGaussian) accepted to TMI 2024.
[09/2024] One paper (Flaws can be Applause) accepted to NeurIPS 2024.
[09/2024] One paper (VLM Fine-tuning) accepted to EMNLP 2024.
[07/2024] One paper (P^2SAM) accepted to ACM MM 2024.
[07/2024] One paper (Multimodal Bio Graph) accepted to ECCV 2024.
[06/2024] Three papers (Endora+EndoSparse+ GS) accepted to MICCAI 2024.
[07/2023] One paper (StegaNeRF) accepted to ICCV 2023.
[05/2022] One paper (Knowledge Condensation Distillation) accepted to ECCV 2022.
🧠 Research Philosophy
My research is guided by four philosophical principles that drive both theoretical innovation and practical impact:
🔬 Feynman Philosophy:"What I cannot create, I do not understand" - Using generation quality to evaluate understanding capability, connecting understanding and generation through attribute-encoded latent spaces
⚡ First Principles:"Goal is the Path" - Using LLMs to eliminate learning costs between users and underlying tools, presenting first-principle thinking
🔄 Reverse Thinking:"Opposing forces drive motion" - Transforming SAM's deterministic perception "disadvantages" into uncertainty perception "advantages" for open-ended visual tasks
⚖️ Dialectical Unity:"Simplicity meets Rigor" - Balancing concise design with strict theory, from simple parameter-efficient bias modifications to information theory and graph theory-based multimodal alignment
A granular video anomaly detection framework that integrates the detection of multiple fine-grained anomalous objects into a unified framework, achieving state-of-the-art performance.
An initial exploration into embedding customizable, imperceptible, and recoverable information within the renders produced by off-the-line 3D generative models, while ensuring minimal impact on the rendered content's quality.
A pioneering foray into the intriguing realm of embedding, relating and perceiving the heterogeneous patterns from various biomedical modalities holistically via a graph theory.
A novel approach that leverages the ambiguity and uncertainty in object boundaries to improve segmentation performance, turning traditional segmentation "flaws" into advantages.