My research focuses on building multimodal LLM-backed intelligent agents that achieve understanding-generation synergy. The supporting foundations include multimodal perception and reasoning as the perceptual and understanding foundation; and multimodal generation and reconstruction as the content manipulation engine, thereby achieving a complete closed-loop from content perception to intelligent decision-making.
[Pinned] Looking for industry position and internship opportunities.
I warmly welcome discussions on research collaborations, as well as any profound or interesting insights. Feel free to reach out!
A granular video anomaly detection framework that integrates the detection of multiple fine-grained anomalous objects into a unified framework, achieving state-of-the-art performance.
An initial exploration into embedding customizable, imperceptible, and recoverable information within the renders produced by off-the-line 3D generative models, while ensuring minimal impact on the rendered content's quality.
A pioneering foray into the intriguing realm of embedding, relating and perceiving the heterogeneous patterns from various biomedical modalities holistically via a graph theory.
A novel approach that leverages the ambiguity and uncertainty in object boundaries to improve segmentation performance, turning traditional segmentation "flaws" into advantages.
Co-founder & TechLead, ScholaGO Education Technology Company Limited
I co-founded ScholaGO Education Technology Company Limited (学旅通教育科技有限公司) to develop innovative LLM-backed educational products that transform static knowledge into immersive, interactive, multimodal experiences. We secured funding from HKSTP, HK Tech 300 government entrepreneurship funds, and Alibaba Cloud, aiming to create impactful technologies that enhance education and societal well-being.
🎨 Personal Interests
📚 Reading: I dedicate substantial time each week to reading and deep contemplation. I have a particular passion for exploring history, philosophy, and sociology, which I believe has enriched my cognitive perspectives significantly.
📈 Investment:
From a reinforcement learning perspective, no things can provide more verifiable rewards than precise return figures, which enables me to continuously refine my decision-making policy.
From a synergistic perspective of perception and generation, I guess training investment decision-making abilities (perception tasks) may benefit future entrepreneurial ventures (generation tasks)? Maybe I can answer it in the future.