SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent Advances in Neural Information Processing Systems (NeurIPS) 2025 (RoboGen@IROS 2025 Best Paper Award ) (* indicates equal contribution. # indicates corresponding author.)
Learning Unified Force and Position Control for Legged Loco-Manipulation Conference on Robot Learning (CoRL) 2025 (Best Paper Award ) (* indicates equal contribution. # indicates corresponding author.)
GWM: Toward Scalable Gaussian World Models for Robotic Manipulation International Conference on Computer Vision (ICCV) 2025 (* indicates equal contribution. # indicates corresponding author.)
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation Ziyu Zhu ,
Xilin Wang ,
Yixuan Li ,
Zhuofan Zhang ,
Xiaojian Ma ,
Yixin Chen ,
Baoxiong Jia ,
Wei Liang ,
Qian Yu ,
Zhidong Deng ,
Siyuan Huang ,
Qing Li .
International Conference on Computer Vision (ICCV) 2025 OpenSUN3D @ ECCV 2024
MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans Huangyue Yu* ,
Baoxiong Jia* ,
Yixin Chen* ,
Yandan Yang ,
Puhao Li ,
Rongpeng Su ,
Jiaxin Li ,
Qing Li ,
Wei Liang ,
Song-Chun Zhu ,
Tengyu Liu ,
Siyuan Huang .
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025 (* indicates equal contribution.)
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025 (* indicates equal contribution.)
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025 (* indicates equal contribution.)
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning Haoran Geng* ,
Feishi Wang* ,
Songlin Wei* ,
Yuyang Li* ,
Bangjun Wang* ,
Boshi An* ,
Charlie Tianyue Cheng* ,
Haozhe Lou ,
Peihao Li ,
Yen-Jen Wang ,
Yutong Liang ,
Dylan Goetting ,
Chaoyi Xu ,
Haozhe Chen ,
Yuxi Qian ,
Yiran Geng ,
Jiageng Mao ,
Weikang Wan ,
Mingtong Zhang ,
Jiangran Lyu ,
Siheng Zhao ,
Jiazhao Zhang ,
Jialiang Zhang ,
Chengyang Zhao ,
Haoran Lu ,
Yufei Ding ,
Ran Gong ,
Yuran Wang ,
Yuxuan Kuang ,
Ruihai Wu ,
Baoxiong Jia ,
Carlo Sferrazza ,
Hao Dong ,
Siyuan Huang# ,
Yue Wang# ,
Jitendra Malik# ,
Pieter Abbeel# .
Robotics Science and Systems (RSS) 2025 (RoboGen@IROS2 2025 Best Open-source Award ) (* indicates equal contribution.)
MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025 (* indicates equal contribution.)
Buliding Interactable Replicas of Complex Articulated Objects via Gaussian Splatting International Conference on Learning Representations (ICLR) 2025 (* indicates equal contribution.)
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V International Conference on Robotics and Automation (ICRA) 2025 (* indicates equal contribution. # indicates corresponding author.)
PhysPart: Physically Plausible Part Completion for Interactable Objects International Conference on Robotics and Automation (ICRA) 2025 (* indicates equal contribution.)
MSR3D: Multi-modal Situated Reasoning in 3D Scenes Advances in Neural Information Processing Systems (NeurIPS) 2024 (* indicates equal contribution. # indicates corresponding author.)
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding European Conference on Computer Vision (ECCV) 2024 OpenSUN3D @ ECCV 2024 (* indicates equal contribution)
SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields European Conference on Computer Vision (ECCV) 2024 Wild3D @ ECCV 2024 (* indicates equal contribution.)
Unifying 3D Vision-Language Understanding via Promptable Queries European Conference on Computer Vision (ECCV) 2024 OpenSUN3D @ ECCV 2024
An Embodied Generalist Agent in 3D World International Conference on Machine Learning (ICML) 2024 GenAI4DM & AGI @ ICLR 2024 (* indicates equal contribution.)
Human-level Few-shot Concept Induction through Minimax Entropy Learning Science Advances (SciAdv) 2024 PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight ) AI3DG @ CVPR 2024 (* indicates equal contribution.)
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight ) HuMoGen @ CVPR 2024
ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab Advances in Neural Information Processing System (NeurIPS) 2023 (* indicates equal contribution.)
X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events International Conference on Computer Vision (ICCV) 2023 (Oral ) ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic Scenes Ran Gong* ,
Jiangyong Huang* ,
Yizhou Zhao ,
Haoran Geng ,
Xiaofeng Gao ,
Qingyang Wu ,
Wensi Ai ,
Ziheng Zhou ,
Demetri Terzopoulos ,
Song-Chun Zhu ,
Baoxiong Jia# ,
Siyuan Huang# .
International Conference on Computer Vision (ICCV) 2023 LangRob @ CoRL 2022 (* indicates equal contribution. # indicates corresponding author.)
Learning a Causal Transition Model for Object Cutting International Conference on Intelligent Robots and Systems (IROS) 2023 (* indicates equal contribution.)
Diffusion-based Generation, Optimization, and Planning in 3D Scenes Conference on Computer Vision and Pattern Recognition (CVPR) 2023 (* indicates equal contribution.)
Improving Unsupervised Object-centric Learning with Query Optimization International Conference on Learning Represetnations (ICLR) 2023 (* indicates equal contribution.)
EgoTaskQA: Understanding Human Tasks in Egocentric Videos Advances in Neural Information Processing System (NeurIPS) 2022 Learning Algebraic Representation for Systematic Generalization in Contextual Decision Processes European Conference on Computer Vision (ECCV) 2022 (* indicates equal contribution.)
Latent Diffusion Energy-Based Model for Interpretable Text Modeling International Conference on Machine Learning (ICML) 2022 ACRE: Abstract Causal REasoning Beyond Covariation IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021 (* indicates equal contribution.)
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021 (* indicates equal contribution.)
LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities European Conference on Computer Vision (ECCV) 2020 A Generalized Earley Parser for Human Activity Parsing and Prediction Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 Learning Perceptual Inference by Contrasting Conference on Neural Information Processing Systems (NeurIPS) 2019 (Spotlight ) (* indicates equal contribution.)
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019 (* indicates equal contribution.)
Learning Human-Object Interactions by Graph Parsing Neural Networks European Conference on Computer Vision (ECCV) 2018 (* indicates equal contribution.)
Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction International Conference on Machine Learning (ICML) 2018 Mining User Reviews for Mobile App Comparison ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp) 2017