Hi! I’m Baoxiong, a research scientist at BIGAI. I received my Ph.D in the Department of Computer Science, University of California, Los Angeles. My research interests lie in the intersection of computer vision, artificial intelligence and cognitive science, with a special focus on spatial/temporal reasoning and its application to acting and planning in real world (scene/activity understanding, future prediction, grounded manipulation, etc.). Previously, I obtained my M.S. from UCLA in 2019 and B.S. from Peking University in 2018.
Info: Email / Google Scholar / CV /
News New I'm co-organizing the 5th 3D Scene Understanding workshop at CVPR 2025. See you in Nashvile! New I recently gave a summary of our work at BostonDynamics. Checktout the slides ! New RoboVerse is accepted by RSS 2025! Go check it out here ! New Four papers on 3D Scene Understanding and Reconstruction are accepted by CVPR 2025! New Two papers on Mobile Manipulation and Articulated Part Generation are accepted by ICRA 2025! 2025/01 One paper on Articulated Object Reconstruction is accepted by ICLR 2025! 2024/12 One paper on Multi-modal 3D Situated Reasoning is accepted by NeurIPS 2024! 2024/10 I recently gave a summary of our work at BIGAI at ChinaGraph 2024. Checktout the slides ! 2024/10 I will be attending ECCV 2024 this year, see you in Milan! 2024/07 I recently gave a talk on Embodied 3D Vision on ZhiDX . Checkout the slides ! 2024/07 SceneVerse is accepted by ECCV 2024. Stay tuned for full data and model release at this link ! 2024/07 Three papers on 3D-VL and Object-centric Learning is accepted by ECCV 2024. 2024/06 Our embodied generalist LEO is accepted by ICML 2024. Check out our code and data at this link . 2024/06 I'm co-organizing the MANGO workshop at CVPR 2024. See you in Seattle! 2024/06 SceneVerse data is released ! Find the download link and instructions at this link . 2024/06 Two papers on 3D motion and scene generation accepted by CVPR 2024 as Highlight . 2024/03 LEO code and data is released ! Find the download link and instruction at this link . 2024/02 Announcing SceneVerse for 3D-VL learning. Checkout our the project page . MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans Huangyue Yu* ,
Baoxiong Jia* ,
Yixin Chen* ,
Yandan Yang ,
Puhao Li ,
Rongpeng Su ,
Jiaxin Li ,
Qing Li ,
Wei Liang ,
Song-Chun Zhu ,
Tengyu Liu ,
Siyuan Huang .
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025 (* indicates equal contribution.)
Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025 (* indicates equal contribution.)
Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025 (* indicates equal contribution.)
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning Haoran Geng* ,
Feishi Wang* ,
Songlin Wei* ,
Yuyang Li* ,
Bangjun Wang* ,
Boshi An* ,
Charlie Tianyue Cheng* ,
Haozhe Lou ,
Peihao Li ,
Yen-Jen Wang ,
Yutong Liang ,
Dylan Goetting ,
Chaoyi Xu ,
Haozhe Chen ,
Yuxi Qian ,
Yiran Geng ,
Jiageng Mao ,
Weikang Wan ,
Mingtong Zhang ,
Jiangran Lyu ,
Siheng Zhao ,
Jiazhao Zhang ,
Jialiang Zhang ,
Chengyang Zhao ,
Haoran Lu ,
Yufei Ding ,
Ran Gong ,
Yuran Wang ,
Yuxuan Kuang ,
Ruihai Wu ,
Baoxiong Jia ,
Carlo Sferrazza ,
Hao Dong ,
Siyuan Huang# ,
Yue Wang# ,
Jitendra Malik# ,
Pieter Abbeel# .
Robotics Science and Systems (RSS) 2025 (* indicates equal contribution.)
Buliding Interactable Replicas of Complex Articulated Objects via Gaussian Splatting International Conference on Learning Representations (ICLR) 2025 (* indicates equal contribution.)
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V International Conference on Robotics and Automation (ICRA) 2025 (* indicates equal contribution. # indicates corresponding author.)
MSR3D: Multi-modal Situated Reasoning in 3D Scenes Advances in Neural Information Processing Systems (NeurIPS) 2024 (* indicates equal contribution. # indicates corresponding author.)
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding European Conference on Computer Vision (ECCV) 2024 OpenSUN3D @ ECCV 2024 (* indicates equal contribution)
SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields European Conference on Computer Vision (ECCV) 2024 Wild3D @ ECCV 2024 (* indicates equal contribution.)
An Embodied Generalist Agent in 3D World International Conference on Machine Learning (ICML) 2024 GenAI4DM & AGI @ ICLR 2024 (* indicates equal contribution.)
PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight ) AI3DG @ CVPR 2024 (* indicates equal contribution.)
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight ) HuMoGen @ CVPR 2024
ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic Scenes Ran Gong* ,
Jiangyong Huang* ,
Yizhou Zhao ,
Haoran Geng ,
Xiaofeng Gao ,
Qingyang Wu ,
Wensi Ai ,
Ziheng Zhou ,
Demetri Terzopoulos ,
Song-Chun Zhu ,
Baoxiong Jia# ,
Siyuan Huang# .
International Conference on Computer Vision (ICCV) 2023 LangRob @ CoRL 2022 (* indicates equal contribution. # indicates corresponding author.)
Diffusion-based Generation, Optimization, and Planning in 3D Scenes Conference on Computer Vision and Pattern Recognition (CVPR) 2023 (* indicates equal contribution.)
Improving Unsupervised Object-centric Learning with Query Optimization International Conference on Learning Represetnations (ICLR) 2023 (* indicates equal contribution.)