Hi!

Hi! I’m Baoxiong, a research scientist at BIGAI. I received my Ph.D in the Department of Computer Science, University of California, Los Angeles. My research interests lie in the intersection of computer vision, artificial intelligence, robotics and cognitive science, with a special focus on spatial/temporal reasoning and its application to acting and planning in real world (scene/activity understanding, future prediction, grounded manipulation, etc.). My recent works focuses on integrating all previous research into humanoid robots and make them helpful when I’m old :-)

Previously, I obtained my M.S. from UCLA in 2019 and B.S. from Peking University in 2018.

Info: Email / Google Scholar / CV /

News

  • New Our paper COLA on Human-Humanoid Collaboration is accepted by ICRA 2026, check it out!
  • New Our paper SceneCOT on CoT-Reasoning in 3D scenes is accepted by ICLR 2026, check it out!
  • New Invited tutorial at EIS 2025 hosted by ACM SIGEMBED China, checkout the slides!
  • New Invited talk at EAIRCon 2025 on 3D Gaussian World Models, checktout the slides!
  • 2025/10 SceneWeaver receives the Best Paper at RoboGen@IROS25, checkout the slides and talk (EN)!
  • 2025/10 We won the first place at the IROS 25 UniTree Dancing Challenge!
  • 2025/10 RoboVerse receives the Best Open-source Award at RoboGen@IROS25!
  • 2025/10 Invited talk at HKU and 3DCVer on UniFP and COLA, checktout the slides and talk (CN)!
  • 2025/09 UniFP receives the Best Paper Award at CoRL 2025! Oral talk available here!
  • 2025/09 One paper on Agentic 3D Scene Generation is accepted by NeurIPS 2025.
  • 2025/08 We won the champion of humanoid dancing at World Humanoid Robot Games (WHRG)!
  • 2025/06 One paper on Unified Force and Position Control is accepted by CoRL 2025 as Oral!
  • 2025/06 Two papers on 4D World Model and Embodied Vision Language are accepted by ICCV 2025!
  • 2025/06 I’m co-organizing the 5th 3D Scene Understanding workshop at CVPR 2025. See you in Nashvile!
  • 2025/04 RoboVerse is accepted by RSS 2025! Go check it out here!
  • 2025/03 I recently gave a summary of our work at BostonDynamics. Checktout the slides!
  • 2025/02 Four papers on 3D Scene Understanding and Reconstruction are accepted by CVPR 2025!
  • 2025/01 Two papers on Mobile Manipulation and Articulated Part Generation are accepted by ICRA 2025!
  • 2025/01 One paper on Articulated Object Reconstruction is accepted by ICLR 2025!

Selected Recent Publications (All publications)

Learning Human-Humanoid Coordination for Collaborative Object Carrying

International Conference on Robotics and Automation (ICRA) 2026
(★ indicates equal contribution. ✉ indicates corresponding author.)

SceneCOT: Eliciting Grounded Chain-of-Thought Reasoning in 3D Scenes

International Conference on Learning Representations (ICLR) 2026
(✉ indicates corresponding author.)

SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

Yandan Yang , Baoxiong Jia★✉ , , Siyuan Huang .
Advances in Neural Information Processing Systems (NeurIPS) 2025 ( RoboGen@IROS 2025 Best Paper Award )
(★ indicates equal contribution. ✉ indicates corresponding author.)

Learning Unified Force and Position Control for Legged Loco-Manipulation

Conference on Robot Learning (CoRL) 2025 ( Best Paper Award )
(★ indicates equal contribution. ✉ indicates corresponding author.)

GWM: Toward Scalable Gaussian World Models for Robotic Manipulation

International Conference on Computer Vision (ICCV) 2025
(★ indicates equal contribution. ✉ indicates corresponding author.)

MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025
(★ indicates equal contribution.)

Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025
(★ indicates equal contribution.)

Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding

Yan Wang , Baoxiong Jia , Ziyu Zhu , Siyuan Huang .
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025
(★ indicates equal contribution.)

RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning

Robotics Science and Systems (RSS) 2025 ( RoboGen@IROS2 2025 Best Open-source Award )
(★ indicates equal contribution. ✉ indicates corresponding author.)

Buliding Interactable Replicas of Complex Articulated Objects via Gaussian Splatting

International Conference on Learning Representations (ICLR) 2025
(★ indicates equal contribution.)

Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V

Peiyuan Zhi , Zhiyuan Zhang , , Muzhi Han , Zeyu Zhang , , Ziyuan Jiao , Baoxiong Jia , Siyuan Huang .
International Conference on Robotics and Automation (ICRA) 2025
(★ indicates equal contribution. ✉ indicates corresponding author.)

MSR3D: Multi-modal Situated Reasoning in 3D Scenes

Advances in Neural Information Processing Systems (NeurIPS) 2024
(★ indicates equal contribution. ✉ indicates corresponding author.)

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

European Conference on Computer Vision (ECCV) 2024
OpenSUN3D @ ECCV 2024 (★ indicates equal contribution.)

SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields

Yu Liu , Baoxiong Jia , Yixin Chen , Siyuan Huang .
International Conference on Learning Representations (ICLR) 2023
Wild3D @ ECCV 2024 (★ indicates equal contribution.)

An Embodied Generalist Agent in 3D World

International Conference on Machine Learning (ICML) 2024
GenAI4DM & AGI @ ICLR 2024 (★ indicates equal contribution.)

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

Yandan Yang , Baoxiong Jia , Peiyuan Zhi , Siyuan Huang .
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024 ( Highlight )
AI3DG @ CVPR 2024 (★ indicates equal contribution.)

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Zan Wang , Yixin Chen , Baoxiong Jia , Puhao Li , Jinlu Zhang , , Tengyu Liu , Yixin Zhu , , Siyuan Huang .
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2024 ( Highlight )
HuMoGen @ CVPR 2024


Last modified by Baoxiong Jia in October 2025.