Hi!

Hi! I’m Baoxiong, a research scientist at BIGAI. I received my Ph.D in the Department of Computer Science, University of California, Los Angeles. My research interests lie in the intersection of computer vision, artificial intelligence and cognitive science, with a special focus on spatial/temporal reasoning and its application to acting and planning in real world (scene/activity understanding, future prediction, grounded manipulation, etc.). Previously, I obtained my M.S. from UCLA in 2019 and B.S. from Peking University in 2018.

Info: Email / Google Scholar / CV /

News

  •   New        SceneWeaver receives the Best Paper Award at RoboGen@IROS25 , checkout the slides and talk (EN) !
  •   New        We won the first place at the IROS 25 UniTree Dancing Challenge!
  •   New        RoboVerse receives the Best Open-source Award at RoboGen@IROS25 !
  •   New        Invited talk at HKU and 3DCVer true UniFP and COLA , checktout the slides and talk (CN) !
  •   New        UniFP receives the Best Paper Award at CoRL 2025! Oral talk available here !
  • 2025/09    One paper on Agentic 3D Scene Generation is accepted by NeurIPS 2025.
  • 2025/08    We won the of humanoid dancing champion World Humanoid Robot Games (WHRG) !
  • 2025/06    One paper on Unified Force and Position Control is accepted by CoRL 2025 as Oral !
  • 2025/06    Two papers on 4D World Model and Embodied Vision Language are accepted by ICCV 2025!
  • 2025/06    I'm co-organizing the 5th 3D Scene Understanding workshop at CVPR 2025. See you in Nashvile!
  • 2025/04    RoboVerse is accepted by RSS 2025! Go check it out here !
  • 2025/03    I recently gave a summary of our work at BostonDynamics. Checktout the slides !
  • 2025/02    Four papers on 3D Scene Understanding and Reconstruction are accepted by CVPR 2025!
  • 2025/01    Two papers on Mobile Manipulation and Articulated Part Generation are accepted by ICRA 2025!
  • 2025/01    One paper on Articulated Object Reconstruction is accepted by ICLR 2025!
  • 2024/12    One paper on Multi-modal 3D Situated Reasoning is accepted by NeurIPS 2024!
  • 2024/10    I recently gave a summary of our work at BIGAI at ChinaGraph 2024. Checktout the slides !
  • Selected Recent Publications (All publications)

    SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent

    Yandan Yang* , Baoxiong Jia*# , , Siyuan Huang# .
    Advances in Neural Information Processing Systems (NeurIPS) 2025 (RoboGen@IROS 2025 Best Paper Award)
    (* indicates equal contribution. # indicates corresponding author.)

    Learning Unified Force and Position Control for Legged Loco-Manipulation

    Conference on Robot Learning (CoRL) 2025 (Best Paper Award)
    (* indicates equal contribution. # indicates corresponding author.)

    GWM: Toward Scalable Gaussian World Models for Robotic Manipulation

    International Conference on Computer Vision (ICCV) 2025
    (* indicates equal contribution. # indicates corresponding author.)

    MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025
    (* indicates equal contribution.)

    Unveiling the Mist over 3D Vision-Language Understanding: Object-centric Evaluation with Chain-of-Analysis

    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025
    (* indicates equal contribution.)

    Masked Point-Entity Contrast for Open-Vocabulary 3D Scene Understanding

    Yan Wang* , Baoxiong Jia* , Ziyu Zhu , Siyuan Huang .
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2025
    (* indicates equal contribution.)

    RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning

    Robotics Science and Systems (RSS) 2025 (RoboGen@IROS2 2025 Best Open-source Award)
    (* indicates equal contribution.)

    Buliding Interactable Replicas of Complex Articulated Objects via Gaussian Splatting

    International Conference on Learning Representations (ICLR) 2025
    (* indicates equal contribution.)

    Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V

    Peiyuan Zhi* , Zhiyuan Zhang* , , Muzhi Han , Zeyu Zhang , , Ziyuan Jiao , Baoxiong Jia# , Siyuan Huang# .
    International Conference on Robotics and Automation (ICRA) 2025
    (* indicates equal contribution. # indicates corresponding author.)

    MSR3D: Multi-modal Situated Reasoning in 3D Scenes

    Advances in Neural Information Processing Systems (NeurIPS) 2024
    (* indicates equal contribution. # indicates corresponding author.)

    SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

    European Conference on Computer Vision (ECCV) 2024
    OpenSUN3D @ ECCV 2024 (* indicates equal contribution)

    SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields

    Yu Liu* , Baoxiong Jia* , Yixin Chen , Siyuan Huang .
    European Conference on Computer Vision (ECCV) 2024
    Wild3D @ ECCV 2024 (* indicates equal contribution.)

    An Embodied Generalist Agent in 3D World

    International Conference on Machine Learning (ICML) 2024
    GenAI4DM & AGI @ ICLR 2024 (* indicates equal contribution.)

    PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

    Yandan Yang* , Baoxiong Jia* , , Siyuan Huang .
    Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight)
    AI3DG @ CVPR 2024 (* indicates equal contribution.)

    Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

    Zan Wang , Yixin Chen , Baoxiong Jia , Puhao Li , Jinlu Zhang , , Tengyu Liu , Yixin Zhu , Wei Liang , Siyuan Huang .
    Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight)
    HuMoGen @ CVPR 2024


    Last modified by Baoxiong Jia in October 2025.