Hi!

Hi! I’m Baoxiong, a research scientist at BIGAI. I received my Ph.D in the Department of Computer Science, University of California, Los Angeles. My research interests lie in the intersection of computer vision, artificial intelligence and cognitive science, with a special focus on spatial/temporal reasoning and its application to acting and planning in real world (scene/activity understanding, future prediction, grounded manipulation, etc.). Previously, I obtained my M.S. from UCLA in 2019 and B.S. from Peking University in 2018.

Info: Email / Google Scholar / CV /

News

  •   New        I will be attending ECCV 2024 this year, see you in Milan!
  •   New        I recently gave a talk on Embodied 3D Vision on ZhiDX . Checkout the slides !
  • 2024/07    SceneVerse is accepted by ECCV 2024. Stay tuned for full data and model release at this link !
  • 2024/07    Three papers on 3D-VL and Object-centric Learning is accepted by ECCV 2024.
  • 2024/06    Our embodied generalist LEO is accepted by ICML 2024. Check out our code and data at this link .
  • 2024/06    I'm co-organizing the MANGO workshop at CVPR 2024. See you in Seattle!
  • 2024/06    SceneVerse data is released ! Find the download link and instructions at this link .
  • 2024/06    Two papers on 3D motion and scene generation accepted by CVPR 2024 as Highlight .
  • 2024/03    LEO code and data is released ! Find the download link and instruction at this link .
  • 2024/02    Announcing SceneVerse for 3D-VL learning. Checkout our the project page .
  • 2023/12    One paper on procedural understanding in videos is accepted by NeurIPS 2023.
  • 2023/10    Two papers accepted by ICCV 2023, congrats to the authors!
  • 2023/10    One paper on temporal and causal transition of objects is accepted by IROS 2023.
  • 06/2023    One paper on diffusion models for 3D is accepted by CVPR 2023.
  • 05/2023    One paper on unsupervised object-centric learning is accepted by ICLR 2023.
  • Recent Selected Publications (All publications)

    SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

    European Conference on Computer Vision (ECCV) 2024

    SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields

    Yu Liu* , Baoxiong Jia* , Yixin Chen , Siyuan Huang .
    European Conference on Computer Vision (ECCV) 2024
    (* indicates equal contribution.)

    An Embodied Generalist Agent in 3D World

    International Conference on Machine Learning (ICML) 2024
    GenAI4DM & AGI @ ICLR 2024 (* indicates equal contribution.)

    Human-level Few-shot Concept Induction through Minimax Entropy Learning

    Chi Zhang , Baoxiong Jia , Yixin Zhu , Song-Chun Zhu .
    Science Advances (SciAdv) 2024

    PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

    , Baoxiong Jia* , , Siyuan Huang .
    Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight)
    (* indicates equal contribution.)

    Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

    Zan Wang , Yixin Chen , Baoxiong Jia , Puhao Li , Jinlu Zhang , , Tengyu Liu , Yixin Zhu , Wei Liang , Siyuan Huang .
    Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight)

    ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic Scenes

    Ran Gong* , Jiangyong Huang* , Yizhou Zhao , Haoran Geng , Xiaofeng Gao , , , , Demetri Terzopoulos , Song-Chun Zhu , Baoxiong Jia , Siyuan Huang .
    International Conference on Computer Vision (ICCV) 2023
    LangRob @ CoRL 2022 (* indicates equal contribution.)

    Diffusion-based Generation, Optimization, and Planning in 3D Scenes

    Conference on Computer Vision and Pattern Recognition (CVPR) 2023
    (* indicates equal contribution.)

    Improving Unsupervised Object-centric Learning with Query Optimization

    Baoxiong Jia* , Yu Liu* , Siyuan Huang .
    International Conference on Learning Represetnations (ICLR) 2023
    (* indicates equal contribution.)

    EgoTaskQA: Understanding Human Tasks in Egocentric Videos

    Baoxiong Jia , , Song-Chun Zhu , Siyuan Huang .
    Advances in Neural Information Processing System (NeurIPS) 2022 (Track on Datasets and Benchmarks)


    Baoxiong Jia © 2024. All rights reserved.