Hi! I’m Baoxiong, a research scientist at BIGAI. I received my Ph.D in the Department of Computer Science, University of California, Los Angeles. My research interests lie in the intersection of computer vision, artificial intelligence and cognitive science, with a special focus on spatial/temporal reasoning and its application to acting and planning in real world (scene/activity understanding, future prediction, grounded manipulation, etc.). Previously, I obtained my M.S. from UCLA in 2019 and B.S. from Peking University in 2018.
Info: Email / Google Scholar / CV /
News New I will be attending ECCV 2024 this year, see you in Milan! New I recently gave a talk on Embodied 3D Vision on ZhiDX . Checkout the slides ! 2024/07 SceneVerse is accepted by ECCV 2024. Stay tuned for full data and model release at this link ! 2024/07 Three papers on 3D-VL and Object-centric Learning is accepted by ECCV 2024. 2024/06 Our embodied generalist LEO is accepted by ICML 2024. Check out our code and data at this link . 2024/06 I'm co-organizing the MANGO workshop at CVPR 2024. See you in Seattle! 2024/06 SceneVerse data is released ! Find the download link and instructions at this link . 2024/06 Two papers on 3D motion and scene generation accepted by CVPR 2024 as Highlight . 2024/03 LEO code and data is released ! Find the download link and instruction at this link . 2024/02 Announcing SceneVerse for 3D-VL learning. Checkout our the project page . 2023/12 One paper on procedural understanding in videos is accepted by NeurIPS 2023. 2023/10 Two papers accepted by ICCV 2023, congrats to the authors! 2023/10 One paper on temporal and causal transition of objects is accepted by IROS 2023. 06/2023 One paper on diffusion models for 3D is accepted by CVPR 2023. 05/2023 One paper on unsupervised object-centric learning is accepted by ICLR 2023. SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding European Conference on Computer Vision (ECCV) 2024 SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields European Conference on Computer Vision (ECCV) 2024 (* indicates equal contribution.)
An Embodied Generalist Agent in 3D World International Conference on Machine Learning (ICML) 2024 GenAI4DM & AGI @ ICLR 2024 (* indicates equal contribution.)
Human-level Few-shot Concept Induction through Minimax Entropy Learning Science Advances (SciAdv) 2024 PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight ) (* indicates equal contribution.)
Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight ) ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic Scenes Ran Gong* ,
Jiangyong Huang* ,
Yizhou Zhao ,
Haoran Geng ,
Xiaofeng Gao ,
Qingyang Wu ,
Wensi Ai ,
Ziheng Zhou ,
Demetri Terzopoulos ,
Song-Chun Zhu ,
Baoxiong Jia ,
Siyuan Huang .
International Conference on Computer Vision (ICCV) 2023 LangRob @ CoRL 2022 (* indicates equal contribution.)
Diffusion-based Generation, Optimization, and Planning in 3D Scenes Conference on Computer Vision and Pattern Recognition (CVPR) 2023 (* indicates equal contribution.)
Improving Unsupervised Object-centric Learning with Query Optimization International Conference on Learning Represetnations (ICLR) 2023 (* indicates equal contribution.)
EgoTaskQA: Understanding Human Tasks in Egocentric Videos Advances in Neural Information Processing System (NeurIPS) 2022 (Track on Datasets and Benchmarks )