Hi! I’m Baoxiong, a research scientist at BIGAI. I received my Ph.D in the Department of Computer Science, University of California, Los Angeles. My research interests lie in the intersection of computer vision, artificial intelligence and cognitive science, with a special focus on spatial/temporal reasoning and its application to acting and planning in real world (scene/activity understanding, future prediction, grounded manipulation, etc.). Previously, I obtained my M.S. from UCLA in 2019 and B.S. from Peking University in 2018.
Info: Email / Google Scholar / CV
News New One paper on procedural understanding in videos is accepted by NeurIPS 2023. 2023/10 Two papers accepted by ICCV 2023, congrats to the authors! 2023/10 One paper on temporal and causal transition of objects is accepted by IROS 2023. 06/2023 One paper on diffusion models for 3D is accepted by CVPR 2023. 05/2023 One paper on unsupervised object-centric learning is accepted by ICLR 2023. 12/2022 One paper on language grounding for robots is accepted as spotlight by LangRob@CoRL 2022. 10/2022 EgoTaskQA project page released, checkout our code and data! 09/2022 One paper on egocentric goal-oriented reasoning is accepted by NeurIPS 2022. 05/2022 One paper on systematic generalization in RAVEN test is accepted by ECCV 2022. 04/2022 One paper on latent diffusion energy-based model is accepted by ICML 2022. 09/2021 My internship on grounded spatial-temporal reasoning for videoQA at Amazon Alexa finished! 02/2021 Two papers on abstract reasoning are accepted by CVPR 2021. 05/2020 LEMMA project page released, check out our code and data! 12/2020 I advanced to candidacy! 07/2020 One paper on multi-agent multi-task activities is accepted by ECCV 2020. 02/2020 One paper on video parsing and prediction is accepted by TPAMI 2020. ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab Advances in Neural Information Processing System (NeurIPS) 2023 (Track on Datasets and Benchmarks )
ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic Scenes Ran Gong* ,
Jiangyong Huang* ,
Yizhou Zhao ,
Haoran Geng ,
Xiaofeng Gao ,
Qingyang Wu ,
Wensi Ai ,
Ziheng Zhou ,
Demetri Terzopoulos ,
Song-Chun Zhu ,
Baoxiong Jia ,
Siyuan Huang .
International Conference on Computer Vision (ICCV) 2023 LangRob@CoRL 2022 (* indicates equal contribution.)
Diffusion-based Generation, Optimization, and Planning in 3D Scenes Conference on Computer Vision and Pattern Recognition (CVPR) 2023 (* indicates equal contribution.)
Improving Unsupervised Object-centric Learning with Query Optimization International Conference on Learning Represetnations (ICLR) 2023 (* indicates equal contribution.)
EgoTaskQA: Understanding Human Tasks in Egocentric Videos Advances in Neural Information Processing System (NeurIPS) 2022 (Track on Datasets and Benchmarks ) Learning Algebraic Representation for Systematic Generalization in Contextual Decision Processes European Conference on Computer Vision (ECCV) 2022 (* indicates equal contribution.)
Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021 (* indicates equal contribution.)
LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities European Conference on Computer Vision (ECCV) 2020 A Generalized Earley Parser for Human Activity Parsing and Prediction Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020 Learning Perceptual Inference by Contrasting Conference on Neural Information Processing Systems (NeurIPS) 2019 (Spotlight ) (* indicates equal contribution.)