Hi! I’m Baoxiong, a research scientist at BIGAI. I received my Ph.D in the Department of Computer Science, University of California, Los Angeles. My research interests lie in the intersection of computer vision, artificial intelligence and cognitive science, with a special focus on spatial/temporal reasoning and its application to acting and planning in real world (scene/activity understanding, future prediction, grounded manipulation, etc.). Previously, I obtained my M.S. from UCLA in 2019 and B.S. from Peking University in 2018.

Info: Email / Google Scholar / CV


  •   New        One paper on procedural understanding in videos is accepted by NeurIPS 2023.
  • 2023/10    Two papers accepted by ICCV 2023, congrats to the authors!
  • 2023/10    One paper on temporal and causal transition of objects is accepted by IROS 2023.
  • 06/2023    One paper on diffusion models for 3D is accepted by CVPR 2023.
  • 05/2023    One paper on unsupervised object-centric learning is accepted by ICLR 2023.
  • 12/2022    One paper on language grounding for robots is accepted as spotlight by LangRob@CoRL 2022.
  • 10/2022    EgoTaskQA project page released, checkout our code and data!
  • 09/2022    One paper on egocentric goal-oriented reasoning is accepted by NeurIPS 2022.
  • 05/2022    One paper on systematic generalization in RAVEN test is accepted by ECCV 2022.
  • 04/2022    One paper on latent diffusion energy-based model is accepted by ICML 2022.
  • 09/2021    My internship on grounded spatial-temporal reasoning for videoQA at Amazon Alexa finished!
  • 02/2021    Two papers on abstract reasoning are accepted by CVPR 2021.
  • 05/2020    LEMMA project page released, check out our code and data!
  • 12/2020    I advanced to candidacy!
  • 07/2020    One paper on multi-agent multi-task activities is accepted by ECCV 2020.
  • 02/2020    One paper on video parsing and prediction is accepted by TPAMI 2020.
  • Selected Publications (All publications)

    ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab

    Jieming Cui* , , Baoxiong Jia* , Siyuan Huang , Zilong Zheng , Jianzhu Ma , Yixin Zhu .
    Advances in Neural Information Processing System (NeurIPS) 2023 (Track on Datasets and Benchmarks)

    ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic Scenes

    Ran Gong* , Jiangyong Huang* , Yizhou Zhao , Haoran Geng , Xiaofeng Gao , , , , Demetri Terzopoulos , Song-Chun Zhu , Baoxiong Jia , Siyuan Huang .
    International Conference on Computer Vision (ICCV) 2023
    LangRob@CoRL 2022 (* indicates equal contribution.)

    Diffusion-based Generation, Optimization, and Planning in 3D Scenes

    Conference on Computer Vision and Pattern Recognition (CVPR) 2023
    (* indicates equal contribution.)

    Improving Unsupervised Object-centric Learning with Query Optimization

    Baoxiong Jia* , Yu Liu* , Siyuan Huang .
    International Conference on Learning Represetnations (ICLR) 2023
    (* indicates equal contribution.)

    EgoTaskQA: Understanding Human Tasks in Egocentric Videos

    Baoxiong Jia , , Song-Chun Zhu , Siyuan Huang .
    Advances in Neural Information Processing System (NeurIPS) 2022 (Track on Datasets and Benchmarks)

    Learning Algebraic Representation for Systematic Generalization in Contextual Decision Processes

    European Conference on Computer Vision (ECCV) 2022
    (* indicates equal contribution.)

    Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

    Chi Zhang* , Baoxiong Jia* , Song-Chun Zhu , Yixin Zhu .
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021
    (* indicates equal contribution.)

    LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

    European Conference on Computer Vision (ECCV) 2020

    A Generalized Earley Parser for Human Activity Parsing and Prediction

    Siyuan Qi , Baoxiong Jia , Siyuan Huang , Ping Wei , Song-Chun Zhu .
    Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020

    Learning Perceptual Inference by Contrasting

    Conference on Neural Information Processing Systems (NeurIPS) 2019 (Spotlight)
    (* indicates equal contribution.)

    Baoxiong Jia © 2022. All rights reserved.