Hi! I’m Baoxiong, a Ph.D. candidate of Department of Computer Science, University of California, Los Angeles. I’m currently working at VCLA@UCLA advised by Prof. Song-Chun Zhu. Before coming to LA, I got my B.S. in Computer Science from EECS, Peking University in 2018 after spending two wonderful years in the PKU Operating System Lab advised by Prof. Yao Guo. Prior to my Ph.D. study, I joined the PKU-UCLA Joint Research Institute (PKU-UCLA JRI) 3+2 Program, and obtained my M.S. in Computer Science at UCLA in 2019 under the supervision of Prof. Song-Chun Zhu. My research interest lies in the intersection of computer vision, artificial intelligence and cognitive science, with an special focus on spatial-temporal learning/reasoning and its application to understanding and planning tasks in both real world (scene/activity understanding, future prediction, etc.) and abstracted domains (Atari, RAVEN, etc.).

Info: Email / Google Scholar / CV


  •   New!        One paper on Diffusion Models for 3D is accepted by CVPR 2023.
  •   New!        One paper on unsupervised object-centric learning is accepted by ICLR 2023.
  • 12/2022    One paper on language grounding for robots is accepted as spotlight by LangRob@CoRL 2022.
  • 10/2022    EgoTaskQA project page released, checkout our code and data!
  • 09/2022    One paper on egocentric goal-oriented reasoning is accepted by NeurIPS 2022.
  • 05/2022    One paper on systematic generalization in RAVEN test is accepted by ECCV 2022.
  • 04/2022    One paper on latent diffusion energy-based model is accepted by ICML 2022.
  • 09/2021    My internship on grounded spatial-temporal reasoning for videoQA at Amazon Alexa finished!
  • 02/2021    Two papers on abstract reasoning are accepted by CVPR 2021.
  • 05/2020    LEMMA project page released, check out our code and data!
  • 12/2020    I advanced to candidacy!
  • 07/2020    One paper on multi-agent multi-task activities is accepted by ECCV 2020.
  • 02/2020    One paper on video parsing and prediction is accepted by TPAMI 2020.
  • Selected Publications (All publications)

    Diffusion-based Generation, Optimization, and Planning in 3D Scenes

    Conference on Computer Vision and Pattern Recognition (CVPR) 2023
    (* indicates equal contribution.)

    Improving Unsupervised Object-centric Learning with Query Optimization

    Baoxiong Jia* , Yu Liu* , Siyuan Huang .
    International Conference on Learning Represetnations (ICLR) 2023
    (* indicates equal contribution.)

    EgoTaskQA: Understanding Human Tasks in Egocentric Videos

    Baoxiong Jia , , Song-Chun Zhu , Siyuan Huang .
    Advances in Neural Information Processing System (NeurIPS) 2022 (Track on Datasets and Benchmarks)

    Learning Algebraic Representation for Systematic Generalization in Contextual Decision Processes

    European Conference on Computer Vision (ECCV) 2022
    (* indicates equal contribution.)

    Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

    Chi Zhang* , Baoxiong Jia* , Song-Chun Zhu , Yixin Zhu .
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021
    (* indicates equal contribution.)

    LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

    European Conference on Computer Vision (ECCV) 2020

    A Generalized Earley Parser for Human Activity Parsing and Prediction

    Siyuan Qi , Baoxiong Jia , Siyuan Huang , Ping Wei , Song-Chun Zhu .
    Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020

    Learning Perceptual Inference by Contrasting

    Conference on Neural Information Processing Systems (NeurIPS) 2019 (Spotlight)
    (* indicates equal contribution.)

    Baoxiong Jia © 2022. All rights reserved.