Hi! I’m Baoxiong, a Ph.D. candidate of Department of Computer Science, University of California, Los Angeles. I’m currently working at VCLA@UCLA advised by Prof. Song-chun Zhu. Before coming to LA, I got my B.S. in Computer Science from EECS, Peking University in 2018 after spending two wonderful years in the PKU Operating System Lab advised by Prof. Yao Guo. Prior to my Ph.D. study, I joined the PKU-UCLA Joint Research Institute (PKU-UCLA JRI) 3+2 Program, and obtained my M.S. in Computer Science at UCLA in 2019 under the supervision of Prof. Song-chun Zhu. My research interest lies in the intersection of computer vision, artificial intelligence and cognitive science, with an special focus on spatial-temporal learning/reasoning and its application to understanding and planning tasks in both real world (scene/activity understanding, future prediction, etc.) and abstracted domains (Atari, RAVEN, etc.).

Info: Email / Google Scholar / CV


  • 09/2021    My internship on grounded spatial-temporal reasoning for videoQA at Amazon Alexa finished!
  • 02/2021    Two papers on abstract reasoning accepted by CVPR 2021.
  • 05/2020    LEMMA project page released, check out our code and data!
  • 12/2020    I advanced to candidacy!
  • 07/2020    One paper on multi-agent multi-task activities accepted by ECCV 2020.
  • 02/2020    One paper on video parsing and prediction accepted by TPAMI 2020.
  • 09/2019    One paper on relational reasoning accepted as Spotlight by NeurIPS 2019.
  • 02/2019    One paper on IQ accepted by CVPR 2019.
  • 07/2018    One paper on human object interaction accepted by ECCV 2018.
  • 05/2018    One paper on human intention prediction accepted by ICML 2018.
  • 09/2017    I started my journey at UCLA.
  • Selected Publications (All publications)

    Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

    Chi Zhang* , Baoxiong Jia* , Song-chun Zhu , Yixin Zhu .
    IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021
    (* indicates equal contribution.)

    LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

    European Conference on Computer Vision (ECCV) 2020

    A Generalized Earley Parser for Human Activity Parsing and Prediction

    Siyuan Qi , Baoxiong Jia , Siyuan Huang , Ping Wei , Song-chun Zhu .
    Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020

    Learning Perceptual Inference by Contrasting

    Conference on Neural Information Processing Systems (NeurIPS) 2019 (Spotlight)
    (* indicates equal contribution.)

    Baoxiong Jia © 2021. All rights reserved.