Publication

MSR3D: Multi-modal Situated Reasoning in 3D Scenes

Advances in Neural Information Processing Systems (NeurIPS) 2024
(* indicates equal contribution. # indicates corresponding author.)

SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding

European Conference on Computer Vision (ECCV) 2024
OpenSUN3D @ ECCV 2024 (* indicates equal contribution)

SlotLifter: Slot-guided Feature Lifting for Learning Object-centric Radiance Fields

Yu Liu* , Baoxiong Jia* , Yixin Chen , Siyuan Huang .
European Conference on Computer Vision (ECCV) 2024
Wild3D @ ECCV 2024 (* indicates equal contribution.)

Unifying 3D Vision-Language Understanding via Promptable Queries

Ziyu Zhu , , Xiaojian Ma , Xuesong Niu , Yixin Chen , Baoxiong Jia , , Siyuan Huang , Qing Li .
European Conference on Computer Vision (ECCV) 2024
OpenSUN3D @ ECCV 2024

An Embodied Generalist Agent in 3D World

International Conference on Machine Learning (ICML) 2024
GenAI4DM & AGI @ ICLR 2024 (* indicates equal contribution.)

Human-level Few-shot Concept Induction through Minimax Entropy Learning

Chi Zhang , Baoxiong Jia , Yixin Zhu , Song-Chun Zhu .
Science Advances (SciAdv) 2024

PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI

, Baoxiong Jia* , , Siyuan Huang .
Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight)
AI3DG @ CVPR 2024 (* indicates equal contribution.)

Move as You Say, Interact as You Can: Language-guided Human Motion Generation with Scene Affordance

Zan Wang , Yixin Chen , Baoxiong Jia , Puhao Li , Jinlu Zhang , , Tengyu Liu , Yixin Zhu , Wei Liang , Siyuan Huang .
Conference on Computer Vision and Pattern Recognition (CVPR) 2024 (Highlight)
HuMoGen @ CVPR 2024

ProBio: A Protocol-guided Multimodal Dataset for Molecular Biology Lab

Jieming Cui* , , Baoxiong Jia* , Siyuan Huang , Zilong Zheng , Jianzhu Ma , Yixin Zhu .
Advances in Neural Information Processing System (NeurIPS) 2023 (Track on Datasets and Benchmarks)
(* indicates equal contribution.)

X-VoE: Measuring eXplanatory Violation of Expectation in Physical Events

, , Baoxiong Jia , Zeyu Zhang , Chi Zhang , Yixin Zhu , Song-Chun Zhu .
International Conference on Computer Vision (ICCV) 2023 (Oral)

ARNOLD: A Benchmark for Language-Grounded Task Learning with Continuous States in Realistic Scenes

Ran Gong* , Jiangyong Huang* , Yizhou Zhao , Haoran Geng , Xiaofeng Gao , , , , Demetri Terzopoulos , Song-Chun Zhu , Baoxiong Jia# , Siyuan Huang# .
International Conference on Computer Vision (ICCV) 2023
LangRob @ CoRL 2022 (* indicates equal contribution. # indicates corresponding author.)

Learning a Causal Transition Model for Object Cutting

International Conference on Intelligent Robots and Systems (IROS) 2023
(* indicates equal contribution.)

Diffusion-based Generation, Optimization, and Planning in 3D Scenes

Conference on Computer Vision and Pattern Recognition (CVPR) 2023
(* indicates equal contribution.)

Improving Unsupervised Object-centric Learning with Query Optimization

Baoxiong Jia* , Yu Liu* , Siyuan Huang .
International Conference on Learning Represetnations (ICLR) 2023
(* indicates equal contribution.)

EgoTaskQA: Understanding Human Tasks in Egocentric Videos

Baoxiong Jia , , Song-Chun Zhu , Siyuan Huang .
Advances in Neural Information Processing System (NeurIPS) 2022 (Track on Datasets and Benchmarks)

Learning Algebraic Representation for Systematic Generalization in Contextual Decision Processes

European Conference on Computer Vision (ECCV) 2022
(* indicates equal contribution.)

Latent Diffusion Energy-Based Model for Interpretable Text Modeling

, Sirui Xie , Xiaojian Ma , Baoxiong Jia , , Ruiqi Gao , Yixin Zhu , Song-Chun Zhu , Ying Nian Wu .
International Conference on Machine Learning (ICML) 2022

ACRE: Abstract Causal REasoning Beyond Covariation

IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021
(* indicates equal contribution.)

Abstract Spatial-Temporal Reasoning via Probabilistic Abduction and Execution

Chi Zhang* , Baoxiong Jia* , Song-Chun Zhu , Yixin Zhu .
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2021
(* indicates equal contribution.)

LEMMA: A Multi-view Dataset for Learning Multi-agent Multi-task Activities

European Conference on Computer Vision (ECCV) 2020

A Generalized Earley Parser for Human Activity Parsing and Prediction

Siyuan Qi , Baoxiong Jia , Siyuan Huang , Ping Wei , Song-Chun Zhu .
Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2020

Learning Perceptual Inference by Contrasting

Conference on Neural Information Processing Systems (NeurIPS) 2019 (Spotlight)
(* indicates equal contribution.)

RAVEN: A Dataset for Relational and Analogical Visual rEasoNing

Chi Zhang* , Feng Gao* , Baoxiong Jia , Yixin Zhu , Song-Chun Zhu .
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2019
(* indicates equal contribution.)

Learning Human-Object Interactions by Graph Parsing Neural Networks

Siyuan Qi* , Wenguan Wang* , Baoxiong Jia , , Song-Chun Zhu .
European Conference on Computer Vision (ECCV) 2018
(* indicates equal contribution.)

Generalized Earley Parser: Bridging Symbolic Grammars and Sequence Data for Future Prediction

Siyuan Qi , Baoxiong Jia , Song-Chun Zhu .
International Conference on Machine Learning (ICML) 2018

Mining User Reviews for Mobile App Comparison

Yuanchun Li , Baoxiong Jia , Yao Guo , .
ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp) 2017


Baoxiong Jia © 2024. All rights reserved.