My primary research focuses on the intersection of vision and language. Currently, I am exploring the tasks involving vision, language, and robotics, such as language-driven video understanding, open-vocabulary image/video understanding, and interactional robots. Previously, my work centered on hand detection, hand pose estimation, face recognition, and person re-identification.

Welcome students who are interested in the research of vision and language, intelligent robots to join us!

You can contact me via e-mail: yangshuo@smbu.edu.cn; yangshuo129@gmail.com.

🔥 News

2026.07 🎉🎉 An open-vocabulary multi-label action recognition paper is accepted by CVIU 2026 (CCF-B, JCR Q2, IF=3.6)!
2026.05 🎉🎉 An interactive 3D grounding framework and dataset paper is accepted by ICML 2026 (CCF-A conference)!
2025.12 🎉🎉 An image-free multi-label image recognition paper is accepted by Pattern Recognition 2026 (中科院一区, JCR Q1, IF=7.6)!
2025.06 🎉🎉 An image-text matching paper is accepted by ICCV 2025 (CCF-A conference)!
2025.04 🎉🎉 A video visual relationship detection paper is accepted by IJCAI 2025 (CCF-A conference)!
2025.04 🎉🎉 A video visual relationship detection paper is accepted by IEEE TPAMI 2025 (CCF-A, 中科院一区, JCR Q1, IF=20.8)!
2025.01 🎉🎉 An open-vocabulary multi-label action classification paper is published in 《计算机研究与发展》 2025 (CCF-A Chinese, IF=2.65)!
2024.12 🎉🎉 A video Summarization paper is accepted by AAAI 2025 (CCF-A conference)!
2024.10 🎉🎉 An image-text matching paper is accepted by IEEE Signal Processing Letter 2024 (JCR Q2, 中科院三区, IF=3.2)!
2024.10 🎉🎉 A language-driven action localization paper is accepted by PRCV 2024 (CCF-C conference) !
2024.06 😊😊 I graduated from Beijing Institute of Technology (北京理工大学) and got a position as an Associate Professor at Shenzhen MSU-BIT University (深圳北理莫斯科大学)!
2024.02: 🎉🎉 A language-driven action localization paper is accepted by IEEE TMM 2024 (中科院一区, JCR Q1, IF=7.3)!
2023.12: 🎉🎉 A video visual relationship detection paper is accepted by AAAI 2024 (CCF-A conference)!
2023.07: 🎉🎉 A frame-supervised language-driven action localization paper is accepted by ACM MM 2023 (CCF-A conference)!
2022.04: 🎉🎉 A language-driven action localization paper is accepted by IJCAI 2022 (CCF-A conference)!
2021.06: 😊😊 I attend a new research group under supervised by Prof.Xinxiao Wu.
2020.03: 🎉🎉 A person re-identification paper is accepted by CVPR 2020 (CCF-A conference)!

📝 Publications

($\ast$ means equal contribution, $\dagger$ means corresponding author)

ICML 2026

AmbiRefer3D: 3D Visual Grounding with Referential Ambiguity

Rongjiang Zhu$\ast$, Wei Kang$\ast$, Zeqi Liu, Junyu Chen, Shuo Yang$\dagger$, Xinxiao Wu$\dagger$
International Conference on Machine Learning (ICML), 2026.

[Paper] [BibTex] [Project]

PR 2026

Image-free Multi-label Image Recognition via LLM-powered Hierarchical Prompt Tuning

Shuo Yang$\dagger$, Zirui Shang, Yongqi Wang, Derong Deng, Hongwei Chen, Xinxiao Wu， Qiyuan Cheng
Pattern Recognition (PR), 2026.

[Paper] [BibTex] [Code]

ICCV 2025

LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching

Mengxiao Tian, Xinxiao Wu, Shuo Yang$\dagger$
International Conference on Computer Vision (ICCV), 2025.

[Paper] [BibTex] [Code]

IJCAI 2025

METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection

Yongqi Wang, Xinxiao Wu, Shuo Yang$\dagger$
The 34th International Joint Conference on Artificial Intelligence (IJCAI), 2025.

[Paper] [BibTex] [Code]

IEEE TPAMI 2025

End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting

Yongqi Wang, Xinxiao Wu, Shuo Yang, Jiebo Luo
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.

[Paper] [BibTex] [Code]

AAAI 2025

Video Summarization using Denoising Diffusion Probabilistic Model

Zirui Shang, Yubo Zhu, Hongxi Li, Shuo Yang, Xinxiao Wu
The 39th Annual AAAI Conference on Artificial Intelligence (AAAI), 2025.

[Paper] [BibTex]

IEEE TMM 2024

Dynamic Pathway for Query-Aware Feature Learning in Language-Driven Action Localization

Shuo Yang, Xinxiao Wu, Zirui Shang, Jiebo Luo
IEEE Transactions on Multimedia (TMM), 2024.

[Paper] [BibTex]

📖 Educations

2018.09 - 2024.06, Ph.D. in Computer Science, School of Computer Science & Technology, Beijing Institute of Technology.

Advisor: Shuliang Wang(2018.09 - 2021.06) and Xinxiao Wu from 2021.06.
2014.09 - 2017.07, M.S. in Computer Science, Institute of Software, Chinese Academic of Science.

Advisor: Xiaoming Deng.
2010.09 - 2014.07, B.S. in Computer Science, School of Information, Beijing Union University.

💻 Experiences

2024.06 - now, Associate Professor at Shenzhen MSU-BIT University, Shenzhen, China.
2019.05 - 2020.02, Research intern at Megvii-inc, Beijing, China.
2017.07 - 2018.08, Algorithm engineer at JD Finance, Beijing, China.

Shuo Yang (杨硕)

🔥 News

📝 Publications

📖 Educations

💻 Experiences