My primary research focuses on the intersection of vision and language. Currently, I am exploring the tasks involving vision, language, and robotics, such as language-driven video understanding, open-vocabulary image/video understanding, and interactional robots. Previously, my work centered on hand detection, hand pose estimation, face recognition, and person re-identification.

Welcome students who are interested in the research of vision and language, intelligent robots to join us!

You can contact me via e-mail: yangshuo@smbu.edu.cn; yangshuo129@gmail.com.

🔥 News

2025.06 🎉🎉 An image-text matching paper is accepted by ICCV 2025 (CCF-A conference)!
2025.04 🎉🎉 A video visual relationship detection paper is accepted by IJCAI 2025 (CCF-A conference)!
2025.04 🎉🎉 A video visual relationship detection paper is accepted by IEEE TPAMI 2025 (CCF-A, JCR1, 中科院一区, IF=20.8)!
2025.01 🎉🎉 An open-vocabulary multi-label action classification paper is published in 《计算机研究与发展》 2025 (CCF-A Chinese, IF=2.65)!
2024.12 🎉🎉 A video Summarization paper is accepted by AAAI 2025 (CCF-A conference)!
2024.10 🎉🎉 An image-text matching paper is accepted by IEEE Signal Processing Letter 2024 (JCR2, 中科院三区, IF=3.2)!
2024.10 🎉🎉 A language-driven action localization paper is accepted by PRCV 2024 (CCF-C conference) !
2024.06 😊😊 I graduated from Beijing Institute of Technology (北京理工大学) and got a position as an Tenure-Track Associate Professor (Pre-Tenure) at Shenzhen MSU-BIT University (深圳北理莫斯科大学)!
2024.02: 🎉🎉 A language-driven action localization paper is accepted by IEEE TMM 2024 (JCR1, 中科院一区, IF=7.3)!
2023.12: 🎉🎉 A video visual relationship detection paper is accepted by AAAI 2024 (CCF-A conference)!
2023.07: 🎉🎉 A frame-supervised language-driven action localization paper is accepted by ACM MM 2023 (CCF-A conference)!
2022.04: 🎉🎉 A language-driven action localization paper is accepted by IJCAI 2022 (CCF-A conference)!
2021.06: 😊😊 I attend a new research group under supervised by Prof.Xinxiao Wu.
2020.03: 🎉🎉 A person re-identification paper is accepted by CVPR 2020 (CCF-A conference)!

📝 Publications

($\ast$ means equal contribution, $\dagger$ means corresponding author)

ICCV 2025

LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching

Mengxiao Tian, Xinxiao Wu, Shuo Yang$\dagger$
International Conference on Computer Vision (ICCV), 2025.

[Paper] [BibTex] [Code]

IJCAI 2025

METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection

Yongqi Wang, Xinxiao Wu, Shuo Yang$\dagger$
The 34th International Joint Conference on Artificial Intelligence (IJCAI), 2025.

[Paper] [BibTex] [Code]

IEEE TPAMI 2025

End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting

Yongqi Wang, Xinxiao Wu, Shuo Yang, Jiebo Luo
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.

[Paper] [BibTex] [Code]

AAAI 2025

Video Summarization using Denoising Diffusion Probabilistic Model

Zirui Shang, Yubo Zhu, Hongxi Li, Shuo Yang, Xinxiao Wu
The 39th Annual AAAI Conference on Artificial Intelligence (AAAI), 2025.

[Paper] [BibTex]

IEEE TMM 2024

Dynamic Pathway for Query-Aware Feature Learning in Language-Driven Action Localization

Shuo Yang, Xinxiao Wu, Zirui Shang, Jiebo Luo
IEEE Transactions on Multimedia (TMM), 2024.

[Paper] [BibTex]

AAAI 2024

Multi-Modal Prompting for Open-Vocabulary Video Visual Relationship Detection

Shuo Yang$\ast$, Yongqi Wang$\ast$, Xiaofeng Ji, Xinxiao Wu
The 38th Annual AAAI Conference on Artificial Intelligence (AAAI), 2024.

[Paper] [BibTex] [Code]

ACM MM 2023

Probability Distribution Based Frame-supervised Language-driven Action Localization

Shuo Yang, Zirui Shang, Xinxiao Wu
The 31st ACM International Conference on Multimedia (ACM MM), 2023.

[Paper] [BibTex] [Code]

IJCAI 2022

Entity-aware and Motion-aware Transformers for Language-driven Action Localization

Shuo Yang, Xinxiao Wu
The 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022.

[Paper] [BibTex] [Code]

CVPR 2020

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

Guan’an Wang$\ast$, Shuo Yang$\ast$, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang, Gang Yu, Jian Sun
In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

[Paper] [BibTex] [Code]

TIP 2018

Joint Hand Detection and Rotation Estimation Using CNN

Xiaoming Deng, Yinda Zhang, Shuo Yang, Ping Tan, Liang Chang, Ye Yuan, Hongan Wang
IEEE Transactions on Image Processing (TIP), 27(4):1888-1900, 2018.

[Paper] [Project Page]

计算机研究与发展 2025, 大语言模型知识引导的开放域多标签动作识别, 朱荣江, 石语珩, 杨硕, 王子奕, 吴心筱

SPL 2024, Source-free Image-text Matching via Uncertainty-aware Learning, Mengxiao Tian, Shuo Yang$\dagger$, Xinxiao Wu, Yunde Jia

PRCV 2024, Efficient Language-Driven Action Localization by Feature Aggregation and Prediction Adjustment, Zirui Shang, Shuo Yang$\dagger$, Xinxiao Wu

Arxiv 2024, Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning, Shuo Yang, Zirui Shang, Yongqi Wang, Derong Deng, Hongwei Chen, Qiyuan Cheng, Xinxiao Wu

Arxiv 2017, Hand3D: Hand Pose Estimation using 3D Neural Network, Xiaoming Deng$\ast$, Shuo Yang$\ast$, Yinda Zhang$\ast$, Ping Tan, Liang Chang, Hongan Wang

Acta Automatica Sinica 2016, Convolutional neural networks in image understanding, Liang Chang, Xiaoming Deng, Mingquan Zhou, Zhongke Wu, Ye Yuan, Shuo Yang, Hongan Wang

📖 Educations

2018.09 - 2024.06, Ph.D. in Computer Science, School of Computer Science & Technology, Beijing Institute of Technology.

Advisor: Shuliang Wang(2018.09 - 2021.06) and Xinxiao Wu from 2021.06.
2014.09 - 2017.07, M.S. in Computer Science, Institute of Software, Chinese Academic of Science.

Advisor: Xiaoming Deng.
2010.09 - 2014.07, B.S. in Computer Science, School of Information, Beijing Union University.

💻 Experiences

2024.06 - now, Tenure-Track Associate Professor (Pre-Tenure) at Shenzhen MSU-BIT University, Shenzhen, China.
2019.05 - 2020.02, Research intern at Megvii-inc, Beijing, China.
2017.07 - 2018.08, Algorithm engineer at JD Finance, Beijing, China.

Shuo Yang (杨硕)

🔥 News

📝 Publications

📖 Educations

💻 Experiences