My primary research focuses on the intersection of vision and language. Currently, I am exploring the application of large language models (LLMs) to tasks involving vision, language, and robotics, such as language-driven video understanding, open-vocabulary multi-label image recognition, and interactional robots. Previously, my work centered on hand detection, hand pose estimation, face recognition, and person re-identification.

You can contact me via e-mail:;

🔥 News

📝 Publications

  ($\ast$ means equal contribution)


Dynamic Pathway for Query-Aware Feature Learning in Language-Driven Action Localization

  • Shuo Yang, Xinxiao Wu, Zirui Shang, Jiebo Luo
  • IEEE Transactions on Multimedia (TMM), 2024.

    [Paper] [BibTex]

AAAI 2024

Multi-Modal Prompting for Open-Vocabulary Video Visual Relationship Detection

  • Shuo Yang$\ast$, Yongqi Wang$\ast$, Xiaofeng Ji, Xinxiao Wu
  • The 38th Annual AAAI Conference on Artificial Intelligence (AAAI), 2024.

    [Paper] [BibTex] [Code]

ACM MM 2023

Probability Distribution Based Frame-supervised Language-driven Action Localization

  • Shuo Yang, Zirui Shang, Xinxiao Wu
  • The 31st ACM International Conference on Multimedia (ACM MM), 2023.

    [Paper] [BibTex] [Code]

IJCAI 2022

Entity-aware and Motion-aware Transformers for Language-driven Action Localization

  • Shuo Yang, Xinxiao Wu
  • The 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022.

    [Paper] [BibTex] [Code]

CVPR 2020

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification

  • Guan’an Wang$\ast$, Shuo Yang$\ast$, Huanyu Liu, Zhicheng Wang, Yang Yang, Shuliang Wang, Gang Yu, Jian Sun
  • In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.

    [Paper] [BibTex] [Code]

TIP 2028

Joint Hand Detection and Rotation Estimation Using CNN

  • Xiaoming Deng, Yinda Zhang, Shuo Yang, Ping Tan, Liang Chang, Ye Yuan, Hongan Wang
  • IEEE Transactions on Image Processing (TIP), 27(4):1888-1900, 2018.

    [Paper] [Project Page]

  • Arxiv 2024, Data-free Multi-label Image Recognition via LLM-powered Prompt Tuning, Shuo Yang, Zirui Shang, Yongqi Wang, Derong Deng, Hongwei Chen, Qiyuan Cheng, Xinxiao Wu
  • Arxiv 2017, Hand3D: Hand Pose Estimation using 3D Neural Network, Xiaoming Deng$\ast$, Shuo Yang$\ast$, Yinda Zhang$\ast$, Ping Tan, Liang Chang, Hongan Wang
  • Acta Automatica Sinica 2016, Convolutional neural networks in image understanding, Liang Chang, Xiaoming Deng, Mingquan Zhou, Zhongke Wu, Ye Yuan, Shuo Yang, Hongan Wang
  • 📖 Educations

    • 2018.09 - 2024.06, Ph.D. in Computer Science, School of Computer Science & Technology, Beijing Institute of Technology.

      Advisor: Shuliang Wang(2018.09 - 2021.06) and Xinxiao Wu from 2021.06.

    • 2014.09 - 2017.07, M.S. in Computer Science, Institute of Software, Chinese Academic of Science.

      Advisor: Xiaoming Deng.

    • 2010.09 - 2014.07, B.S. in Computer Science, School of Information, Beijing Union University.

    💻 Experiences