My primary research focuses on the intersection of vision and language. Currently, I am exploring the tasks involving vision, language, and robotics, such as language-driven video understanding, open-vocabulary image/video understanding, and interactional robots. Previously, my work centered on hand detection, hand pose estimation, face recognition, and person re-identification.

Welcome students who are interested in the research of vision and language, intelligent robots to join us!

You can contact me via e-mail: yangshuo@smbu.edu.cn; yangshuo129@gmail.com.

🔥 News

📝 Publications

Google Scholar citations   ($\ast$ means equal contribution, $\dagger$ means corresponding author)

PR 2026
sym

Image-free Multi-label Image Recognition via LLM-powered Hierarchical Prompt Tuning

  • Shuo Yang$\dagger$, Zirui Shang, Yongqi Wang, Derong Deng, Hongwei Chen, Xinxiao Wu, Qiyuan Cheng
  • Pattern Recognition (PR), 2026.

    [Paper] [BibTex] [Code]

ICCV 2025
sym

LLM-enhanced Action-aware Multi-modal Prompt Tuning for Image-Text Matching

  • Mengxiao Tian, Xinxiao Wu, Shuo Yang$\dagger$
  • International Conference on Computer Vision (ICCV), 2025.

    [Paper] [BibTex] [Code]

IJCAI 2025
sym

METOR: A Unified Framework for Mutual Enhancement of Objects and Relationships in Open-vocabulary Video Visual Relationship Detection

  • Yongqi Wang, Xinxiao Wu, Shuo Yang$\dagger$
  • The 34th International Joint Conference on Artificial Intelligence (IJCAI), 2025.

    [Paper] [BibTex] [Code]

IEEE TPAMI 2025
sym

End-to-end Open-vocabulary Video Visual Relationship Detection using Multi-modal Prompting

  • Yongqi Wang, Xinxiao Wu, Shuo Yang, Jiebo Luo
  • IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025.

    [Paper] [BibTex] [Code]

AAAI 2025
sym

Video Summarization using Denoising Diffusion Probabilistic Model

  • Zirui Shang, Yubo Zhu, Hongxi Li, Shuo Yang, Xinxiao Wu
  • The 39th Annual AAAI Conference on Artificial Intelligence (AAAI), 2025.

    [Paper] [BibTex]

IEEE TMM 2024
sym

Dynamic Pathway for Query-Aware Feature Learning in Language-Driven Action Localization

  • Shuo Yang, Xinxiao Wu, Zirui Shang, Jiebo Luo
  • IEEE Transactions on Multimedia (TMM), 2024.

    [Paper] [BibTex]

📖 Educations

  • 2018.09 - 2024.06, Ph.D. in Computer Science, School of Computer Science & Technology, Beijing Institute of Technology.

    Advisor: Shuliang Wang(2018.09 - 2021.06) and Xinxiao Wu from 2021.06.

  • 2014.09 - 2017.07, M.S. in Computer Science, Institute of Software, Chinese Academic of Science.

    Advisor: Xiaoming Deng.

  • 2010.09 - 2014.07, B.S. in Computer Science, School of Information, Beijing Union University.

💻 Experiences