Shangzhe Di   狄尚哲

Hi, I am a second-year PhD candidate at Shanghai Jiao Tong University (SJTU), where I am fortunate to be advised by Prof. Weidi Xie. My research focuses on video understanding and multimodal learning, driven by a passion for exploring the unknowns in these fields.

Before joining SJTU, I earned my master's and bachelor's degrees from Beihang University (BUAA). During this period, I explored video background music generation and visual object tracking under the guidance of Prof. Si Liu.

I’m always eager to connect, exchange ideas, and collaborate on innovative research. Please feel free to reach out!

Email  /  CV  /  Github  /  Google Scholar

profile photo
Education

  • PhD Student, Shanghai Jiao Tong University, Apr. 2023 - Present
  • M.Eng. in Computer Science, Beihang University, Sep. 2020 - Jan. 2023
  • Exchange Student, Technical University of Munich, Apr. 2019 - Aug. 2019
  • B.Eng. in Software Engineering, Beihang University, Sep. 2016 - Jun. 2020

  • Research
    Unlocking Video-LLM via Agent-of-Thoughts Distillation
    Yudi Shi, Shangzhe Di, Qirui Chen, Weidi Xie
    In Submission, 2024.
    paper / project page / code

    Distill multi-step reasoning and spatial-temporal understanding into a generative video-language model.

    Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
    Shangzhe Di, Zhelun Yu, et al.
    In Submission, 2024.

    A training-free approach enabling Video-LLMs for streaming video question-answering.

    Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
    Qirui Chen, Shangzhe Di, Weidi Xie
    In AAAI, 2025.
    paper / project page / code

    Pinpoint scattered visual evidence in long egocentric videos while responding to questions.

    Grounded Question-Answering in Long Egocentric Videos
    Shangzhe Di, Weidi Xie
    In CVPR, 2024.
    paper / project page / code / bibtex

    Simultaneous query grounding and answering in long, egocentric videos.

    Video Background Music Generation with Controllable Music Transformer
    Shangzhe Di*, Zeren Jiang*, Si Liu, Zhaokai Wang, Leyan Zhu, Zexin He, Hongming Liu, Shuicheng Yan
    In ACM MM, 2021. (Best Paper Award)
    paper / project page / code / colab notebook / bibtex

    The first satisfying method for video background music generation.

    Honors and Awards

  • Best Paper Award, ACM MM 2021
  • Best Video Award, IJCAI 2021 Video Competition
  • First Prize Scholarship x 2 (Top 10%), Beihang University, 2019 & 2021
  • Full Scholarship for Exchange Program, China Scholarship Council, 2019
  • Special Prize Scholarship (Top 3%), Beihang University, 2018


  • The website template is borrowed from here.

    Web Counter