Skip to content
View zhenye234's full-sized avatar
🍉
🍉

Block or report zhenye234

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
zhenye234/README.md
  • 👋 Hi, I’m Ye Zhen, a PhD student at HKUST.
  • 👀 I’m interested in Multimodal generation and speech synthesis.
  • if you have any questions, please feel free to contact me with zhenye312@gmail.com

Pinned Loading

  1. Talker-T2AV Talker-T2AV Public

    Talker-T2AV Joint Talking Audio-Video Generation with Autoregressive Diffusion Modeling

    Python 30

  2. LLaSA_training LLaSA_training Public

    LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

    Python 660 50

  3. X-Codec-2.0 X-Codec-2.0 Public

    Codec for paper: LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

    Python 355 53

  4. xcodec xcodec Public

    AAAI 2025: Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

    Python 303 25

  5. FlashSpeech FlashSpeech Public

    ACM MM 2024 FlashSpeech: Efficient Zero-Shot Speech Synthesis

    Python 154 11

  6. CoMoSpeech CoMoSpeech Public

    ACM MM 2023 CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model

    Python 213 22