Multimodal LLM

Alpachino

Hey there! Welcome to our team’s corner. We’re enthusiastic about Multimodal Large Language Models and explore ways to enhance interactions between language and image/video/audio.

Explore projects Visit GitHub

Text Image Video Audio

Projects

Project pages hosted at /proj/*.

QTSplus

Query-Aware Tokenizer for Long-Video Multimodal Language Models.

Open page

μ²Tokenizer

Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation.

Open page

About

Our research explores innovative ways to make AI systems better understand and generate multimodal content. We’re always on the lookout for practical techniques that improve capability and efficiency without sacrificing quality.