Multimodal LLM

Alpachino

Hey there! Welcome to our team’s corner. We’re enthusiastic about Multimodal Large Language Models and explore ways to enhance interactions between language and image/video/audio.

Text Image Video Audio
Hero Image

Projects

Project pages hosted at /proj/*.

QTSplus

Query-Aware Tokenizer for Long-Video Multimodal Language Models.

μ²Tokenizer

Differentiable Multi-Scale Multi-Modal Tokenizer for Radiology Report Generation.

About

Our research explores innovative ways to make AI systems better understand and generate multimodal content. We’re always on the lookout for practical techniques that improve capability and efficiency without sacrificing quality.