π’ Tencent Youtu Lab
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
·2176 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Tencent Youtu Lab
VITA-1.5: μ€μκ° μκ° λ° μμ± μνΈμμ©μ μν GPT-40 μμ€μ λ€μ€ λͺ¨λ¬ LLM