Skip to main content

🏒 Tencent Youtu Lab

VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
·2176 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Vision-Language Models 🏒 Tencent Youtu Lab
VITA-1.5: μ‹€μ‹œκ°„ μ‹œκ° 및 μŒμ„± μƒν˜Έμž‘μš©μ„ μœ„ν•œ GPT-40 μˆ˜μ€€μ˜ 닀쀑 λͺ¨λ‹¬ LLM