π’ College of Computer Science and Technology, Zhejiang University
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
·3272 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ College of Computer Science and Technology, Zhejiang University
2.5λ
λΆλμ κ΅μ‘ λΉλμ€λ₯Ό νμ©, κ³ νμ§ λ€μ€ λͺ¨λ¬ ν
μ€νΈλΆ μ½νΌμ€ κ΅¬μΆ λ° VLMs μ¬μ νμ΅ μ±λ₯ ν₯μ