↓Skip to main content

🏢 College of Computer Science and Technology, Zhejiang University

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

1 January 2025·3272 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 College of Computer Science and Technology, Zhejiang University

2.5년 분량의 교육 비디오를 활용, 고품질 다중 모달 텍스트북 코퍼스 구축 및 VLMs 사전 학습 성능 향상