๐ข Hong Kong University of Science and Technology
Large Motion Video Autoencoding with Cross-modal Video VAE
·2098 words·10 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Video Understanding
๐ข Hong Kong University of Science and Technology
๊ณ ํ์ง ์์ ์์ฑ ๋ฐ ํจ์จ์ ์์ถ์ ์ํ ํ์ ์ ์ธ ํฌ๋ก์ค ๋ชจ๋ฌ ๋น๋์ค VAE!
Diving into Self-Evolving Training for Multimodal Reasoning
·2584 words·13 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Hong Kong University of Science and Technology
M-STAR: ๋ค๋ชจ๋ฌ ์ถ๋ก ์ ์ํ ์๊ธฐ ์งํ ํ๋ จ์ ์๋ก์ด ํ๋ ์์ํฌ๋ฅผ ์ ์!
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
·1797 words·9 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Hong Kong University of Science and Technology
B-STAR: ์๊ธฐ ํ์ต ์ถ๋ก ์์์ ํ์๊ณผ ํ์ฉ์ ๊ท ํ์ ๋ชจ๋ํฐ๋งํ๊ณ ์กฐ์ ํ์ฌ ์ฑ๋ฅ์ ํฅ์์ํค๋ ์๋ก์ด ํ๋ ์์ํฌ
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
·2165 words·11 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Vision-Language Models
๐ข Hong Kong University of Science and Technology
MegaPairs๋ VLM๊ณผ ๊ณต๊ฐ ๋๋ฉ์ธ ์ด๋ฏธ์ง๋ฅผ ํ์ฉ, 2600๋ง ๊ฐ ์ด์์ ๊ณ ํ์ง ๋ค์ค ๋ชจ๋ฌ ํ์ต ๋ฐ์ดํฐ๋ฅผ ์์ฑํ์ฌ ๋ฒ์ฉ ๋ค์ค ๋ชจ๋ฌ ๊ฒ์ ์ฑ๋ฅ์ ํ๊ธฐ์ ์ผ๋ก ํฅ์์์ผฐ์ต๋๋ค.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
·2184 words·11 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Image Generation
๐ข Hong Kong University of Science and Technology
LeviTor: ์ฌ์ฉ์์ ๊ฐํธํ 3D ๊ถค์ ์
๋ ฅ๋ง์ผ๋ก ์ฌ์ค์ ์ธ ๋น๋์ค ํฉ์ฑ์ด ๊ฐ๋ฅํ ํ์ ์ ์ธ ๋ชจ๋ธ!
Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
·2500 words·12 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Vision-Language Models
๐ข Hong Kong University of Science and Technology
์๊ฐ ์ ๋ฌธ๊ฐ ๋ชจ๋ธ์ ํ์ฉํ ์ด๋ฏธ์ง ์บก์
ํฅ์์ผ๋ก ๋ค์ค ๋ชจ๋ฌ ๋ชจ๋ธ ์ฑ๋ฅ ๊ฐ์
AniDoc: Animation Creation Made Easier
·1844 words·9 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Video Understanding
๐ข Hong Kong University of Science and Technology
AniDoc: ํฌ์ ์ค์ผ์น์ ์ฐธ์กฐ ์ด๋ฏธ์ง๋ฅผ ํ์ฉ, 2D ์ ๋๋ฉ์ด์
์๋ ์ฑ์ ๋ฐ ๋ณด๊ฐ์ ๊ตฌํํ๋ ํ์ ์ AI ๋ชจ๋ธ!
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
·2657 words·13 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
3D Vision
๐ข Hong Kong University of Science and Technology
GaussianProperty๋ LMM์ ์ฌ์ฉํ์ฌ 3D ๊ฐ์ฐ์์์ ๋ฌผ๋ฆฌ์ ์์ฑ์ ํตํฉํ๋ ํ๋ จ ์๋ ํ๋ ์์ํฌ๋ก, ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ ์๋ฎฌ๋ ์ด์
๋ฐ ๋ก๋ด ์ฅ๊ธฐ์ ๊ฐ์ ๋ค์ด์คํธ๋ฆผ ์์
์ ๊ฐ๋ฅํ๊ฒ ํฉ๋๋ค.