Skip to main content

๐Ÿข Hong Kong University of Science and Technology

Large Motion Video Autoencoding with Cross-modal Video VAE
·2098 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Hong Kong University of Science and Technology
๊ณ ํ’ˆ์งˆ ์˜์ƒ ์ƒ์„ฑ ๋ฐ ํšจ์œจ์  ์••์ถ•์„ ์œ„ํ•œ ํ˜์‹ ์ ์ธ ํฌ๋กœ์Šค ๋ชจ๋‹ฌ ๋น„๋””์˜ค VAE!
Diving into Self-Evolving Training for Multimodal Reasoning
·2584 words·13 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Hong Kong University of Science and Technology
M-STAR: ๋‹ค๋ชจ๋‹ฌ ์ถ”๋ก ์„ ์œ„ํ•œ ์ž๊ธฐ ์ง„ํ™” ํ›ˆ๋ จ์˜ ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œ!
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
·1797 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Hong Kong University of Science and Technology
B-STAR: ์ž๊ธฐ ํ•™์Šต ์ถ”๋ก ์ž์—์„œ ํƒ์ƒ‰๊ณผ ํ™œ์šฉ์˜ ๊ท ํ˜•์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ์กฐ์ •ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
·2165 words·11 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Vision-Language Models ๐Ÿข Hong Kong University of Science and Technology
MegaPairs๋Š” VLM๊ณผ ๊ณต๊ฐœ ๋„๋ฉ”์ธ ์ด๋ฏธ์ง€๋ฅผ ํ™œ์šฉ, 2600๋งŒ ๊ฐœ ์ด์ƒ์˜ ๊ณ ํ’ˆ์งˆ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜์—ฌ ๋ฒ”์šฉ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ๊ฒ€์ƒ‰ ์„ฑ๋Šฅ์„ ํš๊ธฐ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
·2184 words·11 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Image Generation ๐Ÿข Hong Kong University of Science and Technology
LeviTor: ์‚ฌ์šฉ์ž์˜ ๊ฐ„ํŽธํ•œ 3D ๊ถค์  ์ž…๋ ฅ๋งŒ์œผ๋กœ ์‚ฌ์‹ค์ ์ธ ๋น„๋””์˜ค ํ•ฉ์„ฑ์ด ๊ฐ€๋Šฅํ•œ ํ˜์‹ ์ ์ธ ๋ชจ๋ธ!
Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
·2500 words·12 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Vision-Language Models ๐Ÿข Hong Kong University of Science and Technology
์‹œ๊ฐ ์ „๋ฌธ๊ฐ€ ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ์ด๋ฏธ์ง€ ์บก์…˜ ํ–ฅ์ƒ์œผ๋กœ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ๋ชจ๋ธ ์„ฑ๋Šฅ ๊ฐœ์„ 
AniDoc: Animation Creation Made Easier
·1844 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Hong Kong University of Science and Technology
AniDoc: ํฌ์†Œ ์Šค์ผ€์น˜์™€ ์ฐธ์กฐ ์ด๋ฏธ์ง€๋ฅผ ํ™œ์šฉ, 2D ์• ๋‹ˆ๋ฉ”์ด์…˜ ์ž๋™ ์ฑ„์ƒ‰ ๋ฐ ๋ณด๊ฐ„์„ ๊ตฌํ˜„ํ•˜๋Š” ํ˜์‹ ์  AI ๋ชจ๋ธ!
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
·2657 words·13 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข Hong Kong University of Science and Technology
GaussianProperty๋Š” LMM์„ ์‚ฌ์šฉํ•˜์—ฌ 3D ๊ฐ€์šฐ์‹œ์•ˆ์— ๋ฌผ๋ฆฌ์  ์†์„ฑ์„ ํ†ตํ•ฉํ•˜๋Š” ํ›ˆ๋ จ ์—†๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ๋กœ๋ด‡ ์ฅ๊ธฐ์™€ ๊ฐ™์€ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.