๐ข Hong Kong University of Science and Technology
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
·2466 words·12 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Video Understanding
๐ข Hong Kong University of Science and Technology
VideoAnydoor: ์ ๋ฐํ ๋ชจ์
์ ์ด๋ฅผ ๊ฐ์ถ ๊ณ ํ์ง ์์ ๊ฐ์ฒด ์ฝ์
A3: Android Agent Arena for Mobile GUI Agents
·1920 words·10 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
AI Applications
Human-AI Interaction
๐ข Hong Kong University of Science and Technology
Android Agent Arena(A3): ์ค์ ๋ชจ๋ฐ์ผ ์ฑ์์ AI ์์ด์ ํธ์ ๋์ ์ฑ๋ฅ ํ๊ฐ๋ฅผ ์ํ ํ์ ํ๋ซํผ
Edicho: Consistent Image Editing in the Wild
·2213 words·11 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Image Generation
๐ข Hong Kong University of Science and Technology
Edicho: ์ด๋ฏธ์ง ๊ฐ ์ผ๊ด์ฑ ์ ์งํ๋ฉฐ ์ ๋ก์ท ์ด๋ฏธ์ง ํธ์ง ๊ฐ๋ฅ!
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
·2961 words·14 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Vision-Language Models
๐ข Hong Kong University of Science and Technology
OS-Genesis๋ ์ญ๋ฐฉํฅ ์์
ํฉ์ฑ์ ํตํด GUI ์์ด์ ํธ ๊ถค์ ์์ฑ ์๋ํ ๋ฌธ์ ๋ฅผ ํด๊ฒฐํ๋ ํ์ ์ ์ธ ํ์ดํ๋ผ์ธ์
๋๋ค.
Large Motion Video Autoencoding with Cross-modal Video VAE
·2098 words·10 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Video Understanding
๐ข Hong Kong University of Science and Technology
๊ณ ํ์ง ์์ ์์ฑ ๋ฐ ํจ์จ์ ์์ถ์ ์ํ ํ์ ์ ์ธ ํฌ๋ก์ค ๋ชจ๋ฌ ๋น๋์ค VAE!
Diving into Self-Evolving Training for Multimodal Reasoning
·2584 words·13 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Hong Kong University of Science and Technology
M-STAR: ๋ค๋ชจ๋ฌ ์ถ๋ก ์ ์ํ ์๊ธฐ ์งํ ํ๋ จ์ ์๋ก์ด ํ๋ ์์ํฌ๋ฅผ ์ ์!
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
·1797 words·9 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Hong Kong University of Science and Technology
B-STAR: ์๊ธฐ ํ์ต ์ถ๋ก ์์์ ํ์๊ณผ ํ์ฉ์ ๊ท ํ์ ๋ชจ๋ํฐ๋งํ๊ณ ์กฐ์ ํ์ฌ ์ฑ๋ฅ์ ํฅ์์ํค๋ ์๋ก์ด ํ๋ ์์ํฌ
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
·2165 words·11 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Vision-Language Models
๐ข Hong Kong University of Science and Technology
MegaPairs๋ VLM๊ณผ ๊ณต๊ฐ ๋๋ฉ์ธ ์ด๋ฏธ์ง๋ฅผ ํ์ฉ, 2600๋ง ๊ฐ ์ด์์ ๊ณ ํ์ง ๋ค์ค ๋ชจ๋ฌ ํ์ต ๋ฐ์ดํฐ๋ฅผ ์์ฑํ์ฌ ๋ฒ์ฉ ๋ค์ค ๋ชจ๋ฌ ๊ฒ์ ์ฑ๋ฅ์ ํ๊ธฐ์ ์ผ๋ก ํฅ์์์ผฐ์ต๋๋ค.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
·2184 words·11 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Image Generation
๐ข Hong Kong University of Science and Technology
LeviTor: ์ฌ์ฉ์์ ๊ฐํธํ 3D ๊ถค์ ์
๋ ฅ๋ง์ผ๋ก ์ฌ์ค์ ์ธ ๋น๋์ค ํฉ์ฑ์ด ๊ฐ๋ฅํ ํ์ ์ ์ธ ๋ชจ๋ธ!
Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
·2500 words·12 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Vision-Language Models
๐ข Hong Kong University of Science and Technology
์๊ฐ ์ ๋ฌธ๊ฐ ๋ชจ๋ธ์ ํ์ฉํ ์ด๋ฏธ์ง ์บก์
ํฅ์์ผ๋ก ๋ค์ค ๋ชจ๋ฌ ๋ชจ๋ธ ์ฑ๋ฅ ๊ฐ์
AniDoc: Animation Creation Made Easier
·1844 words·9 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Video Understanding
๐ข Hong Kong University of Science and Technology
AniDoc: ํฌ์ ์ค์ผ์น์ ์ฐธ์กฐ ์ด๋ฏธ์ง๋ฅผ ํ์ฉ, 2D ์ ๋๋ฉ์ด์
์๋ ์ฑ์ ๋ฐ ๋ณด๊ฐ์ ๊ตฌํํ๋ ํ์ ์ AI ๋ชจ๋ธ!
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
·2657 words·13 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
3D Vision
๐ข Hong Kong University of Science and Technology
GaussianProperty๋ LMM์ ์ฌ์ฉํ์ฌ 3D ๊ฐ์ฐ์์์ ๋ฌผ๋ฆฌ์ ์์ฑ์ ํตํฉํ๋ ํ๋ จ ์๋ ํ๋ ์์ํฌ๋ก, ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ ์๋ฎฌ๋ ์ด์
๋ฐ ๋ก๋ด ์ฅ๊ธฐ์ ๊ฐ์ ๋ค์ด์คํธ๋ฆผ ์์
์ ๊ฐ๋ฅํ๊ฒ ํฉ๋๋ค.