Skip to main content

Paper Reviews by AI

2024

PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
·2572 words·13 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข Meta AI
PartGen: ๋‹ค์ค‘ ๋ทฐ ํ™•์‚ฐ ๋ชจ๋ธ์„ ์ด์šฉ, ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€, ๊ธฐ์กด 3D ๊ฐ์ฒด๋กœ๋ถ€ํ„ฐ ์˜๋ฏธ์žˆ๋Š” ๋ถ€๋ถ„์œผ๋กœ ๊ตฌ์„ฑ๋œ ๊ณ ํ’ˆ์งˆ 3D ๊ฐ์ฒด ์ƒ์„ฑ ๋ฐ ์žฌ๊ตฌ์„ฑ.
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
·3181 words·15 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Tencent AI Lab
DiTCtrl: ํŠœ๋‹ ์—†์ด ๋‹ค์ค‘ ํ”„๋กฌํ”„ํŠธ๋กœ ๋งค๋„๋Ÿฌ์šด ์žฅ์‹œ๊ฐ„ ๋น„๋””์˜ค ์ƒ์„ฑ
DepthLab: From Partial to Complete
·1980 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข HKU
DepthLab: ๋ถ€๋ถ„ ๊นŠ์ด ์ •๋ณด๋กœ ์™„์ „ํ•œ 3D ์‹œ๊ฐ ์ •๋ณด ๋ณต์›
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding
·2837 words·14 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Scene Understanding ๐Ÿข AIRI
3DGraphLLM: ์˜๋ฏธ๋ก ์  ๊ทธ๋ž˜ํ”„์™€ ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•˜์—ฌ 3D ์žฅ๋ฉด ์ดํ•ด ์„ฑ๋Šฅ์„ ํš๊ธฐ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚จ ์ตœ์ฒจ๋‹จ ์—ฐ๊ตฌ!
ResearchTown: Simulator of Human Research Community
·16894 words·80 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข University of Illinois Urbana-Champaign
RESEARCHTOWN: LLM ๊ธฐ๋ฐ˜ ์ธ๊ฐ„ ์—ฐ๊ตฌ ๊ณต๋™์ฒด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋กœ, ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ ํ™œ๋™์„ ํ˜„์‹ค์ ์œผ๋กœ ๋ชจ๋ฐฉํ•˜๋ฉฐ ํ•™์ œ ๊ฐ„ ์—ฐ๊ตฌ ์•„์ด๋””์–ด ์ƒ์„ฑ ๊ฐ€๋Šฅ
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
·3159 words·15 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Human-AI Interaction ๐Ÿข Shanghai Jiao Tong University
PC Agent๋Š” ์ธ๊ฐ„์˜ ์ธ์ง€ ๊ณผ์ •์„ AI ์— ์ „์ดํ•˜์—ฌ ๋ณต์žกํ•œ ๋””์ง€ํ„ธ ์ž‘์—…์„ ์ž๋™ํ™”ํ•˜๋Š” ํ˜์‹ ์ ์ธ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค.
Large Motion Video Autoencoding with Cross-modal Video VAE
·2098 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Hong Kong University of Science and Technology
๊ณ ํ’ˆ์งˆ ์˜์ƒ ์ƒ์„ฑ ๋ฐ ํšจ์œจ์  ์••์ถ•์„ ์œ„ํ•œ ํ˜์‹ ์ ์ธ ํฌ๋กœ์Šค ๋ชจ๋‹ฌ ๋น„๋””์˜ค VAE!
In Case You Missed It: ARC 'Challenge' Is Not That Challenging
·2275 words·11 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Snowflake AI Research
๊ธฐ์กด ๋‹ค์ค‘ ์„ ํƒ ๋ฌธ์ œ ํ‰๊ฐ€ ๋ฐฉ์‹์˜ ์˜ค๋ฅ˜๋ฅผ ์ง€์ ํ•˜๊ณ , ๋ชจ๋“  ์˜ต์…˜์„ ํ•จ๊ป˜ ๊ณ ๋ คํ•˜๋Š” ์ƒˆ๋กœ์šด ํ‰๊ฐ€ ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜์—ฌ ๋ชจ๋ธ ์„ฑ๋Šฅ ํ‰๊ฐ€์˜ ์ •ํ™•์„ฑ์„ ๋†’์˜€์Šต๋‹ˆ๋‹ค.
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding
·1812 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Dialogue Systems ๐Ÿข Peking University
Friends-MMC: ๋ฐฉ๋Œ€ํ•œ ๋น„๋””์˜ค ๋ฐ์ดํ„ฐ์™€ ์ฃผ์„์„ ํฌํ•จํ•œ ์ƒˆ๋กœ์šด ๋‹ค์ค‘ ๋ชจ๋‹ฌ ๋‹ค์ค‘ ์ฐธ์—ฌ ๋Œ€ํ™” ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ด ์‹ค์ œ ์„ธ๊ณ„์˜ ๋Œ€ํ™” ์ดํ•ด๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค!
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization
·1717 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Tsinghua University
FoPE: ์ฃผํŒŒ์ˆ˜ ์˜์—ญ ํŠน์ง• ๊ฐœ์„ ์œผ๋กœ ๊ธด ๋ฌธ๋งฅ ๊ธธ์ด ์ผ๋ฐ˜ํ™” ๋‹ฌ์„ฑ!
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
·366 words·2 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Machine Translation ๐Ÿข Tencent AI Lab
DRT-01 ๋ชจ๋ธ์€ ์žฅ๋ฌธ์˜ ์‚ฌ๊ณ  ๊ณผ์ •์„ ํ™œ์šฉํ•˜์—ฌ ๋ฌธํ•™ ๋ฒˆ์—ญ์˜ ์ •ํ™•๋„์™€ ์œ ์ฐฝ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
Diving into Self-Evolving Training for Multimodal Reasoning
·2584 words·13 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Hong Kong University of Science and Technology
M-STAR: ๋‹ค๋ชจ๋‹ฌ ์ถ”๋ก ์„ ์œ„ํ•œ ์ž๊ธฐ ์ง„ํ™” ํ›ˆ๋ จ์˜ ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œ!
Deliberation in Latent Space via Differentiable Cache Augmentation
·2751 words·13 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Google DeepMind
๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ์ถ”๋ก  ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์ธ โ€˜์ฐจ๋ณ„ ๊ฐ€๋Šฅํ•œ ์บ์‹œ ์ฆ๊ฐ•โ€™ ๊ธฐ๋ฒ• ์ œ์‹œ!
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
·1797 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Hong Kong University of Science and Technology
B-STAR: ์ž๊ธฐ ํ•™์Šต ์ถ”๋ก ์ž์—์„œ ํƒ์ƒ‰๊ณผ ํ™œ์šฉ์˜ ๊ท ํ˜•์„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๊ณ  ์กฐ์ •ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ
Revisiting In-Context Learning with Long Context Language Models
·3818 words·18 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Google DeepMind
์žฅ๋ฌธ ์ปจํ…์ŠคํŠธ ์–ธ์–ด ๋ชจ๋ธ์—์„œ ์ •๊ตํ•œ ์ƒ˜ํ”Œ ์„ ํƒ ์ „๋žต๋ณด๋‹ค ๋ฌด์ž‘์œ„ ์ƒ˜ํ”Œ๋ง์ด ICL ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๋” ํšจ๊ณผ์ ์ด๋ฉฐ, ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ํ†ตํ•ด ์ €์ž์› ์ž‘์—… ์„ฑ๋Šฅ์„ 5% ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค๋Š” ๋†€๋ผ์šด ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋ฅผ ๋ฐœํ‘œ!
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks with Reinforcement Fine-Tuning
·1880 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Beijing Jiaotong University
OpenRFT๋Š” ์ œํ•œ๋œ ๋„๋ฉ”์ธ ํŠน์ • ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ผ๋ฐ˜์ ์ธ ์ถ”๋ก  ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
·3113 words·15 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Image Generation ๐Ÿข Tsinghua University
๋‹จ์ผ ๋‹จ๊ณ„ ์ƒ˜ํ”Œ๋ง์œผ๋กœ ์ด๋ฏธ์ง€ ์ž๋™ ํšŒ๊ท€ ๋ชจ๋ธ ์†๋„๋ฅผ ํš๊ธฐ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚จ ์ฆ๋ฅ˜ ๋””์ฝ”๋”ฉ(DD) ๊ธฐ๋ฒ• ์ œ์•ˆ!
NILE: Internal Consistency Alignment in Large Language Models
·2709 words·13 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Chinese University of Hong Kong
NILE ํ”„๋ ˆ์ž„์›Œํฌ๋Š” LLM์˜ ๋‚ด๋ถ€ ์ง€์‹๊ณผ IFT ๋ฐ์ดํ„ฐ์…‹์˜ ์„ธ๊ณ„ ์ง€์‹ ๊ฐ„ ์ผ๊ด€์„ฑ์„ ๋†’์—ฌ LLM ์„ฑ๋Šฅ์„ ์ตœ๋Œ€ 68.5%๊นŒ์ง€ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.
LearnLM: Improving Gemini for Learning
·3761 words·18 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers AI Applications Education ๐Ÿข Google DeepMind
LearnLM์€ ๊ต์œก์  ๋งฅ๋ฝ์—์„œ ์ƒ์„ฑํ˜• AI์˜ ํŽ˜๋‹ค๊ณ ์ง€(Pedagogy)๋ฅผ ํ–ฅ์ƒ์‹œํ‚จ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๊ต์‚ฌ๋‚˜ ๊ฐœ๋ฐœ์ž๊ฐ€ ์›ํ•˜๋Š” ํŽ˜๋‹ค๊ณ ์ง€์  ํŠน์„ฑ์„ ๋ชจ๋ธ์— ์ฃผ์ž…ํ•˜๋Š” ์ƒˆ๋กœ์šด ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ๊ธฐ์กด ๋ชจ๋ธ๋ณด๋‹ค ํ•™์Šต ํšจ๊ณผ๋ฅผ 31% ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage
·2414 words·12 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Visual Question Answering ๐Ÿข Seoul National University
์ดˆ์ •๋ฐ€ ์ด๋ฏธ์ง€ ์บก์…˜ ์ƒ์„ฑ์˜ ํ™˜๊ฐ ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ์œ„ํ•ด, LLM-MLLM ํ˜‘์—… ๊ธฐ๋ฐ˜์˜ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ(CapMAS)์„ ์ œ์•ˆํ•˜์—ฌ ์‚ฌ์‹ค์„ฑ๊ณผ ํฌ๊ด„์„ฑ์„ ๋†’์˜€์Šต๋‹ˆ๋‹ค.