Skip to main content

Paper Reviews by AI

2024

PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
·2572 words·13 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข Meta AI
PartGen: ๋‹ค์ค‘ ๋ทฐ ํ™•์‚ฐ ๋ชจ๋ธ์„ ์ด์šฉ, ํ…์ŠคํŠธ, ์ด๋ฏธ์ง€, ๊ธฐ์กด 3D ๊ฐ์ฒด๋กœ๋ถ€ํ„ฐ ์˜๋ฏธ์žˆ๋Š” ๋ถ€๋ถ„์œผ๋กœ ๊ตฌ์„ฑ๋œ ๊ณ ํ’ˆ์งˆ 3D ๊ฐ์ฒด ์ƒ์„ฑ ๋ฐ ์žฌ๊ตฌ์„ฑ.
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
·2368 words·12 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข Zhejiang University
๋‹จ์ผ ์ด๋ฏธ์ง€์—์„œ ๊ฐ์ฒด ๋ฐฉํ–ฅ ์ถ”์ •์˜ ์ •ํ™•๋„๋ฅผ ํฌ๊ฒŒ ๋†’์ด๋Š” ‘Orient Anything’ ๋ชจ๋ธ ์ œ์‹œ!
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
·2002 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Vision-Language Models ๐Ÿข Tsinghua University
Mulberry๋Š” ์ง‘๋‹จ ๋ชฌํ…Œ ์นด๋ฅผ๋กœ ํŠธ๋ฆฌ ํƒ์ƒ‰(CoMCTS)์„ ์ด์šฉ, ๋‹จ๊ณ„์  ์ถ”๋ก  ๋ฐ ๋ฐ˜์„ฑ ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ˜ ๋‹ค์ค‘ ๋ชจ๋“œ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(MLLM)์„ ๊ฐœ๋ฐœํ•œ ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค.
Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation
·2158 words·11 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Vision-Language Models ๐Ÿข University of Science and Technology of China
Molar: ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ LLM๊ณผ ํ˜‘์—… ํ•„ํ„ฐ๋ง์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์‹œํ€€์…œ ์ถ”์ฒœ ์„ฑ๋Šฅ์„ ํš๊ธฐ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚จ ํ˜์‹ ์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ!
MMFactory: A Universal Solution Search Engine for Vision-Language Tasks
·2306 words·11 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Vision-Language Models ๐Ÿข University of British Columbia
MMFactory: ์‚ฌ์šฉ์ž ๋งž์ถคํ˜• ๋น„์ „-์–ธ์–ด ์ž‘์—… ์†”๋ฃจ์…˜ ๊ฒ€์ƒ‰ ์—”์ง„
How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?
·1013 words·5 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Machine Translation ๐Ÿข Fondazione Bruno Kessler
์‹ค์‹œ๊ฐ„ ๋™์‹œ ํ†ต์—ญ ์‹œ์Šคํ…œ์˜ ํ˜„์‹ค์ ์ธ ํ•œ๊ณ„๋ฅผ ๊ทœ๋ช…ํ•˜๊ณ , ํ‘œ์ค€ํ™”๋œ ์šฉ์–ด์™€ ์ฒด๊ณ„๋ฅผ ์ œ์‹œํ•˜์—ฌ ์—ฐ๊ตฌ ๋ฐœ์ „์„ ์ด‰์ง„ํ•˜๋Š” ๋…ผ๋ฌธ.
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
·3181 words·15 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Tencent AI Lab
DiTCtrl: ํŠœ๋‹ ์—†์ด ๋‹ค์ค‘ ํ”„๋กฌํ”„ํŠธ๋กœ ๋งค๋„๋Ÿฌ์šด ์žฅ์‹œ๊ฐ„ ๋น„๋””์˜ค ์ƒ์„ฑ
DepthLab: From Partial to Complete
·1980 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข HKU
DepthLab: ๋ถ€๋ถ„ ๊นŠ์ด ์ •๋ณด๋กœ ์™„์ „ํ•œ 3D ์‹œ๊ฐ ์ •๋ณด ๋ณต์›
CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era
·2988 words·15 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Question Answering ๐Ÿข Megagon Labs
๋ณธ ์—ฐ๊ตฌ๋Š” ๋Œ€๊ทœ๋ชจ ํ˜„๋Œ€ ์ง€์‹ ๊ทธ๋ž˜ํ”„์—์„œ LLM์„ ์ด์šฉํ•œ ์ •ํ™•ํ•œ ์ •๋ณด ๊ฒ€์ƒ‰์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์ธ CypherBench๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๊ธฐ์กด์˜ RDF ๊ธฐ๋ฐ˜ ์ง€์‹ ๊ทธ๋ž˜ํ”„๋Š” ๊ณผ๋„ํ•˜๊ฒŒ ํฐ ์Šคํ‚ค๋งˆ์™€ ๋ฆฌ์†Œ์Šค ์‹๋ณ„์ž ์‚ฌ์šฉ์œผ๋กœ LLM์— ๋น„ํšจ์œจ์ ์ด๋ผ๋Š” ๋ฌธ์ œ์ ์„ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, Wikidata์™€ ๊ฐ™์€ ํ˜„๋Œ€ ์ง€์‹ ๊ทธ๋ž˜ํ”„๋Š” LLM์˜ ๋ฌธ๋งฅ ์ฐฝ ํฌ๊ธฐ๋ฅผ ์ดˆ๊ณผํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ…
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding
·2837 words·14 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Scene Understanding ๐Ÿข AIRI
3DGraphLLM: ์˜๋ฏธ๋ก ์  ๊ทธ๋ž˜ํ”„์™€ ๊ฑฐ๋Œ€ ์–ธ์–ด ๋ชจ๋ธ์„ ๊ฒฐํ•ฉํ•˜์—ฌ 3D ์žฅ๋ฉด ์ดํ•ด ์„ฑ๋Šฅ์„ ํš๊ธฐ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œํ‚จ ์ตœ์ฒจ๋‹จ ์—ฐ๊ตฌ!
1.58-bit FLUX
·1092 words·6 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Image Generation ๐Ÿข ByteDance
1.58-bit FLUX: 99.5%์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ 1.58-bit๋กœ ์–‘์žํ™”ํ•˜์—ฌ ๋ชจ๋ธ ํฌ๊ธฐ 7.7๋ฐฐ, ์ถ”๋ก  ๋ฉ”๋ชจ๋ฆฌ 5.1๋ฐฐ ๊ฐ์†Œ, ๊ณ ํ’ˆ์งˆ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ์œ ์ง€!
YuLan-Mini: An Open Data-efficient Language Model
·3531 words·17 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Renmin University of China
YuLan-Mini: 24์–ต ๊ฐœ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ ํšจ์œจ์ ์ธ ๊ฐœ๋ฐฉํ˜• LLM
WavePulse: Real-time Content Analytics of Radio Livestreams
·2678 words·13 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Information Extraction ๐Ÿข New York University
WavePulse: ์‹ค์‹œ๊ฐ„ ๋ผ๋””์˜ค ๋ฐฉ์†ก ์ฝ˜ํ…์ธ  ๋ถ„์„ ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ ์ •์น˜์  ๋‹ด๋ก , ๋ฏธ๋””์–ด ์œ ํ†ต, ์—ฌ๋ก  ๋™ํ–ฅ์„ ์‹ค์‹œ๊ฐ„ ๋ถ„์„ํ•˜์—ฌ ์ •์น˜ ๊ณผํ•™ ๋ฐ ๋ฏธ๋””์–ด ์—ฐ๊ตฌ์— ์ƒˆ๋กœ์šด ๊ฐ€๋Šฅ์„ฑ์„ ์—ด์—ˆ์Šต๋‹ˆ๋‹ค.
VidTwin: Video VAE with Decoupled Structure and Dynamics
·2381 words·12 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Peking University
VidTwin: ๊ตฌ์กฐ์™€ ๋™์—ญํ•™์„ ๋ถ„๋ฆฌํ•˜์—ฌ ๋น„๋””์˜ค ์••์ถ• ๋ฐ ์ƒ์„ฑ์˜ ์ƒˆ๋กœ์šด ๊ธฐ์ค€์„ ์ œ์‹œํ•˜๋Š” ํ˜์‹ ์ ์ธ ๋น„๋””์˜ค ์ž๋™ ์ธ์ฝ”๋”!
SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images
·2234 words·11 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Question Answering ๐Ÿข Kyoto University
SBS Figures: 100๋งŒ ๊ฐœ์˜ ํ•ฉ์„ฑ ์ด๋ฏธ์ง€์™€ QA ์Œ์œผ๋กœ ์‚ฌ์ „ ํ•™์Šต๋œ, ํšจ์œจ์ ์ธ Figure QA ๋ชจ๋ธ!
ResearchTown: Simulator of Human Research Community
·16894 words·80 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข University of Illinois Urbana-Champaign
RESEARCHTOWN: LLM ๊ธฐ๋ฐ˜ ์ธ๊ฐ„ ์—ฐ๊ตฌ ๊ณต๋™์ฒด ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ๋กœ, ๋‹ค์–‘ํ•œ ์—ฐ๊ตฌ ํ™œ๋™์„ ํ˜„์‹ค์ ์œผ๋กœ ๋ชจ๋ฐฉํ•˜๋ฉฐ ํ•™์ œ ๊ฐ„ ์—ฐ๊ตฌ ์•„์ด๋””์–ด ์ƒ์„ฑ ๊ฐ€๋Šฅ
PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World
·3159 words·15 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Human-AI Interaction ๐Ÿข Shanghai Jiao Tong University
PC Agent๋Š” ์ธ๊ฐ„์˜ ์ธ์ง€ ๊ณผ์ •์„ AI ์— ์ „์ดํ•˜์—ฌ ๋ณต์žกํ•œ ๋””์ง€ํ„ธ ์ž‘์—…์„ ์ž๋™ํ™”ํ•˜๋Š” ํ˜์‹ ์ ์ธ ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค.
Large Motion Video Autoencoding with Cross-modal Video VAE
·2098 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Hong Kong University of Science and Technology
๊ณ ํ’ˆ์งˆ ์˜์ƒ ์ƒ์„ฑ ๋ฐ ํšจ์œจ์  ์••์ถ•์„ ์œ„ํ•œ ํ˜์‹ ์ ์ธ ํฌ๋กœ์Šค ๋ชจ๋‹ฌ ๋น„๋””์˜ค VAE!
In Case You Missed It: ARC 'Challenge' Is Not That Challenging
·2275 words·11 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Snowflake AI Research
๊ธฐ์กด ๋‹ค์ค‘ ์„ ํƒ ๋ฌธ์ œ ํ‰๊ฐ€ ๋ฐฉ์‹์˜ ์˜ค๋ฅ˜๋ฅผ ์ง€์ ํ•˜๊ณ , ๋ชจ๋“  ์˜ต์…˜์„ ํ•จ๊ป˜ ๊ณ ๋ คํ•˜๋Š” ์ƒˆ๋กœ์šด ํ‰๊ฐ€ ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜์—ฌ ๋ชจ๋ธ ์„ฑ๋Šฅ ํ‰๊ฐ€์˜ ์ •ํ™•์„ฑ์„ ๋†’์˜€์Šต๋‹ˆ๋‹ค.
Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding
·1812 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Dialogue Systems ๐Ÿข Peking University
Friends-MMC: ๋ฐฉ๋Œ€ํ•œ ๋น„๋””์˜ค ๋ฐ์ดํ„ฐ์™€ ์ฃผ์„์„ ํฌํ•จํ•œ ์ƒˆ๋กœ์šด ๋‹ค์ค‘ ๋ชจ๋‹ฌ ๋‹ค์ค‘ ์ฐธ์—ฌ ๋Œ€ํ™” ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ด ์‹ค์ œ ์„ธ๊ณ„์˜ ๋Œ€ํ™” ์ดํ•ด๋ฅผ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค!