Skip to main content

Paper Reviews by AI

2024

Reliable, Reproducible, and Really Fast Leaderboards with Evalica
·1243 words·6 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข JetBrains
Evalica: ๋ฒค์น˜๋งˆํ‚น์„ ์‰ฝ๊ณ  ๋น ๋ฅด๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋งŒ๋“œ๋Š” ํˆดํ‚ท
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
·2657 words·13 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข Hong Kong University of Science and Technology
GaussianProperty๋Š” LMM์„ ์‚ฌ์šฉํ•˜์—ฌ 3D ๊ฐ€์šฐ์‹œ์•ˆ์— ๋ฌผ๋ฆฌ์  ์†์„ฑ์„ ํ†ตํ•ฉํ•˜๋Š” ํ›ˆ๋ จ ์—†๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ๋กœ๋ด‡ ์ฅ๊ธฐ์™€ ๊ฐ™์€ ๋‹ค์šด์ŠคํŠธ๋ฆผ ์ž‘์—…์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
·1754 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Image Generation ๐Ÿข Google DeepMind
DynamicScaler๋Š” ํ…์ŠคํŠธ๋‚˜ ์ด๋ฏธ์ง€์—์„œ ๊ธด ๋Š๊น€ ์—†๋Š” ํŒŒ๋…ธ๋ผ๋งˆ ๋น„๋””์˜ค๋ฅผ ์ƒ์„ฑํ•˜๋ฉฐ, ํ•ด์ƒ๋„์™€ ์ข…ํšก๋น„์— ๊ด€๊ณ„์—†์ด ์ผ๊ด€๋œ ์›€์ง์ž„์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
·2744 words·13 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers AI Applications Robotics ๐Ÿข Microsoft Research
TraceVLA: ๊ณผ๊ฑฐ์˜ ์›€์ง์ž„์„ ์‹œ๊ฐ์ ์œผ๋กœ ๋ณด์—ฌ์คŒ์œผ๋กœ์จ ๋กœ๋ด‡์˜ ์‹œ๊ณต๊ฐ„์  ์ธ์‹์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
·3662 words·18 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข KAIST
SplineGS: ์‹ค์‹œ๊ฐ„ ๋™์  3D ์žฅ๋ฉด์„ ์œ„ํ•œ ๊ฐ•๋ ฅํ•œ ๋ชจ์…˜ ์ ์‘ํ˜• ์Šคํ”Œ๋ผ์ธ.
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
·4642 words·22 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Microsoft Corporation
SCBench๋Š” ๋ฉ€ํ‹ฐํ„ด ๋ฐ ๋ฉ€ํ‹ฐ๋ฆฌํ€˜์ŠคํŠธ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ์žฅ๋ฌธ ๋งฅ๋ฝ ๋ฉ”์„œ๋“œ๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์ž…๋‹ˆ๋‹ค.
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
·1911 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers AI Applications Robotics ๐Ÿข UC Berkeley
RLDG๋Š” ๊ฐ•ํ™” ํ•™์Šต์„ ํ†ตํ•ด ์ƒ์„ฑ๋œ ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ๋กœ ๋ฒ”์šฉ ๋กœ๋ด‡ ์ •์ฑ…์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ํš๊ธฐ์ ์ธ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images
·1580 words·8 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Image Generation ๐Ÿข University of British Columbia
P2P: ํ…์ŠคํŠธ ๊ธฐ๋ฐ˜์˜ ์ƒˆ๋กœ์šด ์ ๋Œ€์  ๊ณต๊ฒฉ์œผ๋กœ ์˜๋ฃŒ ์˜์ƒ DNN์˜ ์ทจ์•ฝ์„ฑ ๊ณต๋žต
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
·3571 words·17 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Princeton University
LinGen: ๋ถ„ ๋‹จ์œ„ ๊ณ ํ•ด์ƒ๋„ ํ…์ŠคํŠธ-ํˆฌ-๋น„๋””์˜ค ์ƒ์„ฑ, ์„ ํ˜• ๊ณ„์‚ฐ ๋ณต์žก๋„๋กœ ํšจ์œจ์„ฑ ๊ทน๋Œ€ํ™”
Large Action Models: From Inception to Implementation
·2067 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Microsoft
LLM์—์„œ LAM์œผ๋กœ: ์‹ค์ œ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•˜๋Š” AI ์—์ด์ „ํŠธ ๊ตฌ์ถ•.
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens
·2277 words·11 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Vision-Language Models ๐Ÿข NVIDIA Research
ResGen, ๊ณ ํ’ˆ์งˆ ์ƒ์„ฑ๊ณผ ๋น ๋ฅธ ์ƒ˜ํ”Œ๋ง ์†๋„๋ฅผ ๋ชจ๋‘ ๋‹ฌ์„ฑํ•˜๋Š” ํšจ์œจ์ ์ธ RVQ ๊ธฐ๋ฐ˜ ์ƒ์„ฑ ๋ชจ๋ธ.
Byte Latent Transformer: Patches Scale Better Than Tokens
·3839 words·19 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข University of Washington
BLT: ๋ฐ”์ดํŠธ ๊ธฐ๋ฐ˜ LLM, ํ† ํฐ๋ณด๋‹ค ํŒจ์น˜ ์šฐ์„ .
BrushEdit: All-In-One Image Inpainting and Editing
·3188 words·15 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Image Generation ๐Ÿข Peking University
BrushEdit: All-in-One Image Inpainting & Editing.
Apollo: An Exploration of Video Understanding in Large Multimodal Models
·1707 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Vision-Language Models ๐Ÿข Meta GenAI
Apollo: ๋Œ€๊ทœ๋ชจ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ชจ๋ธ์˜ ๋น„๋””์˜ค ์ดํ•ด๋ฅผ ์œ„ํ•œ ์‹ฌ์ธต ํƒ๊ตฌ.
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
·3268 words·16 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Vision-Language Models ๐Ÿข Tsinghua University
SynerGen-VL: ๊ฐ„๋‹จํ•œ ๊ตฌ์กฐ๋กœ ์ด๋ฏธ์ง€ ์ดํ•ด ๋ฐ ์ƒ์„ฑ์„ ๋™์‹œ์— ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฐ•๋ ฅํ•œ MLLM.
Phi-4 Technical Report
·2236 words·11 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Microsoft Research
Phi-4: 140์–ต ๋งค๊ฐœ๋ณ€์ˆ˜ ์–ธ์–ด ๋ชจ๋ธ์€ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ์— ์ค‘์ ์„ ๋‘” ํ›ˆ๋ จ ๋ ˆ์‹œํ”ผ๋กœ ๊ฐœ๋ฐœ๋˜์–ด ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ๋Œ€ํญ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
·2344 words·12 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Multimodal Generation ๐Ÿข University of Edinburgh
VMB๋Š” ํ…์ŠคํŠธ ๋ฐ ์Œ์•… ๋ธŒ๋ฆฌ์ง€๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์Œ์•… ์ƒ์„ฑ์„ ์œ„ํ•œ ์ƒˆ๋กญ๊ณ  ์ œ์–ด ๊ฐ€๋Šฅํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
·3354 words·16 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Human-AI Interaction ๐Ÿข Shanghai Artificial Intelligence Laboratory
InternLM-XComposer2.5-OmniLive: ์‹ค์‹œ๊ฐ„ ์ŠคํŠธ๋ฆฌ๋ฐ ๋น„๋””์˜ค ๋ฐ ์˜ค๋””์˜ค ์ƒํ˜ธ์ž‘์šฉ์„ ์œ„ํ•œ ์ธ๊ฐ„์˜ ์ธ์ง€๋Šฅ๋ ฅ์„ ๋ชจ๋ฐฉํ•œ ํ˜์‹ ์  ๋‹ค์ค‘ ๋ชจ๋“œ AI ์‹œ์Šคํ…œ
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
·3493 words·17 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Nanjing University
InstanceCap: ์ธ์Šคํ„ด์Šค ์ธ์‹ ๊ตฌ์กฐํ™” ์บก์…˜์„ ํ†ตํ•ด ํ…์ŠคํŠธ-๋น„๋””์˜ค ์ƒ์„ฑ์„ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers
·7101 words·34 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Pennsylvania State University
GREATER๋Š” ์ถ”๋ก ์— ๋Œ€ํ•œ ๊ทธ๋ ˆ์ด๋””์–ธํŠธ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์†Œ๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ํ”„๋กฌํ”„ํŠธ๋ฅผ ์ตœ์ ํ™”ํ•˜์—ฌ ๋Œ€๊ทœ๋ชจ LLM ์—†์ด๋„ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.