Paper Reviews by AI
2024
Reliable, Reproducible, and Really Fast Leaderboards with Evalica
·1243 words·6 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข JetBrains
Evalica: ๋ฒค์น๋งํน์ ์ฝ๊ณ ๋น ๋ฅด๊ณ ์ ๋ขฐํ ์ ์๊ฒ ๋ง๋๋ ํดํท
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
·2657 words·13 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
3D Vision
๐ข Hong Kong University of Science and Technology
GaussianProperty๋ LMM์ ์ฌ์ฉํ์ฌ 3D ๊ฐ์ฐ์์์ ๋ฌผ๋ฆฌ์ ์์ฑ์ ํตํฉํ๋ ํ๋ จ ์๋ ํ๋ ์์ํฌ๋ก, ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ ์๋ฎฌ๋ ์ด์
๋ฐ ๋ก๋ด ์ฅ๊ธฐ์ ๊ฐ์ ๋ค์ด์คํธ๋ฆผ ์์
์ ๊ฐ๋ฅํ๊ฒ ํฉ๋๋ค.
DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes
·1754 words·9 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Image Generation
๐ข Google DeepMind
DynamicScaler๋ ํ
์คํธ๋ ์ด๋ฏธ์ง์์ ๊ธด ๋๊น ์๋ ํ๋
ธ๋ผ๋ง ๋น๋์ค๋ฅผ ์์ฑํ๋ฉฐ, ํด์๋์ ์ข
ํก๋น์ ๊ด๊ณ์์ด ์ผ๊ด๋ ์์ง์์ ์ ์งํฉ๋๋ค.
TraceVLA: Visual Trace Prompting Enhances Spatial-Temporal Awareness for Generalist Robotic Policies
·2744 words·13 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
AI Applications
Robotics
๐ข Microsoft Research
TraceVLA: ๊ณผ๊ฑฐ์ ์์ง์์ ์๊ฐ์ ์ผ๋ก ๋ณด์ฌ์ค์ผ๋ก์จ ๋ก๋ด์ ์๊ณต๊ฐ์ ์ธ์์ ํฅ์์ํต๋๋ค.
SplineGS: Robust Motion-Adaptive Spline for Real-Time Dynamic 3D Gaussians from Monocular Video
·3662 words·18 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
3D Vision
๐ข KAIST
SplineGS: ์ค์๊ฐ ๋์ 3D ์ฅ๋ฉด์ ์ํ ๊ฐ๋ ฅํ ๋ชจ์
์ ์ํ ์คํ๋ผ์ธ.
SCBench: A KV Cache-Centric Analysis of Long-Context Methods
·4642 words·22 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Microsoft Corporation
SCBench๋ ๋ฉํฐํด ๋ฐ ๋ฉํฐ๋ฆฌํ์คํธ ์๋๋ฆฌ์ค์์ ์ฅ๋ฌธ ๋งฅ๋ฝ ๋ฉ์๋๋ฅผ ํ๊ฐํ๋ ์๋ก์ด ๋ฒค์น๋งํฌ์
๋๋ค.
RLDG: Robotic Generalist Policy Distillation via Reinforcement Learning
·1911 words·9 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
AI Applications
Robotics
๐ข UC Berkeley
RLDG๋ ๊ฐํ ํ์ต์ ํตํด ์์ฑ๋ ๊ณ ํ์ง ๋ฐ์ดํฐ๋ก ๋ฒ์ฉ ๋ก๋ด ์ ์ฑ
์ ์ฑ๋ฅ์ ํฅ์์ํค๋ ํ๊ธฐ์ ์ธ ๋ฐฉ๋ฒ์
๋๋ค.
Prompt2Perturb (P2P): Text-Guided Diffusion-Based Adversarial Attacks on Breast Ultrasound Images
·1580 words·8 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Image Generation
๐ข University of British Columbia
P2P: ํ
์คํธ ๊ธฐ๋ฐ์ ์๋ก์ด ์ ๋์ ๊ณต๊ฒฉ์ผ๋ก ์๋ฃ ์์ DNN์ ์ทจ์ฝ์ฑ ๊ณต๋ต
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
·3571 words·17 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Video Understanding
๐ข Princeton University
LinGen: ๋ถ ๋จ์ ๊ณ ํด์๋ ํ
์คํธ-ํฌ-๋น๋์ค ์์ฑ, ์ ํ ๊ณ์ฐ ๋ณต์ก๋๋ก ํจ์จ์ฑ ๊ทน๋ํ
Large Action Models: From Inception to Implementation
·2067 words·10 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Microsoft
LLM์์ LAM์ผ๋ก: ์ค์ ์์
์ ์ํํ๋ AI ์์ด์ ํธ ๊ตฌ์ถ.
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens
·2277 words·11 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Vision-Language Models
๐ข NVIDIA Research
ResGen, ๊ณ ํ์ง ์์ฑ๊ณผ ๋น ๋ฅธ ์ํ๋ง ์๋๋ฅผ ๋ชจ๋ ๋ฌ์ฑํ๋ ํจ์จ์ ์ธ RVQ ๊ธฐ๋ฐ ์์ฑ ๋ชจ๋ธ.
Byte Latent Transformer: Patches Scale Better Than Tokens
·3839 words·19 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข University of Washington
BLT: ๋ฐ์ดํธ ๊ธฐ๋ฐ LLM, ํ ํฐ๋ณด๋ค ํจ์น ์ฐ์ .
BrushEdit: All-In-One Image Inpainting and Editing
·3188 words·15 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Image Generation
๐ข Peking University
BrushEdit: All-in-One Image Inpainting & Editing.
Apollo: An Exploration of Video Understanding in Large Multimodal Models
·1707 words·9 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Vision-Language Models
๐ข Meta GenAI
Apollo: ๋๊ท๋ชจ ๋ฉํฐ๋ชจ๋ฌ ๋ชจ๋ธ์ ๋น๋์ค ์ดํด๋ฅผ ์ํ ์ฌ์ธต ํ๊ตฌ.
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
·3268 words·16 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Vision-Language Models
๐ข Tsinghua University
SynerGen-VL: ๊ฐ๋จํ ๊ตฌ์กฐ๋ก ์ด๋ฏธ์ง ์ดํด ๋ฐ ์์ฑ์ ๋์์ ์ํํ๋ ๊ฐ๋ ฅํ MLLM.
Phi-4 Technical Report
·2236 words·11 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Microsoft Research
Phi-4: 140์ต ๋งค๊ฐ๋ณ์ ์ธ์ด ๋ชจ๋ธ์ ๋ฐ์ดํฐ ํ์ง์ ์ค์ ์ ๋ ํ๋ จ ๋ ์ํผ๋ก ๊ฐ๋ฐ๋์ด ์ถ๋ก ๋ฅ๋ ฅ์ ๋ํญ ํฅ์์์ผฐ์ต๋๋ค.
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
·2344 words·12 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Multimodal Generation
๐ข University of Edinburgh
VMB๋ ํ
์คํธ ๋ฐ ์์
๋ธ๋ฆฌ์ง๋ฅผ ํ์ฉํ์ฌ ๋ฉํฐ๋ชจ๋ฌ ์์
์์ฑ์ ์ํ ์๋กญ๊ณ ์ ์ด ๊ฐ๋ฅํ ํ๋ ์์ํฌ๋ฅผ ์ ์ํฉ๋๋ค.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
·3354 words·16 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Human-AI Interaction
๐ข Shanghai Artificial Intelligence Laboratory
InternLM-XComposer2.5-OmniLive: ์ค์๊ฐ ์คํธ๋ฆฌ๋ฐ ๋น๋์ค ๋ฐ ์ค๋์ค ์ํธ์์ฉ์ ์ํ ์ธ๊ฐ์ ์ธ์ง๋ฅ๋ ฅ์ ๋ชจ๋ฐฉํ ํ์ ์ ๋ค์ค ๋ชจ๋ AI ์์คํ
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
·3493 words·17 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Video Understanding
๐ข Nanjing University
InstanceCap: ์ธ์คํด์ค ์ธ์ ๊ตฌ์กฐํ ์บก์
์ ํตํด ํ
์คํธ-๋น๋์ค ์์ฑ์ ๊ฐ์ ํฉ๋๋ค.
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers
·7101 words·34 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Pennsylvania State University
GREATER๋ ์ถ๋ก ์ ๋ํ ๊ทธ๋ ์ด๋์ธํธ๋ฅผ ํ์ฉํ์ฌ ์๊ท๋ชจ ์ธ์ด ๋ชจ๋ธ์ ํ๋กฌํํธ๋ฅผ ์ต์ ํํ์ฌ ๋๊ท๋ชจ LLM ์์ด๋ ์ฑ๋ฅ์ ํฅ์์ํต๋๋ค.