Skip to main content
AI Paper Reviews by AI

AI Paper Reviews by AI

Discover AI research through comprehensive reviews with advanced AI models
(powered by Gemini 1.5 & Upstage’s Document Parse)

Recent

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
·4797 words·23 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Question Answering ๐Ÿข Stanford University
AutoConverter๋Š” ์˜คํ”ˆ์—”๋“œ ๋ฐฉ์‹์˜ VQA ์งˆ๋ฌธ์„ ๋‹ค์ง€์„ ๋‹คํ˜• ์งˆ๋ฌธ์œผ๋กœ ์ž๋™ ๋ณ€ํ™˜ํ•˜๋Š” ์‹œ์Šคํ…œ์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด VLM(Vision Language Model) ํ‰๊ฐ€์˜ ๊ฐ๊ด€์„ฑ๊ณผ ์žฌํ˜„์„ฑ์„ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์—ฐ๊ตฌ์ง„์€ AutoConverter๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ 20๊ฐœ์˜ ๊ธฐ์กด VQA ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ฉํ•œ VMCBench๋ผ๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ๋ฅผ ๊ตฌ์ถ•ํ–ˆ์Šต๋‹ˆ๋‹ค. VMCBen…
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
·2104 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Shanghai AI Laboratory
BoostStep: ๋‹จ๊ณ„๋ณ„ ์ถ”๋ก ์œผ๋กœ LLMs์˜ ์ˆ˜ํ•™์  ๋Šฅ๋ ฅ ํ–ฅ์ƒ!
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
·1981 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Vision-Language Models ๐Ÿข Chinese University of Hong Kong
Dispider: ์‹ค์‹œ๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ์„ ์œ„ํ•ด ๋ถ„๋ฆฌ๋œ ์ธ์‹, ๊ฒฐ์ •, ๋ฐ˜์‘์„ ์‚ฌ์šฉํ•˜๋Š” ๋น„๋””์˜ค LLM์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
·1134 words·6 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Speech Recognition ๐Ÿข SandLogic Technologies Pvt Ltd.
Mamba ์•„ํ‚คํ…์ฒ˜ ๊ธฐ๋ฐ˜์˜ Samba-ASR์€ ํšจ์œจ์ ์ธ ์ƒํƒœ ๊ณต๊ฐ„ ๋ชจ๋ธ์„ ์ด์šฉ, ๊ธฐ์กด Transformer ๋ชจ๋ธ์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณตํ•˜๊ณ  ์Œ์„ฑ ์ธ์‹ ๋ถ„์•ผ์—์„œ ์ตœ์ฒจ๋‹จ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ์Šต๋‹ˆ๋‹ค.
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
·3033 words·15 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Nanjing University
STAR: T2V ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์‹ค์„ธ๊ณ„ ๋น„๋””์˜ค ์ดˆ๊ณ ํ•ด์ƒ๋„ ๊ธฐ์ˆ ๋กœ ํ˜„์‹ค์ ์ธ ๊ณต๊ฐ„์  ์„ธ๋ถ€ ์ •๋ณด์™€ ๊ฒฌ๊ณ ํ•œ ์‹œ๊ฐ„์  ์ผ๊ด€์„ฑ์„ ๋‹ฌ์„ฑ!
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
·2799 words·14 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Image Generation ๐Ÿข Meta
๋งˆ์Šคํฌ ๊ธฐ๋ฐ˜ ๋ชจ์…˜ ๊ฒฝ๋กœ๋ฅผ ์ด์šฉํ•œ 2๋‹จ๊ณ„ ์ด๋ฏธ์ง€-๋น„๋””์˜ค ์ƒ์„ฑ ํ”„๋ ˆ์ž„์›Œํฌ์ธ THROUGH-THE-MASK๊ฐ€ ๋‹ค์ค‘ ๊ฐ์ฒด์˜ ์ •ํ™•ํ•œ ์• ๋‹ˆ๋ฉ”์ด์…˜์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•ฉ๋‹ˆ๋‹ค.
TransPixar: Advancing Text-to-Video Generation with Transparency
·2013 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Adobe Research
TransPixar: ์ œํ•œ๋œ ๋ฐ์ดํ„ฐ๋กœ๋„ ๊ณ ํ’ˆ์งˆ ํˆฌ๋ช… ๋น„๋””์˜ค ์ƒ์„ฑ
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
·2099 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข University of Science and Technology of China (USTC)
DepthMaster๋Š” ๋‹จ์ผ ๋‹จ๊ณ„ ํ™•์‚ฐ ๋ชจ๋ธ์„ ์ด์šฉ, ์ƒ์„ฑ์  ํŠน์ง•์„ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋…ธํ˜๋Ÿฌ ๊นŠ์ด ์ถ”์ •์˜ ์ •ํ™•๋„์™€ ์†๋„๋ฅผ ํš๊ธฐ์ ์œผ๋กœ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking
·2321 words·11 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Multimedia Laboratory, the Chinese University of Hong Kong
GS-DiT: ํšจ์œจ์ ์ธ 3D ์  ์ถ”์ ์œผ๋กœ ์˜์‚ฌ 4D ๊ฐ€์šฐ์Šค ํ•„๋“œ๋ฅผ ํ™œ์šฉ, 4D ๋น„๋””์˜ค ์ œ์–ด ๊ฐ€๋Šฅํ•œ ํ˜์‹ ์  ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ