↓Skip to main content

AI Paper Reviews by AI

AI Paper Reviews by AI

Discover AI research through comprehensive reviews with advanced AI models
(powered by Gemini 1.5 & Upstage’s Document Parse)

Recent

Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation

6 January 2025·4797 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Stanford University

AutoConverter는 오픈엔드 방식의 VQA 질문을 다지선다형 질문으로 자동 변환하는 시스템입니다. 이를 통해 VLM(Vision Language Model) 평가의 객관성과 재현성을 높일 수 있습니다. 연구진은 AutoConverter를 사용하여 20개의 기존 VQA 데이터셋을 통합한 VMCBench라는 새로운 벤치마크를 구축했습니다. VMCBen…

BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning

6 January 2025·2104 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Shanghai AI Laboratory

BoostStep: 단계별 추론으로 LLMs의 수학적 능력 향상!

Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction

6 January 2025·1981 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Chinese University of Hong Kong

Dispider: 실시간 상호작용을 위해 분리된 인식, 결정, 반응을 사용하는 비디오 LLM을 가능하게 합니다.

Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models

6 January 2025·1134 words·6 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Speech Recognition 🏢 SandLogic Technologies Pvt Ltd.

Mamba 아키텍처 기반의 Samba-ASR은 효율적인 상태 공간 모델을 이용, 기존 Transformer 모델의 한계를 극복하고 음성 인식 분야에서 최첨단 성능을 달성했습니다.

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

6 January 2025·3033 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanjing University

STAR: T2V 모델 기반 실세계 비디오 초고해상도 기술로 현실적인 공간적 세부 정보와 견고한 시간적 일관성을 달성!

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

6 January 2025·2799 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Meta

마스크 기반 모션 경로를 이용한 2단계 이미지-비디오 생성 프레임워크인 THROUGH-THE-MASK가 다중 객체의 정확한 애니메이션을 가능하게 합니다.

TransPixar: Advancing Text-to-Video Generation with Transparency

6 January 2025·2013 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Adobe Research

TransPixar: 제한된 데이터로도 고품질 투명 비디오 생성

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

5 January 2025·2099 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Science and Technology of China (USTC)

DepthMaster는 단일 단계 확산 모델을 이용, 생성적 특징을 활용하여 모노큘러 깊이 추정의 정확도와 속도를 획기적으로 향상시켰습니다.

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

5 January 2025·2321 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Multimedia Laboratory, the Chinese University of Hong Kong

GS-DiT: 효율적인 3D 점 추적으로 의사 4D 가우스 필드를 활용, 4D 비디오 제어 가능한 혁신적 비디오 생성 모델