Skip to main content

Paper Reviews by AI

2025

TransPixar: Advancing Text-to-Video Generation with Transparency
·2013 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Video Understanding 🏒 Adobe Research
TransPixar: μ œν•œλœ λ°μ΄ν„°λ‘œλ„ κ³ ν’ˆμ§ˆ 투λͺ… λΉ„λ””μ˜€ 생성
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
·2799 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Meta
마슀크 기반 λͺ¨μ…˜ 경둜λ₯Ό μ΄μš©ν•œ 2단계 이미지-λΉ„λ””μ˜€ 생성 ν”„λ ˆμž„μ›Œν¬μΈ THROUGH-THE-MASKκ°€ 닀쀑 객체의 μ •ν™•ν•œ μ• λ‹ˆλ©”μ΄μ…˜μ„ κ°€λŠ₯ν•˜κ²Œ ν•©λ‹ˆλ‹€.
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution
·3033 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Video Understanding 🏒 Nanjing University
STAR: T2V λͺ¨λΈ 기반 싀세계 λΉ„λ””μ˜€ μ΄ˆκ³ ν•΄μƒλ„ 기술둜 ν˜„μ‹€μ μΈ 곡간적 μ„ΈλΆ€ 정보와 κ²¬κ³ ν•œ μ‹œκ°„μ  일관성을 달성!
Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
·1134 words·6 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Speech Recognition 🏒 SandLogic Technologies Pvt Ltd.
Mamba μ•„ν‚€ν…μ²˜ 기반의 Samba-ASR은 효율적인 μƒνƒœ 곡간 λͺ¨λΈμ„ 이용, κΈ°μ‘΄ Transformer λͺ¨λΈμ˜ ν•œκ³„λ₯Ό κ·Ήλ³΅ν•˜κ³  μŒμ„± 인식 λΆ„μ•Όμ—μ„œ μ΅œμ²¨λ‹¨ μ„±λŠ₯을 λ‹¬μ„±ν–ˆμŠ΅λ‹ˆλ‹€.
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
·1981 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Vision-Language Models 🏒 Chinese University of Hong Kong
Dispider: μ‹€μ‹œκ°„ μƒν˜Έμž‘μš©μ„ μœ„ν•΄ λΆ„λ¦¬λœ 인식, κ²°μ •, λ°˜μ‘μ„ μ‚¬μš©ν•˜λŠ” λΉ„λ””μ˜€ LLM을 κ°€λŠ₯ν•˜κ²Œ ν•©λ‹ˆλ‹€.
BoostStep: Boosting mathematical capability of Large Language Models via improved single-step reasoning
·2104 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Shanghai AI Laboratory
BoostStep: 단계별 μΆ”λ‘ μœΌλ‘œ LLMs의 μˆ˜ν•™μ  λŠ₯λ ₯ ν–₯상!
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
·4797 words·23 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Question Answering 🏒 Stanford University
AutoConverterλŠ” μ˜€ν”ˆμ—”λ“œ λ°©μ‹μ˜ VQA μ§ˆλ¬Έμ„ λ‹€μ§€μ„ λ‹€ν˜• 질문으둜 μžλ™ λ³€ν™˜ν•˜λŠ” μ‹œμŠ€ν…œμž…λ‹ˆλ‹€. 이λ₯Ό 톡해 VLM(Vision Language Model) ν‰κ°€μ˜ 객관성과 μž¬ν˜„μ„±μ„ 높일 수 μžˆμŠ΅λ‹ˆλ‹€. 연ꡬ진은 AutoConverterλ₯Ό μ‚¬μš©ν•˜μ—¬ 20개의 κΈ°μ‘΄ VQA 데이터셋을 ν†΅ν•©ν•œ VMCBenchλΌλŠ” μƒˆλ‘œμš΄ 벀치마크λ₯Ό κ΅¬μΆ•ν–ˆμŠ΅λ‹ˆλ‹€. VMCBen…
ToolHop: A Query-Driven Benchmark for Evaluating Large Language Models in Multi-Hop Tool Use
·3178 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 ByteDance
ToolHop: λŒ€κ·œλͺ¨ μ–Έμ–΄ λͺ¨λΈμ˜ 닀쀑 단계 도ꡬ μ‚¬μš© λŠ₯λ ₯을 μ—„κ²©νžˆ ν‰κ°€ν•˜λŠ” μƒˆλ‘œμš΄ 벀치마크
Test-time Computing: from System-1 Thinking to System-2 Thinking
·699 words·4 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Soochow University
ν…ŒμŠ€νŠΈ μ‹œκ°„ μ»΄ν“¨νŒ…μ„ ν™œμš©ν•˜μ—¬ λŒ€κ·œλͺ¨ μ–Έμ–΄ λͺ¨λΈμ˜ μΆ”λ‘  λŠ₯λ ₯을 μ‹œμŠ€ν…œ 1 μ‚¬κ³ μ—μ„œ μ‹œμŠ€ν…œ 2 사고 μˆ˜μ€€μœΌλ‘œ ν–₯μƒμ‹œν‚€λŠ” 방법을 μ œμ‹œν•˜λŠ” 획기적인 연ꡬ!
Scaling Laws for Floating Point Quantization Training
·5642 words·27 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tencent AI Lab
뢀동 μ†Œμˆ˜μ  μ–‘μžν™” ν›ˆλ ¨μ˜ μƒˆλ‘œμš΄ scaling law 발견: μ§€μˆ˜, 맨티사 λΉ„νŠΈ 및 μŠ€μΌ€μΌλ§ 인자 계산 정밀도가 LLM μ„±λŠ₯에 λ―ΈμΉ˜λŠ” 영ν–₯을 μ •λŸ‰μ μœΌλ‘œ 규λͺ…
GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking
·2321 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Video Understanding 🏒 Multimedia Laboratory, the Chinese University of Hong Kong
GS-DiT: 효율적인 3D 점 μΆ”μ μœΌλ‘œ μ˜μ‚¬ 4D κ°€μš°μŠ€ ν•„λ“œλ₯Ό ν™œμš©, 4D λΉ„λ””μ˜€ μ œμ–΄ κ°€λŠ₯ν•œ ν˜μ‹ μ  λΉ„λ””μ˜€ 생성 λͺ¨λΈ
DepthMaster: Taming Diffusion Models for Monocular Depth Estimation
·2099 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision 3D Vision 🏒 University of Science and Technology of China (USTC)
DepthMasterλŠ” 단일 단계 ν™•μ‚° λͺ¨λΈμ„ 이용, 생성적 νŠΉμ§•μ„ ν™œμš©ν•˜μ—¬ λͺ¨λ…Έν˜λŸ¬ 깊이 μΆ”μ •μ˜ 정확도와 속도λ₯Ό 획기적으둜 ν–₯μƒμ‹œμΌ°μŠ΅λ‹ˆλ‹€.
Personalized Graph-Based Retrieval for Large Language Models
·3060 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 UC Santa Cruz
κ°œμΈν™”λœ κ·Έλž˜ν”„ 기반 검색 증강 생성(PGraphRAG) ν”„λ ˆμž„μ›Œν¬λ₯Ό 톡해 ν¬μ†Œ 데이터 문제λ₯Ό ν•΄κ²°ν•˜κ³ , LLM의 κ°œμΈν™” μ„±λŠ₯을 크게 ν–₯μƒμ‹œμΌ°μŠ΅λ‹ˆλ‹€.
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
·2176 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Vision-Language Models 🏒 Tencent Youtu Lab
VITA-1.5: μ‹€μ‹œκ°„ μ‹œκ° 및 μŒμ„± μƒν˜Έμž‘μš©μ„ μœ„ν•œ GPT-40 μˆ˜μ€€μ˜ 닀쀑 λͺ¨λ‹¬ LLM
Virgo: A Preliminary Exploration on Reproducing o1-like MLLM
·3242 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Multimodal Reasoning 🏒 Gaoling School of Artificial Intelligence, Renmin University of China
Virgo: ν…μŠ€νŠΈ 기반 μž₯λ¬Έ 사고 데이터λ₯Ό ν™œμš©, λ‹€μ–‘ν•œ λ©€ν‹°λͺ¨λ‹¬ λ²€μΉ˜λ§ˆν¬μ—μ„œ μ΅œμ²¨λ‹¨ μ„±λŠ₯ 달성!
METAGENE-1: Metagenomic Foundation Model for Pandemic Monitoring
·2684 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of Southern California
70μ–΅ 개 λ§€κ°œλ³€μˆ˜λ₯Ό 가진 λ©”νƒ€μœ μ „μ²΄ 기반 λŒ€κ·œλͺ¨ μ–Έμ–΄ λͺ¨λΈ(METAGENE-1)이 폐수 λ°μ΄ν„°λ‘œ ν›ˆλ ¨λ˜μ–΄ 병원균 탐지 및 μœ μ „μ²΄ μ„œμ—΄ μž„λ² λ”© μž‘μ—…μ—μ„œ μ΅œμ²¨λ‹¨ μ„±λŠ₯을 λ‹¬μ„±ν–ˆμŠ΅λ‹ˆλ‹€.
Ingredients: Blending Custom Photos with Video Diffusion Transformers
·2088 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Video Understanding 🏒 Kunlun Inc.
κ³ ν’ˆμ§ˆ 닀쀑 ID λ§žμΆ€ν˜• λΉ„λ””μ˜€ 생성을 μœ„ν•œ ν˜μ‹ μ μΈ ν”„λ ˆμž„μ›Œν¬, Ingredients μ†Œκ°œ!
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
·2819 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers AI Applications Robotics 🏒 AgiBot
EnerVerse: λ‘œλ΄‡ μ‘°μž‘μ„ μœ„ν•œ 미래 곡간 생성 ν”„λ ˆμž„μ›Œν¬κ°€ μž₯κΈ°κ°„ μž‘μ—…μ—μ„œ μ„±λŠ₯ ν–₯상을 λ‹¬μ„±ν–ˆμŠ΅λ‹ˆλ‹€.
Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
·3175 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Ant Group
AUTO-RT: μžλ™ν™”λœ 재밍 μ „λž΅ νƒμƒ‰μœΌλ‘œ LLM 취약점 효율적으둜 발견!
VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control
·2466 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Video Understanding 🏒 Hong Kong University of Science and Technology
VideoAnydoor: μ •λ°€ν•œ λͺ¨μ…˜ μ œμ–΄λ₯Ό κ°–μΆ˜ κ³ ν’ˆμ§ˆ μ˜μƒ 객체 μ‚½μž