Skip to main content

Paper Reviews by AI

2025

SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization
·3547 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Action Recognition 🏒 Unmanned System Research Institute, Northwestern Polytechnical University
SeFAR: μ œν•œλœ λ°μ΄ν„°λ‘œλ„ μ •λ°€ λ™μž‘ μΈμ‹μ˜ μ„±λŠ₯을 획기적으둜 ν–₯μƒμ‹œν‚€λŠ” μƒˆλ‘œμš΄ μ„Έλ―Έ-μŠˆνΌλ°”μ΄μ¦ˆλ“œ ν•™μŠ΅ ν”„λ ˆμž„μ›Œν¬!
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
·1984 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Video Understanding 🏒 Nanyang Technological University
SeedVR: λ¬΄ν•œν•œ ν™•μ‚° 트랜슀포머둜 일반적인 λΉ„λ””μ˜€ 볡원 ν–₯상
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
·2873 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Huazhong University of Science and Technology
고차원 잠재 κ³΅κ°„μ—μ„œμ˜ μ΅œμ ν™” λ”œλ ˆλ§ˆλ₯Ό ν•΄κ²°ν•˜λŠ” VA-VAEλ₯Ό 톡해, 고해상도 이미지 μƒμ„±μ—μ„œ μ΅œμ²¨λ‹¨ μ„±λŠ₯을 달성!
Nested Attention: Semantic-aware Attention Values for Concept Personalization
·1325 words·7 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Tel Aviv University
쀑첩 주의 λ©”μ»€λ‹ˆμ¦˜μ„ μ‚¬μš©ν•˜μ—¬ ν…μŠ€νŠΈ-이미지 λͺ¨λΈμ˜ κ°œμΈν™” μ„±λŠ₯을 ν–₯μƒμ‹œν‚¨ Nested Attention 기법 μ œμ‹œ!
Graph Generative Pre-trained Transformer
·2643 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Graph Generation 🏒 Tufts University
G2PT: κ·Έλž˜ν”„λ₯Ό μ‹œν€€μŠ€λ‘œ 효율적으둜 μΈμ½”λ”©ν•˜κ³  Transformer둜 ν•™μŠ΅μ‹œμΌœ κ·Έλž˜ν”„ 생성 및 예츑 μ„±λŠ₯을 획기적으둜 ν–₯μƒμ‹œν‚¨ μƒˆλ‘œμš΄ λͺ¨λΈ!
Dynamic Scaling of Unit Tests for Code Reward Modeling
·2368 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
λ‹¨μœ„ ν…ŒμŠ€νŠΈμ˜ 수λ₯Ό 늘렀 μ½”λ“œ 보상 λͺ¨λΈμ˜ 정확성을 λ†’μ΄λŠ” 방법을 μ œμ‹œν•˜λŠ” 연ꡬ!
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
·1888 words·9 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Alibaba Group
CODEELO 벀치마크: 인간 μˆ˜μ€€μ˜ Elo λ“±κΈ‰μœΌλ‘œ LLM의 경쟁적 μ½”λ“œ 생성 λŠ₯λ ₯ 평가
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery
·3521 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Stanford University
BoxingGym: LLM 기반 과학적 μ—μ΄μ „νŠΈμ˜ μ‹€ν—˜ 섀계 및 λͺ¨λΈ 발견 λŠ₯λ ₯ μ’…ν•© 평가 벀치마크
A3: Android Agent Arena for Mobile GUI Agents
·1920 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers AI Applications Human-AI Interaction 🏒 Hong Kong University of Science and Technology
Android Agent Arena(A3): μ‹€μ œ λͺ¨λ°”일 μ•±μ—μ„œ AI μ—μ΄μ „νŠΈμ˜ 동적 μ„±λŠ₯ 평가λ₯Ό μœ„ν•œ ν˜μ‹  ν”Œλž«νΌ
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
·3211 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of Texas at Austin
TAPE(conTextualized equivAriant Position Embedding) ν”„λ ˆμž„μ›Œν¬λ₯Ό 톡해 λ¬Έλ§₯ 정보λ₯Ό ν™œμš©ν•œ 동적 μœ„μΉ˜ μΈμ½”λ”©μœΌλ‘œ 트랜슀포머의 μœ„μΉ˜ 기반 μ£Όμ†Œ 지정 μ„±λŠ₯을 ν–₯μƒμ‹œμΌ°μŠ΅λ‹ˆλ‹€.
Population Aware Diffusion for Time Series Generation
·2991 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Machine Learning Deep Learning 🏒 William & Mary
인ꡬ μˆ˜μ€€ νŠΉμ§• 보쑴 μ‹œκ³„μ—΄ 생성을 μœ„ν•œ μƒˆλ‘œμš΄ ν™•μ‚° λͺ¨λΈ PaD-TS μ œμ•ˆ
AutoPresent: Designing Structured Visuals from Scratch
·3831 words·18 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Vision-Language Models 🏒 Carnegie Mellon University
AUTOPRESENT: μžμ—°μ–΄ λͺ…λ Ήμ–΄λ‘œ μ™„λ²½ν•œ ν”„λ ˆμ  ν…Œμ΄μ…˜ μŠ¬λΌμ΄λ“œ μžλ™ 생성!
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
·3272 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Vision-Language Models 🏒 College of Computer Science and Technology, Zhejiang University
2.5λ…„ λΆ„λŸ‰μ˜ ꡐ윑 λΉ„λ””μ˜€λ₯Ό ν™œμš©, κ³ ν’ˆμ§ˆ 닀쀑 λͺ¨λ‹¬ ν…μŠ€νŠΈλΆ μ½”νΌμŠ€ ꡬ좕 및 VLMs 사전 ν•™μŠ΅ μ„±λŠ₯ ν–₯상

2024

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
·3245 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Vision-Language Models 🏒 Zhejiang University
VideoRefer SuiteλŠ” μ •κ΅ν•œ 곡간-μ‹œκ°„μ  개체 이해λ₯Ό μœ„ν•œ μƒˆλ‘œμš΄ λΉ„λ””μ˜€ LLM(VideoRefer)κ³Ό λŒ€κ·œλͺ¨ κ³ ν’ˆμ§ˆ 데이터셋(VideoRefer-700K), 쒅합적인 벀치마크(VideoRefer-Bench)λ₯Ό μ œμ‹œν•©λ‹ˆλ‹€.
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
·2638 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of Texas at Austin
심측 μ‹ κ²½λ§μ˜ μž₯κΈ° μ˜μ‘΄μ„±μ„ λͺ¨λΈλ§ν•˜λŠ” ꡬ쑰적 μƒνƒœ 곡간 λͺ¨λΈ(SSM)의 ν•œκ³„λ₯Ό 극볡! μ΅œμ‹  μ—°κ΅¬μ—μ„œ SSM의 졜근 편ν–₯(recency bias) 및 κ³Όλ„ν•œ ν‰ν™œν™”(over-smoothing) 문제λ₯Ό 규λͺ…ν•˜κ³ , 이λ₯Ό ν•΄κ²°ν•˜λŠ” **κ·Ήμ„±ν™” 기법(polarization)**을 μ œμ‹œν•˜μ—¬ μž₯κΈ° 토큰 상관관계 정확도λ₯Ό λ†’μ˜€μŠ΅λ‹ˆλ‹€.
MLLM-as-a-Judge for Image Safety without Human Labeling
·5796 words·28 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Meta AI
인간 라벨링 없이 사전 μ •μ˜λœ μ•ˆμ „ κ·œμΉ™μ„ μ‚¬μš©ν•˜μ—¬ 사전 ν›ˆλ ¨λœ 닀쀑 λͺ¨λ‹¬ λŒ€ν˜• μ–Έμ–΄ λͺ¨λΈ(MLLM)을 톡해 이미지 μ•ˆμ „μ„±μ„ νŒλ‹¨ν•˜λŠ” μƒˆλ‘œμš΄ μ œλ‘œμƒ· 방법을 μ œμ‹œν•©λ‹ˆλ‹€.
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
·2196 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 ByteDance Inc
VMix: 크둜슀 μ–΄ν…μ…˜ λ―Ήμ‹± μ œμ–΄λ₯Ό ν†΅ν•œ ν…μŠ€νŠΈ-이미지 ν™•μ‚° λͺ¨λΈ κ°œμ„ 
Training Software Engineering Agents and Verifiers with SWE-Gym
·3117 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers AI Applications Software Engineering 🏒 UC Berkeley
SWE-Gym: ν˜„μ‹€ 세계 μ†Œν”„νŠΈμ›¨μ–΄ μ—”μ§€λ‹ˆμ–΄λ§ μ—μ΄μ „νŠΈ ν›ˆλ ¨μ„ μœ„ν•œ 졜초의 ν™˜κ²½
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
·2183 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Text Generation 🏒 Singapore University of Technology and Design
TANGOFLUX: 적은 λ§€κ°œλ³€μˆ˜λ‘œ μ΄ˆκ³ μ†, κ³ ν’ˆμ§ˆ ν…μŠ€νŠΈ μŒμ„± λ³€ν™˜
Slow Perception: Let's Perceive Geometric Figures Step-by-step
·3207 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Visual Question Answering 🏒 Stepfun
느린 지각(Slow Perception): 단계별 κΈ°ν•˜ν•™μ  λ„ν˜• μΈμ‹μœΌλ‘œ 정확도 ν–₯상