Paper Reviews by AI
2025
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization
·3547 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Action Recognition
π’ Unmanned System Research Institute, Northwestern Polytechnical University
SeFAR: μ νλ λ°μ΄ν°λ‘λ μ λ° λμ μΈμμ μ±λ₯μ νκΈ°μ μΌλ‘ ν₯μμν€λ μλ‘μ΄ μΈλ―Έ-μνΌλ°μ΄μ¦λ νμ΅ νλ μμν¬!
SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration
·1984 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Nanyang Technological University
SeedVR: 무νν νμ° νΈλμ€ν¬λ¨Έλ‘ μΌλ°μ μΈ λΉλμ€ λ³΅μ ν₯μ
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
·2873 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Huazhong University of Science and Technology
κ³ μ°¨μ μ μ¬ κ³΅κ°μμμ μ΅μ ν λλ λ§λ₯Ό ν΄κ²°νλ VA-VAEλ₯Ό ν΅ν΄, κ³ ν΄μλ μ΄λ―Έμ§ μμ±μμ μ΅μ²¨λ¨ μ±λ₯μ λ¬μ±!
Nested Attention: Semantic-aware Attention Values for Concept Personalization
·1325 words·7 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Tel Aviv University
μ€μ²© μ£Όμ λ©μ»€λμ¦μ μ¬μ©νμ¬ ν
μ€νΈ-μ΄λ―Έμ§ λͺ¨λΈμ κ°μΈν μ±λ₯μ ν₯μμν¨ Nested Attention κΈ°λ² μ μ!
Graph Generative Pre-trained Transformer
·2643 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Graph Generation
π’ Tufts University
G2PT: κ·Έλνλ₯Ό μνμ€λ‘ ν¨μ¨μ μΌλ‘ μΈμ½λ©νκ³ Transformerλ‘ νμ΅μμΌ κ·Έλν μμ± λ° μμΈ‘ μ±λ₯μ νκΈ°μ μΌλ‘ ν₯μμν¨ μλ‘μ΄ λͺ¨λΈ!
Dynamic Scaling of Unit Tests for Code Reward Modeling
·2368 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tsinghua University
λ¨μ ν
μ€νΈμ μλ₯Ό λλ € μ½λ 보μ λͺ¨λΈμ μ νμ±μ λμ΄λ λ°©λ²μ μ μνλ μ°κ΅¬!
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
·1888 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Alibaba Group
CODEELO λ²€μΉλ§ν¬: μΈκ° μμ€μ Elo λ±κΈμΌλ‘ LLMμ κ²½μμ μ½λ μμ± λ₯λ ₯ νκ°
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery
·3521 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Stanford University
BoxingGym: LLM κΈ°λ° κ³Όνμ μμ΄μ νΈμ μ€ν μ€κ³ λ° λͺ¨λΈ λ°κ²¬ λ₯λ ₯ μ’
ν© νκ° λ²€μΉλ§ν¬
A3: Android Agent Arena for Mobile GUI Agents
·1920 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
AI Applications
Human-AI Interaction
π’ Hong Kong University of Science and Technology
Android Agent Arena(A3): μ€μ λͺ¨λ°μΌ μ±μμ AI μμ΄μ νΈμ λμ μ±λ₯ νκ°λ₯Ό μν νμ νλ«νΌ
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
·3211 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Texas at Austin
TAPE(conTextualized equivAriant Position Embedding) νλ μμν¬λ₯Ό ν΅ν΄ λ¬Έλ§₯ μ 보λ₯Ό νμ©ν λμ μμΉ μΈμ½λ©μΌλ‘ νΈλμ€ν¬λ¨Έμ μμΉ κΈ°λ° μ£Όμ μ§μ μ±λ₯μ ν₯μμμΌ°μ΅λλ€.
Population Aware Diffusion for Time Series Generation
·2991 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Machine Learning
Deep Learning
π’ William & Mary
μΈκ΅¬ μμ€ νΉμ§ 보쑴 μκ³μ΄ μμ±μ μν μλ‘μ΄ νμ° λͺ¨λΈ PaD-TS μ μ
AutoPresent: Designing Structured Visuals from Scratch
·3831 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Carnegie Mellon University
AUTOPRESENT: μμ°μ΄ λͺ
λ Ήμ΄λ‘ μλ²½ν νλ μ ν
μ΄μ
μ¬λΌμ΄λ μλ μμ±!
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining
·3272 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ College of Computer Science and Technology, Zhejiang University
2.5λ
λΆλμ κ΅μ‘ λΉλμ€λ₯Ό νμ©, κ³ νμ§ λ€μ€ λͺ¨λ¬ ν
μ€νΈλΆ μ½νΌμ€ κ΅¬μΆ λ° VLMs μ¬μ νμ΅ μ±λ₯ ν₯μ
2024
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
·3245 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Zhejiang University
VideoRefer Suiteλ μ κ΅ν 곡κ°-μκ°μ κ°μ²΄ μ΄ν΄λ₯Ό μν μλ‘μ΄ λΉλμ€ LLM(VideoRefer)κ³Ό λκ·λͺ¨ κ³ νμ§ λ°μ΄ν°μ
(VideoRefer-700K), μ’
ν©μ μΈ λ²€μΉλ§ν¬(VideoRefer-Bench)λ₯Ό μ μν©λλ€.
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
·2638 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Texas at Austin
μ¬μΈ΅ μ κ²½λ§μ μ₯κΈ° μμ‘΄μ±μ λͺ¨λΈλ§νλ ꡬ쑰μ μν κ³΅κ° λͺ¨λΈ(SSM)μ νκ³λ₯Ό 극볡! μ΅μ μ°κ΅¬μμ SSMμ μ΅κ·Ό νΈν₯(recency bias) λ° κ³Όλν ννν(over-smoothing) λ¬Έμ λ₯Ό κ·λͺ
νκ³ , μ΄λ₯Ό ν΄κ²°νλ **κ·Ήμ±ν κΈ°λ²(polarization)**μ μ μνμ¬ μ₯κΈ° ν ν° μκ΄κ΄κ³ μ νλλ₯Ό λμμ΅λλ€.
MLLM-as-a-Judge for Image Safety without Human Labeling
·5796 words·28 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Meta AI
μΈκ° λΌλ²¨λ§ μμ΄ μ¬μ μ μλ μμ κ·μΉμ μ¬μ©νμ¬ μ¬μ νλ ¨λ λ€μ€ λͺ¨λ¬ λν μΈμ΄ λͺ¨λΈ(MLLM)μ ν΅ν΄ μ΄λ―Έμ§ μμ μ±μ νλ¨νλ μλ‘μ΄ μ λ‘μ· λ°©λ²μ μ μν©λλ€.
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
·2196 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ ByteDance Inc
VMix: ν¬λ‘μ€ μ΄ν
μ
λ―Ήμ± μ μ΄λ₯Ό ν΅ν ν
μ€νΈ-μ΄λ―Έμ§ νμ° λͺ¨λΈ κ°μ
Training Software Engineering Agents and Verifiers with SWE-Gym
·3117 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
AI Applications
Software Engineering
π’ UC Berkeley
SWE-Gym: νμ€ μΈκ³ μννΈμ¨μ΄ μμ§λμ΄λ§ μμ΄μ νΈ νλ ¨μ μν μ΅μ΄μ νκ²½
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
·2183 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Text Generation
π’ Singapore University of Technology and Design
TANGOFLUX: μ μ 맀κ°λ³μλ‘ μ΄κ³ μ, κ³ νμ§ ν
μ€νΈ μμ± λ³ν
Slow Perception: Let's Perceive Geometric Figures Step-by-step
·3207 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Visual Question Answering
π’ Stepfun
λλ¦° μ§κ°(Slow Perception): λ¨κ³λ³ κΈ°ννμ λν μΈμμΌλ‘ μ νλ ν₯μ