↓Skip to main content

Paper Reviews by AI

2025

SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization

2 January 2025·3547 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Action Recognition 🏢 Unmanned System Research Institute, Northwestern Polytechnical University

SeFAR: 제한된 데이터로도 정밀 동작 인식의 성능을 획기적으로 향상시키는 새로운 세미-슈퍼바이즈드 학습 프레임워크!

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

2 January 2025·1984 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanyang Technological University

SeedVR: 무한한 확산 트랜스포머로 일반적인 비디오 복원 향상

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

2 January 2025·2873 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Huazhong University of Science and Technology

고차원 잠재 공간에서의 최적화 딜레마를 해결하는 VA-VAE를 통해, 고해상도 이미지 생성에서 최첨단 성능을 달성!

Nested Attention: Semantic-aware Attention Values for Concept Personalization

2 January 2025·1325 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tel Aviv University

중첩 주의 메커니즘을 사용하여 텍스트-이미지 모델의 개인화 성능을 향상시킨 Nested Attention 기법 제시!

Graph Generative Pre-trained Transformer

2 January 2025·2643 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Graph Generation 🏢 Tufts University

G2PT: 그래프를 시퀀스로 효율적으로 인코딩하고 Transformer로 학습시켜 그래프 생성 및 예측 성능을 획기적으로 향상시킨 새로운 모델!

Dynamic Scaling of Unit Tests for Code Reward Modeling

2 January 2025·2368 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

단위 테스트의 수를 늘려 코드 보상 모델의 정확성을 높이는 방법을 제시하는 연구!

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

2 January 2025·1888 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Alibaba Group

CODEELO 벤치마크: 인간 수준의 Elo 등급으로 LLM의 경쟁적 코드 생성 능력 평가

BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery

2 January 2025·3521 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Stanford University

BoxingGym: LLM 기반 과학적 에이전트의 실험 설계 및 모델 발견 능력 종합 평가 벤치마크

A3: Android Agent Arena for Mobile GUI Agents

2 January 2025·1920 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Human-AI Interaction 🏢 Hong Kong University of Science and Technology

Android Agent Arena(A3): 실제 모바일 앱에서 AI 에이전트의 동적 성능 평가를 위한 혁신 플랫폼

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding

1 January 2025·3211 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Texas at Austin

TAPE(conTextualized equivAriant Position Embedding) 프레임워크를 통해 문맥 정보를 활용한 동적 위치 인코딩으로 트랜스포머의 위치 기반 주소 지정 성능을 향상시켰습니다.

Population Aware Diffusion for Time Series Generation

1 January 2025·2991 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Machine Learning Deep Learning 🏢 William & Mary

인구 수준 특징 보존 시계열 생성을 위한 새로운 확산 모델 PaD-TS 제안

AutoPresent: Designing Structured Visuals from Scratch

1 January 2025·3831 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Carnegie Mellon University

AUTOPRESENT: 자연어 명령어로 완벽한 프레젠테이션 슬라이드 자동 생성!

2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining

1 January 2025·3272 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 College of Computer Science and Technology, Zhejiang University

2.5년 분량의 교육 비디오를 활용, 고품질 다중 모달 텍스트북 코퍼스 구축 및 VLMs 사전 학습 성능 향상

2024

VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM

31 December 2024·3245 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Zhejiang University

VideoRefer Suite는 정교한 공간-시간적 개체 이해를 위한 새로운 비디오 LLM(VideoRefer)과 대규모 고품질 데이터셋(VideoRefer-700K), 종합적인 벤치마크(VideoRefer-Bench)를 제시합니다.

Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing

31 December 2024·2638 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Texas at Austin

심층 신경망의 장기 의존성을 모델링하는 구조적 상태 공간 모델(SSM)의 한계를 극복! 최신 연구에서 SSM의 최근 편향(recency bias) 및 과도한 평활화(over-smoothing) 문제를 규명하고, 이를 해결하는 **극성화 기법(polarization)**을 제시하여 장기 토큰 상관관계 정확도를 높였습니다.

MLLM-as-a-Judge for Image Safety without Human Labeling

31 December 2024·5796 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Meta AI

인간 라벨링 없이 사전 정의된 안전 규칙을 사용하여 사전 훈련된 다중 모달 대형 언어 모델(MLLM)을 통해 이미지 안전성을 판단하는 새로운 제로샷 방법을 제시합니다.

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

30 December 2024·2196 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Inc

VMix: 크로스 어텐션 믹싱 제어를 통한 텍스트-이미지 확산 모델 개선

Training Software Engineering Agents and Verifiers with SWE-Gym

30 December 2024·3117 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Software Engineering 🏢 UC Berkeley

SWE-Gym: 현실 세계 소프트웨어 엔지니어링 에이전트 훈련을 위한 최초의 환경

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

30 December 2024·2183 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Text Generation 🏢 Singapore University of Technology and Design

TANGOFLUX: 적은 매개변수로 초고속, 고품질 텍스트 음성 변환

Slow Perception: Let's Perceive Geometric Figures Step-by-step

30 December 2024·3207 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Visual Question Answering 🏢 Stepfun

느린 지각(Slow Perception): 단계별 기하학적 도형 인식으로 정확도 향상