Paper Reviews by AI

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation

18 December 2024·3040 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Dept. ECE, University of Alberta

PixelMan은 픽셀 조작 및 생성을 통해 훈련 없이도 일관성 있는 객체 편집을 16단계 만에 달성하는 혁신적인 확산 모델 기반 방법입니다.

LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer

18 December 2024·3363 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University

LLaVA-UHD v2는 계층적 윈도우 변환기를 이용, 고해상도 특징 피라미드를 통합하여 다양한 시각적 세부 정보를 포착하는 혁신적인 다중 모달 언어 모델입니다.

GUI Agents: A Survey

18 December 2024·207 words·1 min· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 University of Maryland

대규모 언어 모델 기반 GUI 에이전트 기술의 최신 동향을 종합적으로 분석하고, 벤치마크, 평가 지표, 아키텍처, 학습 방법을 체계적으로 분류하여 통합 프레임워크를 제시합니다.

FashionComposer: Compositional Fashion Image Generation

18 December 2024·2170 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong

FashionComposer: 다양한 입력(텍스트, 의상 이미지, 3D 모델)을 활용해 사실적인 패션 이미지를 합성하는 혁신적인 프레임워크!

Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception

18 December 2024·2500 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Hong Kong University of Science and Technology

시각 전문가 모델을 활용한 이미지 캡션 향상으로 다중 모달 모델 성능 개선

Autoregressive Video Generation without Vector Quantization

18 December 2024·3553 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 BAAI

벡터 양자화 없이도 효율적이고 유연한 자기회귀 비디오 생성 모델, NOVA 개발!

AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge

18 December 2024·3149 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Nanyang Technological University

AntiLeak-Bench: 자동화된 벤치마킹으로 LLM 데이터 오염 방지

AniDoc: Animation Creation Made Easier

18 December 2024·1844 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

AniDoc: 희소 스케치와 참조 이미지를 활용, 2D 애니메이션 자동 채색 및 보간을 구현하는 혁신적 AI 모델!

VidTok: A Versatile and Open-Source Video Tokenizer

17 December 2024·2469 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Microsoft Research

VidTok: 오픈소스 고성능 비디오 토크나이저가 연속 및 이산 토큰화에서 최첨단 성능을 달성하며, 효율적인 학습 전략과 혁신적인 양자화 기법을 통해 영상 생성 및 이해 연구에 새로운 가능성을 열었습니다.

Move-in-2D: 2D-Conditioned Human Motion Generation

17 December 2024·1943 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Adobe Research

Move-in-2D: 2D 이미지와 텍스트 프롬프트로 현실적인 인간 동작 생성

Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning

17 December 2024·4087 words·20 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Karlsruhe Institute of Technology

MoDE: 효율적인 다중 작업 학습을 위한 전문가 혼합 잡음 제거기를 사용한 확산 트랜스포머 정책

DateLogicQA: Benchmarking Temporal Biases in Large Language Models

17 December 2024·2927 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Aberdeen

DateLogicQA: LLM의 시간적 추론 편향 벤치마크 제시! 토큰화, 표상 및 논리 수준 편향 분석으로 시간적 데이터 처리 개선 방안 제시!

ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers

17 December 2024·1484 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tongyi Lab

ChatDiT: 제로샷 방식으로 사전 훈련된 확산 변환기를 활용, 자연어로 다양한 시각적 과제 해결!

Wonderland: Navigating 3D Scenes from a Single Image

16 December 2024·2841 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Toronto

단일 이미지로 고품질 3D 장면을 생성하는 효율적이고 확장 가능한 프레임워크

Whisper-GPT: A Hybrid Representation Audio Large Language Model

16 December 2024·1322 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Stanford University

Whisper-GPT: 하이브리드 음성 및 음악 LLM으로, 연속 오디오와 이산 토큰을 결합하여 향상된 성능을 제공합니다.

The Open Source Advantage in Large Language Models (LLMs)

16 December 2024·248 words·2 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Rollins College

오픈소스 LLM, 폐쇄형 LLM 대비 투명성과 접근성은 높지만, 성능은 낮음. 하이브리드 전략이 미래.

StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors

16 December 2024·1741 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Nanjing University

’’ StrandHead: 텍스트만으로 사실적인 3D 헤드 아바타와 섬세한 헤어스타일까지 생성.''

SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models

16 December 2024·3260 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Tsinghua University

Self-play with refinement boosts instruction-following in LLMs.

Sequence Matters: Harnessing Video Models in 3D Super-Resolution

16 December 2024·3903 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Department of Electrical and Computer Engineering, Sungkyunkwan University

비디오 초해상도 모델을 이용한 혁신적인 3D 초해상도 기법으로, 정렬 과정 없이도 최첨단 성능 달성!

SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator

16 December 2024·2998 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Huawei Noah's Ark Lab

SepLLM은 특수 토큰의 중요성을 활용하여 LLM 추론을 가속화하고 긴 시퀀스를 효율적으로 처리합니다.