Image Generation

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

6 January 2025·2799 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Meta

마스크 기반 모션 경로를 이용한 2단계 이미지-비디오 생성 프레임워크인 THROUGH-THE-MASK가 다중 객체의 정확한 애니메이션을 가능하게 합니다.

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

2 January 2025·2873 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Huazhong University of Science and Technology

고차원 잠재 공간에서의 최적화 딜레마를 해결하는 VA-VAE를 통해, 고해상도 이미지 생성에서 최첨단 성능을 달성!

Nested Attention: Semantic-aware Attention Values for Concept Personalization

2 January 2025·1325 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tel Aviv University

중첩 주의 메커니즘을 사용하여 텍스트-이미지 모델의 개인화 성능을 향상시킨 Nested Attention 기법 제시!

MLLM-as-a-Judge for Image Safety without Human Labeling

31 December 2024·5796 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Meta AI

인간 라벨링 없이 사전 정의된 안전 규칙을 사용하여 사전 훈련된 다중 모달 대형 언어 모델(MLLM)을 통해 이미지 안전성을 판단하는 새로운 제로샷 방법을 제시합니다.

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

30 December 2024·2196 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Inc

VMix: 크로스 어텐션 믹싱 제어를 통한 텍스트-이미지 확산 모델 개선

Edicho: Consistent Image Editing in the Wild

30 December 2024·2213 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

Edicho: 이미지 간 일관성 유지하며 제로샷 이미지 편집 가능!

Bringing Objects to Life: 4D generation from 3D objects

29 December 2024·2224 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NVIDIA

3to4D: 텍스트 프롬프트로 사용자 제공 3D 객체를 실감나게 애니메이션화!

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

27 December 2024·3812 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tencent AI Lab

VideoMaker: 영상 확산 모델의 고유한 힘을 이용한 제로샷 맞춤형 영상 생성

1.58-bit FLUX

24 December 2024·1092 words·6 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

1.58-bit FLUX: 99.5%의 파라미터를 1.58-bit로 양자화하여 모델 크기 7.7배, 추론 메모리 5.1배 감소, 고품질 이미지 생성 유지!

Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching

22 December 2024·3113 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University

단일 단계 샘플링으로 이미지 자동 회귀 모델 속도를 획기적으로 향상시킨 증류 디코딩(DD) 기법 제안!

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

20 December 2024·3581 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 National University of Singapore

CLEAR: 선형화된 어텐션으로 고해상도 이미지 생성 속도를 획기적으로 높이다!

UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency

19 December 2024·2616 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ETH Zurich

비지도 학습 기반 순환 편집 일관성(CEC) 활용, 지시어 기반 이미지 편집의 새로운 지평을 열다!

Parallelized Autoregressive Visual Generation

19 December 2024·3557 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

본 연구는 토큰 의존성을 고려한 병렬화 전략을 통해 자동 회귀 시각적 생성의 속도를 최대 9.5배까지 향상시켰습니다.

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

19 December 2024·2184 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

LeviTor: 사용자의 간편한 3D 궤적 입력만으로 사실적인 비디오 합성이 가능한 혁신적인 모델!

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

19 December 2024·3112 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Harvard University

Affordance-Aware Object Insertion: 배경과 전경의 상호작용을 고려한 현실적인 이미지 합성 기술!

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation

18 December 2024·3040 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Dept. ECE, University of Alberta

PixelMan은 픽셀 조작 및 생성을 통해 훈련 없이도 일관성 있는 객체 편집을 16단계 만에 달성하는 혁신적인 확산 모델 기반 방법입니다.

FashionComposer: Compositional Fashion Image Generation

18 December 2024·2170 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 University of Hong Kong

FashionComposer: 다양한 입력(텍스트, 의상 이미지, 3D 모델)을 활용해 사실적인 패션 이미지를 합성하는 혁신적인 프레임워크!

Autoregressive Video Generation without Vector Quantization

18 December 2024·3553 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 BAAI

벡터 양자화 없이도 효율적이고 유연한 자기회귀 비디오 생성 모델, NOVA 개발!

ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers

17 December 2024·1484 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tongyi Lab

ChatDiT: 제로샷 방식으로 사전 훈련된 확산 변환기를 활용, 자연어로 다양한 시각적 과제 해결!

Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models

16 December 2024·3489 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Inha University

실시간 이미지 보호, 딥페이크 대비책.