Computer Vision

TransPixar: Advancing Text-to-Video Generation with Transparency

6 January 2025·2013 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Adobe Research

TransPixar: 제한된 데이터로도 고품질 투명 비디오 생성

Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation

6 January 2025·2799 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Meta

마스크 기반 모션 경로를 이용한 2단계 이미지-비디오 생성 프레임워크인 THROUGH-THE-MASK가 다중 객체의 정확한 애니메이션을 가능하게 합니다.

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

6 January 2025·3033 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanjing University

STAR: T2V 모델 기반 실세계 비디오 초고해상도 기술로 현실적인 공간적 세부 정보와 견고한 시간적 일관성을 달성!

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

5 January 2025·2321 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Multimedia Laboratory, the Chinese University of Hong Kong

GS-DiT: 효율적인 3D 점 추적으로 의사 4D 가우스 필드를 활용, 4D 비디오 제어 가능한 혁신적 비디오 생성 모델

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

5 January 2025·2099 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 University of Science and Technology of China (USTC)

DepthMaster는 단일 단계 확산 모델을 이용, 생성적 특징을 활용하여 모노큘러 깊이 추정의 정확도와 속도를 획기적으로 향상시켰습니다.

Ingredients: Blending Custom Photos with Video Diffusion Transformers

3 January 2025·2088 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Kunlun Inc.

고품질 다중 ID 맞춤형 비디오 생성을 위한 혁신적인 프레임워크, Ingredients 소개!

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

2 January 2025·2466 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

VideoAnydoor: 정밀한 모션 제어를 갖춘 고품질 영상 객체 삽입

SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization

2 January 2025·3547 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Action Recognition 🏢 Unmanned System Research Institute, Northwestern Polytechnical University

SeFAR: 제한된 데이터로도 정밀 동작 인식의 성능을 획기적으로 향상시키는 새로운 세미-슈퍼바이즈드 학습 프레임워크!

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

2 January 2025·1984 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanyang Technological University

SeedVR: 무한한 확산 트랜스포머로 일반적인 비디오 복원 향상

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

2 January 2025·2873 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Huazhong University of Science and Technology

고차원 잠재 공간에서의 최적화 딜레마를 해결하는 VA-VAE를 통해, 고해상도 이미지 생성에서 최첨단 성능을 달성!

Nested Attention: Semantic-aware Attention Values for Concept Personalization

2 January 2025·1325 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tel Aviv University

중첩 주의 메커니즘을 사용하여 텍스트-이미지 모델의 개인화 성능을 향상시킨 Nested Attention 기법 제시!

MLLM-as-a-Judge for Image Safety without Human Labeling

31 December 2024·5796 words·28 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Meta AI

인간 라벨링 없이 사전 정의된 안전 규칙을 사용하여 사전 훈련된 다중 모달 대형 언어 모델(MLLM)을 통해 이미지 안전성을 판단하는 새로운 제로샷 방법을 제시합니다.

VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control

30 December 2024·2196 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance Inc

VMix: 크로스 어텐션 믹싱 제어를 통한 텍스트-이미지 확산 모델 개선

Slow Perception: Let's Perceive Geometric Figures Step-by-step

30 December 2024·3207 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Visual Question Answering 🏢 Stepfun

느린 지각(Slow Perception): 단계별 기하학적 도형 인식으로 정확도 향상

LTX-Video: Realtime Video Latent Diffusion

30 December 2024·2625 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Lightricks

LTX-Video: 초고속 실시간 고해상도 비디오 생성 모델

Edicho: Consistent Image Editing in the Wild

30 December 2024·2213 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

Edicho: 이미지 간 일관성 유지하며 제로샷 이미지 편집 가능!

Bringing Objects to Life: 4D generation from 3D objects

29 December 2024·2224 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 NVIDIA

3to4D: 텍스트 프롬프트로 사용자 제공 3D 객체를 실감나게 애니메이션화!

On the Compositional Generalization of Multimodal LLMs for Medical Imaging

28 December 2024·4972 words·24 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Visual Question Answering 🏢 Chinese University of Hong Kong, Shenzhen

의료 영상에 대한 다중 모드 거대 언어 모델의 일반화 능력 향상에 구성적 일반화(CG)가 핵심 역할을 수행하며, 제한된 데이터에서도 효과적임을 밝힘.

VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models

27 December 2024·3812 words·18 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tencent AI Lab

VideoMaker: 영상 확산 모델의 고유한 힘을 이용한 제로샷 맞춤형 영상 생성

PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models

24 December 2024·2572 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Meta AI

PartGen: 다중 뷰 확산 모델을 이용, 텍스트, 이미지, 기존 3D 객체로부터 의미있는 부분으로 구성된 고품질 3D 객체 생성 및 재구성.