Video Understanding

TransPixar: Advancing Text-to-Video Generation with Transparency

6 January 2025·2013 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Adobe Research

TransPixar: 제한된 데이터로도 고품질 투명 비디오 생성

STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution

6 January 2025·3033 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanjing University

STAR: T2V 모델 기반 실세계 비디오 초고해상도 기술로 현실적인 공간적 세부 정보와 견고한 시간적 일관성을 달성!

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

5 January 2025·2321 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Multimedia Laboratory, the Chinese University of Hong Kong

GS-DiT: 효율적인 3D 점 추적으로 의사 4D 가우스 필드를 활용, 4D 비디오 제어 가능한 혁신적 비디오 생성 모델

Ingredients: Blending Custom Photos with Video Diffusion Transformers

3 January 2025·2088 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Kunlun Inc.

고품질 다중 ID 맞춤형 비디오 생성을 위한 혁신적인 프레임워크, Ingredients 소개!

VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control

2 January 2025·2466 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

VideoAnydoor: 정밀한 모션 제어를 갖춘 고품질 영상 객체 삽입

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

2 January 2025·1984 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanyang Technological University

SeedVR: 무한한 확산 트랜스포머로 일반적인 비디오 복원 향상

LTX-Video: Realtime Video Latent Diffusion

30 December 2024·2625 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Lightricks

LTX-Video: 초고속 실시간 고해상도 비디오 생성 모델

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

24 December 2024·3181 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tencent AI Lab

DiTCtrl: 튜닝 없이 다중 프롬프트로 매끄러운 장시간 비디오 생성

VidTwin: Video VAE with Decoupled Structure and Dynamics

23 December 2024·2381 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Peking University

VidTwin: 구조와 동역학을 분리하여 비디오 압축 및 생성의 새로운 기준을 제시하는 혁신적인 비디오 자동 인코더!

Large Motion Video Autoencoding with Cross-modal Video VAE

23 December 2024·2098 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

고품질 영상 생성 및 효율적 압축을 위한 혁신적인 크로스 모달 비디오 VAE!

MotiF: Making Text Count in Image Animation with Motion Focal Loss

20 December 2024·2819 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Brown University

MotiF: 움직임에 초점을 맞춘 손실 함수로 텍스트 기반 이미지 애니메이션 개선

AniDoc: Animation Creation Made Easier

18 December 2024·1844 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

AniDoc: 희소 스케치와 참조 이미지를 활용, 2D 애니메이션 자동 채색 및 보간을 구현하는 혁신적 AI 모델!

VidTok: A Versatile and Open-Source Video Tokenizer

17 December 2024·2469 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Microsoft Research

VidTok: 오픈소스 고성능 비디오 토크나이저가 연속 및 이산 토큰화에서 최첨단 성능을 달성하며, 효율적인 학습 전략과 혁신적인 양자화 기법을 통해 영상 생성 및 이해 연구에 새로운 가능성을 열었습니다.

Move-in-2D: 2D-Conditioned Human Motion Generation

17 December 2024·1943 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Adobe Research

Move-in-2D: 2D 이미지와 텍스트 프롬프트로 현실적인 인간 동작 생성

LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

13 December 2024·3571 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Princeton University

LinGen: 분 단위 고해상도 텍스트-투-비디오 생성, 선형 계산 복잡도로 효율성 극대화

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

12 December 2024·3493 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanjing University

InstanceCap: 인스턴스 인식 구조화 캡션을 통해 텍스트-비디오 생성을 개선합니다.

Background-aware Moment Detection for Video Moment Retrieval

5 June 2023·2175 words·11 mins· loading · loading

AI Generated Computer Vision Video Understanding 🏢 Seoul National University

BM-DETR: 배경 정보 활용으로 비디오 순간 검색의 약한 정렬 문제 해결!