Computer Vision

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

24 December 2024·2368 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University

단일 이미지에서 객체 방향 추정의 정확도를 크게 높이는 ‘Orient Anything’ 모델 제시!

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

24 December 2024·3181 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tencent AI Lab

DiTCtrl: 튜닝 없이 다중 프롬프트로 매끄러운 장시간 비디오 생성

DepthLab: From Partial to Complete

24 December 2024·1980 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 HKU

DepthLab: 부분 깊이 정보로 완전한 3D 시각 정보 복원

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

24 December 2024·2837 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Scene Understanding 🏢 AIRI

3DGraphLLM: 의미론적 그래프와 거대 언어 모델을 결합하여 3D 장면 이해 성능을 획기적으로 향상시킨 최첨단 연구!

1.58-bit FLUX

24 December 2024·1092 words·6 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

1.58-bit FLUX: 99.5%의 파라미터를 1.58-bit로 양자화하여 모델 크기 7.7배, 추론 메모리 5.1배 감소, 고품질 이미지 생성 유지!

VidTwin: Video VAE with Decoupled Structure and Dynamics

23 December 2024·2381 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Peking University

VidTwin: 구조와 동역학을 분리하여 비디오 압축 및 생성의 새로운 기준을 제시하는 혁신적인 비디오 자동 인코더!

Large Motion Video Autoencoding with Cross-modal Video VAE

23 December 2024·2098 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

고품질 영상 생성 및 효율적 압축을 위한 혁신적인 크로스 모달 비디오 VAE!

Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching

22 December 2024·3113 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Tsinghua University

단일 단계 샘플링으로 이미지 자동 회귀 모델 속도를 획기적으로 향상시킨 증류 디코딩(DD) 기법 제안!

Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage

20 December 2024·2414 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Visual Question Answering 🏢 Seoul National University

초정밀 이미지 캡션 생성의 환각 문제 해결을 위해, LLM-MLLM 협업 기반의 다중 에이전트 시스템(CapMAS)을 제안하여 사실성과 포괄성을 높였습니다.

MotiF: Making Text Count in Image Animation with Motion Focal Loss

20 December 2024·2819 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Brown University

MotiF: 움직임에 초점을 맞춘 손실 함수로 텍스트 기반 이미지 애니메이션 개선

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

20 December 2024·3581 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 National University of Singapore

CLEAR: 선형화된 어텐션으로 고해상도 이미지 생성 속도를 획기적으로 높이다!

UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency

19 December 2024·2616 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ETH Zurich

비지도 학습 기반 순환 편집 일관성(CEC) 활용, 지시어 기반 이미지 편집의 새로운 지평을 열다!

Parallelized Autoregressive Visual Generation

19 December 2024·3557 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

본 연구는 토큰 의존성을 고려한 병렬화 전략을 통해 자동 회귀 시각적 생성의 속도를 최대 9.5배까지 향상시켰습니다.

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

19 December 2024·2184 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Hong Kong University of Science and Technology

LeviTor: 사용자의 간편한 3D 궤적 입력만으로 사실적인 비디오 합성이 가능한 혁신적인 모델!

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

19 December 2024·2450 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tencent

단일 이미지에서 초고속, 고품질, 애니메이션 가능한 3D 아바타를 생성하는 IDOL 모델 제시!

DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation

19 December 2024·1542 words·8 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Tencent PCG

DI-PCG는 이미지 조건으로부터 고품질 3D 자산을 효율적으로 생성하기 위해 경량화된 확산 변환기 모델을 활용한 혁신적인 역방향 절차적 콘텐츠 생성 방법론입니다.

Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion

19 December 2024·3112 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Harvard University

Affordance-Aware Object Insertion: 배경과 전경의 상호작용을 고려한 현실적인 이미지 합성 기술!

Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces

18 December 2024·4794 words·23 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Visual Question Answering 🏢 Stanford University

MLLM의 시각-공간 지능 향상에 도움이 되는 새로운 비디오 기반 벤치마크 VSI-Bench 발표!

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

18 December 2024·3901 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University

저렴한 라이다 프롬프트를 사용한 4K 고해상도 정확한 계량적 깊이 추정을 위한 새로운 패러다임, Prompt Depth Anything 제시!

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation

18 December 2024·3040 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Dept. ECE, University of Alberta

PixelMan은 픽셀 조작 및 생성을 통해 훈련 없이도 일관성 있는 객체 편집을 16단계 만에 달성하는 혁신적인 확산 모델 기반 방법입니다.