Skip to main content

Computer Vision

PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
·2572 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision 3D Vision 🏒 Meta AI
PartGen: 닀쀑 λ·° ν™•μ‚° λͺ¨λΈμ„ 이용, ν…μŠ€νŠΈ, 이미지, κΈ°μ‘΄ 3D κ°μ²΄λ‘œλΆ€ν„° μ˜λ―ΈμžˆλŠ” λΆ€λΆ„μœΌλ‘œ κ΅¬μ„±λœ κ³ ν’ˆμ§ˆ 3D 객체 생성 및 μž¬κ΅¬μ„±.
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
·3181 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Video Understanding 🏒 Tencent AI Lab
DiTCtrl: νŠœλ‹ 없이 닀쀑 ν”„λ‘¬ν”„νŠΈλ‘œ λ§€λ„λŸ¬μš΄ μž₯μ‹œκ°„ λΉ„λ””μ˜€ 생성
DepthLab: From Partial to Complete
·1980 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision 3D Vision 🏒 HKU
DepthLab: λΆ€λΆ„ 깊이 μ •λ³΄λ‘œ μ™„μ „ν•œ 3D μ‹œκ° 정보 볡원
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding
·2837 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Scene Understanding 🏒 AIRI
3DGraphLLM: 의미둠적 κ·Έλž˜ν”„μ™€ κ±°λŒ€ μ–Έμ–΄ λͺ¨λΈμ„ κ²°ν•©ν•˜μ—¬ 3D μž₯λ©΄ 이해 μ„±λŠ₯을 획기적으둜 ν–₯μƒμ‹œν‚¨ μ΅œμ²¨λ‹¨ 연ꡬ!
Large Motion Video Autoencoding with Cross-modal Video VAE
·2098 words·10 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Video Understanding 🏒 Hong Kong University of Science and Technology
κ³ ν’ˆμ§ˆ μ˜μƒ 생성 및 효율적 압좕을 μœ„ν•œ ν˜μ‹ μ μΈ 크둜슀 λͺ¨λ‹¬ λΉ„λ””μ˜€ VAE!
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
·3113 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Tsinghua University
단일 단계 μƒ˜ν”Œλ§μœΌλ‘œ 이미지 μžλ™ νšŒκ·€ λͺ¨λΈ 속도λ₯Ό 획기적으둜 ν–₯μƒμ‹œν‚¨ 증λ₯˜ λ””μ½”λ”©(DD) 기법 μ œμ•ˆ!
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage
·2414 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Visual Question Answering 🏒 Seoul National University
μ΄ˆμ •λ°€ 이미지 μΊ‘μ…˜ μƒμ„±μ˜ ν™˜κ° 문제 해결을 μœ„ν•΄, LLM-MLLM ν˜‘μ—… 기반의 닀쀑 μ—μ΄μ „νŠΈ μ‹œμŠ€ν…œ(CapMAS)을 μ œμ•ˆν•˜μ—¬ 사싀성과 포괄성을 λ†’μ˜€μŠ΅λ‹ˆλ‹€.
MotiF: Making Text Count in Image Animation with Motion Focal Loss
·2819 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Video Understanding 🏒 Brown University
MotiF: μ›€μ§μž„μ— μ΄ˆμ μ„ 맞좘 손싀 ν•¨μˆ˜λ‘œ ν…μŠ€νŠΈ 기반 이미지 μ• λ‹ˆλ©”μ΄μ…˜ κ°œμ„ 
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
·3581 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 National University of Singapore
CLEAR: μ„ ν˜•ν™”λœ μ–΄ν…μ…˜μœΌλ‘œ 고해상도 이미지 생성 속도λ₯Ό 획기적으둜 높이닀!
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
·2616 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 ETH Zurich
비지도 ν•™μŠ΅ 기반 μˆœν™˜ νŽΈμ§‘ 일관성(CEC) ν™œμš©, μ§€μ‹œμ–΄ 기반 이미지 νŽΈμ§‘μ˜ μƒˆλ‘œμš΄ 지평을 μ—΄λ‹€!
Parallelized Autoregressive Visual Generation
·3557 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Peking University
λ³Έ μ—°κ΅¬λŠ” 토큰 μ˜μ‘΄μ„±μ„ κ³ λ €ν•œ 병렬화 μ „λž΅μ„ 톡해 μžλ™ νšŒκ·€ μ‹œκ°μ  μƒμ„±μ˜ 속도λ₯Ό μ΅œλŒ€ 9.5λ°°κΉŒμ§€ ν–₯μƒμ‹œμΌ°μŠ΅λ‹ˆλ‹€.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
·2184 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Hong Kong University of Science and Technology
LeviTor: μ‚¬μš©μžμ˜ κ°„νŽΈν•œ 3D ꢀ적 μž…λ ₯만으둜 사싀적인 λΉ„λ””μ˜€ 합성이 κ°€λŠ₯ν•œ ν˜μ‹ μ μΈ λͺ¨λΈ!
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
·2450 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision 3D Vision 🏒 Tencent
단일 μ΄λ―Έμ§€μ—μ„œ μ΄ˆκ³ μ†, κ³ ν’ˆμ§ˆ, μ• λ‹ˆλ©”μ΄μ…˜ κ°€λŠ₯ν•œ 3D 아바타λ₯Ό μƒμ„±ν•˜λŠ” IDOL λͺ¨λΈ μ œμ‹œ!
DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
·1542 words·8 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision 3D Vision 🏒 Tencent PCG
DI-PCGλŠ” 이미지 μ‘°κ±΄μœΌλ‘œλΆ€ν„° κ³ ν’ˆμ§ˆ 3D μžμ‚°μ„ 효율적으둜 μƒμ„±ν•˜κΈ° μœ„ν•΄ κ²½λŸ‰ν™”λœ ν™•μ‚° λ³€ν™˜κΈ° λͺ¨λΈμ„ ν™œμš©ν•œ ν˜μ‹ μ μΈ μ—­λ°©ν–₯ 절차적 μ½˜ν…μΈ  생성 λ°©λ²•λ‘ μž…λ‹ˆλ‹€.
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
·3112 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Harvard University
Affordance-Aware Object Insertion: λ°°κ²½κ³Ό μ „κ²½μ˜ μƒν˜Έμž‘μš©μ„ κ³ λ €ν•œ ν˜„μ‹€μ μΈ 이미지 ν•©μ„± 기술!
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
·4794 words·23 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Visual Question Answering 🏒 Stanford University
MLLM의 μ‹œκ°-곡간 지λŠ₯ ν–₯상에 도움이 λ˜λŠ” μƒˆλ‘œμš΄ λΉ„λ””μ˜€ 기반 벀치마크 VSI-Bench λ°œν‘œ!
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
·3901 words·19 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision 3D Vision 🏒 Zhejiang University
μ €λ ΄ν•œ 라이닀 ν”„λ‘¬ν”„νŠΈλ₯Ό μ‚¬μš©ν•œ 4K 고해상도 μ •ν™•ν•œ κ³„λŸ‰μ  깊이 좔정을 μœ„ν•œ μƒˆλ‘œμš΄ νŒ¨λŸ¬λ‹€μž„, Prompt Depth Anything μ œμ‹œ!
PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
·3040 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Dept. ECE, University of Alberta
PixelMan은 ν”½μ…€ μ‘°μž‘ 및 생성을 톡해 ν›ˆλ ¨ 없이도 일관성 μžˆλŠ” 객체 νŽΈμ§‘μ„ 16단계 λ§Œμ— λ‹¬μ„±ν•˜λŠ” ν˜μ‹ μ μΈ ν™•μ‚° λͺ¨λΈ 기반 λ°©λ²•μž…λ‹ˆλ‹€.
FashionComposer: Compositional Fashion Image Generation
·2170 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 University of Hong Kong
FashionComposer: λ‹€μ–‘ν•œ μž…λ ₯(ν…μŠ€νŠΈ, μ˜μƒ 이미지, 3D λͺ¨λΈ)을 ν™œμš©ν•΄ 사싀적인 νŒ¨μ…˜ 이미지λ₯Ό ν•©μ„±ν•˜λŠ” ν˜μ‹ μ μΈ ν”„λ ˆμž„μ›Œν¬!
Autoregressive Video Generation without Vector Quantization
·3553 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 BAAI
벑터 μ–‘μžν™” 없이도 효율적이고 μœ μ—°ν•œ μžκΈ°νšŒκ·€ λΉ„λ””μ˜€ 생성 λͺ¨λΈ, NOVA 개발!