Computer Vision
PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models
·2572 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Meta AI
PartGen: λ€μ€ λ·° νμ° λͺ¨λΈμ μ΄μ©, ν
μ€νΈ, μ΄λ―Έμ§, κΈ°μ‘΄ 3D κ°μ²΄λ‘λΆν° μλ―Έμλ λΆλΆμΌλ‘ ꡬμ±λ κ³ νμ§ 3D κ°μ²΄ μμ± λ° μ¬κ΅¬μ±.
DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation
·3181 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Tencent AI Lab
DiTCtrl: νλ μμ΄ λ€μ€ ν둬ννΈλ‘ 맀λλ¬μ΄ μ₯μκ° λΉλμ€ μμ±
DepthLab: From Partial to Complete
·1980 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ HKU
DepthLab: λΆλΆ κΉμ΄ μ λ³΄λ‘ μμ ν 3D μκ° μ 보 볡μ
3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding
·2837 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Scene Understanding
π’ AIRI
3DGraphLLM: μλ―Έλ‘ μ κ·Έλνμ κ±°λ μΈμ΄ λͺ¨λΈμ κ²°ν©νμ¬ 3D μ₯λ©΄ μ΄ν΄ μ±λ₯μ νκΈ°μ μΌλ‘ ν₯μμν¨ μ΅μ²¨λ¨ μ°κ΅¬!
Large Motion Video Autoencoding with Cross-modal Video VAE
·2098 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Hong Kong University of Science and Technology
κ³ νμ§ μμ μμ± λ° ν¨μ¨μ μμΆμ μν νμ μ μΈ ν¬λ‘μ€ λͺ¨λ¬ λΉλμ€ VAE!
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
·3113 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Tsinghua University
λ¨μΌ λ¨κ³ μνλ§μΌλ‘ μ΄λ―Έμ§ μλ νκ· λͺ¨λΈ μλλ₯Ό νκΈ°μ μΌλ‘ ν₯μμν¨ μ¦λ₯ λμ½λ©(DD) κΈ°λ² μ μ!
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage
·2414 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Visual Question Answering
π’ Seoul National University
μ΄μ λ° μ΄λ―Έμ§ μΊ‘μ
μμ±μ νκ° λ¬Έμ ν΄κ²°μ μν΄, LLM-MLLM νμ
κΈ°λ°μ λ€μ€ μμ΄μ νΈ μμ€ν
(CapMAS)μ μ μνμ¬ μ¬μ€μ±κ³Ό ν¬κ΄μ±μ λμμ΅λλ€.
MotiF: Making Text Count in Image Animation with Motion Focal Loss
·2819 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Brown University
MotiF: μμ§μμ μ΄μ μ λ§μΆ μμ€ ν¨μλ‘ ν
μ€νΈ κΈ°λ° μ΄λ―Έμ§ μ λλ©μ΄μ
κ°μ
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
·3581 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ National University of Singapore
CLEAR: μ ννλ μ΄ν
μ
μΌλ‘ κ³ ν΄μλ μ΄λ―Έμ§ μμ± μλλ₯Ό νκΈ°μ μΌλ‘ λμ΄λ€!
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
·2616 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ ETH Zurich
λΉμ§λ νμ΅ κΈ°λ° μν νΈμ§ μΌκ΄μ±(CEC) νμ©, μ§μμ΄ κΈ°λ° μ΄λ―Έμ§ νΈμ§μ μλ‘μ΄ μ§νμ μ΄λ€!
Parallelized Autoregressive Visual Generation
·3557 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Peking University
λ³Έ μ°κ΅¬λ ν ν° μμ‘΄μ±μ κ³ λ €ν λ³λ ¬ν μ λ΅μ ν΅ν΄ μλ νκ· μκ°μ μμ±μ μλλ₯Ό μ΅λ 9.5λ°°κΉμ§ ν₯μμμΌ°μ΅λλ€.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
·2184 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Hong Kong University of Science and Technology
LeviTor: μ¬μ©μμ κ°νΈν 3D κΆ€μ μ
λ ₯λ§μΌλ‘ μ¬μ€μ μΈ λΉλμ€ ν©μ±μ΄ κ°λ₯ν νμ μ μΈ λͺ¨λΈ!
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
·2450 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Tencent
λ¨μΌ μ΄λ―Έμ§μμ μ΄κ³ μ, κ³ νμ§, μ λλ©μ΄μ
κ°λ₯ν 3D μλ°νλ₯Ό μμ±νλ IDOL λͺ¨λΈ μ μ!
DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
·1542 words·8 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Tencent PCG
DI-PCGλ μ΄λ―Έμ§ 쑰건μΌλ‘λΆν° κ³ νμ§ 3D μμ°μ ν¨μ¨μ μΌλ‘ μμ±νκΈ° μν΄ κ²½λνλ νμ° λ³νκΈ° λͺ¨λΈμ νμ©ν νμ μ μΈ μλ°©ν₯ μ μ°¨μ μ½ν
μΈ μμ± λ°©λ²λ‘ μ
λλ€.
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
·3112 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Harvard University
Affordance-Aware Object Insertion: λ°°κ²½κ³Ό μ κ²½μ μνΈμμ©μ κ³ λ €ν νμ€μ μΈ μ΄λ―Έμ§ ν©μ± κΈ°μ !
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
·4794 words·23 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Visual Question Answering
π’ Stanford University
MLLMμ μκ°-κ³΅κ° μ§λ₯ ν₯μμ λμμ΄ λλ μλ‘μ΄ λΉλμ€ κΈ°λ° λ²€μΉλ§ν¬ VSI-Bench λ°ν!
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
·3901 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Zhejiang University
μ λ ΄ν λΌμ΄λ€ ν둬ννΈλ₯Ό μ¬μ©ν 4K κ³ ν΄μλ μ νν κ³λμ κΉμ΄ μΆμ μ μν μλ‘μ΄ ν¨λ¬λ€μ, Prompt Depth Anything μ μ!
PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
·3040 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Dept. ECE, University of Alberta
PixelManμ ν½μ
μ‘°μ λ° μμ±μ ν΅ν΄ νλ ¨ μμ΄λ μΌκ΄μ± μλ κ°μ²΄ νΈμ§μ 16λ¨κ³ λ§μ λ¬μ±νλ νμ μ μΈ νμ° λͺ¨λΈ κΈ°λ° λ°©λ²μ
λλ€.
FashionComposer: Compositional Fashion Image Generation
·2170 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ University of Hong Kong
FashionComposer: λ€μν μ
λ ₯(ν
μ€νΈ, μμ μ΄λ―Έμ§, 3D λͺ¨λΈ)μ νμ©ν΄ μ¬μ€μ μΈ ν¨μ
μ΄λ―Έμ§λ₯Ό ν©μ±νλ νμ μ μΈ νλ μμν¬!
Autoregressive Video Generation without Vector Quantization
·3553 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ BAAI
λ²‘ν° μμν μμ΄λ ν¨μ¨μ μ΄κ³ μ μ°ν μκΈ°νκ· λΉλμ€ μμ± λͺ¨λΈ, NOVA κ°λ°!