Image Generation
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation
·2799 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Meta
λ§μ€ν¬ κΈ°λ° λͺ¨μ
κ²½λ‘λ₯Ό μ΄μ©ν 2λ¨κ³ μ΄λ―Έμ§-λΉλμ€ μμ± νλ μμν¬μΈ THROUGH-THE-MASKκ° λ€μ€ κ°μ²΄μ μ νν μ λλ©μ΄μ
μ κ°λ₯νκ² ν©λλ€.
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
·2873 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Huazhong University of Science and Technology
κ³ μ°¨μ μ μ¬ κ³΅κ°μμμ μ΅μ ν λλ λ§λ₯Ό ν΄κ²°νλ VA-VAEλ₯Ό ν΅ν΄, κ³ ν΄μλ μ΄λ―Έμ§ μμ±μμ μ΅μ²¨λ¨ μ±λ₯μ λ¬μ±!
Nested Attention: Semantic-aware Attention Values for Concept Personalization
·1325 words·7 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Tel Aviv University
μ€μ²© μ£Όμ λ©μ»€λμ¦μ μ¬μ©νμ¬ ν
μ€νΈ-μ΄λ―Έμ§ λͺ¨λΈμ κ°μΈν μ±λ₯μ ν₯μμν¨ Nested Attention κΈ°λ² μ μ!
MLLM-as-a-Judge for Image Safety without Human Labeling
·5796 words·28 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Meta AI
μΈκ° λΌλ²¨λ§ μμ΄ μ¬μ μ μλ μμ κ·μΉμ μ¬μ©νμ¬ μ¬μ νλ ¨λ λ€μ€ λͺ¨λ¬ λν μΈμ΄ λͺ¨λΈ(MLLM)μ ν΅ν΄ μ΄λ―Έμ§ μμ μ±μ νλ¨νλ μλ‘μ΄ μ λ‘μ· λ°©λ²μ μ μν©λλ€.
VMix: Improving Text-to-Image Diffusion Model with Cross-Attention Mixing Control
·2196 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ ByteDance Inc
VMix: ν¬λ‘μ€ μ΄ν
μ
λ―Ήμ± μ μ΄λ₯Ό ν΅ν ν
μ€νΈ-μ΄λ―Έμ§ νμ° λͺ¨λΈ κ°μ
Edicho: Consistent Image Editing in the Wild
·2213 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Hong Kong University of Science and Technology
Edicho: μ΄λ―Έμ§ κ° μΌκ΄μ± μ μ§νλ©° μ λ‘μ· μ΄λ―Έμ§ νΈμ§ κ°λ₯!
Bringing Objects to Life: 4D generation from 3D objects
·2224 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ NVIDIA
3to4D: ν
μ€νΈ ν둬ννΈλ‘ μ¬μ©μ μ 곡 3D κ°μ²΄λ₯Ό μ€κ°λκ² μ λλ©μ΄μ
ν!
VideoMaker: Zero-shot Customized Video Generation with the Inherent Force of Video Diffusion Models
·3812 words·18 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Tencent AI Lab
VideoMaker: μμ νμ° λͺ¨λΈμ κ³ μ ν νμ μ΄μ©ν μ λ‘μ· λ§μΆ€ν μμ μμ±
1.58-bit FLUX
·1092 words·6 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ ByteDance
1.58-bit FLUX: 99.5%μ νλΌλ―Έν°λ₯Ό 1.58-bitλ‘ μμννμ¬ λͺ¨λΈ ν¬κΈ° 7.7λ°°, μΆλ‘ λ©λͺ¨λ¦¬ 5.1λ°° κ°μ, κ³ νμ§ μ΄λ―Έμ§ μμ± μ μ§!
Distilled Decoding 1: One-step Sampling of Image Auto-regressive Models with Flow Matching
·3113 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Tsinghua University
λ¨μΌ λ¨κ³ μνλ§μΌλ‘ μ΄λ―Έμ§ μλ νκ· λͺ¨λΈ μλλ₯Ό νκΈ°μ μΌλ‘ ν₯μμν¨ μ¦λ₯ λμ½λ©(DD) κΈ°λ² μ μ!
CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up
·3581 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ National University of Singapore
CLEAR: μ ννλ μ΄ν
μ
μΌλ‘ κ³ ν΄μλ μ΄λ―Έμ§ μμ± μλλ₯Ό νκΈ°μ μΌλ‘ λμ΄λ€!
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency
·2616 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ ETH Zurich
λΉμ§λ νμ΅ κΈ°λ° μν νΈμ§ μΌκ΄μ±(CEC) νμ©, μ§μμ΄ κΈ°λ° μ΄λ―Έμ§ νΈμ§μ μλ‘μ΄ μ§νμ μ΄λ€!
Parallelized Autoregressive Visual Generation
·3557 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Peking University
λ³Έ μ°κ΅¬λ ν ν° μμ‘΄μ±μ κ³ λ €ν λ³λ ¬ν μ λ΅μ ν΅ν΄ μλ νκ· μκ°μ μμ±μ μλλ₯Ό μ΅λ 9.5λ°°κΉμ§ ν₯μμμΌ°μ΅λλ€.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
·2184 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Hong Kong University of Science and Technology
LeviTor: μ¬μ©μμ κ°νΈν 3D κΆ€μ μ
λ ₯λ§μΌλ‘ μ¬μ€μ μΈ λΉλμ€ ν©μ±μ΄ κ°λ₯ν νμ μ μΈ λͺ¨λΈ!
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
·3112 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Harvard University
Affordance-Aware Object Insertion: λ°°κ²½κ³Ό μ κ²½μ μνΈμμ©μ κ³ λ €ν νμ€μ μΈ μ΄λ―Έμ§ ν©μ± κΈ°μ !
PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
·3040 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Dept. ECE, University of Alberta
PixelManμ ν½μ
μ‘°μ λ° μμ±μ ν΅ν΄ νλ ¨ μμ΄λ μΌκ΄μ± μλ κ°μ²΄ νΈμ§μ 16λ¨κ³ λ§μ λ¬μ±νλ νμ μ μΈ νμ° λͺ¨λΈ κΈ°λ° λ°©λ²μ
λλ€.
FashionComposer: Compositional Fashion Image Generation
·2170 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ University of Hong Kong
FashionComposer: λ€μν μ
λ ₯(ν
μ€νΈ, μμ μ΄λ―Έμ§, 3D λͺ¨λΈ)μ νμ©ν΄ μ¬μ€μ μΈ ν¨μ
μ΄λ―Έμ§λ₯Ό ν©μ±νλ νμ μ μΈ νλ μμν¬!
Autoregressive Video Generation without Vector Quantization
·3553 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ BAAI
λ²‘ν° μμν μμ΄λ ν¨μ¨μ μ΄κ³ μ μ°ν μκΈ°νκ· λΉλμ€ μμ± λͺ¨λΈ, NOVA κ°λ°!
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers
·1484 words·7 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Tongyi Lab
ChatDiT: μ λ‘μ· λ°©μμΌλ‘ μ¬μ νλ ¨λ νμ° λ³νκΈ°λ₯Ό νμ©, μμ°μ΄λ‘ λ€μν μκ°μ κ³Όμ ν΄κ²°!
Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models
·3489 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Inha University
μ€μκ° μ΄λ―Έμ§ 보νΈ, λ₯νμ΄ν¬ λλΉμ±
.