Paper Reviews by AI
2024
PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
·3040 words·15 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Image Generation
๐ข Dept. ECE, University of Alberta
PixelMan์ ํฝ์
์กฐ์ ๋ฐ ์์ฑ์ ํตํด ํ๋ จ ์์ด๋ ์ผ๊ด์ฑ ์๋ ๊ฐ์ฒด ํธ์ง์ 16๋จ๊ณ ๋ง์ ๋ฌ์ฑํ๋ ํ์ ์ ์ธ ํ์ฐ ๋ชจ๋ธ ๊ธฐ๋ฐ ๋ฐฉ๋ฒ์
๋๋ค.
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
·3363 words·16 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Vision-Language Models
๐ข Tsinghua University
LLaVA-UHD v2๋ ๊ณ์ธต์ ์๋์ฐ ๋ณํ๊ธฐ๋ฅผ ์ด์ฉ, ๊ณ ํด์๋ ํน์ง ํผ๋ผ๋ฏธ๋๋ฅผ ํตํฉํ์ฌ ๋ค์ํ ์๊ฐ์ ์ธ๋ถ ์ ๋ณด๋ฅผ ํฌ์ฐฉํ๋ ํ์ ์ ์ธ ๋ค์ค ๋ชจ๋ฌ ์ธ์ด ๋ชจ๋ธ์
๋๋ค.
GUI Agents: A Survey
·207 words·1 min·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Human-AI Interaction
๐ข University of Maryland
๋๊ท๋ชจ ์ธ์ด ๋ชจ๋ธ ๊ธฐ๋ฐ GUI ์์ด์ ํธ ๊ธฐ์ ์ ์ต์ ๋ํฅ์ ์ข
ํฉ์ ์ผ๋ก ๋ถ์ํ๊ณ , ๋ฒค์น๋งํฌ, ํ๊ฐ ์งํ, ์ํคํ
์ฒ, ํ์ต ๋ฐฉ๋ฒ์ ์ฒด๊ณ์ ์ผ๋ก ๋ถ๋ฅํ์ฌ ํตํฉ ํ๋ ์์ํฌ๋ฅผ ์ ์ํฉ๋๋ค.
FashionComposer: Compositional Fashion Image Generation
·2170 words·11 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Image Generation
๐ข University of Hong Kong
FashionComposer: ๋ค์ํ ์
๋ ฅ(ํ
์คํธ, ์์ ์ด๋ฏธ์ง, 3D ๋ชจ๋ธ)์ ํ์ฉํด ์ฌ์ค์ ์ธ ํจ์
์ด๋ฏธ์ง๋ฅผ ํฉ์ฑํ๋ ํ์ ์ ์ธ ํ๋ ์์ํฌ!
Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
·2500 words·12 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Multimodal Learning
Vision-Language Models
๐ข Hong Kong University of Science and Technology
์๊ฐ ์ ๋ฌธ๊ฐ ๋ชจ๋ธ์ ํ์ฉํ ์ด๋ฏธ์ง ์บก์
ํฅ์์ผ๋ก ๋ค์ค ๋ชจ๋ฌ ๋ชจ๋ธ ์ฑ๋ฅ ๊ฐ์
Autoregressive Video Generation without Vector Quantization
·3553 words·17 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Image Generation
๐ข BAAI
๋ฒกํฐ ์์ํ ์์ด๋ ํจ์จ์ ์ด๊ณ ์ ์ฐํ ์๊ธฐํ๊ท ๋น๋์ค ์์ฑ ๋ชจ๋ธ, NOVA ๊ฐ๋ฐ!
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
·3149 words·15 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Nanyang Technological University
AntiLeak-Bench: ์๋ํ๋ ๋ฒค์น๋งํน์ผ๋ก LLM ๋ฐ์ดํฐ ์ค์ผ ๋ฐฉ์ง
AniDoc: Animation Creation Made Easier
·1844 words·9 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Video Understanding
๐ข Hong Kong University of Science and Technology
AniDoc: ํฌ์ ์ค์ผ์น์ ์ฐธ์กฐ ์ด๋ฏธ์ง๋ฅผ ํ์ฉ, 2D ์ ๋๋ฉ์ด์
์๋ ์ฑ์ ๋ฐ ๋ณด๊ฐ์ ๊ตฌํํ๋ ํ์ ์ AI ๋ชจ๋ธ!
VidTok: A Versatile and Open-Source Video Tokenizer
·2469 words·12 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Video Understanding
๐ข Microsoft Research
VidTok: ์คํ์์ค ๊ณ ์ฑ๋ฅ ๋น๋์ค ํ ํฌ๋์ด์ ๊ฐ ์ฐ์ ๋ฐ ์ด์ฐ ํ ํฐํ์์ ์ต์ฒจ๋จ ์ฑ๋ฅ์ ๋ฌ์ฑํ๋ฉฐ, ํจ์จ์ ์ธ ํ์ต ์ ๋ต๊ณผ ํ์ ์ ์ธ ์์ํ ๊ธฐ๋ฒ์ ํตํด ์์ ์์ฑ ๋ฐ ์ดํด ์ฐ๊ตฌ์ ์๋ก์ด ๊ฐ๋ฅ์ฑ์ ์ด์์ต๋๋ค.
Move-in-2D: 2D-Conditioned Human Motion Generation
·1943 words·10 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Video Understanding
๐ข Adobe Research
Move-in-2D: 2D ์ด๋ฏธ์ง์ ํ
์คํธ ํ๋กฌํํธ๋ก ํ์ค์ ์ธ ์ธ๊ฐ ๋์ ์์ฑ
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
·4087 words·20 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
AI Applications
Robotics
๐ข Karlsruhe Institute of Technology
MoDE: ํจ์จ์ ์ธ ๋ค์ค ์์
ํ์ต์ ์ํ ์ ๋ฌธ๊ฐ ํผํฉ ์ก์ ์ ๊ฑฐ๊ธฐ๋ฅผ ์ฌ์ฉํ ํ์ฐ ํธ๋์คํฌ๋จธ ์ ์ฑ
DateLogicQA: Benchmarking Temporal Biases in Large Language Models
·2927 words·14 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข University of Aberdeen
DateLogicQA: LLM์ ์๊ฐ์ ์ถ๋ก ํธํฅ ๋ฒค์น๋งํฌ ์ ์! ํ ํฐํ, ํ์ ๋ฐ ๋
ผ๋ฆฌ ์์ค ํธํฅ ๋ถ์์ผ๋ก ์๊ฐ์ ๋ฐ์ดํฐ ์ฒ๋ฆฌ ๊ฐ์ ๋ฐฉ์ ์ ์!
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers
·1484 words·7 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Image Generation
๐ข Tongyi Lab
ChatDiT: ์ ๋ก์ท ๋ฐฉ์์ผ๋ก ์ฌ์ ํ๋ จ๋ ํ์ฐ ๋ณํ๊ธฐ๋ฅผ ํ์ฉ, ์์ฐ์ด๋ก ๋ค์ํ ์๊ฐ์ ๊ณผ์ ํด๊ฒฐ!
Wonderland: Navigating 3D Scenes from a Single Image
·2841 words·14 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
3D Vision
๐ข University of Toronto
๋จ์ผ ์ด๋ฏธ์ง๋ก ๊ณ ํ์ง 3D ์ฅ๋ฉด์ ์์ฑํ๋ ํจ์จ์ ์ด๊ณ ํ์ฅ ๊ฐ๋ฅํ ํ๋ ์์ํฌ
Whisper-GPT: A Hybrid Representation Audio Large Language Model
·1322 words·7 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Stanford University
Whisper-GPT: ํ์ด๋ธ๋ฆฌ๋ ์์ฑ ๋ฐ ์์
LLM์ผ๋ก, ์ฐ์ ์ค๋์ค์ ์ด์ฐ ํ ํฐ์ ๊ฒฐํฉํ์ฌ ํฅ์๋ ์ฑ๋ฅ์ ์ ๊ณตํฉ๋๋ค.
The Open Source Advantage in Large Language Models (LLMs)
·248 words·2 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Rollins College
์คํ์์ค LLM, ํ์ํ LLM ๋๋น ํฌ๋ช
์ฑ๊ณผ ์ ๊ทผ์ฑ์ ๋์ง๋ง, ์ฑ๋ฅ์ ๋ฎ์. ํ์ด๋ธ๋ฆฌ๋ ์ ๋ต์ด ๋ฏธ๋.
StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors
·1741 words·9 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
3D Vision
๐ข Nanjing University
’’ StrandHead: ํ
์คํธ๋ง์ผ๋ก ์ฌ์ค์ ์ธ 3D ํค๋ ์๋ฐํ์ ์ฌ์ธํ ํค์ด์คํ์ผ๊น์ง ์์ฑ.''
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
·3260 words·16 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Tsinghua University
Self-play with refinement boosts instruction-following in LLMs.
Sequence Matters: Harnessing Video Models in 3D Super-Resolution
·3903 words·19 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
3D Vision
๐ข Department of Electrical and Computer Engineering, Sungkyunkwan University
๋น๋์ค ์ดํด์๋ ๋ชจ๋ธ์ ์ด์ฉํ ํ์ ์ ์ธ 3D ์ดํด์๋ ๊ธฐ๋ฒ์ผ๋ก, ์ ๋ ฌ ๊ณผ์ ์์ด๋ ์ต์ฒจ๋จ ์ฑ๋ฅ ๋ฌ์ฑ!
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
·2998 words·15 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Huawei Noah's Ark Lab
SepLLM์ ํน์ ํ ํฐ์ ์ค์์ฑ์ ํ์ฉํ์ฌ LLM ์ถ๋ก ์ ๊ฐ์ํํ๊ณ ๊ธด ์ํ์ค๋ฅผ ํจ์จ์ ์ผ๋ก ์ฒ๋ฆฌํฉ๋๋ค.