Skip to main content

Paper Reviews by AI

2024

PixelMan: Consistent Object Editing with Diffusion Models via Pixel Manipulation and Generation
·3040 words·15 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Image Generation ๐Ÿข Dept. ECE, University of Alberta
PixelMan์€ ํ”ฝ์…€ ์กฐ์ž‘ ๋ฐ ์ƒ์„ฑ์„ ํ†ตํ•ด ํ›ˆ๋ จ ์—†์ด๋„ ์ผ๊ด€์„ฑ ์žˆ๋Š” ๊ฐ์ฒด ํŽธ์ง‘์„ 16๋‹จ๊ณ„ ๋งŒ์— ๋‹ฌ์„ฑํ•˜๋Š” ํ˜์‹ ์ ์ธ ํ™•์‚ฐ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค.
LLaVA-UHD v2: an MLLM Integrating High-Resolution Feature Pyramid via Hierarchical Window Transformer
·3363 words·16 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Vision-Language Models ๐Ÿข Tsinghua University
LLaVA-UHD v2๋Š” ๊ณ„์ธต์  ์œˆ๋„์šฐ ๋ณ€ํ™˜๊ธฐ๋ฅผ ์ด์šฉ, ๊ณ ํ•ด์ƒ๋„ ํŠน์ง• ํ”ผ๋ผ๋ฏธ๋“œ๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์‹œ๊ฐ์  ์„ธ๋ถ€ ์ •๋ณด๋ฅผ ํฌ์ฐฉํ•˜๋Š” ํ˜์‹ ์ ์ธ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ์–ธ์–ด ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค.
GUI Agents: A Survey
·207 words·1 min· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Human-AI Interaction ๐Ÿข University of Maryland
๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ ๊ธฐ๋ฐ˜ GUI ์—์ด์ „ํŠธ ๊ธฐ์ˆ ์˜ ์ตœ์‹  ๋™ํ–ฅ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ๋ถ„์„ํ•˜๊ณ , ๋ฒค์น˜๋งˆํฌ, ํ‰๊ฐ€ ์ง€ํ‘œ, ์•„ํ‚คํ…์ฒ˜, ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„๋ฅ˜ํ•˜์—ฌ ํ†ตํ•ฉ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
FashionComposer: Compositional Fashion Image Generation
·2170 words·11 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Image Generation ๐Ÿข University of Hong Kong
FashionComposer: ๋‹ค์–‘ํ•œ ์ž…๋ ฅ(ํ…์ŠคํŠธ, ์˜์ƒ ์ด๋ฏธ์ง€, 3D ๋ชจ๋ธ)์„ ํ™œ์šฉํ•ด ์‚ฌ์‹ค์ ์ธ ํŒจ์…˜ ์ด๋ฏธ์ง€๋ฅผ ํ•ฉ์„ฑํ•˜๋Š” ํ˜์‹ ์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ!
Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
·2500 words·12 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Multimodal Learning Vision-Language Models ๐Ÿข Hong Kong University of Science and Technology
์‹œ๊ฐ ์ „๋ฌธ๊ฐ€ ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ์ด๋ฏธ์ง€ ์บก์…˜ ํ–ฅ์ƒ์œผ๋กœ ๋‹ค์ค‘ ๋ชจ๋‹ฌ ๋ชจ๋ธ ์„ฑ๋Šฅ ๊ฐœ์„ 
Autoregressive Video Generation without Vector Quantization
·3553 words·17 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Image Generation ๐Ÿข BAAI
๋ฒกํ„ฐ ์–‘์žํ™” ์—†์ด๋„ ํšจ์œจ์ ์ด๊ณ  ์œ ์—ฐํ•œ ์ž๊ธฐํšŒ๊ท€ ๋น„๋””์˜ค ์ƒ์„ฑ ๋ชจ๋ธ, NOVA ๊ฐœ๋ฐœ!
AntiLeak-Bench: Preventing Data Contamination by Automatically Constructing Benchmarks with Updated Real-World Knowledge
·3149 words·15 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Nanyang Technological University
AntiLeak-Bench: ์ž๋™ํ™”๋œ ๋ฒค์น˜๋งˆํ‚น์œผ๋กœ LLM ๋ฐ์ดํ„ฐ ์˜ค์—ผ ๋ฐฉ์ง€
AniDoc: Animation Creation Made Easier
·1844 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Hong Kong University of Science and Technology
AniDoc: ํฌ์†Œ ์Šค์ผ€์น˜์™€ ์ฐธ์กฐ ์ด๋ฏธ์ง€๋ฅผ ํ™œ์šฉ, 2D ์• ๋‹ˆ๋ฉ”์ด์…˜ ์ž๋™ ์ฑ„์ƒ‰ ๋ฐ ๋ณด๊ฐ„์„ ๊ตฌํ˜„ํ•˜๋Š” ํ˜์‹ ์  AI ๋ชจ๋ธ!
VidTok: A Versatile and Open-Source Video Tokenizer
·2469 words·12 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Microsoft Research
VidTok: ์˜คํ”ˆ์†Œ์Šค ๊ณ ์„ฑ๋Šฅ ๋น„๋””์˜ค ํ† ํฌ๋‚˜์ด์ €๊ฐ€ ์—ฐ์† ๋ฐ ์ด์‚ฐ ํ† ํฐํ™”์—์„œ ์ตœ์ฒจ๋‹จ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, ํšจ์œจ์ ์ธ ํ•™์Šต ์ „๋žต๊ณผ ํ˜์‹ ์ ์ธ ์–‘์žํ™” ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์˜์ƒ ์ƒ์„ฑ ๋ฐ ์ดํ•ด ์—ฐ๊ตฌ์— ์ƒˆ๋กœ์šด ๊ฐ€๋Šฅ์„ฑ์„ ์—ด์—ˆ์Šต๋‹ˆ๋‹ค.
Move-in-2D: 2D-Conditioned Human Motion Generation
·1943 words·10 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Video Understanding ๐Ÿข Adobe Research
Move-in-2D: 2D ์ด๋ฏธ์ง€์™€ ํ…์ŠคํŠธ ํ”„๋กฌํ”„ํŠธ๋กœ ํ˜„์‹ค์ ์ธ ์ธ๊ฐ„ ๋™์ž‘ ์ƒ์„ฑ
Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
·4087 words·20 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers AI Applications Robotics ๐Ÿข Karlsruhe Institute of Technology
MoDE: ํšจ์œจ์ ์ธ ๋‹ค์ค‘ ์ž‘์—… ํ•™์Šต์„ ์œ„ํ•œ ์ „๋ฌธ๊ฐ€ ํ˜ผํ•ฉ ์žก์Œ ์ œ๊ฑฐ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•œ ํ™•์‚ฐ ํŠธ๋žœ์Šคํฌ๋จธ ์ •์ฑ…
DateLogicQA: Benchmarking Temporal Biases in Large Language Models
·2927 words·14 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข University of Aberdeen
DateLogicQA: LLM์˜ ์‹œ๊ฐ„์  ์ถ”๋ก  ํŽธํ–ฅ ๋ฒค์น˜๋งˆํฌ ์ œ์‹œ! ํ† ํฐํ™”, ํ‘œ์ƒ ๋ฐ ๋…ผ๋ฆฌ ์ˆ˜์ค€ ํŽธํ–ฅ ๋ถ„์„์œผ๋กœ ์‹œ๊ฐ„์  ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๊ฐœ์„  ๋ฐฉ์•ˆ ์ œ์‹œ!
ChatDiT: A Training-Free Baseline for Task-Agnostic Free-Form Chatting with Diffusion Transformers
·1484 words·7 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision Image Generation ๐Ÿข Tongyi Lab
ChatDiT: ์ œ๋กœ์ƒท ๋ฐฉ์‹์œผ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จ๋œ ํ™•์‚ฐ ๋ณ€ํ™˜๊ธฐ๋ฅผ ํ™œ์šฉ, ์ž์—ฐ์–ด๋กœ ๋‹ค์–‘ํ•œ ์‹œ๊ฐ์  ๊ณผ์ œ ํ•ด๊ฒฐ!
Wonderland: Navigating 3D Scenes from a Single Image
·2841 words·14 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข University of Toronto
๋‹จ์ผ ์ด๋ฏธ์ง€๋กœ ๊ณ ํ’ˆ์งˆ 3D ์žฅ๋ฉด์„ ์ƒ์„ฑํ•˜๋Š” ํšจ์œจ์ ์ด๊ณ  ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ํ”„๋ ˆ์ž„์›Œํฌ
Whisper-GPT: A Hybrid Representation Audio Large Language Model
·1322 words·7 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Stanford University
Whisper-GPT: ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์Œ์„ฑ ๋ฐ ์Œ์•… LLM์œผ๋กœ, ์—ฐ์† ์˜ค๋””์˜ค์™€ ์ด์‚ฐ ํ† ํฐ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ํ–ฅ์ƒ๋œ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
The Open Source Advantage in Large Language Models (LLMs)
·248 words·2 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Rollins College
์˜คํ”ˆ์†Œ์Šค LLM, ํ์‡„ํ˜• LLM ๋Œ€๋น„ ํˆฌ๋ช…์„ฑ๊ณผ ์ ‘๊ทผ์„ฑ์€ ๋†’์ง€๋งŒ, ์„ฑ๋Šฅ์€ ๋‚ฎ์Œ. ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ „๋žต์ด ๋ฏธ๋ž˜.
StrandHead: Text to Strand-Disentangled 3D Head Avatars Using Hair Geometric Priors
·1741 words·9 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข Nanjing University
’’ StrandHead: ํ…์ŠคํŠธ๋งŒ์œผ๋กœ ์‚ฌ์‹ค์ ์ธ 3D ํ—ค๋“œ ์•„๋ฐ”ํƒ€์™€ ์„ฌ์„ธํ•œ ํ—ค์–ด์Šคํƒ€์ผ๊นŒ์ง€ ์ƒ์„ฑ.''
SPaR: Self-Play with Tree-Search Refinement to Improve Instruction-Following in Large Language Models
·3260 words·16 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Tsinghua University
Self-play with refinement boosts instruction-following in LLMs.
Sequence Matters: Harnessing Video Models in 3D Super-Resolution
·3903 words·19 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Computer Vision 3D Vision ๐Ÿข Department of Electrical and Computer Engineering, Sungkyunkwan University
๋น„๋””์˜ค ์ดˆํ•ด์ƒ๋„ ๋ชจ๋ธ์„ ์ด์šฉํ•œ ํ˜์‹ ์ ์ธ 3D ์ดˆํ•ด์ƒ๋„ ๊ธฐ๋ฒ•์œผ๋กœ, ์ •๋ ฌ ๊ณผ์ • ์—†์ด๋„ ์ตœ์ฒจ๋‹จ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ!
SepLLM: Accelerate Large Language Models by Compressing One Segment into One Separator
·2998 words·15 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข Huawei Noah's Ark Lab
SepLLM์€ ํŠน์ˆ˜ ํ† ํฐ์˜ ์ค‘์š”์„ฑ์„ ํ™œ์šฉํ•˜์—ฌ LLM ์ถ”๋ก ์„ ๊ฐ€์†ํ™”ํ•˜๊ณ  ๊ธด ์‹œํ€€์Šค๋ฅผ ํšจ์œจ์ ์œผ๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.