Paper Reviews by AI
2024
Large Action Models: From Inception to Implementation
·2067 words·10 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Microsoft
LLMμμ LAMμΌλ‘: μ€μ μμ
μ μννλ AI μμ΄μ νΈ κ΅¬μΆ.
Efficient Generative Modeling with Residual Vector Quantization-Based Tokens
·2277 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ NVIDIA Research
ResGen, κ³ νμ§ μμ±κ³Ό λΉ λ₯Έ μνλ§ μλλ₯Ό λͺ¨λ λ¬μ±νλ ν¨μ¨μ μΈ RVQ κΈ°λ° μμ± λͺ¨λΈ.
Byte Latent Transformer: Patches Scale Better Than Tokens
·3839 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Washington
BLT: λ°μ΄νΈ κΈ°λ° LLM, ν ν°λ³΄λ€ ν¨μΉ μ°μ .
BrushEdit: All-In-One Image Inpainting and Editing
·3188 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Peking University
BrushEdit: All-in-One Image Inpainting & Editing.
Apollo: An Exploration of Video Understanding in Large Multimodal Models
·1707 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Meta GenAI
Apollo: λκ·λͺ¨ λ©ν°λͺ¨λ¬ λͺ¨λΈμ λΉλμ€ μ΄ν΄λ₯Ό μν μ¬μΈ΅ νꡬ.
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
·3268 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Tsinghua University
SynerGen-VL: κ°λ¨ν κ΅¬μ‘°λ‘ μ΄λ―Έμ§ μ΄ν΄ λ° μμ±μ λμμ μννλ κ°λ ₯ν MLLM.
Phi-4 Technical Report
·2236 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Microsoft Research
Phi-4: 140μ΅ λ§€κ°λ³μ μΈμ΄ λͺ¨λΈμ λ°μ΄ν° νμ§μ μ€μ μ λ νλ ¨ λ μνΌλ‘ κ°λ°λμ΄ μΆλ‘ λ₯λ ₯μ λν ν₯μμμΌ°μ΅λλ€.
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation
·2344 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Multimodal Generation
π’ University of Edinburgh
VMBλ ν
μ€νΈ λ° μμ
λΈλ¦¬μ§λ₯Ό νμ©νμ¬ λ©ν°λͺ¨λ¬ μμ
μμ±μ μν μλ‘κ³ μ μ΄ κ°λ₯ν νλ μμν¬λ₯Ό μ μν©λλ€.
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
·3354 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Human-AI Interaction
π’ Shanghai Artificial Intelligence Laboratory
InternLM-XComposer2.5-OmniLive: μ€μκ° μ€νΈλ¦¬λ° λΉλμ€ λ° μ€λμ€ μνΈμμ©μ μν μΈκ°μ μΈμ§λ₯λ ₯μ λͺ¨λ°©ν νμ μ λ€μ€ λͺ¨λ AI μμ€ν
InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption
·3493 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Video Understanding
π’ Nanjing University
InstanceCap: μΈμ€ν΄μ€ μΈμ ꡬ쑰ν μΊ‘μ
μ ν΅ν΄ ν
μ€νΈ-λΉλμ€ μμ±μ κ°μ ν©λλ€.
GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers
·7101 words·34 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Pennsylvania State University
GREATERλ μΆλ‘ μ λν κ·Έλ μ΄λμΈνΈλ₯Ό νμ©νμ¬ μκ·λͺ¨ μΈμ΄ λͺ¨λΈμ ν둬ννΈλ₯Ό μ΅μ ννμ¬ λκ·λͺ¨ LLM μμ΄λ μ±λ₯μ ν₯μμν΅λλ€.
GenEx: Generating an Explorable World
·2180 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Embodied AI
π’ Johns Hopkins University
GenEx: λ¨μΌ μ΄λ―Έμ§λ‘ νμ κ°λ₯ν 3D μΈκ³ μμ±.
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion
·1899 words·9 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Nanyang Technological University
FreeScaleλ‘ νλ μμ΄ 8K μ΄λ―Έμ§ μμ±!
FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers
·2291 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Virginia Tech
''
TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning
·1434 words·7 mins·
loading
·
loading
AI Generated
π€ Daily Papers
AI Applications
Robotics
π’ Princeton University
TidyBot++: μ λΉμ©, νλ‘λ
Έλ―Ή μ΄λ μ‘°μκΈ° & νΈλν° ν
λ μ€νΌλ μ΄μ
μΈν°νμ΄μ€ 곡κ°
SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs
·2378 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Saudi Data & Artificial Intelligence Authority
Smaller language models reason better with fine-tuned training recipes.
ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation
·3512 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Google
κ°μ²΄ ν©μ±μ μ μλ: ObjectMateλ‘ νλ μμ΄ μ¬μ€μ μΈ κ²°κ³Όλ₯Ό μ»μΌμΈμ.
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models
·3977 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Shanghai Artificial Intelligence Laboratory
Evaluation Agent: λ λΉ λ₯΄κ³ , μ μ°νλ©°, μ€λͺ
κ°λ₯ν μκ°μ μμ± λͺ¨λΈ νκ° νλ μμν¬.
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities
·2792 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Mohamed Bin Zayed University of Artificial Intelligence
BiMediX2: μλμ΄-μμ΄ μ΄μ€ μΈμ΄ μλ£ μ λ¬Έκ° LMM μΆμ!
NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks
·2943 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Alberta
NeuZip dynamically compresses neural network weights, achieving memory-efficient training and inference without performance loss, significantly reducing the memory footprint of large language models.