Paper Reviews by AI

Large Action Models: From Inception to Implementation

13 December 2024·2067 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft

LLM에서 LAM으로: 실제 작업을 수행하는 AI 에이전트 구축.

Efficient Generative Modeling with Residual Vector Quantization-Based Tokens

13 December 2024·2277 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 NVIDIA Research

ResGen, 고품질 생성과 빠른 샘플링 속도를 모두 달성하는 효율적인 RVQ 기반 생성 모델.

Byte Latent Transformer: Patches Scale Better Than Tokens

13 December 2024·3839 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Washington

BLT: 바이트 기반 LLM, 토큰보다 패치 우선.

BrushEdit: All-In-One Image Inpainting and Editing

13 December 2024·3188 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Peking University

BrushEdit: All-in-One Image Inpainting & Editing.

Apollo: An Exploration of Video Understanding in Large Multimodal Models

13 December 2024·1707 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Meta GenAI

Apollo: 대규모 멀티모달 모델의 비디오 이해를 위한 심층 탐구.

SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding

12 December 2024·3268 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University

SynerGen-VL: 간단한 구조로 이미지 이해 및 생성을 동시에 수행하는 강력한 MLLM.

Phi-4 Technical Report

12 December 2024·2236 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Microsoft Research

Phi-4: 140억 매개변수 언어 모델은 데이터 품질에 중점을 둔 훈련 레시피로 개발되어 추론 능력을 대폭 향상시켰습니다.

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

12 December 2024·2344 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Multimodal Generation 🏢 University of Edinburgh

VMB는 텍스트 및 음악 브리지를 활용하여 멀티모달 음악 생성을 위한 새롭고 제어 가능한 프레임워크를 제시합니다.

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

12 December 2024·3354 words·16 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 Shanghai Artificial Intelligence Laboratory

InternLM-XComposer2.5-OmniLive: 실시간 스트리밍 비디오 및 오디오 상호작용을 위한 인간의 인지능력을 모방한 혁신적 다중 모드 AI 시스템

InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption

12 December 2024·3493 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Nanjing University

InstanceCap: 인스턴스 인식 구조화 캡션을 통해 텍스트-비디오 생성을 개선합니다.

GReaTer: Gradients over Reasoning Makes Smaller Language Models Strong Prompt Optimizers

12 December 2024·7101 words·34 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Pennsylvania State University

GREATER는 추론에 대한 그레이디언트를 활용하여 소규모 언어 모델의 프롬프트를 최적화하여 대규모 LLM 없이도 성능을 향상시킵니다.

GenEx: Generating an Explorable World

12 December 2024·2180 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Embodied AI 🏢 Johns Hopkins University

GenEx: 단일 이미지로 탐색 가능한 3D 세계 생성.

FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

12 December 2024·1899 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Nanyang Technological University

FreeScale로 튜닝 없이 8K 이미지 생성!

FluxSpace: Disentangled Semantic Editing in Rectified Flow Transformers

12 December 2024·2291 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Virginia Tech

''

TidyBot++: An Open-Source Holonomic Mobile Manipulator for Robot Learning

11 December 2024·1434 words·7 mins· loading · loading

AI Generated 🤗 Daily Papers AI Applications Robotics 🏢 Princeton University

TidyBot++: 저비용, 홀로노믹 이동 조작기 & 핸드폰 텔레오퍼레이션 인터페이스 공개

SmolTulu: Higher Learning Rate to Batch Size Ratios Can Lead to Better Reasoning in SLMs

11 December 2024·2378 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Saudi Data & Artificial Intelligence Authority

Smaller language models reason better with fine-tuned training recipes.

ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation

11 December 2024·3512 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Google

객체 합성의 새 시대: ObjectMate로 튜닝 없이 사실적인 결과를 얻으세요.

Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models

10 December 2024·3977 words·19 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 Shanghai Artificial Intelligence Laboratory

Evaluation Agent: 더 빠르고, 유연하며, 설명 가능한 시각적 생성 모델 평가 프레임워크.

BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities

10 December 2024·2792 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Mohamed Bin Zayed University of Artificial Intelligence

BiMediX2: 아랍어-영어 이중 언어 의료 전문가 LMM 출시!

NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks

28 October 2024·2943 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Alberta

NeuZip dynamically compresses neural network weights, achieving memory-efficient training and inference without performance loss, significantly reducing the memory footprint of large language models.