Skip to main content

Paper Reviews by AI

2024

Progressive Multimodal Reasoning via Active Retrieval
·2635 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Multimodal Reasoning 🏒 Gaoling School of Artificial Intelligence, Renmin University of China
AR-MCTS: λŠ₯동적 검색과 λͺ¬ν…Œ μΉ΄λ₯Όλ‘œ 트리 νƒμƒ‰μœΌλ‘œ λ©€ν‹°λͺ¨λ‹¬ μΆ”λ‘  ν–₯상
Parallelized Autoregressive Visual Generation
·3557 words·17 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Peking University
λ³Έ μ—°κ΅¬λŠ” 토큰 μ˜μ‘΄μ„±μ„ κ³ λ €ν•œ 병렬화 μ „λž΅μ„ 톡해 μžλ™ νšŒκ·€ μ‹œκ°μ  μƒμ„±μ˜ 속도λ₯Ό μ΅œλŒ€ 9.5λ°°κΉŒμ§€ ν–₯μƒμ‹œμΌ°μŠ΅λ‹ˆλ‹€.
Outcome-Refining Process Supervision for Code Generation
·2498 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Peking University
λ³΅μž‘ν•œ μ•Œκ³ λ¦¬μ¦˜ 좔둠이 ν•„μš”ν•œ μ½”λ“œ 생성 κ³Όμ œμ—μ„œ 기쑴의 ν•œκ³„λ₯Ό κ·Ήλ³΅ν•˜λŠ” μƒˆλ‘œμš΄ 방법둠, Outcome-Refining Process Supervision (ORPS) μ œμ‹œ
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design
·2237 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Microsoft Research
MixLLM: 좜λ ₯ νŠΉμ§• κ°„μ˜ μ „μ—­ ν˜Όν•© 정밀도 μ–‘μžν™”μ™€ 고효율 μ‹œμŠ€ν…œ 섀계λ₯Ό 톡해 LLM의 정확도와 νš¨μœ¨μ„±μ„ λ™μ‹œμ— ν–₯μƒμ‹œν‚€λŠ” 획기적인 μ–‘μžν™” 방법
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
·2165 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Vision-Language Models 🏒 Hong Kong University of Science and Technology
MegaPairsλŠ” VLMκ³Ό 곡개 도메인 이미지λ₯Ό ν™œμš©, 2600만 개 μ΄μƒμ˜ κ³ ν’ˆμ§ˆ 닀쀑 λͺ¨λ‹¬ ν•™μŠ΅ 데이터λ₯Ό μƒμ„±ν•˜μ—¬ λ²”μš© 닀쀑 λͺ¨λ‹¬ 검색 μ„±λŠ₯을 획기적으둜 ν–₯μƒμ‹œμΌ°μŠ΅λ‹ˆλ‹€.
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps
·7524 words·36 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 TU Darmstadt
M-ALERTλŠ” λ‹€κ΅­μ–΄ LLM의 μ•ˆμ „μ„±μ„ ν‰κ°€ν•˜κΈ° μœ„ν•œ μƒˆλ‘œμš΄ λ²€μΉ˜λ§ˆν¬μž…λ‹ˆλ‹€. μ˜μ–΄, ν”„λž‘μŠ€μ–΄, 독일어, μ΄νƒˆλ¦¬μ•„μ–΄, μŠ€νŽ˜μΈμ–΄ 5개 μ–Έμ–΄μ˜ 75,000개 ν”„λ‘¬ν”„νŠΈλ₯Ό ν¬ν•¨ν•˜λ©°, λ‹€μ–‘ν•œ μ–Έμ–΄ 및 λ²”μ£Όμ—μ„œ LLM의 μ•ˆμ „μ„± 뢈일치λ₯Ό λ°ν˜€λƒˆμŠ΅λ‹ˆλ‹€.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
·2184 words·11 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Hong Kong University of Science and Technology
LeviTor: μ‚¬μš©μžμ˜ κ°„νŽΈν•œ 3D ꢀ적 μž…λ ₯만으둜 사싀적인 λΉ„λ””μ˜€ 합성이 κ°€λŠ₯ν•œ ν˜μ‹ μ μΈ λͺ¨λΈ!
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
·2450 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision 3D Vision 🏒 Tencent
단일 μ΄λ―Έμ§€μ—μ„œ μ΄ˆκ³ μ†, κ³ ν’ˆμ§ˆ, μ• λ‹ˆλ©”μ΄μ…˜ κ°€λŠ₯ν•œ 3D 아바타λ₯Ό μƒμ„±ν•˜λŠ” IDOL λͺ¨λΈ μ œμ‹œ!
How to Synthesize Text Data without Model Collapse?
·5005 words·24 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Tsinghua University
ν•©μ„± 데이터 기반 μ–Έμ–΄ λͺ¨λΈ ν•™μŠ΅μ˜ λΆ•κ΄΄ 문제 ν•΄κ²°: 토큰 νŽΈμ§‘ 기법 μ œμ‹œ!
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
·2904 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Vision-Language Models 🏒 GenAI, Meta
CrossFlow: λͺ¨λ‹¬λ¦¬ν‹° κ°„ 직접적 λ³€ν™˜ κ°€λŠ₯ν•œ ν˜μ‹ μ  ν”„λ ˆμž„μ›Œν¬!
Fietje: An open, efficient LLM for Dutch
·2556 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 KU Leuven
Fietje: μ˜€ν”ˆμ†ŒμŠ€ μ†Œν˜• λ„€λœλž€λ“œμ–΄ LLM 곡개!
DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
·1542 words·8 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision 3D Vision 🏒 Tencent PCG
DI-PCGλŠ” 이미지 μ‘°κ±΄μœΌλ‘œλΆ€ν„° κ³ ν’ˆμ§ˆ 3D μžμ‚°μ„ 효율적으둜 μƒμ„±ν•˜κΈ° μœ„ν•΄ κ²½λŸ‰ν™”λœ ν™•μ‚° λ³€ν™˜κΈ° λͺ¨λΈμ„ ν™œμš©ν•œ ν˜μ‹ μ μΈ μ—­λ°©ν–₯ 절차적 μ½˜ν…μΈ  생성 λ°©λ²•λ‘ μž…λ‹ˆλ‹€.
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
·2525 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Multimodal Learning Multimodal Generation 🏒 Snap Inc
AV-Link: μ‹œκ°„ μ •λ ¬ ν™•μ‚° κΈ°λŠ₯을 ν†΅ν•œ 크둜슀 λͺ¨λ‹¬ μ˜€λ””μ˜€-λΉ„λ””μ˜€ μƒμ„±μ˜ 획기적인 λ°œμ „!
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
·3112 words·15 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Image Generation 🏒 Harvard University
Affordance-Aware Object Insertion: λ°°κ²½κ³Ό μ „κ²½μ˜ μƒν˜Έμž‘μš©μ„ κ³ λ €ν•œ ν˜„μ‹€μ μΈ 이미지 ν•©μ„± 기술!
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
·2682 words·13 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 NVIDIA Research
AceMathλŠ” 사전 ν›ˆλ ¨ 및 보상 λͺ¨λΈλ§μ„ 톡해 μ΅œμ²¨λ‹¨ μˆ˜ν•™ μΆ”λ‘  λŠ₯λ ₯을 λ‹¬μ„±ν•œ ν”„λŸ°ν‹°μ–΄κΈ‰ λͺ¨λΈ μ‹œλ¦¬μ¦ˆμž…λ‹ˆλ‹€.
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
·4794 words·23 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Visual Question Answering 🏒 Stanford University
MLLM의 μ‹œκ°-곡간 지λŠ₯ ν–₯상에 도움이 λ˜λŠ” μƒˆλ‘œμš΄ λΉ„λ””μ˜€ 기반 벀치마크 VSI-Bench λ°œν‘œ!
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
·2422 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Carnegie Mellon University
TheAgentCompany λ²€μΉ˜λ§ˆν¬λŠ” μ‹€μ œ μ†Œν”„νŠΈμ›¨μ–΄ νšŒμ‚¬ ν™˜κ²½μ„ λͺ¨λ°©ν•˜μ—¬ LLM μ—μ΄μ „νŠΈμ˜ μ‹€μ œ 업무 μˆ˜ν–‰ λŠ₯λ ₯을 ν‰κ°€ν•˜λ©°, AI μ—μ΄μ „νŠΈμ˜ ν˜„μ‹€ 세계 적용 κ°€λŠ₯μ„±κ³Ό ν•œκ³„λ₯Ό λ³΄μ—¬μ€λ‹ˆλ‹€.
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
·2449 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 Answer.AI
ModernBERT: λΉ λ₯΄κ³  λ©”λͺ¨λ¦¬ 효율적인 μž₯λ¬Έ μ»¨ν…μŠ€νŠΈ λ―Έμ„Έ μ‘°μ • 및 좔둠을 μœ„ν•œ μ΅œμ²¨λ‹¨ μ–‘λ°©ν–₯ 인코더!
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
·2978 words·14 mins· loading · loading
AI Generated πŸ€— Daily Papers Natural Language Processing Large Language Models 🏒 University of Chinese Academy of Sciences
RAG-RewardBench: RAG ν™˜κ²½μ—μ„œ 보상 λͺ¨λΈ 평가λ₯Ό μœ„ν•œ 졜초의 벀치마크 μ œμ‹œ!
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
·3901 words·19 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision 3D Vision 🏒 Zhejiang University
μ €λ ΄ν•œ 라이닀 ν”„λ‘¬ν”„νŠΈλ₯Ό μ‚¬μš©ν•œ 4K 고해상도 μ •ν™•ν•œ κ³„λŸ‰μ  깊이 좔정을 μœ„ν•œ μƒˆλ‘œμš΄ νŒ¨λŸ¬λ‹€μž„, Prompt Depth Anything μ œμ‹œ!