Paper Reviews by AI
2024
Progressive Multimodal Reasoning via Active Retrieval
·2635 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Multimodal Reasoning
π’ Gaoling School of Artificial Intelligence, Renmin University of China
AR-MCTS: λ₯λμ κ²μκ³Ό λͺ¬ν
μΉ΄λ₯Όλ‘ νΈλ¦¬ νμμΌλ‘ λ©ν°λͺ¨λ¬ μΆλ‘ ν₯μ
Parallelized Autoregressive Visual Generation
·3557 words·17 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Peking University
λ³Έ μ°κ΅¬λ ν ν° μμ‘΄μ±μ κ³ λ €ν λ³λ ¬ν μ λ΅μ ν΅ν΄ μλ νκ· μκ°μ μμ±μ μλλ₯Ό μ΅λ 9.5λ°°κΉμ§ ν₯μμμΌ°μ΅λλ€.
Outcome-Refining Process Supervision for Code Generation
·2498 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Peking University
볡μ‘ν μκ³ λ¦¬μ¦ μΆλ‘ μ΄ νμν μ½λ μμ± κ³Όμ μμ κΈ°μ‘΄μ νκ³λ₯Ό 극볡νλ μλ‘μ΄ λ°©λ²λ‘ , Outcome-Refining Process Supervision (ORPS) μ μ
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design
·2237 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Microsoft Research
MixLLM: μΆλ ₯ νΉμ§ κ°μ μ μ νΌν© μ λ°λ μμνμ κ³ ν¨μ¨ μμ€ν
μ€κ³λ₯Ό ν΅ν΄ LLMμ μ νλμ ν¨μ¨μ±μ λμμ ν₯μμν€λ νκΈ°μ μΈ μμν λ°©λ²
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval
·2165 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Hong Kong University of Science and Technology
MegaPairsλ VLMκ³Ό κ³΅κ° λλ©μΈ μ΄λ―Έμ§λ₯Ό νμ©, 2600λ§ κ° μ΄μμ κ³ νμ§ λ€μ€ λͺ¨λ¬ νμ΅ λ°μ΄ν°λ₯Ό μμ±νμ¬ λ²μ© λ€μ€ λͺ¨λ¬ κ²μ μ±λ₯μ νκΈ°μ μΌλ‘ ν₯μμμΌ°μ΅λλ€.
LLMs Lost in Translation: M-ALERT uncovers Cross-Linguistic Safety Gaps
·7524 words·36 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ TU Darmstadt
M-ALERTλ λ€κ΅μ΄ LLMμ μμ μ±μ νκ°νκΈ° μν μλ‘μ΄ λ²€μΉλ§ν¬μ
λλ€. μμ΄, νλμ€μ΄, λ
μΌμ΄, μ΄ν리μμ΄, μ€νμΈμ΄ 5κ° μΈμ΄μ 75,000κ° ν둬ννΈλ₯Ό ν¬ν¨νλ©°, λ€μν μΈμ΄ λ° λ²μ£Όμμ LLMμ μμ μ± λΆμΌμΉλ₯Ό λ°νλμ΅λλ€.
LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis
·2184 words·11 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Hong Kong University of Science and Technology
LeviTor: μ¬μ©μμ κ°νΈν 3D κΆ€μ μ
λ ₯λ§μΌλ‘ μ¬μ€μ μΈ λΉλμ€ ν©μ±μ΄ κ°λ₯ν νμ μ μΈ λͺ¨λΈ!
IDOL: Instant Photorealistic 3D Human Creation from a Single Image
·2450 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Tencent
λ¨μΌ μ΄λ―Έμ§μμ μ΄κ³ μ, κ³ νμ§, μ λλ©μ΄μ
κ°λ₯ν 3D μλ°νλ₯Ό μμ±νλ IDOL λͺ¨λΈ μ μ!
How to Synthesize Text Data without Model Collapse?
·5005 words·24 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Tsinghua University
ν©μ± λ°μ΄ν° κΈ°λ° μΈμ΄ λͺ¨λΈ νμ΅μ λΆκ΄΄ λ¬Έμ ν΄κ²°: ν ν° νΈμ§ κΈ°λ² μ μ!
Flowing from Words to Pixels: A Framework for Cross-Modality Evolution
·2904 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ GenAI, Meta
CrossFlow: λͺ¨λ¬λ¦¬ν° κ° μ§μ μ λ³ν κ°λ₯ν νμ μ νλ μμν¬!
Fietje: An open, efficient LLM for Dutch
·2556 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ KU Leuven
Fietje: μ€νμμ€ μν λ€λλλμ΄ LLM 곡κ°!
DI-PCG: Diffusion-based Efficient Inverse Procedural Content Generation for High-quality 3D Asset Creation
·1542 words·8 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Tencent PCG
DI-PCGλ μ΄λ―Έμ§ 쑰건μΌλ‘λΆν° κ³ νμ§ 3D μμ°μ ν¨μ¨μ μΌλ‘ μμ±νκΈ° μν΄ κ²½λνλ νμ° λ³νκΈ° λͺ¨λΈμ νμ©ν νμ μ μΈ μλ°©ν₯ μ μ°¨μ μ½ν
μΈ μμ± λ°©λ²λ‘ μ
λλ€.
AV-Link: Temporally-Aligned Diffusion Features for Cross-Modal Audio-Video Generation
·2525 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Multimodal Generation
π’ Snap Inc
AV-Link: μκ° μ λ ¬ νμ° κΈ°λ₯μ ν΅ν ν¬λ‘μ€ λͺ¨λ¬ μ€λμ€-λΉλμ€ μμ±μ νκΈ°μ μΈ λ°μ !
Affordance-Aware Object Insertion via Mask-Aware Dual Diffusion
·3112 words·15 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Image Generation
π’ Harvard University
Affordance-Aware Object Insertion: λ°°κ²½κ³Ό μ κ²½μ μνΈμμ©μ κ³ λ €ν νμ€μ μΈ μ΄λ―Έμ§ ν©μ± κΈ°μ !
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
·2682 words·13 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ NVIDIA Research
AceMathλ μ¬μ νλ ¨ λ° λ³΄μ λͺ¨λΈλ§μ ν΅ν΄ μ΅μ²¨λ¨ μν μΆλ‘ λ₯λ ₯μ λ¬μ±ν νλ°ν°μ΄κΈ λͺ¨λΈ μ리μ¦μ
λλ€.
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
·4794 words·23 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Visual Question Answering
π’ Stanford University
MLLMμ μκ°-κ³΅κ° μ§λ₯ ν₯μμ λμμ΄ λλ μλ‘μ΄ λΉλμ€ κΈ°λ° λ²€μΉλ§ν¬ VSI-Bench λ°ν!
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
·2422 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Carnegie Mellon University
TheAgentCompany λ²€μΉλ§ν¬λ μ€μ μννΈμ¨μ΄ νμ¬ νκ²½μ λͺ¨λ°©νμ¬ LLM μμ΄μ νΈμ μ€μ μ
무 μν λ₯λ ₯μ νκ°νλ©°, AI μμ΄μ νΈμ νμ€ μΈκ³ μ μ© κ°λ₯μ±κ³Ό νκ³λ₯Ό 보μ¬μ€λλ€.
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
·2449 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ Answer.AI
ModernBERT: λΉ λ₯΄κ³ λ©λͺ¨λ¦¬ ν¨μ¨μ μΈ μ₯λ¬Έ 컨ν
μ€νΈ λ―ΈμΈ μ‘°μ λ° μΆλ‘ μ μν μ΅μ²¨λ¨ μλ°©ν₯ μΈμ½λ!
RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment
·2978 words·14 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Large Language Models
π’ University of Chinese Academy of Sciences
RAG-RewardBench: RAG νκ²½μμ 보μ λͺ¨λΈ νκ°λ₯Ό μν μ΅μ΄μ λ²€μΉλ§ν¬ μ μ!
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
·3901 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Zhejiang University
μ λ ΄ν λΌμ΄λ€ ν둬ννΈλ₯Ό μ¬μ©ν 4K κ³ ν΄μλ μ νν κ³λμ κΉμ΄ μΆμ μ μν μλ‘μ΄ ν¨λ¬λ€μ, Prompt Depth Anything μ μ!