Visual Question Answering
Slow Perception: Let's Perceive Geometric Figures Step-by-step
·3207 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Visual Question Answering
π’ Stepfun
λλ¦° μ§κ°(Slow Perception): λ¨κ³λ³ κΈ°ννμ λν μΈμμΌλ‘ μ νλ ν₯μ
On the Compositional Generalization of Multimodal LLMs for Medical Imaging
·4972 words·24 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Visual Question Answering
π’ Chinese University of Hong Kong, Shenzhen
μλ£ μμμ λν λ€μ€ λͺ¨λ κ±°λ μΈμ΄ λͺ¨λΈμ μΌλ°ν λ₯λ ₯ ν₯μμ ꡬμ±μ μΌλ°ν(CG)κ° ν΅μ¬ μν μ μννλ©°, μ νλ λ°μ΄ν°μμλ ν¨κ³Όμ μμ λ°ν.
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage
·2414 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Visual Question Answering
π’ Seoul National University
μ΄μ λ° μ΄λ―Έμ§ μΊ‘μ
μμ±μ νκ° λ¬Έμ ν΄κ²°μ μν΄, LLM-MLLM νμ
κΈ°λ°μ λ€μ€ μμ΄μ νΈ μμ€ν
(CapMAS)μ μ μνμ¬ μ¬μ€μ±κ³Ό ν¬κ΄μ±μ λμμ΅λλ€.
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
·4794 words·23 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
Visual Question Answering
π’ Stanford University
MLLMμ μκ°-κ³΅κ° μ§λ₯ ν₯μμ λμμ΄ λλ μλ‘μ΄ λΉλμ€ κΈ°λ° λ²€μΉλ§ν¬ VSI-Bench λ°ν!