Skip to main content

Visual Question Answering

Slow Perception: Let's Perceive Geometric Figures Step-by-step
·3207 words·16 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Visual Question Answering 🏒 Stepfun
느린 지각(Slow Perception): 단계별 κΈ°ν•˜ν•™μ  λ„ν˜• μΈμ‹μœΌλ‘œ 정확도 ν–₯상
On the Compositional Generalization of Multimodal LLMs for Medical Imaging
·4972 words·24 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Visual Question Answering 🏒 Chinese University of Hong Kong, Shenzhen
의료 μ˜μƒμ— λŒ€ν•œ 닀쀑 λͺ¨λ“œ κ±°λŒ€ μ–Έμ–΄ λͺ¨λΈμ˜ μΌλ°˜ν™” λŠ₯λ ₯ ν–₯상에 ꡬ성적 μΌλ°˜ν™”(CG)κ°€ 핡심 역할을 μˆ˜ν–‰ν•˜λ©°, μ œν•œλœ λ°μ΄ν„°μ—μ„œλ„ νš¨κ³Όμ μž„μ„ 밝힘.
Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage
·2414 words·12 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Visual Question Answering 🏒 Seoul National University
μ΄ˆμ •λ°€ 이미지 μΊ‘μ…˜ μƒμ„±μ˜ ν™˜κ° 문제 해결을 μœ„ν•΄, LLM-MLLM ν˜‘μ—… 기반의 닀쀑 μ—μ΄μ „νŠΈ μ‹œμŠ€ν…œ(CapMAS)을 μ œμ•ˆν•˜μ—¬ 사싀성과 포괄성을 λ†’μ˜€μŠ΅λ‹ˆλ‹€.
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
·4794 words·23 mins· loading · loading
AI Generated πŸ€— Daily Papers Computer Vision Visual Question Answering 🏒 Stanford University
MLLM의 μ‹œκ°-곡간 지λŠ₯ ν–₯상에 도움이 λ˜λŠ” μƒˆλ‘œμš΄ λΉ„λ””μ˜€ 기반 벀치마크 VSI-Bench λ°œν‘œ!