๐ข Stanford University
Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation
·4797 words·23 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Question Answering
๐ข Stanford University
AutoConverter๋ ์คํ์๋ ๋ฐฉ์์ VQA ์ง๋ฌธ์ ๋ค์ง์ ๋คํ ์ง๋ฌธ์ผ๋ก ์๋ ๋ณํํ๋ ์์คํ
์
๋๋ค. ์ด๋ฅผ ํตํด VLM(Vision Language Model) ํ๊ฐ์ ๊ฐ๊ด์ฑ๊ณผ ์ฌํ์ฑ์ ๋์ผ ์ ์์ต๋๋ค. ์ฐ๊ตฌ์ง์ AutoConverter๋ฅผ ์ฌ์ฉํ์ฌ 20๊ฐ์ ๊ธฐ์กด VQA ๋ฐ์ดํฐ์
์ ํตํฉํ VMCBench๋ผ๋ ์๋ก์ด ๋ฒค์น๋งํฌ๋ฅผ ๊ตฌ์ถํ์ต๋๋ค. VMCBen…
BoxingGym: Benchmarking Progress in Automated Experimental Design and Model Discovery
·3521 words·17 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Stanford University
BoxingGym: LLM ๊ธฐ๋ฐ ๊ณผํ์ ์์ด์ ํธ์ ์คํ ์ค๊ณ ๋ฐ ๋ชจ๋ธ ๋ฐ๊ฒฌ ๋ฅ๋ ฅ ์ข
ํฉ ํ๊ฐ ๋ฒค์น๋งํฌ
Thinking in Space: How Multimodal Large Language Models See, Remember, and Recall Spaces
·4794 words·23 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Computer Vision
Visual Question Answering
๐ข Stanford University
MLLM์ ์๊ฐ-๊ณต๊ฐ ์ง๋ฅ ํฅ์์ ๋์์ด ๋๋ ์๋ก์ด ๋น๋์ค ๊ธฐ๋ฐ ๋ฒค์น๋งํฌ VSI-Bench ๋ฐํ!
Whisper-GPT: A Hybrid Representation Audio Large Language Model
·1322 words·7 mins·
loading
·
loading
AI Generated
๐ค Daily Papers
Natural Language Processing
Large Language Models
๐ข Stanford University
Whisper-GPT: ํ์ด๋ธ๋ฆฌ๋ ์์ฑ ๋ฐ ์์
LLM์ผ๋ก, ์ฐ์ ์ค๋์ค์ ์ด์ฐ ํ ํฐ์ ๊ฒฐํฉํ์ฌ ํฅ์๋ ์ฑ๋ฅ์ ์ ๊ณตํฉ๋๋ค.