Skip to main content

๐Ÿข University of Texas at Austin

Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding
·3211 words·16 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข University of Texas at Austin
TAPE(conTextualized equivAriant Position Embedding) ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ๋ฌธ๋งฅ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•œ ๋™์  ์œ„์น˜ ์ธ์ฝ”๋”ฉ์œผ๋กœ ํŠธ๋žœ์Šคํฌ๋จธ์˜ ์œ„์น˜ ๊ธฐ๋ฐ˜ ์ฃผ์†Œ ์ง€์ • ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค.
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing
·2638 words·13 mins· loading · loading
AI Generated ๐Ÿค— Daily Papers Natural Language Processing Large Language Models ๐Ÿข University of Texas at Austin
์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์˜ ์žฅ๊ธฐ ์˜์กด์„ฑ์„ ๋ชจ๋ธ๋งํ•˜๋Š” ๊ตฌ์กฐ์  ์ƒํƒœ ๊ณต๊ฐ„ ๋ชจ๋ธ(SSM)์˜ ํ•œ๊ณ„๋ฅผ ๊ทน๋ณต! ์ตœ์‹  ์—ฐ๊ตฌ์—์„œ SSM์˜ ์ตœ๊ทผ ํŽธํ–ฅ(recency bias) ๋ฐ ๊ณผ๋„ํ•œ ํ‰ํ™œํ™”(over-smoothing) ๋ฌธ์ œ๋ฅผ ๊ทœ๋ช…ํ•˜๊ณ , ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๋Š” **๊ทน์„ฑํ™” ๊ธฐ๋ฒ•(polarization)**์„ ์ œ์‹œํ•˜์—ฌ ์žฅ๊ธฐ ํ† ํฐ ์ƒ๊ด€๊ด€๊ณ„ ์ •ํ™•๋„๋ฅผ ๋†’์˜€์Šต๋‹ˆ๋‹ค.