π’ Zhejiang University
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
·3245 words·16 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Multimodal Learning
Vision-Language Models
π’ Zhejiang University
VideoRefer Suiteλ μ κ΅ν 곡κ°-μκ°μ κ°μ²΄ μ΄ν΄λ₯Ό μν μλ‘μ΄ λΉλμ€ LLM(VideoRefer)κ³Ό λκ·λͺ¨ κ³ νμ§ λ°μ΄ν°μ
(VideoRefer-700K), μ’
ν©μ μΈ λ²€μΉλ§ν¬(VideoRefer-Bench)λ₯Ό μ μν©λλ€.
OneKE: A Dockerized Schema-Guided LLM Agent-based Knowledge Extraction System
·304 words·2 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Natural Language Processing
Information Extraction
π’ Zhejiang University
OneKE: λ컀 κΈ°λ°, λ€μ€ μμ΄μ νΈ LLM μ§μ μΆμΆ μμ€ν
μΌλ‘ μΉ, PDFμμ λ€μν λλ©μΈ μ§μ μΆμΆ κ°λ₯
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models
·2368 words·12 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Zhejiang University
λ¨μΌ μ΄λ―Έμ§μμ κ°μ²΄ λ°©ν₯ μΆμ μ μ νλλ₯Ό ν¬κ² λμ΄λ ‘Orient Anything’ λͺ¨λΈ μ μ!
Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation
·3901 words·19 mins·
loading
·
loading
AI Generated
π€ Daily Papers
Computer Vision
3D Vision
π’ Zhejiang University
μ λ ΄ν λΌμ΄λ€ ν둬ννΈλ₯Ό μ¬μ©ν 4K κ³ ν΄μλ μ νν κ³λμ κΉμ΄ μΆμ μ μν μλ‘μ΄ ν¨λ¬λ€μ, Prompt Depth Anything μ μ!