Paper Reviews by AI

PartGen: Part-level 3D Generation and Reconstruction with Multi-View Diffusion Models

24 December 2024·2572 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Meta AI

PartGen: 다중 뷰 확산 모델을 이용, 텍스트, 이미지, 기존 3D 객체로부터 의미있는 부분으로 구성된 고품질 3D 객체 생성 및 재구성.

Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models

24 December 2024·2368 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 Zhejiang University

단일 이미지에서 객체 방향 추정의 정확도를 크게 높이는 ‘Orient Anything’ 모델 제시!

Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search

24 December 2024·2002 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 Tsinghua University

Mulberry는 집단 몬테 카를로 트리 탐색(CoMCTS)을 이용, 단계적 추론 및 반성 능력을 갖춘 다중 모드 대규모 언어 모델(MLLM)을 개발한 연구입니다.

Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation

24 December 2024·2158 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of Science and Technology of China

Molar: 멀티모달 LLM과 협업 필터링을 결합하여 시퀀셜 추천 성능을 획기적으로 향상시킨 혁신적인 프레임워크!

MMFactory: A Universal Solution Search Engine for Vision-Language Tasks

24 December 2024·2306 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Vision-Language Models 🏢 University of British Columbia

MMFactory: 사용자 맞춤형 비전-언어 작업 솔루션 검색 엔진

How "Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

24 December 2024·1013 words·5 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Machine Translation 🏢 Fondazione Bruno Kessler

실시간 동시 통역 시스템의 현실적인 한계를 규명하고, 표준화된 용어와 체계를 제시하여 연구 발전을 촉진하는 논문.

DiTCtrl: Exploring Attention Control in Multi-Modal Diffusion Transformer for Tuning-Free Multi-Prompt Longer Video Generation

24 December 2024·3181 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Tencent AI Lab

DiTCtrl: 튜닝 없이 다중 프롬프트로 매끄러운 장시간 비디오 생성

DepthLab: From Partial to Complete

24 December 2024·1980 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision 3D Vision 🏢 HKU

DepthLab: 부분 깊이 정보로 완전한 3D 시각 정보 복원

CypherBench: Towards Precise Retrieval over Full-scale Modern Knowledge Graphs in the LLM Era

24 December 2024·2988 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Megagon Labs

본 연구는 대규모 현대 지식 그래프에서 LLM을 이용한 정확한 정보 검색을 위한 새로운 벤치마크인 CypherBench를 제시합니다. 기존의 RDF 기반 지식 그래프는 과도하게 큰 스키마와 리소스 식별자 사용으로 LLM에 비효율적이라는 문제점을 분석합니다. 특히, Wikidata와 같은 현대 지식 그래프는 LLM의 문맥 창 크기를 초과하는 경우가 많습니…

3DGraphLLM: Combining Semantic Graphs and Large Language Models for 3D Scene Understanding

24 December 2024·2837 words·14 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Scene Understanding 🏢 AIRI

3DGraphLLM: 의미론적 그래프와 거대 언어 모델을 결합하여 3D 장면 이해 성능을 획기적으로 향상시킨 최첨단 연구!

1.58-bit FLUX

24 December 2024·1092 words·6 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Image Generation 🏢 ByteDance

1.58-bit FLUX: 99.5%의 파라미터를 1.58-bit로 양자화하여 모델 크기 7.7배, 추론 메모리 5.1배 감소, 고품질 이미지 생성 유지!

YuLan-Mini: An Open Data-efficient Language Model

23 December 2024·3531 words·17 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Renmin University of China

YuLan-Mini: 24억 개 매개변수를 가진 데이터 효율적인 개방형 LLM

WavePulse: Real-time Content Analytics of Radio Livestreams

23 December 2024·2678 words·13 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Information Extraction 🏢 New York University

WavePulse: 실시간 라디오 방송 콘텐츠 분석 프레임워크가 정치적 담론, 미디어 유통, 여론 동향을 실시간 분석하여 정치 과학 및 미디어 연구에 새로운 가능성을 열었습니다.

VidTwin: Video VAE with Decoupled Structure and Dynamics

23 December 2024·2381 words·12 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Peking University

VidTwin: 구조와 동역학을 분리하여 비디오 압축 및 생성의 새로운 기준을 제시하는 혁신적인 비디오 자동 인코더!

SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images

23 December 2024·2234 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Question Answering 🏢 Kyoto University

SBS Figures: 100만 개의 합성 이미지와 QA 쌍으로 사전 학습된, 효율적인 Figure QA 모델!

ResearchTown: Simulator of Human Research Community

23 December 2024·16894 words·80 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 University of Illinois Urbana-Champaign

RESEARCHTOWN: LLM 기반 인간 연구 공동체 시뮬레이터로, 다양한 연구 활동을 현실적으로 모방하며 학제 간 연구 아이디어 생성 가능

PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

23 December 2024·3159 words·15 mins· loading · loading

AI Generated 🤗 Daily Papers Multimodal Learning Human-AI Interaction 🏢 Shanghai Jiao Tong University

PC Agent는 인간의 인지 과정을 AI 에 전이하여 복잡한 디지털 작업을 자동화하는 혁신적인 시스템입니다.

Large Motion Video Autoencoding with Cross-modal Video VAE

23 December 2024·2098 words·10 mins· loading · loading

AI Generated 🤗 Daily Papers Computer Vision Video Understanding 🏢 Hong Kong University of Science and Technology

고품질 영상 생성 및 효율적 압축을 위한 혁신적인 크로스 모달 비디오 VAE!

In Case You Missed It: ARC 'Challenge' Is Not That Challenging

23 December 2024·2275 words·11 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Large Language Models 🏢 Snowflake AI Research

기존 다중 선택 문제 평가 방식의 오류를 지적하고, 모든 옵션을 함께 고려하는 새로운 평가 방식을 제안하여 모델 성능 평가의 정확성을 높였습니다.

Friends-MMC: A Dataset for Multi-modal Multi-party Conversation Understanding

23 December 2024·1812 words·9 mins· loading · loading

AI Generated 🤗 Daily Papers Natural Language Processing Dialogue Systems 🏢 Peking University

Friends-MMC: 방대한 비디오 데이터와 주석을 포함한 새로운 다중 모달 다중 참여 대화 데이터셋을 통해 실제 세계의 대화 이해를 위한 새로운 가능성을 제시합니다!