Joochan Kim

I am an incoming ECE PhD student at Purdue, starting fall 2026, advised by Ziran Wang. I am currently a research intern at KIST in Seoul, working on Embodied AI, with a focus on Vision-Language-Action models, under the mentorship of Hwasup Lim and Tackgeun You.

I received my M.S. from SNU, advised by Byoung-Tak Zhang. During my master's program, I also interned at SIMTech, A*STAR, where I was mentored by Haiyue Zhu. I received my B.S. from Yonsei University.

Research

I'm interested in Embodied AI, Multimodal AI, and data-centric AI. Most of my research is about developing embodied AI from internet AI. Some papers with highlight are papers with main contributions.

The Losing Winner paper figure
The Losing Winner: An LLM Agent that Predicts the Market but Loses Money
NeurIPS Workshop on Generative AI in Finance, 2025

Fine-tuning an LLM for Bitcoin market state prediction improves accuracy but paradoxically worsens trading returns, exposing the dangers of proxy objectives and reward hacking in financial AI.

Continual Vision-and-Language Navigation paper figure
Continual Vision-and-Language Navigation
Seongjun Jeong, Gi-Cheon Kang, Seongho Choi, Joochan Kim, Byoung-Tak Zhang
BMVC, 2025

We propose Continual Vision-and-Language Navigation (CVLN) paradigm along with two methods for CVLN: Perplexity Replay (PerpR) and Episodic Self-Replay (ESR).

Ordinal bias in action recognition paper figure
Exploring Ordinal Bias in Action Recognition for Instructional Videos
ICLR Workshop on Spurious Correlation and Shortcut Learning, 2025

Ordinal bias leads action recognition models to over-rely on dominant action pairs, inflating performance and lacking true video comprehension even when challenged by action masking and sequence shuffling.

BM-DETR paper figure
Background-aware Moment Detection for Video Moment Retrieval
Minjoon Jung, Youwon Jang, Seongho Choi, Joochan Kim, Jin-Hwa Kim, Byoung-Tak Zhang
WACV, 2025

We propose Background-aware Moment Detection TRansformer (BM-DETR), which carefully adopts a contrastive approach for robust prediction. BM-DETR achieves state-of-the-art performance on various benchmarks while being highly efficient.

VLN-CM paper figure
Zero-Shot Vision-and-Language Navigation with Collision Mitigation in Continuous Environment
Seongjun Jeong, Gi-Cheon Kang, Joochan Kim, Byoung-Tak Zhang
CVPR Workshop on Embodied AI, 2025

We propose the zero-shot Vision-and-Language Navigation with Collision Mitigation (VLN-CM), which takes low-level actions as an output while considering possible collisions.

MPGN paper figure
Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval
Minjoon Jung, Seongho Choi, Joochan Kim, Jin-Hwa Kim, Byoung-Tak Zhang
EMNLP, 2022

We propose a self-supervised learning framework: Modal-specific Pseudo Query Generation Network (MPGN). First, MPGN selects candidate temporal moments via subtitle-based moment sampling. Then, it generates pseudo queries exploiting both visual and textual information from the selected temporal moments.

Miscellanea

Teaching
Teaching Assistant, M1522.000300 Spring 2023