시각 언어 모델 체인 오브 씨토트 추론 향상

발행일: 2025년 6월 5일 오전 12시 00분

Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. However, current training recipes often relying on datasets dominated by short annotations with minimal rationales. In this work, we show that training VLM on short answers leads to poor generalization on reasoning tasks that require more detailed explanations. To address this limitation, we propose a two-stage post-training strategy that extends the usage of short answer data for enhanced CoT reasoning. First, we augment short answers with CoT reasoning generated by…

#인공지능 #컴퓨터과학

출처: Apple

요약번역: 미주투데이 서현진 기자

본 기사에 대한 의견을 공유해주세요.