Arxiv Paper - Video Instruction Tuning With Synthetic Data


Episode Artwork
1.0x
0% played 00:00 00:00
Nov 19 2024 4 mins   3

In this episode, we discuss Video Instruction Tuning With Synthetic Data by Yuanhan Zhang, Jinming Wu, Wei Li, Bo Li, Zejun Ma, Ziwei Liu, Chunyuan Li. The paper proposes a high-quality synthetic dataset, LLaVA-Video-178K, to address the challenge of developing large multimodal video models by improving video instruction-following tasks through detailed captioning and question-answering. Using this dataset and existing tuning data, the authors develop a novel model, LLaVA-Video, which demonstrates strong performance across various video benchmarks. They plan to release the dataset, generation pipeline, and model checkpoints to the public.