【第126期】ICAL:VLM的上下文抽取学习


Episode Artwork
1.0x
0% played 00:00 00:00
Feb 03 2025 17 mins  

Seventy3: 用NotebookLM将论文生成播客,让大家跟着AI一起进步。

今天的主题是:

VLM Agents Generate Their Own Memories: Distilling Experience into Embodied Programs of Thought

Summary

This NeurIPS 2024 paper introduces In-Context Abstraction Learning (ICAL), a method that allows Vision-Language Models (VLMs) to learn from suboptimal demonstrations and human feedback. ICAL generates its own high-quality examples by abstracting noisy trajectories, correcting errors, and annotating cognitive abstractions like causal relationships and subgoals. The resulting examples significantly improve VLM performance on three benchmarks (TEACh, VisualWebArena, and Ego4D), surpassing state-of-the-art results. The paper also explores the efficiency gains and continual learning capabilities of ICAL, showing reduced reliance on human feedback and environment interactions over time. Furthermore, the impact of fine-tuning the VLM on ICAL's learned examples is evaluated.

这篇 NeurIPS 2024 论文介绍了一种名为上下文抽象学习(In-Context Abstraction Learning,简称 ICAL)的方法,该方法使视觉-语言模型(VLMs)能够从不完美的示范和人类反馈中学习。ICAL 通过抽象噪声轨迹、纠正错误,并标注认知抽象(如因果关系和子目标),生成自己的高质量示例。这些生成的示例显著提升了 VLM 在三个基准测试(TEACh、VisualWebArena 和 Ego4D)上的表现,超越了当前的最先进成果。论文还探讨了 ICAL 的效率提升和持续学习能力,显示随着时间的推移,对人类反馈和环境交互的依赖减少。此外,论文还评估了在 ICAL 学到的示例上对 VLM 进行微调的影响。

原文链接:https://arxiv.org/abs/2406.14596