Mar 04 2025 14 mins
本期的 20 篇论文如下:
[00:21] 🧠 Visual-RFT: Visual Reinforcement Fine-Tuning(视觉强化微调:视觉强化微调)
[01:05] 🌐 Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models(Difix3D+:通过单步扩散模型改进三维重建)
[01:43] 🧠 Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs(Phi-4-Mini技术报告:通过LoRA混合的多模态语言模型实现紧凑且强大的性能)
[02:25] 🎥 OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment(OneRec:统一生成推荐与迭代偏好对齐)
[03:04] 🤔 When an LLM is apprehensive about its answers -- and when its uncertainty is justified(当LLM对其答案感到不安时——以及何时其不确定性是有道理的)
[03:46] 🎵 DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion(DiffRhythm:基于潜在扩散的超快速且极度简单的端到端全长歌曲生成)
[04:28] 🐯 Liger: Linearizing Large Language Models to Gated Recurrent Structures(Liger:将大型语言模型线性化为门控递归结构)
[05:05] 📊 Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions(麒麟:一个包含应用级用户会话的多模态信息检索数据集)
[05:50] 🧠 Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs(实现自我改进推理者的认知行为,或,高效STaRs的四个习惯)
[06:28] ⚡ Speculative Ad-hoc Querying(投机性即席查询)
[07:15] ⚡ DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting(双解码:硬件感知的异构推测解码与动态多序列草稿)
[07:52] 🎨 Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation(Kiss3DGen: repurposing Image Diffusion Models for 3D Asset Generation)
[08:31] 🧠 Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia(词形重要:LLM在字谜现象下的语义重构)
[09:10] ⚡ From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens(从小时到分钟:超长序列生成的高效加速,最高可达100K tokens)
[09:47] 🔍 Large-Scale Data Selection for Instruction Tuning(大规模数据选择用于指令微调)
[10:26] 🌐 SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity(SampleMix:一种协调数据质量和多样性的样本级预训练数据混合策略)
[11:01] 🤖 CodeArena: A Collective Evaluation Platform for LLM Code Generation(CodeArena:面向LLM代码生成的大规模评估平台)
[11:47] 🎥 VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation(视频UFO:用于文本到视频生成的大规模用户聚焦数据集)
[12:42] 🎙 PodAgent: A Comprehensive Framework for Podcast Generation(PodAgent:播客生成的综合框架)
[13:18] 🏠 Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model(无姿态稀疏视角房间布局重建在预训练模型时代的应用)

【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递