HuggingFace 每日AI论文速递

Mar 04 2025 14 mins

Other Episodes

本期的 20 篇论文如下：

[00:21] 🧠 Visual-RFT: Visual Reinforcement Fine-Tuning（视觉强化微调：视觉强化微调）

[01:05] 🌐 Difix3D+: Improving 3D Reconstructions with Single-Step Diffusion Models（Difix3D+：通过单步扩散模型改进三维重建）

[01:43] 🧠 Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs（Phi-4-Mini技术报告：通过LoRA混合的多模态语言模型实现紧凑且强大的性能）

[02:25] 🎥 OneRec: Unifying Retrieve and Rank with Generative Recommender and Iterative Preference Alignment（OneRec：统一生成推荐与迭代偏好对齐）

[03:04] 🤔 When an LLM is apprehensive about its answers -- and when its uncertainty is justified（当LLM对其答案感到不安时——以及何时其不确定性是有道理的）

[03:46] 🎵 DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion（DiffRhythm：基于潜在扩散的超快速且极度简单的端到端全长歌曲生成）

[04:28] 🐯 Liger: Linearizing Large Language Models to Gated Recurrent Structures（Liger：将大型语言模型线性化为门控递归结构）

[05:05] 📊 Qilin: A Multimodal Information Retrieval Dataset with APP-level User Sessions（麒麟：一个包含应用级用户会话的多模态信息检索数据集）

[05:50] 🧠 Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs（实现自我改进推理者的认知行为，或，高效STaRs的四个习惯）

[06:28] ⚡ Speculative Ad-hoc Querying（投机性即席查询）

[07:15] ⚡ DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting（双解码：硬件感知的异构推测解码与动态多序列草稿）

[07:52] 🎨 Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation（Kiss3DGen： repurposing Image Diffusion Models for 3D Asset Generation）

[08:31] 🧠 Word Form Matters: LLMs' Semantic Reconstruction under Typoglycemia（词形重要：LLM在字谜现象下的语义重构）

[09:10] ⚡ From Hours to Minutes: Lossless Acceleration of Ultra Long Sequence Generation up to 100K Tokens（从小时到分钟：超长序列生成的高效加速，最高可达100K tokens）

[09:47] 🔍 Large-Scale Data Selection for Instruction Tuning（大规模数据选择用于指令微调）

[10:26] 🌐 SampleMix: A Sample-wise Pre-training Data Mixing Strategey by Coordinating Data Quality and Diversity（SampleMix：一种协调数据质量和多样性的样本级预训练数据混合策略）

[11:01] 🤖 CodeArena: A Collective Evaluation Platform for LLM Code Generation（CodeArena：面向LLM代码生成的大规模评估平台）

[11:47] 🎥 VideoUFO: A Million-Scale User-Focused Dataset for Text-to-Video Generation（视频UFO：用于文本到视频生成的大规模用户聚焦数据集）

[12:42] 🎙 PodAgent: A Comprehensive Framework for Podcast Generation（PodAgent：播客生成的综合框架）

[13:18] 🏠 Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model（无姿态稀疏视角房间布局重建在预训练模型时代的应用）

【关注我们】

您还可以在以下平台找到我们，获得播客内容以外更多信息

小红书: AI速递

Download episode Share

Copy URL

Subscribe on Podcast Addict

2025.03.04 | 强化视觉推理，提升3D重建质量。

Mar 04 2025 14 mins