Latent Space: The AI Engineer Podcast

Available Episodes

5 of 145

The RLVR Revolution — with Nathan Lambert (AI2, Interconnects.ai)
Chapters 00:00:00 Welcome and Guest Introduction 00:01:18 Tulu, OVR, and the RLVR Journey 00:03:40 Industry Approaches to Post-Training and Preference Data 00:06:08 Understanding RLVR and Its Impact 00:06:18 Agents, Tool Use, and Training Environments 00:10:34 Open Data, Human Feedback, and Benchmarking 00:12:44 Chatbot Arena, Sycophancy, and Evaluation Platforms 00:15:42 RLHF vs RLVR: Books, Algorithms, and Future Directions 00:17:54 Frontier Models: Reasoning, Hybrid Models, and Data 00:22:11 Search, Retrieval, and Emerging Model Capabilities 00:29:23 Tool Use, Curriculum, and Model Training Challenges 00:38:06 Skills, Planning, and Abstraction in Agent Models 00:46:50 Parallelism, Verifiers, and Scaling Approaches 00:54:33 Overoptimization and Reward Design in RL 01:02:27 Open Models, Personalization, and the Model Spec 01:06:50 Open Model Ecosystem and Infrastructure 01:13:05 Meta, Hardware, and the Future of AI Competition 01:15:42 Building an Open DeepSeek and Closing Thoughts We first had Nathan on to give us his RLHF deep dive when he was joining AI2, and now he’s back to help us catch up on the evolution to RLVR (Reinforcement Learning with Verifiable Rewards), first proposed in his Tulu 3 paper. While RLHF remains foundational, RLVR has emerged as a powerful approach for training models on tasks with clear success criteria and using verifiable, objective functions as reward signals—particularly useful in domains like math, code correctness, and instruction-following. Instead of relying solely on subjective human feedback, RLVR leverages deterministic signals to guide optimization, making it more scalable and potentially more reliable across many domains. However, he notes that RLVR is still rapidly evolving, especially regarding how it handles tool use and multi-step reasoning. We also discussed the Tulu model series, a family of instruction-tuned open models developed at AI2. Tulu is designed to be a reproducible, state-of-the-art post-training recipe for the open community. Unlike frontier labs like OpenAI or Anthropic, which rely on vast and often proprietary datasets, Tulu aims to distill and democratize best practices for instruction and preference tuning. We are impressed with how small eval suites, careful task selection, and transparent methodology can rival even the best proprietary models on specific benchmarks. One of the most fascinating threads is the challenge of incorporating tool use into RL frameworks. Lambert highlights that while you can prompt a model to use tools like search or code execution, getting the model to reliably learn when and how to use them through RL is much harder. This is compounded by the difficulty of designing reward functions that avoid overoptimization—where models learn to “game” the reward signal rather than solve the underlying task. This is particularly problematic in code generation, where models might reward hack unit tests by inserting pass statements instead of correct logic. As models become more agentic and are expected to plan, retrieve, and act across multiple tools, reward design becomes a critical bottleneck. Other topics covered: - The evolution from RLHF (Reinforcement Learning from Human Feedback) to RLVR (Reinforcement Learning from Verifiable Rewards) - The goals and technical architecture of the Tulu models, including the motivation to open-source post-training recipes - Challenges of tool use in RL: verifiability, reward design, and scaling across domains - Evaluation frameworks and the role of platforms like Chatbot Arena and emerging “arena”-style benchmarks - The strategic tension between hybrid reasoning models and unified reasoning models at the frontier - Planning, abstraction, and calibration in reasoning agents and why these concepts matter - The future of open-source AI models, including DeepSeek, OLMo, and the potential for an “American DeepSeek” - The importance of model personality, character tuning, and the model spec paradigm - Overoptimization in RL settings and how it manifests in different domains (control tasks, code, math) - Industry trends in inference-time scaling and model parallelism Finally, the episode closes with a vision for the future of open-source AI. Nathan has now written up his ambition to build an “American DeepSeek”—a fully open, end-to-end reasoning-capable model with transparent training data, tools, and infrastructure. He emphasizes that open-source AI is not just about weights; it’s about releasing recipes, evaluations, and methods that lower the barrier for everyone to build and understand cutting-edge systems. It would seem the
--------
--------
🕰️ The Oral History of Windsurf (ft. Varun Mohan, Scott Wu, Jeff Wang, Kevin Hou, Anshul R)
This is a recap episode that ends with a short fresh interview on the future of Windsurf + Cognition with Jeff Wang and Scott Wu at the end. As the story of Windsurf as an independent company has come to a dramatic close with Google and Cognition, we’re taking this opportunity to look back at our coverage of Windsurf over the last 3 years. Here’s a brief timeline with related links. Jun 2021 - Exafunction founded Oct 2022 - Codeium pivot https://windsurf.com/blog/beta-launch-announcement Dec 2022 - “Copilot for X” https://www.latent.space/p/what-building-copilot-for-x-really Mar 2023 - Codeium first episode, LS episode 2 https://www.latent.space/p/varun-mohan July 2023 - “How to Make AI UX Your Moat" ****https://www.latent.space/p/ai-ux-moat Mar 2024 - Cognition Devin launch https://www.youtube.com/watch?v=fjHtjT7GO1c Jun 2024 - Scott @ AI Engineer https://www.youtube.com/watch?v=T7NWjoD_OuY Jun 2024 - Kevin @ AI Engineer https://www.youtube.com/watch?v=DuZXbinJ4Uc Nov 2024 - “Enterprise Infra Native” https://www.latent.space/p/enterprise Nov 2024 - Windsurf launch, LS Episode https://www.latent.space/p/windsurf Mar 2025 - Kevin Hou @ AI Engineer https://www.youtube.com/watch?v=bVNNvWq6dKo Jun 2025 - Scott @ AI Engineer https://www.youtube.com/watch?v=MI83buT_23o Jun 2025 - Kevin Hou @ AI Engineer https://www.youtube.com/watch?v=JVuNPL5QO8Q Jul 2025 - Jeff + Scott, CogSurf Episode ← new one, released here. We hope this serves as food for thought for students of history, and a reintroduction to the Latent Space extended universe and backlog, for those of you who are new. Welcome! Timestamps [00:02:07] Mar 2024 Codeium @ LS [00:52:36] Mar 2024 Devin Launch Video [00:54:28] Jun 2024 Codeium @ AIE SF [01:12:14] Jun 2024 Cognition @ AIE SF [01:30:53] Nov 2024 Windsurf Launch Video [01:37:16] Nov 2024 Windsurf Launch @ LS [02:43:10] Feb 2025 Windsurf @ AIE NYC [03:03:27] Jun 2025 Cognition @ AIE SF [03:18:50] June 2025 Windsurf @ AIE SF [03:34:23] July 2025 - Cognition + Windsurf Chapters 00:00:00 Mar 2024 Codeium @ LS 00:52:36 Mar 2024 Devin Launch Video 00:54:28 Jun 2024 Codeium @ AIE SF 01:12:14 Jun 2024 Cognition @ AIE SF 01:30:53 Nov 2024 Windsurf Launch Video 01:37:16 Nov 2024 Windsurf Launch @ LS 02:43:10 Feb 2025 Windsurf @ AIE NYC 03:03:27 Jun 2025 Cognition @ AIE SF 03:18:50 June 2025 Windsurf @ AIE SF 03:34:23 July 2025 - Cognition + Windsurf
--------
--------
AI is Eating Search
ChatGPT handles 2.5B prompts/day and is on track to match Google's daily searches by end of 2026. AI agents don't browse like us—they crave queryable, chunkable data for tools like ChatGPT & Perplexity. A new industry is being born, some are calling it AI SEO, others GEO, but what is clear is that it drives amazing results. Businesses are seeing 2-4x higher conversion from visitors coming from AI compared to traditional search. Robert McCloy is the co-founder of Scrunch AI (https://scrunchai.com/), a fast growing company that helps brands and businesses re-write their content on the fly based on what agents are looking for.
--------
56:21
--------
56:21
The Future of Notebooks - with Akshay Agrawal of Marimo
Akshay Agrawal joins us to talk about Marimo and their vision for the future of Python notebooks, and how it’s the perfect canvas for AI-driven data analysis. 0:00 Introduction 0:46 Overview of Marimo and Its Features 2:33 Origin Story and Motivation Behind Marimo 4:26 Demo: Classical Machine Learning with MNIST in Marimo 6:52 Notebook Compatibility and Conversion from Jupyter 7:42 Demo: Interactive Notebook with Custom UI and Layout 10:08 AI-Native Utilities and Code Generation with Language Models 11:36 Dependency Management and Integration with UV Package Manager 13:00 Demo: Data Annotation Workflow Using a PS5 Controller 15:51 Starting from Scratch: Blank Canvas AI Use Cases 18:27 Context Formatting for AI Code Generation 19:54 Chat Interface and Local/Remote Model Support 21:01 WebAssembly Support and MoLab Cloud-Hosted Notebooks 23:21 Future Plans and Breaking Out of Old Notebook Habits 25:40 Running Marimo Notebooks as Scripts or Data Apps 26:44 Exploring AI Agents and Community Contributions 26:56 Call to Action: How to Get Started and Contribute
--------
--------
Cline: the open source coding agent that doesn't cut costs
Saoud Rizwan and Pash from Cline joined us to talk about why fast apply models got bitter lesson'd, how they pioneered the plan + act paradigm for coding, and why non-technical people use IDEs to do marketing and generate slides. Full writeup: https://www.latent.space/p/cline X: https://x.com/latentspacepod Chapters: 00:00 - Introductions 01:35 - Plan and Act Paradigm 05:37 - Model Evaluation and Early Development of Cline 08:14 - Use Cases of Cline Beyond Coding 09:09 - Why Cline is a VS Code Extension and Not a Fork 12:07 - Economic Value of Programming Agents 16:07 - Early Adoption for MCPs 19:35 - Local vs Remote MCP Servers 22:10 - Anthropic's Role in MCP Registry 22:49 - Most Popular MCPs and Their Use Cases 25:26 - Challenges and Future of MCP Monetization 27:32 - Security and Trust Issues with MCPs 28:56 - Alternative History Without MCP 29:43 - Market Positioning of Coding Agents and IDE Integration Matrix 32:57 - Visibility and Autonomy in Coding Agents 35:21 - Evolving Definition of Complexity in Programming Tasks 38:16 - Forks of Cline and Open Source Regrets 40:07 - Simplicity vs Complexity in Agent Design 46:33 - How Fast Apply Got Bitter Lesson'd 49:12 - Cline's Business Model and Bring-Your-Own-API-Key Approach 54:18 - Integration with OpenRouter and Enterprise Infrastructure 55:32 - Impact of Declining Model Costs 57:48 - Background Agents and Multi-Agent Systems 1:00:42 - Vision and Multi-Modalities 1:01:07 - State of Context Engineering 1:07:37 - Memory Systems in Coding Agents 1:10:14 - Standardizing Rules Files Across Agent Tools 1:11:16 - Cline's Personality and Anthropomorphization 1:12:55 - Hiring at Cline and Team Culture
--------
1:15:43
--------
1:15:43

More Business podcasts

Trending Business podcasts

About Latent Space: The AI Engineer Podcast

The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0. We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al. Full show notes always on https://latent.space

Podcast website

Business Technology Entrepreneurship

Listen to Latent Space: The AI Engineer Podcast, money money money and many other podcasts from around the world with the radio.net app

Get the free radio.net app

Stations and podcasts to bookmark
Stream via Wi-Fi or Bluetooth
Supports Carplay & Android Auto
Many other app features

Open app

Get the free radio.net app

Stations and podcasts to bookmark
Stream via Wi-Fi or Bluetooth
Supports Carplay & Android Auto
Many other app features

Latent Space: The AI Engineer Podcast

Scan code,
download the app,
start listening.