More info: https://docs.anthropic.com/en/docs/claude-code/overview
The AI coding wars have now split across four battlegrounds:
1. AI IDEs: with two leading startups in Windsurf ($3B acq. by OpenAI) and Cursor ($9B valuation) and a sea of competition behind them (like Cline, Github Copilot, etc).
2. Vibe coding platforms: Bolt.new, Lovable, v0, etc. all experiencing fast growth and getting to the tens of millions of revenue in months.
3. The teammate agents: Devin, Cosine, etc. Simply give them a task, and they will get back to you with a full PR (with mixed results)
4. The cli-based agents: after Aider’s initial success, we are now seeing many other alternatives including two from the main labs: OpenAI Codex and Claude Code. The main draw is that 1) they are composable 2) they are pay as you go based on tokens used.
Since we covered all three of the first categories, today’s guests are Boris and Cat, the lead engineer and PM for Claude Code. If you only take one thing away from this episode, it’s this piece from Boris: Claude Code is not a product as much as it’s a Unix utility.
This fits very well with Anthropic’s product principle: “do the simple thing first.” Whether it’s the memory implementation (a markdown file that gets auto-loaded) or the approach to prompt summarization (just ask Claude to summarize), they always pick the smallest building blocks that are useful, understandable, and extensible. Even major features like planning (“/think”) and memory (#tags in markdown) fit the same idea of having text I/O as the core interface. This is very similar to the original UNIX design philosophy:
Claude Code is also the most direct way to consume Sonnet for coding, rather than going through all the hidden prompting and optimization than the other products do. You will feel that right away, as the average spend per user is $6/day on Claude Code compared to $20/mo for Cursor, for example. Apparently, there are some engineers inside of Anthropic that have spent >$1,000 in one day!
If you’re building AI developer tools, there’s also a lot of alpha on how to design a cli tool, interactive vs non-interactive modes, and how to balance feature creation. Enjoy!
Timestamps
[00:00:00] Intro
[00:01:59] Origins of Claude Code
[00:04:32] Anthropic’s Product Philosophy
[00:07:38] What should go into Claude Code?
[00:09:26] Claude.md and Memory Simplification
[00:10:07] Claude Code vs Aider
[00:11:23] Parallel Workflows and Unix Utility Philosophy
[00:12:51] Cost considerations and pricing model
[00:14:51] Key Features Shipped Since Launch
[00:16:28] Claude Code writes 80% of Claude Code
[00:18:01] Custom Slash Commands and MCP Integration
[00:21:08] Terminal UX and Technical Stack
[00:27:11] Code Review and Semantic Linting
[00:28:33] Non-Interactive Mode and Automation
[00:36:09] Engineering Productivity Metrics
[00:37:47] Balancing Feature Creation and Maintenance
[00:41:59] Memory and the Future of Context
[00:50:10] Sandboxing, Branching, and Agent Planning
[01:01:43] Future roadmap
[01:11:00] Why Anthropic Excels at Developer Tools
--------
1:17:21
⚡️The Rise and Fall of the Vector DB Category
Note from your hosts: we were off this week for ICLR and RSA! This week we’re bringing you one of the top episodes from our lightning podcast series, the shorter format, Youtube-only side podcast we do for breaking news and faster turnaround. Please support our work on YouTube! https://www.youtube.com/playlist?list=PLWEAb1SXhjlc5qgVK4NgehdCzMYCwZtiB
The explosion of embedding-based applications created a new challenge: efficiently storing, indexing, and searching these high-dimensional vectors at scale. This gap gave rise to the vector database category, with companies like Pinecone leading the charge in 2022-2023 by defining specialized infrastructure for vector operations.
The category saw explosive growth following ChatGPT's launch in late 2022, as developers rushed to build AI applications using Retrieval-Augmented Generation (RAG). This surge was partly driven by a widespread misconception that embedding-based similarity search was the only viable method for retrieving context for LLMs!!!
The resulting "vector database gold rush" saw massive investment and attention directed toward vector search infrastructure, even though traditional information retrieval techniques remained equally valuable for many RAG applications.
https://x.com/jobergum/status/1872923872007217309
Chapters
00:00 Introduction to Trondheim and Background
03:03 The Rise and Fall of Vector Databases
06:08 Convergence of Search Technologies
09:04 Embeddings and Their Importance
12:03 Building Effective Search Systems
15:00 RAG Applications and Recommendations
17:55 The Role of Knowledge Graphs
20:49 Future of Embedding Models and Innovations
--------
27:16
Why Every Agent needs Open Source Cloud Sandboxes
Vasek Mlejnsky from E2B joins us today to talk about sandboxes for AI agents. In the last 2 years, E2B has grown from a handful of developers building on it to being used by ~50% of the Fortune 500 and generating millions of sandboxes each week for their customers. As the “death of chat completions” approaches, LLMs workflows and agents are relying more and more on tool usage and multi-modality.
The most common use cases for their sandboxes:
- Run data analysis and charting (like Perplexity)
- Execute arbitrary code generated by the model (like Manus does)
- Running evals on code generation (see LMArena Web)
- Doing reinforcement learning for code capabilities (like HuggingFace)
Timestamps:
00:00:00 Introductions
00:00:37 Origin of DevBook -> E2B
00:02:35 Early Experiments with GPT-3.5 and Building AI Agents
00:05:19 Building an Agent Cloud
00:07:27 Challenges of Building with Early LLMs
00:10:35 E2B Use Cases
00:13:52 E2B Growth vs Models Capabilities
00:15:03 The LLM Operating System (LLMOS) Landscape
00:20:12 Breakdown of JavaScript vs Python Usage on E2B
00:21:50 AI VMs vs Traditional Cloud
00:26:28 Technical Specifications of E2B Sandboxes
00:29:43 Usage-based billing infrastructure
00:34:08 Pricing AI on Value Delivered vs Token Usage
00:36:24 Forking, Checkpoints, and Parallel Execution in Sandboxes
00:39:18 Future Plans for Toolkit and Higher-Level Agent Frameworks
00:42:35 Limitations of Chat-Based Interfaces and the Future of Agents
00:44:00 MCPs and Remote Agent Capabilities
00:49:22 LLMs.txt, scrapers, and bad AI bots
00:53:00 Manus and Computer Use on E2B
00:55:03 E2B for RL with Hugging Face
00:56:58 E2B for Agent Evaluation on LMArena
00:58:12 Long-Term Vision: E2B as Full Lifecycle Infrastructure for LLMs
01:00:45 Future Plans for Hosting and Deployment of LLM-Generated Apps
01:01:15 Why E2B Moved to San Francisco
01:05:49 Open Roles and Hiring Plans at E2B
--------
1:06:38
⚡️GPT 4.1: The New OpenAI Workhorse
We’ll keep this brief because we’re on a tight turnaround: GPT 4.1, previously known as the Quasar and Optimus models, is now live as the natural update for 4o/4o-mini (and the research preview of GPT 4.5). Though it is a general purpose model family, the headline features are:
Coding abilities (o1-level SWEBench and SWELancer, but ok Aider)
Instruction Following (with a very notable prompting guide)
Long Context up to 1m tokens (with new MRCR and Graphwalk benchmarks)
Vision (simply o1 level)
Cheaper Pricing (cheaper than 4o, greatly improved prompt caching savings)
We caught up with returning guest Michelle Pokrass and Josh McGrath to get more detail on each!
Chapters
00:00:00 Introduction and Guest Welcome
00:00:57 GPC 4.1 Launch Overview
00:01:54 Developer Feedback and Model Names
00:02:53 Model Naming and Starry Themes
00:03:49 Confusion Over GPC 4.1 vs 4.5
00:04:47 Distillation and Model Improvements
00:05:45 Omnimodel Architecture and Future Plans
00:06:43 Core Capabilities of GPC 4.1
00:07:40 Training Techniques and Long Context
00:08:37 Challenges in Long Context Reasoning
00:09:34 Context Utilization in Models
00:10:31 Graph Walks and Model Evaluation
00:11:31 Real Life Applications of Graph Tasks
00:12:30 Multi-Hop Reasoning Benchmarks
00:13:30 Agentic Workflows and Backtracking
00:14:28 Graph Traversals for Agent Planning
00:15:24 Context Usage in API and Memory Systems
00:16:21 Model Performance in Long Context Tasks
00:17:17 Instruction Following and Real World Data
00:18:12 Challenges in Grading Instructions
00:19:09 Instruction Following Techniques
00:20:09 Prompting Techniques and Model Responses
00:21:05 Agentic Workflows and Model Persistence
00:22:01 Balancing Persistence and User Control
00:22:56 Evaluations on Model Edits and Persistence
00:23:55 XML vs JSON in Prompting
00:24:50 Instruction Placement in Context
00:25:49 Optimizing for Prompt Caching
00:26:49 Chain of Thought and Reasoning Models
00:27:46 Choosing the Right Model for Your Task
00:28:46 Coding Capabilities of GPC 4.1
00:29:41 Model Performance in Coding Tasks
00:30:39 Understanding Coding Model Differences
00:31:36 Using Smaller Models for Coding
00:32:33 Future of Coding in OpenAI
00:33:28 Internal Use and Success Stories
00:34:26 Vision and Multi-Modal Capabilities
00:35:25 Screen vs Embodied Vision
00:36:22 Vision Benchmarks and Model Improvements
00:37:19 Model Deprecation and GPU Usage
00:38:13 Fine-Tuning and Preference Steering
00:39:12 Upcoming Reasoning Models
00:40:10 Creative Writing and Model Humor
00:41:07 Feedback and Developer Community
00:42:03 Pricing and Blended Model Costs
00:44:02 Conclusion and Wrap-Up
--------
41:52
SF Compute: Commoditizing Compute
Evan Conrad, co-founder of SF Compute, joined us to talk about how they started as an AI lab that avoided bankruptcy by selling GPU clusters, why CoreWeave financials look like a real estate business, and how GPUs are turning into a commodities market.
Chapters:
00:00:05 - Introductions
00:00:12 - Introduction of guest Evan Conrad from SF Compute
00:00:12 - CoreWeave Business Model Discussion
00:05:37 - CoreWeave as a Real Estate Business
00:08:59 - Interest Rate Risk and GPU Market Strategy Framework
00:16:33 - Why Together and DigitalOcean will lose money on their clusters
00:20:37 - SF Compute's AI Lab Origins
00:25:49 - Utilization Rates and Benefits of SF Compute Market Model
00:30:00 - H100 GPU Glut, Supply Chain Issues, and Future Demand Forecast
00:34:00 - P2P GPU networks
00:36:50 - Customer stories
00:38:23 - VC-Provided GPU Clusters and Credit Risk Arbitrage
00:41:58 - Market Pricing Dynamics and Preemptible GPU Pricing Model
00:48:00 - Future Plans for Financialization?
00:52:59 - Cluster auditing and quality control
00:58:00 - Futures Contracts for GPUs
01:01:20 - Branding and Aesthetic Choices Behind SF Compute
01:06:30 - Lessons from Previous Startups
01:09:07 - Hiring at SF Compute
Chapters
00:00:00 Introduction and Background
00:00:58 Analysis of GPU Business Models
00:01:53 Challenges with GPU Pricing
00:02:48 Revenue and Scaling with GPUs
00:03:46 Customer Sensitivity to GPU Pricing
00:04:44 Core Weave's Business Strategy
00:05:41 Core Weave's Market Perception
00:06:40 Hyperscalers and GPU Market Dynamics
00:07:37 Financial Strategies for GPU Sales
00:08:35 Interest Rates and GPU Market Risks
00:09:30 Optimal GPU Contract Strategies
00:10:27 Risks in GPU Market Contracts
00:11:25 Price Sensitivity and Market Competition
00:12:21 Market Dynamics and GPU Contracts
00:13:18 Hyperscalers and GPU Market Strategies
00:14:15 Nvidia and Market Competition
00:15:12 Microsoft's Role in GPU Market
00:16:10 Challenges in GPU Market Dynamics
00:17:07 Economic Realities of the GPU Market
00:18:03 Real Estate Model for GPU Clouds
00:18:59 Price Sensitivity and Chip Design
00:19:55 SF Compute's Beginnings and Challenges
00:20:54 Navigating the GPU Market
00:21:54 Pivoting to a GPU Cloud Provider
00:22:53 Building a GPU Market
00:23:52 SF Compute as a GPU Marketplace
00:24:49 Market Liquidity and GPU Pricing
00:25:47 Utilization Rates in GPU Markets
00:26:44 Brokerage and Market Flexibility
00:27:42 H100 Glut and Market Cycles
00:28:40 Supply Chain Challenges and GPU Glut
00:29:35 Future Predictions for the GPU Market
00:30:33 Speculations on Test Time Inference
00:31:29 Market Demand and Test Time Inference
00:32:26 Open Source vs. Closed AI Demand
00:33:24 Future of Inference Demand
00:34:24 Peer-to-Peer GPU Markets
00:35:17 Decentralized GPU Market Skepticism
00:36:15 Redesigning Architectures for New Markets
00:37:14 Supporting Grad Students and Startups
00:38:11 Successful Startups Using SF Compute
00:39:11 VCs and GPU Infrastructure
00:40:09 VCs as GPU Credit Transformators
00:41:06 Market Timing and GPU Infrastructure
00:42:02 Understanding GPU Pricing Dynamics
00:43:01 Market Pricing and Preemptible Compute
00:43:55 Price Volatility and Market Optimization
00:44:52 Customizing Compute Contracts
00:45:50 Creating Flexible Compute Guarantees
00:46:45 Financialization of GPU Markets
00:47:44 Building a Spot Market for GPUs
00:48:40 Auditing and Standardizing Clusters
00:49:40 Ensuring Cluster Reliability
00:50:36 Active Monitoring and Refunds
00:51:33 Automating Customer Refunds
00:52:33 Challenges in Cluster Maintenance
00:53:29 Remote Cluster Management
00:54:29 Standardizing Compute Contracts
00:55:28 Unified Infrastructure for Clusters
00:56:24 Creating a Commodity Market for GPUs
00:57:22 Futures Market and Risk Management
00:58:18 Reducing Risk with GPU Futures
00:59:14 Stabilizing the GPU Market
01:00:10 SF Compute's Anti-Hype Approach
01:01:07 Calm Branding and Expectations
01:02:07 Promoting San Francisco's Beauty
01:03:03 Design Philosophy at SF Compute
01:04:02 Artistic Influence on Branding
01:05:00 Past Projects and Burnout
01:05:59 Challenges in Building an Email Client
01:06:57 Persistence and Iteration in Startups
01:07:57 Email Market Challenges
01:08:53 SF Compute Job Opportunities
01:09:53 Hiring for Systems Engineering
01:10:50 Financial Systems Engineering Role
01:11:50 Conclusion and Farewell
The podcast by and for AI Engineers! In 2024, over 2 million readers and listeners came to Latent Space to hear about news, papers and interviews in Software 3.0.
We cover Foundation Models changing every domain in Code Generation, Multimodality, AI Agents, GPU Infra and more, directly from the founders, builders, and thinkers involved in pushing the cutting edge. Striving to give you both the definitive take on the Current Thing down to the first introduction to the tech you'll be using in the next 3 months! We break news and exclusive interviews from OpenAI, Anthropic, Gemini, Meta (Soumith Chintala), Sierra (Bret Taylor), tiny (George Hotz), Databricks/MosaicML (Jon Frankle), Modular (Chris Lattner), Answer.ai (Jeremy Howard), et al.
Full show notes always on https://latent.space