Podcasts about Inference

  • 611PODCASTS
  • 1,103EPISODES
  • 42mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Jun 9, 2026LATEST
Inference

POPULARITY

20192020202120222023202420252026


Best podcasts about Inference

Show all podcasts related to inference

Latest podcast episodes about Inference

100x Entrepreneur
94% CAGR: What the Inference Boom means for your AI costs | Vamshi Ambati

100x Entrepreneur

Play Episode Listen Later Jun 9, 2026 51:56 Transcription Available


Vamshi Ambati has spent more than two decades in AI, through the symbolic era, statistical era, and the neural wave we're experiencing today. A CMU PhD, founder of LatentStructure and Predera (which was acquired), now an investor at Virama Ventures, he's one of the sharper voices on what's actually happening under the hood of the AI boom.We discuss a simple question: Who wins when models become cheaper and more abundant? And try to answer this by looking at how inference spend v/s compute spend is shifting, and why inference may become the biggest infrastructure opportunity of the next decade.Vamshi explains what actually goes into the cost of a token, why AI is simultaneously getting cheaper and more expensive, and why the inference market alone could reach $1.3 trillion by 2030. If you're building in AI or someone who wants a clear mental model of where this industry is headed, this conversation is for you. 00:00 - Trailer0:45 - How an AI researcher thinks after 20 years05:53 - Where enterprise AI adoption is headed08:35 - Drawing parallels between cloud and AI11:20 - If building is cheap, what's valuable?13:37 - Can computing get cheaper?16:41 - What is inference, really?22:22 - Why coding and customer support got eaten first?26:48 - Which technologies are overvalued and undervalued?29:56 - An accidental entrepreneur's journey33:15 - Why is healthcare slow to adopt technology?38:59 - Landing Walmart as a customer42:36 - Should founders build in services if product isn't visible?43:47 - Is Palantir a product company or a services company?44:15 - How to win as a forward-deployed company46:23 - What it takes to land large enterprise customers49:20 - Building sales muscles as a technical founder-------------India's talent has built the world's tech—now it's time to lead it.This mission goes beyond startups. It's about shifting the center of gravity in global tech to include the brilliance rising from India.What is Neon Fund?We invest in seed and early-stage founders from India and the diaspora building world-class Enterprise AI companies. We bring capital, conviction, and a community that's done it before.Subscribe for real founder stories, investor perspectives, economist breakdowns, and a behind-the-scenes look at how we're doing it all at Neon.-------------Check us out on:Website: https://neon.fund/Instagram: https://www.instagram.com/theneonshoww/LinkedIn: https://www.linkedin.com/company/beneon/Twitter: https://x.com/TheNeonShowwConnect with Siddhartha on:LinkedIn: https://www.linkedin.com/in/siddharthaahluwalia/Twitter: https://x.com/siddharthaa7-------------This video is for informational purposes only. The views expressed are those of the individuals quoted and do not constitute professional advice.Send us Fan Mail

De Nederlandse Kubernetes Podcast
#136: vLLM, LMD, and the Quest to Build the Linux of AI Inference

De Nederlandse Kubernetes Podcast

Play Episode Listen Later Jun 9, 2026 32:21


In this episode, hosts Ronald and Jan are joined at KubeCon by two guests from Red Hat: Brian Stevens, AI CTO and one of the original architects behind the creation of Kubernetes and the CNCF, and Rob Shaw, co-lead of the vLLM project and maintainer of LMD.Brian shares the remarkable backstory of how Kubernetes came to be open source, including how Red Hat negotiated a single committer seat before agreeing to be a launch partner, and how he later pushed Google to contribute Kubernetes to the newly formed CNCF rather than keeping it proprietary like TensorFlow.Rob explains what an inference runtime actually is: the critical piece of software that takes an abstract AI model and runs it as efficiently as possible on a GPU or other accelerator — handling everything from CUDA-level kernel optimization to memory management and concurrent request scheduling. vLLM serves as a "Rosetta Stone" between the ever-growing zoo of models (Llama, DeepSeek, Mistral, Qwen, Nvidia Nemotron) and accelerators (Nvidia, AMD, Intel, Google TPUs).The conversation covers model compression and quantization how techniques like 4-bit precision can deliver 2x hardware efficiency gains while preserving 99%+ model accuracy. Brian and Rob also address the "big model vs. many small models" debate, recommending to always start with the largest capable model to validate a use case before optimizing down.Looking ahead, both guests see inference as potentially the single largest workload ever run on Kubernetes, and position LMD (now contributed to the CNCF) as the distributed inference layer that will make this possible across heterogeneous accelerator environments  preventing enterprises from ending up with 42 incompatible AI stacks.The episode closes with a discussion on AI slop, human-in-the-loop thinking, and the future of Kubernetes as the universal platform for running AI agents at scale.Powered by  @acc-ict ​Stuur ons een bericht.ACC ICT Specialist in IT-CONTINUÏTEIT Bedrijfskritische applicaties én data veilig beschikbaar, onafhankelijk van derden, altijd en overalSupport the showLike and subscribe! It helps out a lot.You can also find us on:De Nederlandse Kubernetes Podcast - YouTubeNederlandse Kubernetes Podcast (@k8spodcast.nl) | TikTokDe Nederlandse Kubernetes PodcastWhere can you meet us:EventsThis Podcast is powered by:ACC ICT - IT-Continuïteit voor Bedrijfskritische Applicaties | ACC ICT

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch
20VC: Nebius Co-Founder on AI Infrastructure Bubbles | The Real Impact of Open Source on OpenAI & Anthropic | How Price Elastic is Demand for Compute | Could Nebius Sell 10x More Compute If They Had It & more with Roman Chernin

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch

Play Episode Listen Later Jun 8, 2026 66:40


Roman Chernin is Co-Founder and Chief Business Officer of Nebius, one of the fastest-growing AI infrastructure companies in the world. Today, Nebius operates some of the largest AI compute clusters globally and serves leading AI labs, enterprises, and developers. Today, Nebius has a market cap of $57BN.  AGENDA:  00:00 — Why AI Infrastructure Is Not a Bubble 05:00 — The Real Impact of Open Source on OpenAI & Anthropic 11:00 — Jevons Paradox: Why Cheaper AI Creates More Demand 13:00 — The Four Layers of AI Infrastructure Explained 19:00 — If Nebius Had 10x More Capacity Tomorrow 26:00 — The Shift from Training to Inference and Agents 31:00 — How Token Factory Cuts AI Costs by 70% 44:00 — Sovereign AI, Europe, and the Future of Model Building 49:00 — Competing Against Hyperscalers with 10x More Capital 59:00 — The Biggest Threat to Nebius Isn't Competition—It's Consolidation    

Everyday AI Podcast – An AI and ChatGPT Podcast
Ep 792: Autonomous Copilot agents, new Codex tools, Github CoPilot app and 7 more AI updates you should be using

Everyday AI Podcast – An AI and ChatGPT Podcast

Play Episode Listen Later Jun 5, 2026 36:45


✅ New autonomous agents. ✅ Canva designs made for you. ✅ Codex upgrades to make your business move. If you had your head down in spreadsheets this week, you missed some MAJOR AI upgrades that are available now. We track what's hot and what's not and break it all down on Fridays with our Friday Features. Autonomous Copilot agents, new Codex tools, Github CoPilot app and 7 more AI updates you should be using — An Everyday AI Chat with Jordan WilsonNewsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageToday's Episode on LinkedIn: Thoughts on this? Join the convo on LinkedIn and connect with other AI leaders.Upcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:OpenAI Codex Role-Specific Plugins LaunchMicrosoft Build Conference AI Feature ReleasesChatGPT Memory and Business Account UpgradesMicrosoft Flash Image Model for PowerPointCanva Integrated with ChatGPT and CodexGitHub Copilot Standalone Desktop App PreviewMicrosoft Autopilot Always-On Work AgentsOpenAI Models Now Available on AWS BedrockCodex Sites: AI-Built Internal Web AppsTimestamps:00:00 OpenAI's big money moves03:47 Explaining role-specific plugins09:02 Microsoft's new image model release11:09 Microsoft's AI strategy and Canva update14:23 Canva integration with ChatGPT16:56 GitHub Copilot's new canvas feature20:46 AI token subscription changes24:42 AWS adds OpenAI models to Bedrock28:25 Introducing OpenAI's CodeX Sites Feature32:07 Launch of OpenAI's New Plug-in34:16 Overview of podcast structureKeywords: Autonomous copilot agents, Codex tools, GitHub Copilot app, OpenAI Codex, ChatGPT business accounts, OpenAI enterprise, Microsoft Build conference, Microsoft always-on agents, AWS AI updates, Canva plugin, ChatGPT memory upgrade, Windows Codex integration, Microsoft Flash model, Enterprise apps integration, Role-specific plugins, Sales data analytics, Product design AI, Creative production AI, Investment banking plugin, Public equity investing, Data analytics plugin, Workspace admins, App permissions, Role-aware work agent, Financial research automation, Microsoft image generation model, PowerPoint AI integration, OneDrive AI features, Visual design creation, Canva app for ChatGPT, Canva MCP server, Agentic context carry, Full screen design preview, GitHub Copilot desktop app, GitHub Copilot Canvas, Agent-native command center, Parallel agent work tree, Code app interface, Model options in GitHub, Token usage limits, Subscription token subsidizing, Anthropic token efficiency, Amazon Bedrock, GPT-4, GPT-4.5, Small language models, Token reckoning, Security governance, Inference engine, Code app sidebar, Codex Sites, Internal dashboards, Project trackers, Interactive web apps, Shareable AI apps, Enterprise data connectors, ChatGPT Canvas, Automated workflow, Workplace authentication, Creative briefs repository.Send Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Start Here ▶️Not sure where to start when it comes to AI? Start with our Start Here Series. You can listen to the first drop -- Episode 691 -- or get free access to our Inner Cricle community and all episodes: StartHereSeries.com Also, here's a link to the entire series on a Spotify playlist. 

Sadler's Lectures
William Clifford, The Ethics Of Belief - The Limits Of Inference - Sadler's Lectures

Sadler's Lectures

Play Episode Listen Later Jun 4, 2026 14:00


This lecture discusses the William Clifford's 1877 essay "The Ethics Of Belief", in which he makes and argued for the central claim "it is wrong always, everywhere, and for any one, to believe anything upon insufficient evidence." It focuses on the third section of his essay, titled "The Limits Of Inference" in which Clifford discusses conditions for having well-founded beliefs of matters we don't have direct experience of, for example matters of everyday life, science, or history. We inevitably rely upon the assumption that the future or present will resemble what we have experienced in the past To support my ongoing work, go to my Patreon site - www.patreon.com/sadler If you'd like to make a direct contribution, you can do so here - www.paypal.me/ReasonIO You can find over 4,000 philosophy videos in my main YouTube channel - www.youtube.com/user/gbisadler Get Clifford's The Ethics of Belief - https://amzn.to/41WkkYA

The Cloudcast
Cerebras is disrupting the market with Fast Inference

The Cloudcast

Play Episode Listen Later Jun 3, 2026 35:21


SUMMARY: After the first successful AI IPO of 2026, we dig into what makes the Cerebras WSE architecture unique in the market for fast inference. GUEST: Andy Hock, at Chief Strategy Officer at Cerebras AISHOW: 1033SHOW TRANSCRIPT: The Enterprise AI Show #1033 TranscriptSHOW VIDEO: https://youtu.be/ed2nVbOtZiASHOW SPONSORS:OutShift - “Scaling Out Superintelligence”  The Internet of Cognition architectureShareGate - ShareGate Protect. Microsoft 365 Governance, we got this!Nasuni - Activate your data for AI and request a demoSHOW NOTES:OpenAI announces 750MW partnership with CerebrasCerebras and AWS partnershipCerebras announces IPOTopic 1 - Welcome to the show. Tell us about your background, and what you focus on today. Topic 2 - For anyone that's not familiar with Cerebras, give us an overview of the company, and especially an overview on the Cerebras technologies (e.g. Wafer-Scale Engine).Topic 3 - Cerebras' WSE architecture is different from many of the GPU or GPU-like architectures in the market today. Centralized vs. distributed architectures always have their tradeoffs. Walk us through the technical and economic value of the Cerebras architecture.Topic 4 - Congratulations on the recent IPO (raised $5.55B). Let's use that as a point in time vs the previous planned IPO. How has the market changed in that timeframe, and how has the Cerebras position changed? Topic 5 - Cerebras (today) offer both WSE hardware, and Cerebras Cloud (API) - very different GTM paths. Can we expect both of those to stay top priorities, or have the market dynamics shifted such that the priorities shift more towards the WSE business - as we're seeing OpenAI, AWS and other engagements announced?Topic 6 - Is Cerebras a training and inference company, or are the economics of inference significantly different enough that it needs to be the sole focus of the company (for now)? Topic 7 - How much effort is it for any company to add support for the Cerebras chips if they have previously been using other architectures?Topic 8 - An IPO is a major milestone for any company, but the markets will now look for your future story. How do you see the AI market evolving over the next 2-5 years, and what are some things that people aren't understanding yet about how it will evolve?FEEDBACK?Email: show @ the enterprise ai show dot comeBluesky: @TheEntAIShow.bsky.socialTwitter/X: @TheEntAIShowInstagram: @TheEntAIShow

Alexa's Input (AI)
How vLLM and llm-d Changed AI Inference with Rob Shaw

Alexa's Input (AI)

Play Episode Listen Later Jun 3, 2026 102:59


In this episode of Alexa's Input (AI), I sat down with Rob Shaw from Red Hat to talk about how AI inference evolved from a simple model serving problem into a large-scale distributed systems problem.We explored the infrastructure shifts behind modern LLM serving, including how vLLM and PagedAttention changed the economics and efficiency of inference, why KV cache management became one of the most important bottlenecks in production AI systems, and how orchestration layers like llm-d are emerging to coordinate distributed inference.We also discuss:how LLM inference differs from traditional model serving runtimesKV cache, prefix caching, and cache-aware routingwhy throughput and latency became major infrastructure challengeslong-context agents and repeated inference callsdistributed inference on Kubernetesintelligent routing, flow control, and load balancingprefill/decode disaggregationenterprise AI deployment realitiesvLLM has become one of the most important open-source projects in AI infrastructure, and llm-d represents a newer shift toward treating inference as a coordinated distributed system rather than just a single runtime problem.If you want to better understand the systems layer beneath modern AI applications, this episode is a deep dive into where inference infrastructure is heading next.General Podcast LinksWatch: ⁠⁠⁠⁠⁠⁠https://www.youtube.com/@alexa_griffith⁠⁠⁠⁠⁠⁠Read: ⁠⁠⁠⁠⁠⁠⁠⁠https://alexasinput.substack.com/⁠⁠⁠⁠⁠⁠⁠⁠Listen:⁠⁠ ⁠⁠https://creators.spotify.com/pod/profile/alexagriffith/⁠⁠⁠⁠More: ⁠⁠⁠⁠⁠⁠https://linktr.ee/alexagriffith⁠⁠⁠⁠⁠⁠Learn more about the host atWebsite: ⁠⁠⁠⁠⁠⁠https://alexagriffith.com/⁠⁠⁠⁠⁠⁠LinkedIn: ⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/alexa-griffith/⁠⁠⁠⁠⁠⁠Find out more about the guest at:LinkedIn: https://www.linkedin.com/in/robert-shaw-1a01399a/ Red Hat Articles: https://developers.redhat.com/author/robert-shawGithub: https://github.com/robertgshaw2-redhat ResourcesvLLM Website: https://vllm.ai/vLLM GitHub Repository: https://github.com/vllm-project/vllmllm-d Website: https://llm-d.ai/llm-d GitHub Repository - https://github.com/llm-d/llm-d KeywordsAI inference, VLLM, LMD, distributed inference, GPU optimization, open source AI, Kubernetes, multi-cluster deployment, AI infrastructure, enterprise AI AI infrastructure, Kubernetes, model optimization, speculative decoding, mixture of experts, AI deployment, performance tuning, AI systems, neural network scaling Key TopicsEvolution of vLLM and llm-dDistributed inference and routingGPU utilization and performance optimizationOpen source AI infrastructureEnterprise deployment challenges and solutions Standardization in Kubernetes for NIC exposurePerformance optimizations: quantization and speculative decodingMixture of experts architecture and parallelism strategiesFlow control and request scheduling in AI systemsEmerging hardware for AI inference, Cerebras processorReinforcement learning and AI system supportModular architecture of vLLM and ecosystem projects

Remotely Curious
Coming soon: Working Smarter season three

Remotely Curious

Play Episode Listen Later Jun 2, 2026 2:17


Modern work can be frustrating and chaotic—if you don't have the right tools. From context engineering to multimodal search, go behind the scenes and hear how Dropbox engineers are building AI that actually understands you, so you can focus on the work that matters most. If you're new to Working Smarter, we've travelled from the F1 track to the bottom of a lake, and heard real stories from chefs, doctors, lawyers, and founders about how AI is helping them do more of what they love about their jobs. But in our third season, we're talking to the people behind the tools—the engineers and product leaders building helpful, time-saving AI features into the Dropbox experience you already know and trust. You'll hear all about their work on agents, inference, security, and, of course, how the people building AI use AI themselves. ~ ~ ~  Working Smarter is brought to you by Dropbox. Find, organize, and share your work—all in one place—with context-aware AI from Dropbox. You can listen to more episodes of Working Smarter on Apple Podcasts, Spotify, YouTube, Amazon Music, or wherever you get your podcasts. To read more stories and past interviews, visit workingsmarter.ai This show would not be possible without the talented team at Cosmic Standard: producer Ben Montoya, sound engineer Aja Simpson, technical director Jacob Winik, and executive producer Eliza Smith. Special thanks to our illustrator Fanny Luor, marketing consultant Meggan Ellingboe, and editorial support from Catie Keck.  Our theme song was composed by Doug Stuart.  Working Smarter is hosted by Matthew Braga. Thanks for listening!

Causal Bandits Podcast
Strait of Hormuz: Causal Models for Rare Events | Alexander Denev S2E11 | CausalBanditsPodcast.com

Causal Bandits Podcast

Play Episode Listen Later Jun 1, 2026 43:28 Transcription Available


Send us Fan Mail*How do you forecast an event that has never happened before?*How do you forecast an event that has never happened before?The recent closure and reopening of the Strait of Hormuz are unique events. For events like these, traditional risk models lose their statistical basis: repetition. Alexander Denev returns to the podcast to show how causal models (Bayesian networks) let us reason about rare events despite this limitation.In this episode, we cover:- Why value-at-risk and other correlation-based models break exactly when you need them most- How a causal structure can "hold in time"- Building scenarios with LLMs - benefits, drawbacks, and lessons learned- Historical analogy as a modeling tool: Bosphorus, Hormuz, and more- A three-way robustness test for any Bayesian network- How the model's call held up: a ceasefire, a still-closed strait, and lasting infrastructure damage keeping oil elevated"History doesn't repeat itself, but it rhymes."------------------------------------------------------------------------------------------------------Video version available on the Youtube: https://youtu.be/FzKy2ws-7qsRecorded on May 29, 2026 in London, UK.------------------------------------------------------------------------------------------------------*About The Guest*Alexander Denev works at the intersection of quantitative finance, causality, and AI. He's the CEO of Turnleaf Analytics and the author of two books on applying Bayesian networks and probabilistic graphical models to finance and scenario analysis.Connect with Alexander:- Alexander on LinkedIn: https://www.linkedin.com/in/alexander-denev-66a25824/- Alexander's web page: https://turnleafanalytics.com/*About The Host*Aleksander (Alex) Molak is an independent machine learning researcher, educator, entrepreneur and a best-selling author in the area of causality (https://amzn.to/3QhsRz4 ).Connect with Alex:- Alex on the Internet: https://bit.ly/aleksander-molak*Links*Web- Alexander's LinkedIn post, Bayesian-network scenario for the Strait of Hormuz / Israel-Iran-US conflict: https://www.linkedin.com/posts/alexander-denev-66a25824_when-modelling-the-impact-of-events-that-share-7442892381668048896-JDs5/- Risk.net article, "Iran confusion makes the case for causal modelling": https://www.risk.net/our-take/7963361/iran-confusion-makes-the-case-for-causal-modellingBooks- Rebonato, R. & Denev, A. - Portfolio Management under Stress: A Bayesian-Net Approach to Coherent Asset Allocation (https://amzn.to/3vE6Jc1)- López de Prado, M. - Advances in Financial Machine Learning (https://amzn.to/3PXD8kH)- Molak, A. - Causal Inference and Discovery in Python (https://amzn.to/3VVK4m3)- Denev, A. - Probabilistic Graphical Models: A New Way of Thinking in Financial Modelling (https://amzn.to/3VQeLJm)- Pearl, J. & Mackenzie, D. - The Book of Why (recommended entry point) (https://amzn.to/4e0ATrZ)- Pearl, J. - Causality: Models, Reasoning and Inference (for advanced readers) (https://amzn.to/49zBKf5)- Rebonato, R. - Coherent Stress Testing: A Bayesian Approach to the Analysis of Financial Stress (https://amzn.to/3RC411e)*Perks & resources*

Crazy Wisdom
Episode #549: From MS-DOS to Vibe Coding: How Non-Technical Founders Build Complex Software

Crazy Wisdom

Play Episode Listen Later May 29, 2026 70:14


Stewart Alsop sat down with Michael Shackelford to discuss their experiences building applications through vibe coding—the practice of using AI to create software without traditional programming expertise. Stewart, who runs the AI Whispers community in Buenos Aires and hosts the Crazy Wisdom podcast (with over 660 interviews), shared how he went from teaching people prompt engineering to building his own video conferencing software as a Riverside.fm replacement, while Michael opened up about his year-long journey creating Genrupt Inc, an AI-powered content generation tool for e-commerce sellers. The conversation covered everything from the decline in quality of Claude's reasoning capabilities and how Chinese companies used distillation attacks to copy Anthropic's models, to the importance of spaced repetition systems for managing knowledge in the age of LLMs, with both sharing battle-tested prompting strategies like asking AI to "explain it to me in genius terms" and using deep research queries to reverse engineer how competitors build their products.Show Notes:- Dan Martell's book "Buy Back Your Time" was mentioned as one of the best business books for thinking about life and business- Check out John Vervaeke's "Awakening from the Meaning Crisis" for understanding relevance realization and why AI fundamentally cannot determine what's relevant to humans without being toldTimestamps00:00 Michael discusses being exhausted from getting his app ready for launch, working nonstop with AI to prepare landing page for podcast traffic driving beta signups05:00 Stewart explains starting AI Whispers in Buenos Aires after leaving OpenAI vendor company, meeting early adopters like Torin who was building mind-reading EEG technology10:00 Discussion of how corporations resist AI adoption due to political games and job security fears while some companies use AI as excuse for pandemic-era layoffs15:00 Stewart describes teaching workshops on using LLMs as linguistic tools rather than coding tools, noting technical people often lack humanities background needed for prompting20:00 Explaining chatbot wrappers, API calls, and how Anthropic's reasoning quality declined after Chinese distillation attacks copied their secret sauce developed with philosophers25:00 Technical discussion of model training, fine-tuning versus RAG for new information, and different approaches to updating AI knowledge beyond initial training30:00 Stewart describes building podcast recording software to replace expensive Riverside, struggling with syncing audio and video files across different computer clocks35:00 Discussion of critical factors in vibe coding, discovering unknown technical requirements, and how AIs don't automatically reveal missing information40:00 Stewart's reverse engineering process using deep research function to study competitors' hiring and technology stacks, separating planning agents from coding agents45:00 Prompting techniques including "explain like I know everything" and using spaced repetition systems to capture valuable prompts and technical knowledge50:00 Michael explains his Generux app for generating ecommerce content using Amazon review data analysis to inform high-converting listing images and videos55:00 Discussion of founder mentality involving self-delusion about project timelines, Michael working nine-plus hours daily for nine months on app development60:00 Comparing Amazon's expert software to prosumer software approach, discussing distribution challenges and future robotics applications for customized products65:00 Stewart demonstrates spaced repetition app for memory improvement and knowledge retention, explaining relevance realization problem that AI agents cannot solve without embodimentKey Insights1. Stewart Alsop started AI Whisperers in Buenos Aires after leaving his role at Invisible Technologies, which was OpenAI's largest vendor for RLHF work. He noticed that machine learning engineers at tech companies lacked the humanities background needed to properly interact with large language models, which are fundamentally linguistic tools. This led him to create weekly workshops teaching non-technical people how to use AI effectively, running events every Thursday for two years straight. The group attracted intense geeks from the start and eventually led to Stewart speaking right after Vitalik Buterin at DevConnect, marking a significant milestone for the community.2. Large corporations are resistant to AI adoption due to multiple factors including political dynamics within organizations and employees fearing job loss. Many companies that grew during the pandemic are now using AI as an excuse to downsize when the real issue is inefficiency from rapid expansion. Stewart observed that even technical people in machine learning often don't understand how to properly use AI tools because they lack linguistic and humanities training. The fundamental problem is educational, requiring companies to train people how to use these new tools while those same people resist learning them.3. Vibe coding has evolved significantly with Claude Code being a game changer that reduced the technical barrier to entry. Before Claude Code, developers needed substantial technical knowledge to work through constant doom loops and debugging cycles. The success of coding AI tools stems from thirty years of testing infrastructure that provides clear yes or no feedback on whether code works. This infrastructure doesn't exist in the same way for manufacturing, science, and other fields, which is why software became the dominant area for AI assistance initially.4. Claude's quality degradation over recent months resulted from multiple factors including distillation attacks by Chinese companies who reverse engineered Anthropic's reasoning capabilities. Anthropic had hired philosophers, sociologists, and psychologists to develop exceptional reasoning in Claude 4.5, but this was expensive to run. When Chinese models like Kimi copied these capabilities at one tenth the cost, and when mainstream users flooded the platform before Anthropic's planned IPO, the company had to reduce quality to manage computational costs. This represents a significant loss for power users who relied on Claude's superior reasoning abilities.5. Stewart built a podcast recording application to replace Riverside because he needed API access to automate workflows, which Riverside wanted one thousand dollars monthly to provide. The technical challenge involves syncing audio and video from local recordings on multiple computers with different clocks through a server, then merging them so voices match lip movements. This problem requires understanding complex timing issues across different network conditions and file formats. Stewart has been working through AI psychosis for months on this FFMPEG pipeline problem, illustrating how vibe coding still requires building intuition about technical problems even without traditional coding knowledge.6. The transition from expert software to prosumer software represents a major opportunity for AI-enabled tools. Expert software like Photoshop, Blender, and terminal interfaces have extreme complexity that intimidates beginners, but AI is making these capabilities accessible through natural language. The reign of specialists is ending as generalists with broad knowledge and curiosity can now build complete applications by leveraging AI to fill technical gaps. This shift particularly benefits entrepreneurs and founders who specialize in getting into difficult situations and figuring them out, even when they originally thought tasks would be easier than they turned out to be.7. Building applications with AI requires accepting massive time investments beyond initial estimates and developing strategies for overcoming knowledge gaps. Michael estimated his ecommerce content generation app would take months but spent nearly a year working over nine hours daily, while Stewart spent months solving audio-video sync issues. Success requires using tools like deep research to understand how competitors solve problems, maintaining separate planning and coding agents, and learning to ask the right questions. The key insight is that vibe coders can achieve ninety percent of functionality independently, but the final ten percent often requires understanding specific technical concepts that AI cannot intuit without proper context and domain knowledge.

Catalyst with Shayle Kann
Building inference data centers on the high seas

Catalyst with Shayle Kann

Play Episode Listen Later May 28, 2026 48:05


Amidst the increasing urgency of powering data centers, a new solution has entered the mix: send them out to sea. In this episode, Shayle speaks to Garth Sheldon-Coulson, co-founder and CEO of Panthalassa. The company is building 85-meter steel "nodes" – taller than Big Ben – that it deploys into the deep ocean. These untethered, self-propelled nodes harness wave energy to power AI clusters, then beam their data back to land via satellite. The technology isn't without its fair share of logistic complications, but it nonetheless offers a pathway to powering the AI boom that's largely independent from grid or fuel constraints. Shayle and Garth cover topics including: - The physics and mechanics that power Panthalassa's nodes - The significance of building an autonomous fleet - The energy generation waiting to be tapped in the open ocean - The logistics and unit economics behind scaling Panthalassa's technology - Why deep-sea compute is well-suited for long-running workloads like inference and reinforcement learning - Catalyst: AI scaling pathways: On grid, on edge, off grid, off planet - Catalyst: How to build more hydropower - Latitude Media: Are Thiel-funded floating data centers enough to make wave energy pencil?  - Open Circuit: Grid utilization vs expansion: The 100 GW debate - Latitude Media: What geothermal can learn from offshore wind's demise Credits: Hosted by Shayle Kann. Produced and edited by Max Savage Levenson. Original music and engineering by Sean Marquand. Stephen Lacey is our executive editor. Catalyst is brought to you by EnergyHub. EnergyHub helps utilities build next-generation virtual power plants that unlock reliable flexibility at every level of the grid. See how EnergyHub helps unlock the power of flexibility at scale, and deliver more value through cross-DER dispatch with their leading Edge DERMS platform, by visiting energyhub.com. Tune into Critical Capital, a brand new podcast from Crux and Latitude Studios. Hosted by Crux CEO Alfred Johnson, Critical Capital explores the interlocking forces powering clean and critical infrastructure. Join us every other Tuesday for in-depth conversations at the intersection of energy, government, finance, and global markets. Listen here, or wherever you get podcasts. Catalyst is brought to you by FischTank PR, an award-winning climate and energy tech, renewables, and sustainability-focused PR firm dedicated to elevating the work of both early-stage and established companies. Learn more about their PR approach and how they can support your company's messaging by visiting ⁠fischtankpr.com⁠.

Tech Disruptors
Cerebras After IPO: OpenAI, AWS and Inference

Tech Disruptors

Play Episode Listen Later May 28, 2026 42:30


“OpenAI has only two AI accelerator compute vendors in production today, Cerebras and Nvidia,” Cerebras CEO Andrew Feldman says. Four days after Cerebras went public, Feldman joined Bloomberg Intelligence's Kunjan Sobhani to discuss the company's next chapter and the rapidly shifting AI infrastructure landscape. Feldman breaks down the OpenAI deal, the strategic AWS partnership around disaggregated inference and why Cerebras believes fast inference is becoming the industry's defining battleground. He explains how Cerebras evolved from building the world's largest chip to operating one of the fastest inference platforms, why disaggregated inference could reshape hyperscale AI deployments and how the company is navigating power, memory and data-center constraints. The episode also explores the competitive landscape beyond GPUs and Feldman's broader perspective on the next phase of AI compute.

The Data Center Frontier Show
Nomads at the Frontier: Phillip Koblence on AI Infrastructure, Inference Demand, and the Industry's Growing Visibility at Data Center World 2026

The Data Center Frontier Show

Play Episode Listen Later May 28, 2026 17:03


Recorded live at Data Center World 2026, Data Center Frontier Editor in Chief Matt Vincent sits down with Phillip Koblence, COO of NYI and co-founder of Nomad Futurist, for the latest installment of Nomads at the Frontier. The conversation explores the accelerating realities of AI infrastructure buildouts, the industry's growing focus on community engagement, workforce shortages, and the shift toward inference-driven deployments following NVIDIA GTC 2026. Koblence discusses why major interconnection hubs and edge-adjacent urban facilities may become increasingly important in the inference era, the operational realities of deploying AI infrastructure in legacy carrier hotels like 60 Hudson Street, and why the industry can no longer remain invisible to the communities where it builds. Additional topics include: The continuing surge in digital infrastructure demand Why conference attendance reflects sustained industry expansion Power constraints and energy storage discussions emerging at Data Center World AI factories and the evolving economic role of data centers Workforce shortages across engineering and skilled trades Nomad Futurist's workforce development initiatives with Infrastructure Masons and I Am The Armed Forces The growing complexity and diversity of the data center ecosystem “Every element of everything within the data center has a full sub-vertical industry associated with it,” Koblence says during the discussion. “People would be surprised how large of an ecosystem is involved in creating the digital economy that exists today.” Listen now for a candid, fast-moving conversation on the state of AI infrastructure and the future of digital infrastructure development.

Bill Wenstrom
Ephesians 5.14c-Quotation from Isaiah 60 is an Inference from the Contents of Ephesians 5.6-13

Bill Wenstrom

Play Episode Listen Later May 28, 2026 53:16


Ephesians Series: Ephesians 5:14c-Quotation from Isaiah 60 is an Inference from the Contents of Ephesians 5:6-13-Lesson # 347

Wenstrom Bible Ministries
Ephesians 5.14c-Quotation from Isaiah 60 is an Inference from the Contents of Ephesians 5.6-13

Wenstrom Bible Ministries

Play Episode Listen Later May 28, 2026 53:16


Ephesians Series: Ephesians 5:14c-Quotation from Isaiah 60 is an Inference from the Contents of Ephesians 5:6-13-Lesson # 347

Impact Pricing
AI Agents, Zero Humans, and the End of SaaS Per-Seat Pricing with Ajit Ghuman

Impact Pricing

Play Episode Listen Later May 25, 2026 30:51


Ajit Ghuman is the co-founder of Monetizely, former VP of Product at Segment, and author of Price to Scale. In this episode, Ajit breaks down one of the biggest pricing challenges AI companies are about to face: what happens when software no longer supports employees — but starts replacing them entirely? If your company is building, pricing, or monetizing AI products, this episode will change how you think about per-seat pricing, buyer psychology, and the future of SaaS monetization.   Why you have to check out today's podcast: Understand why per-user pricing may stop working as AI agents increasingly replace human workflows inside software products. Learn Ajit Ghuman's 3-part "Agentic Pricing Spectrum" for evaluating AI products based on autonomy, operational scope, and output-to-cost dynamics. Discover why buyers are suddenly comfortable with tokens, credits, and bundled AI pricing — even when they don't fully understand what those units actually mean.   "Unless you understand what your market is, who your buyers are, what do they want... it's the only thing that I start with when I do any project." – Ajit Ghuman   Topics Covered: 02:02 – Why Pricing Became the Most Direct Link to Customer Value. How pricing became the clearest connection between products, value, and business strategy.  06:29 – The AI Pricing Problem Nobody Has Fully Solved Yet. Why AI is forcing SaaS companies to rethink seats, tokens, outcomes, and margins.  07:38 – "Zero Human Companies" and the End of Per-User Pricing. Ajit explores a future where AI agents replace entire job functions — and asks the terrifying question: what happens when there's no user left to charge for? 12:30 – Why Cursor Still Charges Per User (For Now). A fascinating breakdown of AI coding tools, human "anchors," and why most AI products still can't fully move to outcome-based pricing. 16:51 – The Coming AI Commoditization Wave. Why Ajit believes agentic AI companies could rise — and collapse — dramatically faster than traditional SaaS businesses. 23:07 – Why Buyers Suddenly Accept Tokens, Credits, and Weird AI Pricing. Ajit explains how ChatGPT normalized token-based pricing — even though most buyers still don't fully understand what they're paying for. 26:00 – The Real Reason AI Pricing Feels So Chaotic Right Now. Inference costs are dropping, users are disappearing, and pricing anchors keep shifting faster than companies can adapt. 29:35 – The One Pricing Principle That Still Matters in the AI Era.  Despite all the chaos around AI monetization, Ajit says successful pricing still starts with deeply understanding your buyers and their problems.   Key Takeaways: "The anchor is still the human… but the moment the human disappears, per-user pricing starts breaking." – Ajit Ghuman  "Agentic AI may compress 20 years of SaaS evolution into just a few years." – Ajit Ghuman    People / Resources Mentioned: Cursor — Used as a real-world example of current AI pricing models Harvey AI — Referenced as an example of high-value AI transformation inside the legal industry Anthropic — Mentioned in relation to inference models powering AI tools OpenAI — Referenced throughout the discussion on tokens and AI pricing behavior Salesforce — Discussed in relation to potential future shifts away from per-seat pricing Zoom — Used as an example of changing pricing priorities during growth stages   Connect with Ajit Ghuman: Website: https://www.getmonetizely.com/  LinkedIn: https://www.linkedin.com/in/ajitpalghuman/   Email: ajit@getmonetizely.com   Connect with Mark Stiving: LinkedIn: https://www.linkedin.com/in/stiving/ Email: mark@impactpricing.com  

Bitcoin Magazine
How China Hijacked Bernie & AOC's War on AI | Bitcoin Policy Hour EP 38

Bitcoin Magazine

Play Episode Listen Later May 22, 2026 57:21


On April 29th, the US Senate hosted a panel on the "existential threat" of AI and two of the four panelists worked for the Chinese government. One month earlier, Bernie Sanders and AOC introduced legislation imposing a federal moratorium on American AI data centers. On Bitcoin Policy Hour EP 38, Zack Cohen, Ken Egan, and Zack Shapiro unpack a new Bitcoin Policy Institute report by Sam Lyman exposing the CCP influence operation steering US AI policy. They also cover the Clarity Act vote in Senate Banking, the BRCA fight, and the Digital Asset Parity Act. Sam Lyman's BPI Report: https://www.btcpolicy.org/articles/foreign-influence-in-the-campaign-against-american-ai

Freemius
AI app pricing models: How to choose the right one without getting destroyed by inference costs

Freemius

Play Episode Listen Later May 22, 2026


You built an AI app, people are using it, and now you need a pricing model that can survive real usage. The problem? Most pricing guides are written for funded...

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!This was recorded before Railway suffered a major GCP outage on May 19, despite being a multi-AZ, multi-zone mesh ring, with HA fiber interconnects between their Metal GCP AWS, because workload discoverability was unintentionally still tied to GCP. All has been resolved with a post-mortem.Railway did not start as an AI infrastructure company.It was founded in 2020 years before agents became the default way people thought about deploying software. Jake Cooper, formerly at Bloomberg and Uber, started Railway with a simple obsession: the activation energy to ship something to production should be near zero. Push code, get a URL, iterate. No Docker files, no Kubernetes manifests, no Ansible scripts stacked on Ansible scripts.For years, this was a slow grind. Railway spent its first 18 months hand-acquiring its first 100 users with Jake personally greeting every Discord signup on a second monitor.Today, Railway has raised $124m and is growing very fast. A 35-person team supports 3 million users, adding roughly 100,000 signups a week. Their bare metal data centers have a 3-month payback period vs. renting in the cloud, with 70% margins funding aggressive cloud bursting when needed. The servers they own have actually appreciated in value as RAM prices have climbed basically meaning the value of their hardware now exceeds the capital they've raised.From rebuilding Railway's network overlay over a weekend to moving the vast majority of workloads onto its own bare metal data centers, Jake Cooper is trying to build a new cloud for an agent-native world. In this episode, Railway's founder and “conductor” joins swyx and Alessio to unpack why the next era of software infrastructure is not just “Heroku but newer,” what agents need that humans did not, and why the old deployment loop of Git, PRs, CI/CD, and static cloud resources may be heading for a rewrite.We go deep on Railway's infrastructure stack: own-metal data centers, three-month cloud payback periods, cloud bursting, data center debt, Railpack, Nixpacks, Temporal, feature flags, Central Station, content-addressable filesystems, agent-safe production forks, and why the CLI may become more important than the canvas in an agent world. Jake also shares the founder journey behind Railway, how the company survived losing $500K/month, why it now serves millions of users with only 35 people, and why he believes the pull request is dying.We discuss:* How Railway went from a slow six-year grind to adding 100,000 users a week* How Railway thinks about agents as the next dominant software species* Why agents need version control, observability, compute, storage, and orchestration at 1000x scale* The economics of Railway's own-metal data centers and three-month payback* How Railway uses cloud bursting while scaling its own infrastructure* Why data center debt can be a better tool than venture debt for infra startups* Central Station, Railway's internal system for clustering customer feedback and incidents* Why responsible disclosure and over-communication matter for platforms* Why feature flags, progressive rollouts, and shadow traffic are essential for agents* Temporal's strengths, pain points, and why workflows matter for agents* Railpack, Nixpacks, Nix, and lazy-loaded content-addressable filesystems* Why “cattle, not pets” may change if you can clone the pets* Why Railway is building a new cloud from scratch instead of copying hyperscalers* The solo founder path, focus, writing, and how Jake thinks about company buildingRailway:* Website: https://railway.com/* X: https://x.com/RailwayJake Cooper:* LinkedIn: https://www.linkedin.com/in/thejakecooper/* X: https://x.com/JustJakeTimestamps00:00:00 Introduction: What Is Railway?00:02:07 Jake's Path to Railway00:06:13 Railway's Six-Year Growth Story00:08:52 Rebuilding the Business After the Free Tier00:11:17 Agents as the Next Software Platform00:13:29 Railway's Infrastructure Philosophy00:15:42 Bare Metal, Cloud Economics, and the Compute Crunch00:17:22 Cloud Bursting and Five-Cloud Networking00:20:20 Data Center Debt and Infra Financing00:23:31 Data Centers in Space00:25:24 What Agents Need From Infrastructure00:28:24 CLIs, Canvas, and Agent-Native UX00:35:15 Central Station, Incidents, and Responsible Disclosure00:40:30 Safe Rollouts, SRE Agents, and Production Forks00:45:00 AI SRE, Specs, Code, and Tests00:48:24 Self-Replicating Infrastructure and the New Serverless00:53:18 Heroku, Temporal, and Workflow Engines01:04:07 Railpack, Nixpacks, and Lazy-Loaded Filesystems01:06:01 Coding Agents, Token Spend, and Roadmap Acceleration01:10:56 The Pull Request Is Dying01:12:28 Feature Flags and the Agent-Era SDLC01:16:15 Cattle, Pets, and Cloning Machines01:19:29 Solo Founder Lessons01:24:12 Focus, GPUs, and Building a New Cloud01:28:20 Closing ThoughtsTranscriptAlessio [00:00:00]: Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs, and I'm joined by Swyx, editor of Latent Space.Swyx [00:00:10]: Hey, hey, hey. Today we're in the studio with Jake Cooper of Railway.Alessio [00:00:14]: Conductor of Railway.Swyx [00:00:15]: Conductor at Railway. Yeah.Alessio [00:00:16]: Choo-choo.Swyx [00:00:17]: Do you actually have that anywhere, like on your business card?Jake [00:00:20]: We call some of our volunteer moderators conductors. I don't have a business card. We're not that big yet. At some point I will. I got handed a nice business card from the Supermicro folks, and I was like, “Damn, this is pretty official.”Swyx [00:00:30]: Business cards are coming back.Jake [00:00:32]: They're cool. They're hip. The conductor thing is good. We're trying to figure out what we want to call each other internally. Some people think it's super cringe and say, “You don't need a name for people internally.” Some people want to call each other something. We still don't have a really good one.Jake [00:00:55]: We've got New Railcrews, Trainiacs. Nothing has stuck yet.Swyx [00:01:00]: I like Trainiac. Trainiac sounds good. Railwayians. For those who don't know, what is Railway? Let's give people a crisp definition up front.Jake [00:01:09]: Railway is the easiest way to ship anything. You go to the canvas, or you talk with Claude, and you say, “Deploy a Postgres instance, deploy my GitHub repository, run this code,” and you're off to the races.Swyx [00:01:22]: You've got a nice animation on the landing page.Jake [00:01:24]: Thank you. None of my work, by the way. They don't let me touch the design stuff anymore.Jake [00:01:25]: We want to make it trivially easy not just to deploy things, but to evolve applications over time. Most tooling right now stacks entropy on top of entropy: Docker, Kubernetes, Ansible scripts, and all these other things. If we can version all of your software and keep track of all the changes, then we can make it trivial to clone environments, fork into a parallel universe, get copies of production data, get copies of any services, make changes, validate them, and collapse them back in without reproducing everything across a staging environment.The Railway Origin Story: From Uber Systems to a New CloudSwyx [00:02:07]: I was looking at your background: Bloomberg, Uber. Nothing immediately stands out as, “This guy is going to found the next great platform as a service.” What prepared you for Railway?Jake [00:02:21]: It was curiosity to keep going deeper. I started out on front-end stuff, working on Wolfram Mathematica and porting it over. Then I briefly moved to Bloomberg, then toward Uber and distributed systems, taking the Jump Bikes systems and moving them to a distributed system built on top of Cadence, the pre-Temporal Temporal.Swyx [00:02:44]: Which, by the way, I'm happy to talk about, pros and cons.Jake [00:02:48]: Totally.Swyx [00:02:51]: But let's do the Railway story.Jake [00:02:52]: It has been a continual step of wanting an experience. Whether it's walking up to a bike, unlocking it, and having it work frictionlessly, or something else, the depth required to make that happen follows from the experience. A lot of the work I do, and a lot of the team does, is in service of that experience. We fundamentally don't care how deep we have to go. We will swim to the bottom of the swimming pool to get the experience.Jake [00:03:17]: I don't have a physics PhD. I did an EECS degree. It has always been about figuring out the next step: how do we get there? That's what led to starting Railway for that experience and then moving all the way to bare metal data centers. I was adding patches to the kernel this week to get the experience there because I can see how much better it can be.Swyx [00:03:49]: Other patches to the Linux kernel this week?Jake [00:03:51]: Yeah. Not upstream. Our fork.Swyx [00:03:52]: That's a flex. Railpack? No, this is different. This is the OS on top of Railpack?Jake [00:03:57]: No, this is an actual kernel patch. It's always literally: what do we have to do to get that experience? Then figure it out. Anything is figureoutable.Swyx [00:04:10]: Would you send the patch upstream, or does it not fit other use cases?Jake [00:04:13]: Maybe. We have to work out the experience internally. It has to do with the storage layer we're building for some of the agentic stuff. Maybe it'll be useful upstream, but it's deeply useful for us internally.Open Source, Forks, and Non-Deterministic VersioningSwyx [00:04:29]: You mentioned open source before. How do you think about starting from open source, and then coding agents letting you do a lot more from forks of it?Jake [00:04:38]: GitHub's original sin is that it's almost a series of broken pointers. You have this thing, then you clone it, and now you've lost the whole upstream. How do we make it trivial for people to modify really small pieces of it?Jake [00:04:51]: We think of Git in a discrete sense: I've either made a change and merged upstream, or I haven't. What would it look like if it were percentage-based, a little more non-deterministic, or a stream of changes that users traverse as a percentage rolled out in general and then rolled all the way up?Jake [00:05:13]: We have the open-source kickback program and let you deploy templates because we want to make it trivial for people to version these shards over time. It solves a large problem around authentication, authorization, and security. NPM has a way to define, “Don't take any new packages.” The ideal end state is that you roll out progressively to users with the minimum impact zone and continue rolling up. JPMorgan should probably be the last one on the patch line, for all our sakes, because our money and livelihoods are there.Jake [00:05:53]: It's okay if Johnny Vibe Coder gets a broken patch because there's so much entropy in the system that the rubber has to meet the road at some point. You have to test at varying levels.The Long Grind: First Users, Free Tier, and Making the Business WorkSwyx [00:06:13]: I wanted to pull up this glorious chart, which is your usage or number of daily signups?Jake [00:06:22]: Daily signups, I think.Swyx [00:06:24]: You started six years ago. It was a slow grind, and now you're on a rocket ship. You say, “Don't doubt your fight and don't quit.” Maybe pick out certain points that were key inflections for the company.Jake [00:06:40]: At the start, it's about getting your first 100 users, hell or high water. We had a website and a support link. The support link was the Discord channel. I had notifications on with two monitors: the monitor I was working on and the other monitor with Discord. If anybody came in, I was immediately like, “Hey, how's it going?” It was rare, so getting those first 100 users to come back was the start.Jake [00:07:14]: Then you build a consultancy factory because users want all these things. You have to go back to the board and ask, “What is the actual product offering I want to build on top of this?”Jake [00:07:28]: VCs want charts that always go up and to the right, but in reality you don't necessarily want charts that look like that. For us, there have been periods of expansion where we add features to test use cases, and periods of compaction where we ask, “If the experience we have is good, how do we make it significantly better?” Maybe we strip out features that don't fit our ICP anymore.Jake [00:07:57]: The boom from 2022 to 2023 came from the free tier. Everybody under the sun was using it.Swyx [00:08:09]: A lot of Reddit bots and Discord bots.Jake [00:08:12]: And crypto miners. When you build an open product on the internet where anybody can sign up, the internet is a horrible place with so many things. You go through periods of asking, “How do I reach as many people as possible?” Then, “How do I fit the exact use case for the people who really matter and are really excited about this specific thing?”Jake [00:08:39]: Then there was a two-year period of making the actual business work. During the free-tier era, we were losing about half a million dollars a month.Swyx [00:08:59]: On a $20 million bank account.Jake [00:09:02]: On a $20 million bank account with maybe $50,000 a month in revenue. That's a horrible business. I don't know how anybody invested. But you have to go through it and say, “We have an experience people love, but the business has to work.”Jake [00:09:17]: There are two schools of thought. You can run the horrible business all the way up with bad margins, or you can go back and make it work. We've always wanted a super lean team. We're 35 people right now. It's very small.Swyx [00:09:36]: Supporting three million already?Jake [00:09:38]: Yeah. We're adding 100,000 users a week right now, so it's growing fast. We don't want to add headcount for the sake of headcount or throw bodies at problems. We want to build systems. It's hard to build systems during expansion because you're adding things to the system because people are asking for them or things are breaking.Jake [00:10:00]: We had to cut off the free users for a little while, rebuild the business, and make sure it worked. We want to reach as many people as possible because software is important. It's become difficult to create things in the physical world, so it's important to make it easy for people to build in the virtual world and have access to creation. But there are legs to that journey.Jake [00:10:30]: You can see divots in the charts. If you follow between 2025 and 2026, it's either summer or winter. People go on holiday with family.Swyx [00:10:50]: It affects that much?Jake [00:10:51]: Yeah. It's kind of B2C and kind of B2B. People are shipping constantly, then they stop. Our activation curve now shows more people activating on weekdays because we have more business users, so it smooths out over time.Agents as the New Interface to DeploymentSwyx [00:11:17]: Was there a point where you started prioritizing AI development or agent development?Jake [00:11:24]: We've prioritized agentic as a top-of-funnel thing. Over the last six months, we've deeply prioritized agentic as a mechanism to build and deploy things because we believe the curve is so steep and that is how people will build and deploy software.Jake [00:11:42]: It almost fundamentally doesn't matter whether this is dot-com or not because we're all on the internet anyway. If agents are going to deploy a bunch of things and we hit an inference wall at some point, we'll fix those problems. The dominant species over the next 10 years is that we've moved from assembly to C to C++ to JavaScript to words. You're going to need to close that loop.Swyx [00:12:13]: When you say this is dot-com, did you mean buying the domain, or the general case?Jake [00:12:17]: I mean the dot-com era, when companies had a huge run-up because people understood the internet was important. Then they hit bottlenecks, fundamental laws of physics, math didn't work, and everybody came back down to earth. But it didn't matter because the internet became so impactful. If you operate on a long enough time horizon, you should build these things anyway because you can see where it's going.Jake [00:12:45]: That's where I think a lot of agent stuff is. You get to a point where you're running thousands of agents in parallel. What is the inference cost? What is the compute cost? How do you make that efficient? How do you coordinate all this? We have issues coordinating humans; we don't even have good tooling for that. Now we have to figure out how to get agents to coordinate, safely version changes, and know when to raise their hand for someone to intervene. Otherwise it becomes an interrupt factory.Railway's Infrastructure Thesis: Network, Compute, Storage, and MetalSwyx [00:13:19]: Let's go right into the technical side. What are the core infrastructure or architectural beliefs of Railway that allow you to do what you do?Jake [00:13:29]: The primitives matter a lot for us. We need network, compute, storage, and orchestration around it. You need control over a lot of those things. We've talked a lot about how we don't really use Kubernetes because we want higher-order control to place workloads in very specific places.Jake [00:13:48]: The reason is that you have to be very efficient with agents: memory reuse and all these other things, or you're going to massively blow up your cost structure. Being able to rack and stack your own servers and build your own metal unlocks performance and cost. Experiences where you're running 1,000 agents in parallel are not massively cost prohibitive.Jake [00:14:13]: Token use and compute use are blowing up. Over time, those things have to get a lot more efficient. You can get a lot of margin to make those experiences solid by building your own metal. That's all in service of offering a differentiated experience to as many people as humanly possible.Swyx [00:14:51]: You have a data center in Singapore.Jake [00:14:53]: Yeah. We have two in every other region now. In Singapore, we're adding a second one in Q3.Swyx [00:14:58]: What's it like? I've never built a data center. Do you go to Equinix and say, “I want some slots?”Jake [00:15:05]: Yeah. Equinix. You basically go and say, “I want power and I want a cage.” They say, “Great, here's what it's going to be.” You rent the cage for a period of time, fill it with racks and servers, and hook up internet to it. That's all the pieces.Swyx [00:15:36]: Then you handle everything else.Jake [00:15:37]: You handle everything else.Swyx [00:15:39]: What's the math versus clouds doing it for you?Jake [00:15:43]: If we rented in the cloud, our payback period when we go to metal is about three months.Swyx [00:15:50]: Which is crazy.Jake [00:15:51]: It's nuts. That's four years of depreciated hardware. You're going to see a lot of this compute crunch because hyperscalers are buying up a lot of stuff. We're working directly with OEMs, resellers, and people building these machines: Supermicro, Dell, and others.Jake [00:16:11]: Upstream, there's a bunch of supply pressure. When we raised our last round, between deploying capital for servers and now, the amount of money we've raised is less than the amount of money we have in the bank plus the value of the servers because the servers have appreciated as RAM has gone up. It's nuts how valuable hardware has become.Jake [00:16:50]: If you look at hyperscalers, they deployed around $80 billion of capital expenditures this year, and next year will be more. That's a massive infrastructure build-out. You look at that and think it's crazy that they're spending way more than the Manhattan Project. But if every person is going to run dozens or hundreds of agents in parallel, you have no conceptual idea how much compute is required to make that experience happen, even if you're deeply efficient and sharing resources. And that doesn't even count inference.Swyx [00:17:22]: How do you plan the build-out? The growth chart is so vertical. Are you usually at 100% utilization as soon as racks are live? How far ahead are you planning?Jake [00:17:33]: We still maintain cloud presence for bursting. We work with AWS, GCP, and a few other clouds. We can rent, and then the moment we get space or power, we compact those workloads off the cloud. We started on the clouds, then built a system to migrate to our own metal. There's nothing that says you can't continually do that again, and that's exactly what we do. We never want to be compute constrained.Jake [00:18:09]: At the start of the year, we actually became compute constrained because one upstream provider wasn't able to give us quota at the rate we needed, and the hardware was slower. I spent a weekend rebuilding our entire network overlay so we could straddle five clouds: Oracle, AWS, ourselves, GCP, and one other one. We can do more than that now.Jake [00:18:38]: We got into a spot where we were trying to pack instances tight because we couldn't get enough compute. That led to a few reliability issues, which are now past us. I made a tweet pointing out that it's becoming harder and harder to acquire compute at the rate these models need to acquire compute. We got bit by it.Swyx [00:19:15]: How do you think about pricing knowing you might not have your own metal available at all times? Are you pricing assuming you need extra margin if you end up going into the cloud?Jake [00:19:26]: Because we've built out our metal data centers, our margins on metal are around 70%. We can deeply subsidize the cloud business if we want to scale at a reasonable rate. We have a few levers: metal, which makes the margins; cloud burst; debt to buy servers; and venture capital. It's an interesting operational problem: how much cash do we have, how much should we raise, how quickly can we deploy it, and can we scale revenue as quickly as we scale compute?Jake [00:20:05]: If we continue making it trivially easy for people to build and deploy, then the faster we close that loop and the more operationally excellent we are with capital, the faster the business can scale. It's almost a straight linear deployment rate.Financing Infrastructure: Hardware Debt, VC, and Operational LeverageSwyx [00:20:20]: I think infra startups raising debt is a tool people don't utilize enough or know enough about. What can you tell us about that? Is it secured against your CPUs?Jake [00:20:32]: It's secured against our hardware.Swyx [00:20:37]: What rates do you get? Who are the lenders?Jake [00:20:39]: We pay prime plus a spread, and we can refinance any of the debt as rates go down. The terms are pretty good. The unfortunate thing is that Twitter has no nuance, so people say, “Venture debt bad.” But as with all things, there are specific tools and areas where you can be deliberate instead of using one tool as a hammer. Venture capital is not the hammer for everything. You have to explore and figure out what works.Swyx [00:21:12]: VC is usually the most expensive financing you can get.Jake [00:21:15]: Yeah. I also think people think about VC incorrectly from a capital-raising perspective. Most people think, “How do I raise as much money as possible from whoever is probably the best I can get at that time?” That's close to right, but what we've tried to do is figure out what unfair advantage we can buy with that equity.Jake [00:21:34]: It's the most expensive equity you're going to give away at that point in time, assuming the company keeps getting better. How do you use it to work with someone stellar who complements you? In the seed stage, I had never started a company. Ray Tonsing had good advice, and I could text him all the time. He was really fast. Awesome.Jake [00:22:01]: Then with John and Erica at Unusual, they said, “You roughly know what you're doing building a product. We'll mostly leave you alone and be available for advice.” Amazing. Then we got to Series A and the business was an operational tire fire because we didn't know how to scale a business. Work with Erica, and Jordan is over at Redpoint, so bonus.Jake [00:22:28]: Now we've raised from TQ and FPV as we're moving into enterprises. Every step of the way, we've asked: who can we partner with at this specific time to unlock the next section of the journey? I don't know enterprise sales. As an engineer, I can eyeball what features we might need, and we have wonderful people internally who can help. But you want boardroom dynamics where everyone is aligned and asking, “How do we win this?” instead of bickering about strategy.Data Centers in Space and the Physics of ComputeSwyx [00:23:31]: You had a tweet about data centers in space. Why no data centers in space?Jake [00:23:37]: It's not “no data centers in space.” My hot take is that I think it is solvable. I've just never seen anybody solve it.Swyx [00:23:49]: You said, “How are you going to dissipate that much heat in a vacuum?” You're making a physics claim.Jake [00:23:55]: I haven't seen anybody prove how you're going to dissipate that much heat in a vacuum. It doesn't mean it's not possible. It just means nobody has brought it up yet.Swyx [00:24:05]: Astrophage.Jake [00:24:06]: I don't know what that is.Swyx [00:24:07]: The Martian thing. Okay, you're very logical.Jake [00:24:09]: It could work. A lot of people are putting the cart before the horse. They say, “We're going to put data centers in space.” Okay, but how? “We have time to figure it out.” It's like in The Martian where they ask how they're going to intercept something and say, “We'll figure it out.”Swyx [00:24:36]: Making a bet on human invention is weird because you blind trust that it can be solved. But with physics, there are first-principles bounds you can put on it. Maybe not. Maybe you're asking to travel time or break a fundamental thermodynamic law.Jake [00:24:57]: I don't know how VCs do this either. How do you know what's not possible and a grift versus what's possible but sounds completely insane? “We're going to put data centers in space.” Coin flip as to which it is, and I guess you'll know in 10 years. That's one cycle.What Agents Need: Versioning, Observability, and 1,000x ScaleSwyx [00:25:23]: Moving back to agents. The branching, fast spin-up, and orchestration you do feels like pre-work that happened to be exactly what agents want. What do agents want differently than humans?Jake [00:25:37]: They want the ability to version things. It's not that different; it materializes slightly differently. Agents want a way to test changes incrementally. Engineers have feature flags. Is there a reason agents can't use feature flags? I don't think so.Jake [00:25:54]: They want version control. Can we use Git or not Git? That one is up in the air. I think something outside Git will emerge for how we version these things over time. They need observability. You need to query what happened, when it happened, which steps failed, traces, logs, metrics, and all the rest. They need network, compute, and storage. They need to write files, save files, iterate on files, and snapshot file systems.Jake [00:26:25]: A lot of what humans needed is in line with what agents need. Branching and forking are not different; we're just moving 1,000 times quicker. It can look like you need something massively different, but what you need is something massively better than what existed. You need orchestration massively better than Kubernetes. You need networking probably better than Envoy. It goes all the way down the stack.Jake [00:26:55]: If the workload profile doesn't change so much as it gets massively compressed because you need thousands of these things, what assumptions change? etcd is going to melt. You need to replace it with something. You can go all the way down the stack and say, “That part has to change, that part has to change, and that part has to change.”Jake [00:27:19]: The interesting thing about the super-exponential curve is that you have to build systems where you can rip out those parts at any time because a new bottleneck might emerge. You get good at parallel agents, and a different part of the system breaks. So it's similar to what humans needed, but at 1,000x scale.Jake [00:27:55]: How do you do code review in the age of agents?Swyx [00:28:00]: You throw more agents at it.Jake [00:28:01]: You don't. But then who reviews for CVEs and all these other things?Swyx [00:28:07]: More agents.Jake [00:28:08]: And that's how we hit the inference wall. You can continually throw agents at the problem, but I think there's a limit to the number of agents you can throw at a problem.CLI, Agent Handles, and Closing the LoopSwyx [00:28:24]: You already had a CLI before it was cool. How is the shape of what you're exposing changing, if at all?Jake [00:28:28]: CLIs have always been cool. The CLI changes because we think about how to give Claude, Codex, ChatGPT, or any model a handhold.Jake [00:28:50]: A CLI is a single command: deploy, get logs, and so on. Things that were prohibitively annoying to humans are not annoying to agents. They're nice. If I handed you a CLI with 40 arguments and 600 flags, you'd think, “I'm never going to use all of this.” But if you hand it to an agent, it says, “This is excellent. I have so many handles to work with.”Jake [00:29:24]: If you're going to expose things to agents that way, you want as many handles as possible where they can get information, query dynamic information, and close the loop quickly. Most problems right now are about how to close the loop as quickly as possible. Where does the agent get stuck, and how can you remove that?Jake [00:29:49]: Telemetry is important. If you can tell where the agent gets stuck from the CLI and say, “12% of people deviate from the happy path because of this, and now I add this argument and drive it down to 2%,” you massively increase the rate of loop closure.Jake [00:30:03]: That's how we think about not just the CLI, but every point in the dashboard. It's a user journey: I hear about Railway. I get something deployed. I get my first green build or aha moment. I see an endpoint, logs, whatever. Then I iterate. The iteration loop is indefinite. The user wants to deploy a new thing, a Postgres instance, change code, and keep iterating.Jake [00:30:36]: If you focus on the iteration loops and what's blocking them from closing quickly, one thing we say internally is: you never want to be waiting on compute anymore. You always want to be waiting on intelligence. If you're waiting on compute, there's a bottleneck that needs to be destroyed because eventually that bottleneck becomes so large that another workflow emerges to change it.Jake [00:31:04]: We've built a product where you push code, build it, and so on. But I fundamentally believe the push-pull loop is going away. We'll get to a point where you make a small change in production, that change is versioned across your infrastructure, you're working alongside copy-on-write versions of your database and infrastructure, and then you merge it in and it's instantaneously live. That's the holy grail of loops. The push-pull-rebuild thing is a point of friction that we're removing entirely.Canvas as Output: Dashboards, Context Anchors, and HyperstructuresSwyx [00:31:43]: It's incredibly fast. If anyone hasn't tried it, that fast feedback is great. My hot take is that Railway was famous for its canvas, which visualizes your infrastructure and lets you manipulate it visually. But that was for humans. For the next phase of growth, Railway CLI is more important than canvas.Jake [00:32:05]: The canvas is funny because it's a mechanism to show changes over time. You're right that previously we used it a lot as an input. Moving forward, its goal is more like an output. You would go to the canvas, make changes, see them, and watch your infrastructure evolve. Now agents have access to the CLI and can make those changes. So the canvas becomes an output: what information does the human need at this moment to make suitable decisions about control requests? Do I approve this or not?Jake [00:32:57]: It also has to be an anchor for your context, a port in the storm. Think of it like layers in a file system. You start with a project, then drill down into services, then into a function or code, because you want to represent the entire thing not just in your head, but in the canvas. Other people can share that representation, think on the same wavelength, and move quickly.Jake [00:33:33]: A lot of organizations get in trouble as they scale because all the context lives in someone's head. “How does this microservice work?” “I have no idea; go ask this person.” Then you have whole categories of products built around context discovery. A lot of that melts away if you have a solid hierarchy and can infinitely nest services, code, context, and everything else all the way down. That's what lets you build these structures over time.Jake [00:34:18]: It's also what lets us build what I've called hyperstructures: things that are way bigger. You look at the Golden Gate Bridge and ask, “How did we build that?” There's a meme that we lost the technology. To some extent, yes, because the coordination that built those things evolved and changed. We lost some of the art of building structure as we jammed everything into Slack.Swyx [00:34:52]: But you jam everything in Discord.Jake [00:34:53]: Same point. It doesn't matter. It's message passing and interrupts, message passing and interrupts.Swyx [00:35:00]: So you're arguing there should be something better and more structured than Slack?Jake [00:35:04]: Yeah. For sure. I think Slack is awful, and Discord is awful too.Central Station: Context Routing, Support, and Incident ClustersSwyx [00:35:09]: This is the equivalent of my mom test. What have you done that has your solution to this?Jake [00:35:15]: Internally, we've built a tool called Central Station that aggregates all the context from our users. Every piece of feedback, every customer support item, everything gets aggregated into clusters. If an incident is brewing, we can determine how many users are affected and break off a discussion based on that.Jake [00:35:40]: That is more helpful than long-running channels where you're trying to decide which channel to put something in. If you can dynamically aggregate information and dynamically route it to the right person based on context, it works better. We know internally that these four people are close to networking. If we see a networking thing, we can drill it down to those four people. If it's with this part, we can look at the commits. This is no longer a manual process internally.Jake [00:36:13]: If you go to station or help.railway.com, that's why we built it. We wanted to scale with a massive amount of leverage by aggregating feedback.Swyx [00:36:27]: This is built in-house?Jake [00:36:28]: Yep.Swyx [00:36:29]: I remember helping out on this one with Angelo in 2023. You scale a lot with a very small team.Jake [00:36:38]: Yeah. We're about 10 times bigger now.Swyx [00:36:40]: You have your full developer code here? Very cool.Jake [00:36:44]: If you go to railway.com/stats, we expose this as a pub-sub-able thing. It's all real-time metrics. There's a way to get it as JSON somewhere if you care.Jake [00:37:01]: We're big on trying to build everything in public and talk about what we're working on. We've had issues in the past, and we'll say, “Here's how we're fixing these things.” We've gotten compliments and flak for incident reports. We're always trying to make them better and talk with people.Incidents, Disclosure, and Progressive RolloutsSwyx [00:37:20]: You had a big one recently. I liked that it was scoped to 3,000. You presumably used Central Station. Talk through what happened and how you address it internally as a team.Jake [00:37:38]: Internally, this one really sucked. It had to do with an upstream provider that didn't do the behavior it said it documented, which is unfortunate given they wrote the RFC for how the behavior should work. We rolled those things out, and Central Station caught it initially when a couple users said caches weren't invalidating. We turned it off immediately.Jake [00:38:03]: When you roll out to a large user base of three million people, you get a lot of disparate behaviors. We tested in staging and had tests, but we hit an edge case. We've hardened those systems, and now we can make that better. But it was a tough one.Swyx [00:38:39]: I always wonder how private disclosure is supposed to work if people find an issue. Are they supposed to contact you first? When you run a platform, these things will happen. What channels should people pursue to quietly resolve it before it becomes a bigger incident?Jake [00:38:59]: There's responsible disclosure. We err on the side of over-disclosing and letting you know something is wrong versus having your provider gaslight you. We've erred on sharing those things more publicly, even if they impact a small subset of users. That's a decision we've made internally. We have four values. One is honor. The honorable thing is to notify people to the widest degree at which they may have been affected or there was an issue, and then confront it head-on: why did it happen, what can we do better?Swyx [00:39:45]: Not the whole user base. That's because of incremental rollouts and other things?Jake [00:39:50]: Yeah. Progressive rollouts.Swyx [00:39:54]: That should be the norm at all large platforms.Jake [00:39:58]: It should. A variety of companies do this. There's the quote that Meta runs 10,000 different versions of Meta. To our earlier point about agents, they need the same thing. They need shadow traffic and all these other things. We've built so much ceremony around production being sacred that we need to make it trivially easy to test different behaviors in a safe environment. Then you can make mistakes in a safe environment.Safe AI SRE: Customer Agents, Forked Environments, and Production ParityAlessio [00:40:30]: Do you see a world where these things get automatically caught, not necessarily by your agent, but by your customer's agent? The cache invalidation issue seems easy to check if you know to look for it.Jake [00:40:44]: It's hard because to determine it, we almost need to hook into your observability infrastructure. That's why we have the template loop on the platform: so you can roll things out progressively. You can roll out to Johnny Vibe Coder initially, or push a shard that someone consumes at their own leisure. Or you can roll it out over weeks: 0.1% of people, 1% of people, early adopters, then all the way up. That's the non-deterministic version control we talked about earlier.Jake [00:41:30]: I believe that's where most things should go, because most companies end up building staged rollout systems in-house. It's the same thing built again and again at every company. There's a massive opportunity to consolidate developer debt.Alessio [00:41:45]: You should have a free tier. Model providers give free tokens if you let them use the data. You could give free compute if someone is the number-one shard that goes out and lets you plug into their observability.Jake [00:41:55]: We do that. That's why we talked about the impact on 3,000 people. We start with lower-impact people. Larger companies on the platform are last to receive those rollouts so they have a version of the platform that's deeply stable.Alessio [00:42:16]: I have three services, so I'm sure I get the first rollout. You can nuke my thing at any time. There are all these SRE agent companies. Observability people also want agents that fix upstream problems. You have your own agent in the canvas now. How do you see that playing out?Jake [00:42:39]: It's the stacking entropy problem. If you don't have primitives to make iteration in production safe, it becomes difficult. If you're an observability provider saying, “Here's the fix to this error,” assume 80% are good and make sense. But in the last 20% long tail of complex issues, if you let somebody stamp it, you create an opportunity for an incident.Jake [00:43:08]: That's why forked environments are important. People have staging, but it always drifts from production. You need primitives, workflows, and experience built first-party on the platform so you can fork any service at any point in time.Jake [00:43:33]: I think of the canvas as a sheet of transparency paper. The agent is a little guy you push up into the canvas. It should say, “I need to copy that service and that service so I can test these two things.” It gets a read-only copy of production. Anything that's PII gets marked as a transform when we clone the database, create a copy-on-write version, or read from it. Then the agent makes changes and asks, “Does this actually work?” as close to production as possible.Jake [00:44:22]: That's how close you have to be, or you get massive drift. The system becomes unstable. You see this with massive systems built on Docker for local, Kubernetes for production, and a specific thing for something else. That complexity slows developers and becomes unstable at scale, making it hard to iterate. We want to compress that way down and say, “As close to prod as possible is where we want to be.”From AISRE Skeptic to Agent BelieverSwyx [00:45:00]: I was texting Erica for questions, and she says you were originally not a believer in AISRE. Have you come around on it?Jake [00:45:10]: I flipped, but I'm still not a believer in AISRE if you don't have the primitives to make it safe. If you unleash AISRE on production infrastructure without safe primitives for copying volumes and making sure things are fine, it's going to nuke your production database. It's not a matter of if, but when. I'm a big believer in making those loops safe.Jake [00:45:33]: I was a deep AI skeptic until 2023. In 2024, I thought, “Maybe I can roughly make this thing do it.” In 2025, I thought, “Now I can hold this.” Over winter break, everybody came back saying, “It's almost impossible to hold this.”Swyx [00:46:01]: Did you see this on the Claude docs? CloudBot? OpenCloud?Jake [00:46:06]: It's gotten to a point where it's harder to hold it wrong than to hold it right. There's a scene in Avengers where Vision picks up Thor's hammer and says it's terribly well-balanced. It self-balances and works well. I'm a deep believer at this point that this will be the dominant species: assembly, C, C++, JavaScript, words.Swyx [00:46:35]: It feels like a big jump.Jake [00:46:37]: It is. But it's not like you abandon CPU-based discrete logic and move straight to fuzzy logic. You need both. Your skills should call code or applications or some static structure. You can use skills to distill what the procedure should be or how the code should act.Jake [00:47:02]: I'm coming to a thesis: you need three points. You need a clear spec defining the system, the code, and the tests. When you say it out loud, if you've been in engineering long enough, you're like, “Of course. That's an RFC, tests, and code.” But they all matter. Having them together lets them reinforce each other: the spec and tests match, but the code doesn't, so reconcile it. Or the tests and code match but the spec doesn't, so reconcile that. That's the iteration loop.Jake [00:47:41]: That's why you're seeing people talk about software factories, docs, and reconciliation. Some of that is architectural astronomy if you don't implement it, but that loop is where most things will end up.Swyx [00:48:07]: For listeners, we've been talking about this on the pod for three years: the holy trinity of specs and tests. Itamar Friedman from Qodo is the reference if people want to look it up.Self-Modifying Infrastructure and the End of Push-Pull-RebuildSwyx [00:48:18]: One thing I want to mention on the OpenCloud idea is self-modification. I don't know how Railway would support it, but I have my OpenClaw, and I just tell it it has the Railway CLI and can do whatever. In theory, whatever capabilities or new infra it needs, it can call the Railway CLI, provision it, and add it to itself. The agent can modify its own infra.Jake [00:48:45]: It's nuts. I have a loop set up where you put the Railway CLI on top of something that runs on Railway. You're authenticated as whatever the current box is, and you can make any changes to it. Then you call Railway deploy, and it deploys itself.Jake [00:49:04]: It's like: “I need to spin up this instance of this environment. I already exist in this environment. Excellent, I have access to a Postgres instance now.” That's where we want to go with agentic, self-replicating infrastructure. That's your loop: iterate in production. You continue making changes. If it works, merge it upstream. If it doesn't, throw it away.Jake [00:49:37]: How do you make throwaway copies trivial to spin up and super cheap? The era of “I have an AWS instance with four vCPU and 16 gigs of RAM” is going to get destroyed. If you do that for agents, you need a thousand of those machines. It's prohibitively expensive compared with what we've spent a ton of time figuring out: the atomic unit of deploy, whether you call it isolates, sandboxes, or something else. Only pay for what you use, spin up instantaneously, and close the loop as quickly as possible.Jake [00:50:15]: If the system can self-replicate safely and say, “This is my environment, I'm making these changes,” it can come back with, “Does this look good? This is a new state of infrastructure given this prompt. I think I've solved it.” Then you go back and say, “Actually, it looks different.” It does the loop again. Then you say, “Cool. Apply.”Swyx [00:50:38]: That's retroactively obvious, which is the most useful kind. Any other comments on agent deployment on Railway?Jake [00:50:51]: It's getting better every day. I'm on X or Twitter. You can always yell at me about the parts not working as well as they should, because plenty of things should work way better.The New Serverless: Stateful, Long-Running, Pay-for-What-You-Use LinuxSwyx [00:51:04]: At this stage, when people want massively or embarrassingly parallel compute, they usually talk serverless. I feel like there's a new serverless compared to the previous five years of serverless. You're in that new bucket. Do you have comparisons or philosophical differences you want to call out?Jake [00:51:31]: It's somewhere in between. It's the ability to run stateful, long-running workflows or executions.Swyx [00:51:42]: Vercel has Fluid Compute, Cloudflare has some container thing, Google has App Runner and others.Jake [00:51:55]: That's where everything is roughly going, and it's why we've been working on this for six years. We believe users need access to a computer: a box that speaks Linux. They need to deploy what they want. Other systems change the surface area of what you can build. For us, users need a computer and need to deploy anything they truly want. That's why we've focused on the primitives: network, compute, storage. If we give you those and expose them so you can run things indefinitely, that's where we believe it's going.Jake [00:52:43]: Twitter has no nuance, so everyone says “servers” or “serverless.” It's always somewhere in the middle: I want to run it for a long time, but I don't want to provision the resource statically or pay for things I'm not using. That's been our thesis from day one: pay only for what you use, run it indefinitely, and it is full Linux.Swyx [00:53:12]: That's why I like the naming of Fluid. It's fluid. Flexible.Heroku, Focus, and Carrying the Torch Without Becoming the PastSwyx [00:53:18]: Another milestone is the Heroku official deprecation. You're one of the presumptive new Herokus. “New Heroku” has been a category for as long as I've been in developer tooling. It's finally happening. What was that like? Any behind-the-scenes of, “This is the moment”?Jake [00:53:42]: You have people where you're like, “You were running stuff on here? You, as this company?” It's crazy that names you would know are running on it and now coming to us saying, “We want to move a lot of this off.”Swyx [00:54:00]: Any behind-the-scenes on why Salesforce let Heroku stagnate?Jake [00:54:05]: I can only guess. It's hard when it's not your business. Salesforce's business is to build a great CRM. That's their focus. Then you acquire a compute business as an offshoot. A lot of early Meta people talk about focus. Boz has a write-up about how in the early days of Meta they had no money, so they were forced to focus. Then they turned on the money tree and had no reason not to split their focus.Jake [00:54:52]: But that dilutes your product. You get offshoots where you ask, “Is this the focus of the business?” If it's not core, it languishes. A lot of companies get in trouble when they split focus because they're fighting a multi-front war, not just externally but internally for alignment. Where are we going? What are we doing? What is our purpose?Jake [00:55:24]: If you're Salesforce-built and mission-driven, you want to work on Salesforce. Heroku is off to the side. It's not core to the business. Getting resources, budget, focus, and alignment internally becomes hard. It was a matter of time.Swyx [00:56:06]: Kudos for them to call it out instead of leaving it unknown.Jake [00:56:12]: Their release was a little odd. They called it out, but they didn't say they were shutting it down. Behind the scenes, I think they issued messages to people saying they should close accounts and that they were going to deprecate and remove things over time.Jake [00:56:30]: It's crazy because some of my first deployment experiences were on Heroku. You start with dragging things into an FTP server, then you try to get a deploy working, and then it's Heroku. It was the on-ramp for us. But the wheel turns. New things emerge. We're happy to carry the torch for a lot of that. But we don't want to be the new Heroku. We want to be the way people build and deploy software, and ultimately the way people monetize software over time.Swyx [00:57:19]: It's still a big crown to be the new Heroku. There are 50 companies that fought for that.Jake [00:57:23]: Everybody is holding some portion of it. We're happy to support people and companies. The platform works differently. The game loop is similar, but we've been dogmatic about where these things are going: primitives, agents, fan-out. Some things fit; some workflows need to change. We have an approximation of Heroku pipelines with the environment system. It's exciting. We've got a ton of people we can support, and it's growing a lot.Temporal, Workflow Engines, and State MachinesSwyx [00:58:12]: I have one more technical question about Temporal. I've sold my shares. You're a power user and one of our earliest customers. I met you through Temporal. You built on Temporal. You have complaints. This may be the most neutral and informed conversation anyone will hear about Temporal without someone working at the company.Jake [00:58:39]: That's fair. I've used Temporal for almost 10 years because of Cadence at Uber.Swyx [00:58:52]: Give people a sense of what Cadence was at Uber.Jake [00:58:57]: Cadence was the precursor to Temporal. It powers trip actions, rides, when you rent a Jump bike or scooter or car. You're running workflows for a period of time and saying, “This ride will run indefinitely until it finishes.” You attach information: you paused in this zone, so add this charge to the bill. When you end the trip, the workflow is done. That experience was powered by Cadence at the time.Swyx [00:59:34]: I used to say it's like programming the entire user journey top-down as one function.Jake [00:59:39]: It's a powerful idea and important. It's also important for the next phase of the agentic journey. You want an agent to do a specific task, be complete or incomplete on that task, and move on to the next thing. You need a way to manage workflows dynamically.Jake [00:59:59]: Temporal was always great in theory, and great when you got it working the way you wanted in production. But it required you to model the entire journey in your head. If you didn't, you could cause issues where replaying the state of the workflow causes non-determinism.Swyx [01:00:25]: Because it works on deterministic workflow history.Jake [01:00:28]: Exactly. I describe it as a jet engine. If you know how to operate it and run it, it's great. But you can't hand it to people trying to build complicated things if they don't have the whole state in their head.Jake [01:00:48]: We run our whole deployment pipeline on top of it. That's a reasonably complicated workflow: pre-commit hooks, signaling, queuing, and all the rest. We ran into the same thing at Uber. As you express a large workflow, it gets more complicated, with more states in the state machine that you have to map back to the workflow.Swyx [01:01:15]: It's a lot of ifs.Jake [01:01:16]: Exactly. At Uber, we built a system for doing the state machine and testing it. We've started to build some of those things here because it's grown heavily. It's not quite love-hate. When it works well, it works super well. But if someone who doesn't have full context puts something into the system that invalidates state or causes non-determinism, or spins off a ton of activities, you have to keep track of underlying SRE knobs like activity slots. Those should scale with memory, vCPU, and so on. It becomes a bear to scale.Swyx [01:02:10]: You need a capable sysadmin running things behind the scenes. If you moved off, what would you do?Jake [01:02:19]: We'd build our own workflow engine. We have a few internally that we've worked on.Swyx [01:02:27]: This is one of those classes of things you typically wouldn't vibe code, but I'm wondering if you can.Jake [01:02:33]: I still don't think you should vibe code it. You still want to run decent tests to make sure it works.Swyx [01:02:39]: Timo didn't invent that from scratch either. There are libraries you can run. On top of that, it's just a state machine that you have to map out. Ultimately, you define the instructions you want and run them through a state machine.Jake [01:03:00]: It's very doable. Workflow stuff is interesting. Restate is doing neat stuff here.Swyx [01:03:10]: You're tied into JavaScript. Are you a JavaScript maxi?Jake [01:03:13]: Internally, we have TypeScript, Rust, and Go. We don't add more languages. Actually, we have a little C because we write BPF code and hooks. But those are the languages.Swyx [01:03:28]: Is this for sidecars?Jake [01:03:32]: No. It's for the networking stack, volumes, and things like that. We use TypeScript a lot because it powers the dashboard, but we're moving a lot of workflow stuff off the dashboard stack and into the infrastructure stack.Railpack, Nixpacks, and Content-Addressable FilesystemsSwyx [01:04:00]: Cool. Any other technical infrastructure stuff? Railpacks?Jake [01:04:07]: We built an engine for determining dependencies based on source code. It's called Railpack. We built the first version, Nixpacks, on top of Nix, and then we moved.Swyx [01:04:17]: People have been trying to get me to adopt Nix and NixOS for four years. Is it ever going to be a thing?Jake [01:04:23]: I don't know. We're excited about it, but it has pain points. Think of it as a stack of versioned binaries at specific slices in time. If you want version X and version Y, you bloat the package space, which blows up image size and makes real-world workloads difficult.Swyx [01:04:53]: But you content-address it and cache it. In theory, there are optimizations.Jake [01:05:00]: In theory, yes. But with a large enough user base and disparate enough machines, you run into a problem Meta described in the XFAAS paper, their internal serverless system. It becomes difficult at scale unless you break out specific runtimes.Jake [01:05:24]: We didn't want to do that because we wanted to truly allow you to deploy anything. That was our initial thing with Nix. But we've moved toward interesting work around content-addressable file systems that can lazy-load anything from any point and page it into memory.Swyx [01:05:48]: Amazing.Jake [01:05:49]: The future is very bright. It's crazy, and it's going to be nuts.Coding Agent Spend, Roadmaps, and Token ROISwyx [01:05:54]: Founder journey stuff?Alessio [01:05:56]: Your cloud usage: you tweeted you're going to spend $300K this month?Jake [01:06:01]: I think we got to $200K.Alessio [01:06:02]: Coding agents?Jake [01:06:03]: Yeah.Swyx [01:06:04]: Across the company?Alessio [01:06:05]: You only have 35 people, so I'm sure they're not all spending $10K a month. What's the distribution?Jake [01:06:10]: I think I'm at about $25K. We have power users all the way down. We came back from winter break, and I basically said, “If you're writing code by hand, you're doing this wrong.” The tools are good enough now that you can move extremely quickly. There are issues and pain points, but you should be reviewing the code you are writing instead of writing it by hand.Jake [01:06:40]: Architectural patterns matter more now than ever, but you shouldn't spend your time generating code you would write. If you know how to write it, ask the agent to write it and reconcile it until it looks like you would have written it yourself.Jake [01:06:58]: People misconstrue my propensity to push people toward agents as connected to our growth and some reliability bumps. They're not necessarily related. The tools are good enough to move extremely quickly and build things way larger than you could before.Jake [01:07:19]: To the earlier point about cooling data centers in space: I don't know. But with software, you can ask, “How would I build block storage from scratch? How would I do these things?” I have ideas because I have history and have read papers. Let me work them out and build massive test benches with thousands of tests, because those are now free to author. If you're not using AI systems to speed-run your roadmap and reconcile your existing system onto the future, you're missing a large point of what's happening.Alessio [01:08:12]: What's the path to spending $3 million a month? Is it bound by ideas and things customers can absorb?Jake [01:08:19]: For most companies, it's bound by deployment at this point. That's why we've seen a massive boom in users and companies, from Fortune 50s down, asking how to get developers to move faster. You'll probably hit your CFO before any technical limits because they'll look at the eye-watering amount of money spent on tokens. Inference costs have to come down, but we're inference constrained now. There will be price discovery around what makes sense for an org to adopt.Jake [01:09:06]: I think you'll end up with the F1 driver concept. If someone is really adept at these things, it makes sense to put them in a $3 million car. If they're not, it probably doesn't make sense. You'll take a few people and say, “You can drive the F1 car. We need to go in this direction. Figure out if it works and prototype it.”Jake [01:09:33]: We've done some of that and vastly accelerated our roadmap. We thought we'd ship something in a few years; now we can probably ship it in a few months because we validated it and don't have to build it incrementally. We can skip steps and move toward our vision.Alessio [01:09:58]: A lot of people are realizing the roadmap doesn't always have a business impact, so they say tokens are too expensive. But if your roadmap were built to make more money by the time you built it, you'd have token pricing for it, the same way you do with sales. You'd spend a billion dollars on sales if you knew you would get $2 billion of revenue.Jake [01:10:19]: Exactly. A naive way to measure this is the percentage of tokens that end up in production. If you can measure impact because those tokens end up in production, that's awesome. But the burden of proof will rise. Internally, we have a growing number of pull requests that haven't merged. The question becomes: how do you get this into production? It's about how quickly you can build and deploy software, which is exciting because that's our whole thing.The SDLC Shift: Prompt Requests, Feature Flags, and Safe RolloutsSwyx [01:10:56]: The SDLC is changing. One thesis is that the pull request is dying. It's going to be the prompt request. Beyond that, code review is also kind of dying if you have all the other systems in place. What else is changing about the SDLC?Jake [01:11:19]: The AISRE and the tools to make it happen. AISRE is pie-in-the-sky aspirational. What does it take to get an AISRE? What tools do you need to build?Swyx [01:11:32]: You should expose your tooling to customers at some point. The Central Station command center.Jake [01:11:39]: We have it for template maintainers. Template maintainers can deploy and maintain templates, and they get feedback. We're going to expose those things incrementally.Swyx [01:11:51]: Clustering around incidents. Everyone has a version of that, but I don't think anyone has solved it.Jake [01:11:56]: I won't say we've solved it internally, but it's gotten so good that we can see incidents forming pretty quickly. At some point, those will be things either someone else builds or we build. We've always built things purpose-built for us. If it makes sense to make it useful for users, monetize it, or turn that loop into a profit center instead of a cost center, we want to do that.Jake [01:12:28]: Pull request is definitely dying.Swyx [01:12:29]: Do you do first-party feature flagging and incremental rollout stuff?Jake [01:12:34]: We have a feature-flagging engine we built internally and will eventually roll out.Swyx [01:12:38]: I don't see it as a user. How come you didn't give us what you have?Jake [01:12:43]: We have to beta test it. We care a lot about the quality of the things. There's plenty we've used internally that doesn't make it all the way through the journey because it fails. It works for one service but not multiple services. We'd have to build it for multiple services and know that if we released it, we'd rebuild it again and again. Some things are worth that, but many inform the roadmap.Jake [01:13:18]: We don't want to dilute the experience by saying, “This works, but only for this service,” unless it's a core initiative. Over the next few months, we'll roll out things that work for a single service, then multiple services, then multiple services across the environment. You have to be deliberate. Otherwise you create broken disparate experiences and support load because people ask how to use the feature.Jake [01:13:52]: It's the earlier expansion and compaction pattern. You expand the company to get features, then compact and smooth them out so the experience is stellar. You told me in the hallway, “It's gotten so much better.” Internally we're saying, “This part really sucks. We need to make it significantly better.”Swyx [01:14:11]: I can attest to that over the last three years watching you build Railway. For listeners, feature flagging is a huge part of Uber culture. So much so that they have too many feature flags and another thing to remove feature flags. Facebook has Gatekeeper. Agents are going to need this. It's fundamental to incremental rollouts. OpenAI acquired Statsig. GPT-5 is routing and flagging through different models.Jake [01:14:56]: It's super important. If the software development lifecycle is going to change because we're doing things 1,000 times faster and 1,000 times more concurrently, what becomes important at scale?Jake [01:15:16]: Before I started Railway, I built a feature-flagging product and tried to sell it. It was an easier version of LaunchDarkly. I ran into a problem: anyone small enough to adopt your technology doesn't care about feature flags, and anyone large enough to need feature flags needs so much scale that you have to build out all the infrastructure. I scrapped it.Jake [01:15:42]: But what is old is new again. Companies are trying to move quickly, but you can't YOLO a vibe-coded thing straight into production. You need to say, “Here's my blast radius, my impact, and I want to shadow it for these users.” Feature flags. You're going to need the tools larger companies built to maintain their structures. Everything gets compressed by 1,000x so everybody can build those structures quickly.Jake [01:16:07]: That's exactly where we are: compressing the software development lifecycle, then expanding it and adding more new things.Cattle, Pets, and Clonable InfrastructureSwyx [01:16:15]: Another term that comes to mind for newer developers is “cattle, not pets.” People treat production like a pet. It has a name. You baby it and keep it alive. With cattle, you can mass farm, roll out, portion parts out, and kill them.Jake [01:16:37]: I think that might change. You can move toward having pets as long as you have a cloning machine for your pets.Swyx [01:16:52]: Yeah.Jake [01:16:52]: If you can snapshot every single thing at every frame, it doesn't matter if something gets obliterated because you have a snapshot of it. The things we've built right now are designed to block changes from the hermetically sealed DevOps line. You have to write a Dockerfile because you nee

Sharp Tech with Ben Thompson
(Preview) Inference in the Agentic Future, xAI Is Two Companies in One, Q&A on Elon's Lawsuit, Intel, Apple

Sharp Tech with Ben Thompson

Play Episode Listen Later May 15, 2026 25:57


Ben and Andrew discuss the future of computing and its implications for the chip market, including what Cerebras is doing that's different, why speed may no longer be a top priority for inference, good news for China's AI ecosystem, the future for Nvidia, and questions on Pat Gelsinger's role in Intel's revival. From there: Both sides of the Anthropic-xAI deal, including Anthropic's compute solution and the triumph of market principles, as well as the market's message to Elon Musk and xAI, and the implications for SpaceX. At the end: Thoughts on Musk's OpenAI lawsuit, a theory on Apple's gross margins and a land grab, and a listener's wife enters founder mode with Claude.

The Brave Marketer
Your AI Chats Aren't Private (And How "Unlinkable Inference" Can Help)

The Brave Marketer

Play Episode Listen Later May 13, 2026 30:07


Ken Liu (Computer Science PhD at the Stanford AI Lab) and Erik Chi (CS PhD at UMich) are the Creators of the Open Anonymity Project, which lets people prove things about themselves online without revealing their identity. In this episode we explore what it means for AI systems to "know" you; why today's so-called privacy modes fall short; and how the next generation of AI systems could be built with privacy as a default, rather than an afterthought. Key Takeaways:  What "unlinkable inference" means and why it changes the privacy model of AI chat tools What actually happens to your data the moment you hit "send" in a typical AI system Why incognito mode in AI tools is largely a UI illusion, rather than a real privacy protection The role of metadata in identifying and profiling users, and how "secretary models" could enable personalization without sacrificing privacy How anti-censorship and privacy intersect in a future dominated by agentic AI systems Why now is the time to rethink assumptions about privacy in AI tools Guest Bio: Ken Liu is a Computer Science PhD student at the Stanford AI Lab, advised by Percy Liang and Sanmi Koyejo. His research focuses on foundation models and data/user privacy, and the intersection between the two. His recent work studies the privacy properties of AI (such as membership, memorization, and unlearning), and various AI privacy tools (such as anonymization, differential privacy, and federated learning). His papers have earned spotlights at top venues, and his findings have been deployed at scale on Android. Ken also led a team to a 1st-place win at the US-UK PETs Prize sponsored by the White House OSTP and the UK Government. Previously, Ken spent time at Google DeepMind, Carnegie Mellon University, Meta, Apple, and Amazon. Erik Chi is a CS PhD at UMich, advised by J. Alex Halderman. His research focuses on security and privacy, particularly network security and anti-censorship. He worked on a new standard for implementing and distributing censorship circumvention protocols—a standard that's now being adopted by VPN vendors to help millions of users access the free Internet. He also did content moderation (surveillance) and recommendation systems at ByteDance before realizing how censors will evolve in the AI era. ---------------------------------------------------------------------------------------- About this Show: The Brave Technologist is here to shed light on the opportunities and challenges of emerging tech. To make it digestible, less scary, and more approachable for all! Join us as we embark on a mission to demystify artificial intelligence, challenge the status quo, and empower everyday people to embrace the digital revolution. Whether you're a tech enthusiast, a curious mind, or an industry professional, this podcast invites you to join the conversation and explore the future of AI together. The Brave Technologist Podcast is hosted by Luke Mulks, VP Business Operations at Brave Software—makers of the privacy-respecting Brave browser and Search engine, and now powering AI everywhere with the Brave Search API. Music by: Ari Dvorin Produced by: Sam Laliberte  

Sadler's Lectures
William Clifford, The Ethics Of Belief - The Limits Of Inference

Sadler's Lectures

Play Episode Listen Later May 11, 2026 14:00


William Clifford, The Ethics Of Belief - The Limits Of Inference by Lectures on classic and contemporary philosophical texts and thinkers by Gregory B. Sadler

SaaS Metrics School
What Belongs in AI COGS? The Financial Framework SaaS Companies Are Scrambling to Build

SaaS Metrics School

Play Episode Listen Later May 9, 2026 4:24


Are AI inference costs already eating into your gross margin — and you can't even see them on your P&L? In episode #370, Ben Murray breaks down exactly what belongs in AI COGS for SaaS companies offering an AI-first or AI-infused product line. Inference bills are stacking up fast, infrastructure-layer spend is the surprise line item nobody priced in, and most finance teams haven't built the GL account structure to capture any of it cleanly. If you don't get the framework in place now, you'll be reporting AI gross margin you can't actually defend by next quarter — and your board will notice. The 5 cost categories every AI COGS framework needs — inference, model hosting/GPU infrastructure, the AI infrastructure layer, monitoring and observability, and AI-specific support Why AI inference costs deserve their own GL account — and shouldn't be buried inside your cloud hosting bill where they disappear The surprise cost line one industry report flagged as the #1 unexpected AI expense — hiding in data platform usage, networking, and egress How to structure your COGS cost centers so you can deliver clean margins by AI product line, not just lumped together at the company level Why token tracking by customer cohort (heavy / medium / light users) is now table stakes for any AI product sold as a subscription The deployed-engineer question: should AI support tickets sit with tech support or a specialized team — and how that decision rewires your margin model Tune in to get the AI COGS framework in place before your gross margin lands on a board slide you can't defend. Resources Mentioned Ben's new AI course: https://www.thesaasacademy.com/ai-finance-metrics-saas Ben's blog post: What Should Be Included in AI COGS: https://www.thesaascfo.com/what-should-be-included-in-ai-cogs/ SaaS Metrics Foundation course: https://www.thesaasacademy.com/the-saas-metrics-foundation

Learning Bayesian Statistics
#157 Amortized Inference & BayesFlow in Practice, with Stefan Radev

Learning Bayesian Statistics

Play Episode Listen Later May 6, 2026 78:43


Support & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free)Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome workTakeaways:Q: What is simulation-based inference and what does "sim-to-real" mean?A: Simulation-based inference (SBI) uses a mechanistic simulator as an epistemic tool: you train a neural network on a large number of labeled simulations and then deploy it on real, unlabeled data. The "sim-to-real" framing captures the key asymmetry -- your network never sees real data during training, only simulations, but it generalizes to real observations at inference time. This is the opposite of the more common "synthetic-for-ML" approach, where fake data is used purely to augment real training data.Q: What is the amortized inference agent skill and what does it do?A: It's an open-source AI agent skill, co-developed by Stefan and Alexandre, that teaches an AI coding agent to run a complete, state-of-the-art amortized inference workflow. Because amortized inference is recent enough that it's underrepresented in LLM training data, vanilla agents tend to get it wrong. The skill injects the right methodology: it guides the agent to set up the simulator, choose the right network architecture, run a pilot, train with appropriate diagnostics, and produce an actionable report -- without the user needing to know the details.Q: What is calibration coverage and why should you never skip it?A: Calibration coverage tells you whether your posterior uncertainty is honest -- whether your credible intervals actually contain the true parameter at the right frequency. A model can show poor parameter recovery yet still be well-calibrated (because it's falling back on the prior), or it can appear to recover parameters while being poorly calibrated. Running calibration diagnostics both in-sample and out-of-sample is especially revealing for hierarchical models, which often appear to underfit in-sample but generalize much better out-of-sample thanks to shrinkage.Full takeaways hereChapters:00:00:00 How does amortized inference fit into the Bayesian workflow?00:12:03 What does "sim-to-real" mean in simulation-based inference?00:15:57 Why is amortized inference particularly suited to psychology and neuroscience?00:21:51 What is the amortized inference agent skill?00:39:00 What is calibration coverage and how do you interpret it?00:41:50 How do you decide what to do next after your first training run?00:44:53 How do actionable insights make Bayesian workflows more usable?00:49:08 What are the unique challenges of hierarchical models in amortized inference?01:00:51 What is the current state of BayesFlow's support for hierarchical models?01:05:00 What are the main failure modes of amortized inference and how do you handle model misspecification?Thank you to my Patrons for making this episode possible!Links from the show

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups
Baseten CEO Tuhin Srivastava on the AI Inference Crunch, Custom Models, and Building the Inference Cloud

No Priors: Artificial Intelligence | Machine Learning | Technology | Startups

Play Episode Listen Later May 1, 2026 42:57


Baseten CEO and co-founder Tuhin Srivastava sits down with Sarah Guo and Elad Gil to discuss the rapid growth of AI inference demand, Baseten's 30x growth, and why inference is becoming the strategic “last market.” Tuhin Srivastava argues the application layer will persist because companies with unique user signals can encode value into workflows and post-train specialized models, citing examples like Abridge and support workflows. The conversation covers GPU capacity constraints, Baseten's multi-cloud fabric across 18 clouds and 90 clusters, long-term contracting dynamics, the importance of the software layer for stickiness, evolving workloads, multichip possibilities, and operational lessons at scale. Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @Tuhinone   Chapters: 00:31 Baseten growth 01:55 Why the app layer wins 05:57 Serving frontier customers 07:55 Open source model mix 09:21 Chinese models and geopolitics 13:07 Custom inference dominates 14:22 Post training acquisition 17:10 When to invest in custom models 18:35 Supply crunch and data centerse 22:25 Longer GPU Contracts 24:09 What Makes a Winner 26:07 Multi Chip Future 28:19 Runtime Roadmap 31:08 Scaling Edge Cases 33:48 Hiring and Leadership 36:44 Operations Pager Culture 38:19 Efficiency Drives Demand 40:41 Concierge Everything Future 42:34 Conclusion

2024
Inference Provider – Gestione del rischio Cyber – Minori online

2024

Play Episode Listen Later May 1, 2026


Nuovo scontro Ue-Meta, questa volta sul controllo dell'età in Facebook e Instagram. Secondo un'indagine preliminare della Commissione Europea Meta non ha protetto adeguatamente i minori di 13 anni, permettendo loro di navigare indisturbati sui propri social. Ci spiega di più Innocenzo Genna, esperto di regolamentazione europea in ambito digitale. Parliamo anche degli altri possibili divieti di social e siti nella UE.La crescente esposizione agli attacchi cyber e la stringente normativa europea sulla cybersecurity impongono alle aziende una gestione proattiva e preventiva del rischio cyber. In questo contesto è importante trovare soluzioni capaci di garantire anche che dati altamente sensibili rimangano sotto giurisdizione UE. Ne parliamo con Marco Riccardi Ceo e cofondatore di QuoIntelligence, scale up che ha raccolto capitali per un valore di 7,3 milioni di euro.Soprattutto per le aziende sarà fondamentale sviluppare soluzioni di IA usando infrastrutture private, rispetto ai grandi cloud provider come OpenAI, Anthropic, ecc. Per ottimizzare i consumi e i costi e per la confidenzialità e la sovranità dei dati. Ne parliamo con Andrea Pili, amministratore delegato e co fondatore di Xference, start up che ha creato una delle prime inference factory in Italia e ha annunciato un accordo con Aruba.E come sempre in Digital News le notizie di innovazione e tecnologia più importanti della settimana.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
How to Engineer AI Inference Systems with Philip Kiely - #766

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Apr 30, 2026 54:51


In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and model serving. Philip shares how research-to-production can move in hours, not months, and why understanding “the knobs” of inference—batching, quantization, speculation, and KV cache reuse—lets teams design better products and SLAs. We trace the inference maturity journey from closed APIs to dedicated deployments and in-house platforms, discuss GPU lifecycles, and survey today's runtime landscape, including vLLM, SGLang, and TensorRT LLM. Finally, we look ahead to agents and multimodality, making the case for specialized, workload-specific runtimes when performance and efficiency matter most. The complete show notes for this episode can be found at https://twimlai.com/go/766.

Interpreting India
An African Perspective for Building AI for Global South | AI Summit Special

Interpreting India

Play Episode Listen Later Apr 30, 2026 48:07


This episode is part of our special series on the India AI Impact Summit, examining the conversations, decisions, and debates that are shaping global AI governance. Raymond draws a distinction early in the conversation that shapes everything that follows: training and inference are not the same thing, and conflating them is leading a lot of countries to make expensive mistakes. Training, he says, is like building the engine. Inference is running the transport system every single day. Most countries do not need to build the engine. What they need is airports, roads, and reliable infrastructure that gets the technology into the hands of people. The global assumption that frontier model training is the only legitimate AI pathway is, in his view, one of the more consequential misreads of the moment. On the ground realities of building in Africa, Raymond is specific about where the bottlenecks actually are. It is not ambition. It is power reliability, cost of connectivity, access to capital, and the kind of financing frameworks that have not yet caught up with what AI infrastructure actually requires. He points to genuinely interesting anomalies, such as Ethiopia's extremely low cost of power sitting alongside very limited terrestrial fiber diversity, as a reminder that building in the Global South is not about replicating Silicon Valley at a discount. It is about finding combinations of constraints that can actually be made to work, and optimising for reliability, cost efficiency, and practical impact rather than scale and prestige. His advice to governments is to start with problems, not hardware. Prestigious projects with no clear use case, over-regulation before a single GPU cluster exists, and attempts to rebuild sovereign versions of large compute clusters are all, in his view, things to ignore. What countries should actually invest in is reliable and clean power, public interest compute access, data governance frameworks, sector specific pilots in health, agriculture, and education, and talent development that works by getting the technology into the hands of people rather than running structured boot camps. For Raymond, the success metric for Africa in five years should not be the size of anyone's model. It should be whether AI has meaningfully improved economic productivity and public service delivery across the continent.Episode Contributors Nidhi Singh is an associate fellow at Carnegie India. Her current research interests include data governance, artificial intelligence and emerging technologies. Her work focuses on the implications of information technology law and policy from a Global Majority and Asian perspective.  Raymond Ononiwu is the founder of Horus Lab, a technology and infrastructure company building Africa's next-generation digital backbone through modular, renewable-powered, AI-ready data centers. An engineer with more than 15 years of experience delivering products across Mixed Reality, Windows Analytics, and Teams Copilot, his work has powered platforms relied on by hundreds of millions globally. Every two weeks, Interpreting India brings you diverse voices from India and around the world to explore the critical questions shaping the nation's future. We delve into how technology, the economy, and foreign policy intertwine to influence India's relationship with the global stage.As a Carnegie India production, hosted by Carnegie scholars, Interpreting India, a Carnegie India production, provides insightful perspectives and cutting-edge by tackling the defining questions that chart India's course through the next decade.Stay tuned for thought-provoking discussions, expert insights, and a deeper understanding of India's place in the world.Don't forget to subscribe, share, and leave a review to join the conversation and be part of Interpreting India's journey.

The Engineering Leadership Podcast
Scaling TensorFlow, Navigating Startup Pivots, ML Edge Infrastructure and AI Inference Strategy w/ Rajat Monga #256

The Engineering Leadership Podcast

Play Episode Listen Later Apr 28, 2026 40:34


Rajat Monga, CVP AI Frameworks @ Microsoft, joins the podcast to discuss his leadership and founder journey, from Google Brain / Tensorflow to inference.io and back to Microsoft. He dissects what it means to refound vs. start from scratch, the value of the open source community, and strategies for discovering what problem to solve when going the startup route. We also cover how to determine your users' hidden incentives and what that means for both product development & marketing, along with navigating the balance between a product's usefulness and consumers' willingness to pay for it. Additionally, Rajat shares about what he's currently up to at Microsoft and the emerging ML / AI technologies he's most excited about.   ABOUT RAJAT MONGA Rajat Monga is responsible for enabling an efficient AI stack at Microsoft from cloud to the edge. Before joining Microsoft, Rajat was founder and CEO of Inference.io, a smart analytics platform powered by AI. During his decade-long tenure at Google, he co-founded and led TensorFlow, and was a founding member of Google Brain. He's built out and led many engineering teams, and designed large scale distributed systems including web scale crawling and eBay's search engine. Rajat is a graduate of IIT Delhi.   Unblocked: The context engine your coding agents are missing. Give your coding agents the context your best engineers have. Your agents can read code, but they don't know how your team works. Rules and MCPs give access to information but not understanding. That's why you still have to tell them where to look and what to look for. Unblocked gives your agents the history, conventions, and decisions behind your code so they generate mergeable output without the back and forth. It automatically surfaces the right context for every task, so agents stay on track without the set up tax or the correction loops. getunblocked.com/elc   SHOW NOTES: Rajat's journey with Google Brain: Scaling deep learning from single PCs to thousands of machines with Jeff Dean & Andrew Ng (2:57) Moving from Google Brain to TensorFlow: Why new hardware and architectures required a total system rebuild (6:02) The "refounding" question: Choosing between starting from scratch or evolving an existing system (8:33) Why Google open-sourced TensorFlow to set industry standards and avoid supporting external copies (10:16) How open-source enabled global innovation, from Japanese cucumber sorting to African plant health (12:02) Transitioning as a leader: Why Rajat left Google during the height of TensorFlow to found a company (13:57) The discovery phase at inference.io: Navigating the pivot from IoT into solving data analytics gaps (15:31) Lessons on PMF: Moving beyond a "useful" product to one that solves a truly critical customer pain point (16:52) Why habits are harder to change than technology and the challenge of competing with established workflows (21:02) Marketing strategies: Tailoring personas for top-down prestige versus bottom-up personal efficiency (23:19) Deciding when to stop: A founder's framework for re-evaluating bets based on current knowledge (24:57) Rajat's new role at Microsoft: Overseeing Edge infrastructure and large-scale Cloud AI inference (27:46) Dissecting ML edge strategy: Using ONNX Runtime to unify AI performance across Windows, iOS, and Android (30:02) Edge AI trends: Shifting from experimental models to production models optimized for cost and privacy (31:20) The future of Edge: How on-device processing will power AI in robotics, smart glasses, and wearables (33:23) Scaling inference: Treating multi-GPU clusters like a distributed operating system for AI models (34:25) Rapid fire questions (37:45)   LINKS AND RESOURCES Epic Disruptions: 11 Innovations That Shaped Our Modern World - Innovation expert Scott Anthony masterfully weaves together the fascinating stories behind history's most transformative disruptions—from ninth-century China to twenty-first-century Silicon Valley. Through eleven pivotal innovations, including the printing press, mass-produced automobiles, the McDonald's revolutionary food system, and the iPhone, Anthony reveals the hidden patterns behind world-changing breakthroughs.   This episode wouldn't have been possible without the help of our incredible production team: Patrick Gallagher - Producer & Co-Host Jerry Li - Co-Host Noah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/ Dan Overheim - Audio Engineer, Dan's also an avid 3D printer - https://www.bnd3d.com/ Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/ Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

WSJ Tech News Briefing
TNB Tech Minute: Google Introduces New AI Inference Chip

WSJ Tech News Briefing

Play Episode Listen Later Apr 22, 2026 2:40


Plus: Anthropic investigates report of "unauthorized access" to its Mythos AI model. And energy company GE Vernova lifts yearly outlook on surge from data center demand. Julie Chang hosts. Learn more about your ad choices. Visit megaphone.fm/adchoices

google chip new ai inference julie chang tech minute
Cannabis School
San Fernando Valley OG / SFVOG

Cannabis School

Play Episode Listen Later Apr 21, 2026 25:32


San Fernando Valley OG is one of those strains that doesn't rush you… it settles you.Smooth inhale, calm head, relaxed body, then out of nowhere you realize you've melted into the chair a little more than planned.In this episode, we break down San Fernando Valley OG from Fire Star and how it actually shows up in real life, not just on paper. The flavor leans sour, earthy, a little cheesy on the back end, and surprisingly solid for older flower.What's driving that experience:Terpenes you'll usually see:• Myrcene – that heavy, body-relax feel• Caryophyllene – adds a little spice, helps take the edge off stress• Limonene – keeps your head from going fully foggy• Linalool – soft, calming, slightly floral layerThis combo is why it feels balanced at first… but if you keep going, it leans more into that relaxed, almost sleepy zone.Cannabinoids (typical range):• THC: ~18–24%• CBD: usually very low, often

SaaS Metrics School
AI Inference Costs Are Crushing SaaS Gross Margins — Here's What to Do About It

SaaS Metrics School

Play Episode Listen Later Apr 21, 2026 5:59


Is your AI SaaS company skating on thin ice because of exploding compute costs you're not tracking? In episode #365, Ben Murray tackles one of the most pressing financial challenges facing AI-first SaaS companies: the structural margin compression caused by LLM inference costs. Traditional SaaS was built on near-zero marginal cost per customer — that era is over. If you're building on top of AI, every prompt, query, and agentic workflow is a hard COGS line that scales with revenue, and if you're not managing it, it will quietly destroy your unit economics. Why AI-first SaaS companies are running 50–60% gross margins (vs. 70–80% for legacy SaaS) — and what Bessemer data shows about AI supernovas with margins as low as 25%. How inference and compute costs differ fundamentally from traditional SaaS COGS — and why they won't scale down the way hosting costs did Why token costs vary wildly (from $1–2 per million to $30–180+ for frontier models) and how that variability makes feature-level economics a CFO priority 5 tactical ways to reduce LLM spend: model routing, prompt caching, context compaction, semantic caching, and batch processing How to set up your GL accounts and COGS tracking to allocate inference costs by feature — so you actually understand the economics of what you've built Tune in before your next board meeting — because if you're not tracking AI inference costs at the feature level, you're flying blind on your most important unit economics. Resources Mentioned The SaaS CFO: https://www.thesaascfo.com/ Ray Rike — AI to ROI Newsletter: https://ai2roi.substack.com/ Tomas Tunguz: https://tomtunguz.com/ Fungies.io — 5 Ways to Save on LLM Costs: https://fungies.io

a16z
Network Effects, AI Costs, and the Future of Consumer Investing with Anish Acharya on The Kevin Rose Show

a16z

Play Episode Listen Later Apr 19, 2026 58:45


This episode originally aired on The Kevin Rose Show. Kevin Rose speaks with Anish Acharya, general partner at a16z, about how AI is rewriting the rules of consumer software, the defensibility of network effects in a world where anyone can spin up an app in 48 hours, and why the real threat to consumer founders may be the cost of inference, not competition. They also discuss model pricing, the future of the four-day work week, and peptides.   Resources: Follow Anish on X: https://x.com/illscience Follow Kevin on X: https://x.com/kevinrose   Stay Updated:Find a16z on YouTube: YouTubeFind a16z on XFind a16z on LinkedInListen to the a16z Show on SpotifyListen to the a16z Show on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Brave Dynamics: Authentic Leadership Reflections
Eugene Cheah: Open-Source AI and the Future of Work - E686

Brave Dynamics: Authentic Leadership Reflections

Play Episode Listen Later Apr 12, 2026 52:24


Is open source the true future of Artificial Intelligence? In this episode of the BRAVE Southeast Asia Tech Podcast, Jeremy Au sits down with Eugene Cheah, CEO and Co-Founder of Featherless AI. They dive deep into the architecture of the RWKV model, the intense global competition between open source and closed source AI, and how China is aggressively pushing an open source strategy to bypass chip constraints. Recorded with a focus on the Southeast Asian tech ecosystem, this episode breaks down the immediate impact of AI on the global south, specifically highlighting the vulnerability of the BPO and call center industries in the Philippines. Eugene also shares his extraordinary journey from building UIlicious to securing a $1M investment in San Francisco with no pitch deck, and his ongoing work with the Linux Foundation and the World Trade Organization to bridge the global AI language divide. Discover tactical insights into startup bootstrapping, macroeconomics, and the entrepreneurial mindset required to navigate the hyper-competitive deep tech space. Tune in to learn how to future-proof your business and stay ahead of the AI curve in Southeast Asia. 00:00 - Introduction & Featherless AI 02:59 - From UIlicious to AI Research 05:45 - RWKV & the Transformer Alternative 07:13 - Spinning Out Featherless as a New Company 09:10 - Fundraising in San Francisco 16:15 - Open Source vs. Closed Source AI 21:52 - China's Open Source AI Strategy 23:57 - Advantages & Disadvantages of Open Source 28:06 - Inference as a Service & Model Variety 32:13 - The Future of AI: Reliability & Specialization 36:35 - Personal Growth & Navigating AI Politics 39:01 - Policy Advice for Southeast Asia & Global AI Impact 43:39 - Multilingual AI & Closing the Global Divide 48:44 - Being Brave: Founding Story & Closing Reflections Watch, listen or read the full insight at https://www.bravesea.com/blog/eugene-cheah-featherless-ai-open-source-ai Get transcripts, startup resources & community discussions at https://www.bravesea.com WhatsApp: https://whatsapp.com/channel/0029VakR55X6BIElUEvkN02e TikTok: https://www.tiktok.com/@jeremyau Instagram: https://www.instagram.com/jeremyauz Twitter X : https://x.com/jeremyau LinkedIn: https://www.linkedin.com/company/bravesea English: Spotify | YouTube | Apple Podcasts Bahasa Indonesia: Spotify | YouTube | Apple Podcasts Chinese: Spotify | YouTube | Apple Podcasts #Singapore #China #Philippines #AI #ArtificalIntelligence #MachineLearning #Technology #TechNews #VentureCapital #Startup #Podcast #southeastasia #techpodcast

Amelia's Weekly Fish Fry
Solving the Memory Wall: A Deep Dive into AI Inference with Sandra Rivera

Amelia's Weekly Fish Fry

Play Episode Listen Later Apr 10, 2026 16:40 Transcription Available


This week, I'm excited to welcome Sandra Rivera from VSORA! We dive into a discussion on why AI inference is essential for deployment at scale, specifically focusing on how VSORA's patented software architecture addresses the "memory wall" by collapsing memory layers. We explore their recent tape-out, which promises approximately 3X the performance at half the power of leading GPUs. We also chat about deployment use cases, the need for low latency and high determinism, future plans for OEM modules and MLPerf benchmarking, and even get a brief look into Sandra's family llama farm.

Learning Bayesian Statistics
#155 Probabilistic Programming for the Real World, with Andreas Munk

Learning Bayesian Statistics

Play Episode Listen Later Apr 8, 2026 114:07


Support & Resources→ Support the show on Patreon→ Bayesian Modeling Course (first 2 lessons free): Our theme music is « Good Bayesian », by Baba Brinkman (feat MC Lars and Mega Ran). Check out his awesome workTakeaways:Q: Why is bridging deep learning and probabilistic programming so important?A: Deep learning is extraordinarily good at fitting complex functions, but it throws away uncertainty. Probabilistic programming keeps uncertainty explicit throughout. Combining the two – as in inference compilation – lets you get the expressiveness of neural networks while still doing proper Bayesian inference.Q: What is inference compilation and how does it relate to amortized inference?A: Amortized inference is the general idea of training a model upfront so you don't have to run expensive inference from scratch every single time. Inference compilation is a specific form of amortized inference where a neural network is trained to propose good posterior samples for a given probabilistic program – essentially learning to do inference rather than computing it fresh each query.Q: What is PyProb and what problems does it solve?A: PyProb is a probabilistic programming library designed specifically to support amortized inference workflows. It lets you write probabilistic models in Python and then train inference networks on top of them, making methods like inference compilation practical for real-world simulators and scientific models.Q: What are probabilistic surrogate networks and why do they matter?A: A probabilistic surrogate network is a learned approximation of a complex, expensive simulator that preserves uncertainty. Instead of running a costly simulation thousands of times, you train a surrogate that can answer probabilistic queries much faster – crucial for applications like risk modeling where speed and uncertainty quantification both matter.Chapters:00:00:00 Introduction to Bayesian Inference and Its Barriers00:03:51 Andreas Munch's Journey into Statistics00:10:09 Bridging the Gap: Bayesian Inference in Real-World Applications00:15:56 Deep Learning Meets Probabilistic Programming00:22:05 Understanding Inference Compilation and Amortized Inference00:28:14 Exploring PyProb: A Tool for Amortized Inference00:33:55 Probabilistic Surrogate Networks and Their Applications00:38:10 Building Surrogate Models for Probabilistic Programming00:45:44 The Challenge of Bayesian Inference in Enterprises00:52:57 Communicating Uncertainty to Stakeholders01:01:09 Democratizing Bayesian Inference with Evara01:06:27 Insurance Pricing and Latent Variables01:16:41 Modeling Uncertainty in Predictions01:20:29 Dynamic Inference and Decision-Making01:23:17 Updating Models with Actual Data01:26:11 The Future of Bayesian Sampling in Excel01:31:54 Navigating Business Challenges and Growth01:36:40 Exploring Language Models and Their Applications01:38:35 The Quest for Better Inference Algorithms01:41:01 Dinner with Great Minds: A Thought ExperimentThank you to my Patrons for making this episode possible!

Troubleshooting Agile
Disagree and Commit - Part II

Troubleshooting Agile

Play Episode Listen Later Apr 8, 2026 21:38


Are you starting from the bottom of the ladder? In the second episode of a two-part series on the practice of disagree and commit, Squirrel uses the ladder of inference to quiz Jeffrey on the reasoning behind his involvement in the sport of deer hunting, with the aim to demonstrate how two people with opposed beliefs can still find common ground on which they can move forward. LINKS: - Disagree and Commit: https://en.wikipedia.org/wiki/Disagree_and_commit - Test-Driven Development for People: https://itrevolution.com/articles/test-driven-development-for-people/ - The Ladder of Inference: https://douglassquirrel.com/LadderOfInferenceHandout.pdf -------------------------------------------------- You'll find free videos and practice material, plus our book Agile Conversations, at agileconversations.com And we'd love to hear any thoughts, ideas, or feedback you have about the show: email us at info@agileconversations.com -------------------------------------------------- About Your Hosts Douglas Squirrel and Jeffrey Fredrick joined forces at TIM Group in 2013, where they studied and practised the art of management through difficult conversations. Over a decade later, they remain united in their passion for growing profitable organisations through better communication. Squirrel is an advisor, author, keynote speaker, coach, and consultant, and he's helped over 300 companies of all sizes make huge, profitable improvements in their culture, skills, and processes. You can find out more about his work here: douglassquirrel.com/index.html Jeffrey is Vice President of Engineering at ION Analytics, Organiser at CITCON, the Continuous Integration and Testing Conference, and is an accomplished author and speaker. You can connect with him here: www.linkedin.com/in/jfredrick/

DevOps Paradox
DOP 344: KubeCon EU 2026 Review

DevOps Paradox

Play Episode Listen Later Apr 1, 2026 53:56


#344: Kubernetes is boring now. That's the whole point. KubeCon EU 2026 in Amsterdam -- likely the biggest KubeCon ever at more than 13,000 attendees -- made one thing extremely clear: the container orchestrator is done being interesting on its own. Every keynote, every new sandbox project, every vendor announcement pointed the same direction. AI. Inference. Agents. NVIDIA donated a DRA driver for GPUs to CNCF. Google open-sourced their cluster autoscaler and shipped a DRA driver for TPUs. Red Hat brought LLM-D for disaggregated inference. NVIDIA contributed the KAI Scheduler for AI workloads. The Gateway API now has an inference extension in beta -- model routing baked directly into the Kubernetes networking layer. And here's the thing Whitney pointed out that should make everyone pause: you can't even run inference workloads in containers. They can escape. You need micro VMs. So the container orchestrator is orchestrating things that aren't containers. The platform engineering conversation shifted too. The bottleneck isn't technology anymore -- it's culture. Getting teams to work together differently. And if your company can't trust its own employees to make decisions, good luck trusting agents. Viktor's take on the determinism objection was blunt: agents aren't deterministic, but neither are you. You just think you are. One thread that kept surfacing: agents as first-class platform users. Not agents doing agent things -- agents as the users your platform serves. Viktor sees it in real time -- pull requests created by agents, reviewed by his Claude, responses written by the submitter's agent. Humans aren't even in the conversation anymore. The new CNCF sandbox projects tell the story too. LLM-D, KAI Scheduler, Higress (AI-native gateway). And then Velero -- the Kubernetes backup tool that everyone assumed was already CNCF -- finally donated by Broadcom. Which raises a fair question: is CNCF becoming a dumping ground for projects companies don't want to maintain? Probably some of both. Viktor compared the current state to the first five years of Kubernetes -- everyone focused on low-level components, trying to figure out how to combine 57 different tools. The next wave will be higher-level platforms that bundle all of it. And somewhere underneath it all, the mainframe keeps running. Viktor's bet: it'll outlive AI.   YouTube channel: https://youtube.com/devopsparadox   Review the podcast on Apple Podcasts: https://www.devopsparadox.com/review-podcast/   Slack: https://www.devopsparadox.com/slack/   Connect with us at: https://www.devopsparadox.com/contact

Azeem Azhar's Exponential View
What NVIDIA's bet on OpenClaw means for the future of AI and your token budget

Azeem Azhar's Exponential View

Play Episode Listen Later Mar 25, 2026 36:35


Welcome to Exponential View, the show where I explore how exponential technologies such as AI are reshaping our future. I've been studying AI and exponential technologies at the frontier for over ten years. Each week, I share some of my analysis or speak with an expert guest to make light of a particular topic. To keep up with the Exponential transition, subscribe to this channel or to my newsletter:  https://www.exponentialview.co/ ---- Last week Jensen Huang shared the numbers from NVIDIA's order book: AI compute demand has grown a millionfold in two years. Much GTC coverage focused on chips, robots, data centers in space, but I think Jensen revealed something far more important in his keynote: “the inference inflection has arrived,” and this is about to transform how all companies should manage their budgets. The inference era is already the operating assumption of the world's most valuable company. In this week's podcast, I cover: (1:20) NVIDIA's $1 trillion order book (1:56) OpenClaw: our era's web browser (7:54) Training vs Inference: how AI is changing (12:50) Pre-fill vs. decode: the technical split (18:06) The Harness: why OpenClaw changes everything (18:59) The engine is useless without a car (22:21) From 100M to 870M tokens per day (24:29) Meet my agent R Mini Arnold's team (26:16) AI focus group simulations at $10–50 a run (29:36) Jensen's self-interest (and why he's still right) (33:07) AI governance: token budgets don't belong with IT (35:07) From training economy to inference economy Read my essay "Magnitudes of Intelligence" on Substack: https://www.exponentialview.co/p/the-hundred-million-token-day Access the solar supercyle model here: https://www.exponentialview.co/p/solar-supercycle ---- Where to find me: Exponential View newsletter: https://www.exponentialview.co/ Website: https://www.azeemazhar.com/ LinkedIn: https://www.linkedin.com/in/azeem/ Twitter/X: https://x.com/azeem Production by EPIIPLUS1. Production and research: Baba Films, Chantal Smith, Marija Gavrilov. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Not Investment Advice
260: Nvidia GTC, AI Inference Explained, Jensen new Steve Jobs & Super Micro's $2.5B Smuggling Scheme

Not Investment Advice

Play Episode Listen Later Mar 25, 2026 35:00


The NIA boys discuss Nvidia GTC, AI Inference Explained, Jensen new Steve Jobs & Super Micro's $2.5B Smuggling SchemeTimestamps(00:00:00) - Intro(00:04:21) - Meme of the Week(00:05:37) - Jensen new Steve Jobs(00:09:29) - Nvidia GTC(00:13:46) - AI Agents and OpenClaw(00:22:11) - AI Inference Explained(00:29:31) - Super Micro's $2.5B Smuggling SchemeWhat Is Not Investment Advice?Every week, Jack Butcher, Bilal Zaidi & Trung Phan discuss what they're finding on the edges of the internet + the latest in business, technology and memes.Subscribe + listen on your fav podcast app:Apple: https://pod.link/notadvicepod.appleSpotify: https://pod.link/notadvicepod.spotifyOthers: https://pod.link/notadvicepodListen into our group chat on Telegram:https://t.me/notinvestmentadviceLet us know what you think on Twitter:http://twitter.com/bzaidihttp://twitter.com/trungtphanhttp://twitter.com/jackbutcherhttp://twitter.com/niapodcast Hosted on Acast. See acast.com/privacy for more information.

The Next Wave - Your Chief A.I. Officer
AI Tool Better Than OpenClaw? + NVIDIA'S $1T Prediction & AI Image Wars

The Next Wave - Your Chief A.I. Officer

Play Episode Listen Later Mar 25, 2026 43:56


Get Matt's AI playbook: https://clickhubspot.com/kfcr Episode 102: Is NVIDIA really the “sun” at the center of the AI universe? Host Matt Wolfe (https://x.com/mreflow) and Joe Fier (https://www.youtube.com/@joefier) break down everything you need to know from NVIDIA's recent GTC conference, the hottest new AI tools for business and marketing, and the changing landscape of AI data, agents, and robotics. This episode dives deep into the explosive potential of NVIDIA's AI roadmap, why Jensen Huang thinks chip sales will hit $1 trillion, and how accessible agent tools like OpenClaw and NemoClaw could change everything for everyday users and enterprises. Plus, Matt Wolfe and Joe Fier explore the rise of data-for-hire side hustles like DoorDash Tasks, where humans help train AI in the real world, and the jaw-dropping athletic skills of the newest generation of robotics. Whether you're wondering where the money and innovation are flowing next—or concerned about the privacy, data, and future job market in the age of AI—it's all here in this packed, must-hear “special” episode. Check out The Next Wave YouTube Channel if you want to see Matt and Nathan on screen: https://lnk.to/thenextwavepd — Show Notes: (00:00) NVIDIA's Future and Growth (06:31) OpenClaw: AI Accessible to All (08:56) AI Compute Shifting to Inference (12:39) Accelerating AI Thinking Time (15:20) Nemo Claw: AI Assistant Revolution (19:00) Stock Buybacks Boost Value (24:11) NVIDIA's Expansive Industry Influence (28:34) Future Economy: UBI or Data Payment? (29:38) Data Privacy and AI Advertising (32:45) CAPTCHA Origins and Duolingo (38:18) Robots: Cool, But Not Smart (40:18) Farewell and Thank You — Mentions: Joe Fier: https://www.youtube.com/@joefier NVIDIA: https://www.nvidia.com/en-us/ Jensen Huang: https://www.linkedin.com/in/jenhsunhuang/ OpenClaw: https://openclaw.ai/ NemoClaw: https://www.nvidia.com/en-us/ai/nemoclaw/ Future Tools: https://futuretools.io/ Get the guide to build your own Custom GPT: https://clickhubspot.com/tnw — Check Out Matt's Stuff: • Future Tools - https://futuretools.beehiiv.com/ • Blog - https://www.mattwolfe.com/ • YouTube- https://www.youtube.com/@mreflow — Check Out Nathan's Stuff: Newsletter: https://news.lore.com/ Blog - https://lore.com/ The Next Wave is a HubSpot Original Podcast // Brought to you by Hubspot Media // Production by Darren Clarke // Editing by Ezra Bakker Trupiano

Interviews: Tech and Business
Deloitte CTO: Advice to CIOs on Enterprise AI | CXOTalk #912

Interviews: Tech and Business

Play Episode Listen Later Mar 25, 2026 53:00


Bill Briggs, CTO of Deloitte, shares findings and advice for Chief Information Officers (CIOs) from the 2026 TechTrends report: 93% of enterprise AI spending goes to technology and tooling, while only 7% of funding goes to culture, change management, and learning. Briggs explains why this imbalance drives failed pilots and runaway costs, and what leaders should do about it.

The Cloudcast
Three Thoughts from NVIDIA GTC 2026

The Cloudcast

Play Episode Listen Later Mar 22, 2026 28:00


SUMMARY: We dig into the NVIDIA GTC keynote and highlight three things - accelerated computing for everything, the complexity of the new inference stack, and NVIDIA's “open” software stack including NemoClaw.SHOW: 1012SHOW TRANSCRIPT: The Reasoning Show #1012 TranscriptSHOW VIDEO: https://youtu.be/aXOr91q76yMSHOW SPONSORS:VENTION - Ready for expert developers who actually deliver?Visit ventionteams.comSHOW NOTES:NVIDIA GTC 2026 (Keynote)NVIDIA NemoClaw - OpenClaw + OpenShell + NVIDIA Agent ToolkitNVIDIA adds Groq LPU to their rack systemsNVIDIA to invest $26B in Open Weight ModelsInterview with Jensen about Accelerated Computing (Stratechery)Topic 1 - Jensen's trying to paint the bigger picture of accelerated computing everywhere (robotics, autonomous driving, gen-ai, physical ai - but also just everyday enterprise apps). Everything is about keeping the stock price up, and margins high. The stock price provides the warchest to fight off all foes. Topic 2 - The inference architecture is a complex mix of GPUs, CPUs, ASICs/LPUs, high-speed networking and seems very different from the training architecture. How big is the burden on data center providers? What are the inference alternatives emerging? Topic 3 - Jensen talked a lot about OpenClaw and eventually about NVIDIA's NemoClaw. How does his interest in Agentic AI tie into his interest in building NVIDIA's own frontier modelFEEDBACK?Email: show @ reasoning dot showBluesky: @reasoningshow.bsky.socialTwitter/X: @ReasoningShowInstagram: @reasoningshowTikTok: @reasoningshow

All-In with Chamath, Jason, Sacks & Friedberg
Jensen Huang LIVE: Nvidia's Future, Physical AI, Rise of the Agent, Inference Explosion, AI PR Crisis

All-In with Chamath, Jason, Sacks & Friedberg

Play Episode Listen Later Mar 19, 2026 66:06


(0:00) Jensen Huang joins the show! (0:26) Acquiring Groq and the inference explosion (8:53) Decision making at the world's most valuable company (10:47) Physical AI's $50T market, OpenClaw's future, the new operating system for modern AI computing (16:38) AI's PR crisis, refuting doomer narratives, Anthropic's comms mistakes (20:48) Revenue capacity, token allocation for employees, Karpathy's autoresearch, agentic future (30:50) Open source, global diffusion, Iran/Taiwan supply chain impact (39:45) Self-driving platform, facing competition from active customers, responding to growth slowdown predictions (47:32) Datacenters in space, AI healthcare, Robotics (56:10) OpenAI/Anthropic revenue potential, how to build an AI moat (59:04) Advice to young people on excelling in the AI era Follow the besties: https://x.com/chamath https://x.com/Jason https://x.com/DavidSacks https://x.com/friedberg Follow on X: https://x.com/theallinpod Follow on Instagram: https://www.instagram.com/theallinpod Follow on TikTok: https://www.tiktok.com/@theallinpod Follow on LinkedIn: https://www.linkedin.com/company/allinpod Intro Music Credit: https://rb.gy/tppkzl https://x.com/yung_spielburg Intro Video Credit: https://x.com/TheZachEffect

Sharp Tech with Ben Thompson
(Preview) OpenAI's Enterprise Pivot, The Rise of Agents and Bubble Counterpoints, Nvidia Changes Its Inference Story

Sharp Tech with Ben Thompson

Play Episode Listen Later Mar 19, 2026 32:50


Ben and Andrew begin with the news that OpenAI is shifting away from “side quests” and allocating resources to the enterprise space, including Dropbox history to explain OpenAI's present, lessons in the enterprise space generally (and what you learn in business school), and OpenAI taking cues from 1980s Microsoft. From there: Talking through Ben's article on Monday, including the implications of agents and questions about integration as durable differentiation for Anthropic and OpenAI. At the end: Nvidia's new messaging on inference chips and Groq integration, and a word about winters (and whiners) in Wisconsin.

WSJ Tech News Briefing
Inside Nvidia's Age of Inference

WSJ Tech News Briefing

Play Episode Listen Later Mar 17, 2026 13:24


Nvidia made its name making chips for training AI models, but a new kind of computing is the talk of the town at the tech powerhouse's annual conference. WSJ's Robbie Whelan explains how the world's biggest company is trying to pivot in the face of inference-mania. Plus, WSJ reporter Kate Clark on how software engineers are faring as (occasionally bossy) bot managers. Katie Deighton hosts. Sign up for the WSJ's free Technology newsletter. Learn more about your ad choices. Visit megaphone.fm/adchoices

WSJ Tech News Briefing
TNB Tech Minute: Amazon and Cerebras Announce Multiyear Inference Chips Partnership

WSJ Tech News Briefing

Play Episode Listen Later Mar 13, 2026 2:54


Plus: Uber is speeding up its rollout of robotaxi services. And EssilorLuxottica's dominance in eyewear could erode amid smart glasses boom. Katherine Sullivan hosts. Learn more about your ad choices. Visit megaphone.fm/adchoices

amazon partnership uber chips inference cerebras multiyear tech minute katherine sullivan
Dev Interrupted
Inference is the new 401k matching and what we're learning from AI-related outages

Dev Interrupted

Play Episode Listen Later Mar 13, 2026 21:49


Are we heading toward a bizarre future where your engineering salary is paid in AI compute tokens instead of cash? Andrew and Ben tackle the latest tech industry shakeups, starting with Meta's acquisition of Moltbook and the controversial idea of making inference limits a core employee benefit. They also break down Charlie Guo's harness engineering playbook, the growing pains behind recent AWS AI-driven outages, and the toxic pressure to constantly run dozens of autonomous agents. Finally, they wrap up by sharing their own agentic weekend projects and debating the catastrophic risks of vibe-coding your laptop's file permissions.Follow the show:Subscribe to our Substack Follow us on LinkedInSubscribe to our YouTube ChannelLeave us a ReviewFollow the hosts:Follow AndrewFollow BenFollow DanFollow today's stories:Silicon Valley is buzzing about this new idea: AI compute as compensationThe Emerging "Harness Engineering" PlaybookMeta acquired Moltbook, the AI agent social network that went viral because of fake posts“A spate of outages, including incidents tied to the use of AI coding tools”, right on scheduleEvery minute you aren't running 69 agents, you are falling behindOFFERS Start Free Trial: Get started with LinearB's AI productivity platform for free. Book a Demo: Learn how you can ship faster, improve DevEx, and lead with confidence in the AI era. LEARN ABOUT LINEARB AI Code Reviews: Automate reviews to catch bugs, security risks, and performance issues before they hit production. AI & Productivity Insights: Go beyond DORA with AI-powered recommendations and dashboards to measure and improve performance. AI-Powered Workflow Automations: Use AI-generated PR descriptions, smart routing, and other automations to reduce developer toil. MCP Server: Interact with your engineering data using natural language to build custom reports and get answers on the fly.

The Tech Blog Writer Podcast
d-Matrix - Ultra-low Latency Batched Inference for Gen AI

The Tech Blog Writer Podcast

Play Episode Listen Later Mar 7, 2026 26:28


What happens when the real bottleneck in artificial intelligence is no longer training models, but actually running them at scale? In this episode of Tech Talks Daily, I sit down with Satyam Srivastava from d-Matrix to explore a shift that is quietly reshaping the entire AI infrastructure landscape. While much of the early AI race focused on training ever larger models, the next phase of AI adoption is increasingly defined by inference. That is the moment when trained models are deployed and used to generate real-world results millions of times a day. Satyam brings a unique perspective shaped by years of experience in signal processing, machine learning, and hardware architecture, including time spent at NVIDIA and Intel working on graphics, media technologies, and AI systems. Now at d-Matrix, he is helping design next-generation computing architectures focused on one of the biggest challenges facing the AI industry today: efficiently running large language models without overwhelming data centers with unsustainable power and infrastructure demands. During our conversation, we explored why the industry underestimated the infrastructure implications of inference at scale. While training large models grabs headlines, the real operational pressure often comes later when those models must serve millions of queries in real time. That shift places enormous strain on memory bandwidth, energy consumption, and data movement inside modern data centers. Satyam explains how d-Matrix identified this challenge years before generative AI exploded into the mainstream. Instead of focusing on training hardware like many AI startups at the time, the company concentrated on inference efficiency. That decision is becoming increasingly relevant as organizations begin to realize that simply adding more GPUs to data centers is not a sustainable long-term strategy. We also discuss the growing power constraints surrounding AI infrastructure, and why efficiency-driven design may be the only realistic path forward. With electricity supply, cooling capacity, and semiconductor availability all becoming limiting factors, the industry is being forced to rethink how AI systems are architected. Custom silicon, purpose-built accelerators, and heterogeneous computing environments are now emerging as key pieces of the puzzle. The conversation also touches on the geopolitical and economic importance of AI semiconductor leadership, and why the relationship between frontier AI labs, infrastructure providers, and chip designers is becoming increasingly strategic. As governments and companies compete to maintain technological leadership, the question of who controls the hardware powering AI may prove just as important as the models themselves. Looking ahead, Satyam shares his perspective on how the role of engineers will evolve as AI infrastructure becomes more specialized and energy-aware. Foundational engineering skills remain essential, but the next generation of engineers will also need to think in terms of entire systems, combining software, hardware, and AI tools to build more efficient computing environments. As AI continues to move from research labs into everyday products and services, are organizations prepared for the infrastructure shift that comes with an inference-driven future? And could efficiency, rather than raw computing power, become the defining metric of the next phase of the AI race?