Podcasts about ETL

400PODCASTS
935EPISODES
45mAVG DURATION
5WEEKLY NEW EPISODES
Nov 5, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about ETL

ENTER THE LAB

127 episodes with ETL

Entrepreneurial Thought Leaders

37 episodes with ETL

Data Engineering Podcast

18 episodes with ETL

Les Grandes Gueules

24 episodes with ETL

Screaming in the Cloud

14 episodes with ETL

A Way with Words — language, linguistics, and callers from all over

7 episodes with ETL

The Data Stack Show

10 episodes with ETL

The Cloud Pod

8 episodes with ETL

Drill to Detail

11 episodes with ETL

Software Engineering Daily

5 episodes with ETL

AWS Podcast

6 episodes with ETL

Roaring Elephant

8 episodes with ETL

Engenharia de Dados [Cast]

5 episodes with ETL

MLOps.community

7 episodes with ETL

Raw Data By P3

5 episodes with ETL

SQL Data Partners Podcast

6 episodes with ETL

Voice of the DBA

6 episodes with ETL

Streaming Audio: a Confluent podcast about Apache Kafka

6 episodes with ETL

The Cloudcast

4 episodes with ETL

The Data Exchange with Ben Lorica

3 episodes with ETL

Software Defined Talk

4 episodes with ETL

Non-Technical

5 episodes with ETL

RunAs Radio

4 episodes with ETL

Modernize or Die ® Podcast - CFML News Edition

4 episodes with ETL

MarTech Podcast // Marketing + Technology = Business Growth

3 episodes with ETL

Giant Robots Smashing Into Other Giant Robots

3 episodes with ETL

Catalog & Cocktails

3 episodes with ETL

Podcast – Software Engineering Daily

5 episodes with ETL

Data Mesh Radio

4 episodes with ETL

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

3 episodes with ETL

Revenue Generator Podcast: Sales + Marketing + Product + Customer Success = Revenue Growth

3 episodes with ETL

Extra to Love: A Trisomy Podcast

3 episodes with ETL

Real-Time Analytics with Tim Berglund

3 episodes with ETL

IBM Analytics Insights Podcasts

3 episodes with ETL

Making Data Simple

3 episodes with ETL

Monday Morning Data Chat

3 episodes with ETL

Show all podcasts related to etl

Latest podcast episodes about ETL

Quickest way to improve analytics using live data in a campaign

MarTech Podcast // Marketing + Technology = Business Growth

Play Episode Listen Later Nov 5, 2025 3:15

Incorta is the first and only open data delivery platform that enables real-time analysis of live, detailed data across all systems of record—without the need for complex ETL processes.See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

business marketing technology growth data campaign analytics martech quickest etl

The most important learning about data at Meta

MarTech Podcast // Marketing + Technology = Business Growth

Play Episode Listen Later Nov 4, 2025 4:48

learning business marketing technology growth data martech etl

Episode 53: ETL - Enter The Lion

The Staff Assistant Podcast

Play Episode Listen Later Oct 20, 2025 118:12

In this episode, I interview Josh Cook - the founder of ETL - Enter the Lion. Josh joined the Baltimore Police Department as a police officer, eventually transferring to the Anne Arundel County Police Department. Feeling called to minister to law enforcement directly, he separated from his department, relocated to Tennessee and began Enter the Lion (ETL). ETL is a Christian ministry that provides a completely free retreat to first responders and law enforcement who are desiring rest and time in nature. ETL provides biblical counseling, mentorship, and discipleship to those seeking a connection with other believers. To contact Josh or inquire about attending a retreat, contact him through his website:www.enterthelion.coYou can access The Tactical Debrief on Apple, Spotify, or Audible Podcasts.

spotify apple tennessee lion etl baltimore police department josh cook

Parakeeto vs. Project Management Tools: What's the Real Solution?, With Kristen Kelly

The Agency Profit Podcast

Play Episode Listen Later Oct 8, 2025 42:39

Points of Interest00:00 – 01:30 – Introduction: Marcel welcomes Parakeeto's Kristen Kelly back to discuss a recurring misconception in agency operations—the belief that a better project management or PSA tool can solve profit management challenges.01:30 – 03:25 – The PM Tool “Silver Bullet” Myth: Kristen explains how leaders and PMs often adopt new tools to tame chaos, believing marketing promises that they'll also solve utilization, capacity, and profitability issues.03:25 – 06:00 – Why Agencies Fall for It: Marcel and Kristen note that while PM tools are valuable, they're often oversold as full profit-management systems. Agencies end up frustrated by missing fields, tool quirks, and data limitations.06:00 – 08:45 – Hitting the Wall: Many teams find themselves with tools that improve delivery workflows but still leave them unable to make key financial or operational decisions because the data remains fragmented across systems.08:45 – 11:43 – Introducing the Framework → Data → Process Model: Marcel outlines Parakeeto's three-part sequence for solving profit management: define the framework (metrics and formulas), structure the data, and establish ongoing processes for hygiene and cadence.11:43 – 12:46 – Why Sequencing Matters: Without first defining what needs to be measured, agencies make poor configuration choices in PM tools—creating rework, confusion, and endless tool migrations.12:46 – 15:19 – Defining the Framework: Agencies must precisely define how metrics like utilization, delivery margin, and project profitability are calculated, and understand the relationships between those measures before configuring tools.15:19 – 19:54 – The Role of Process and Data Hygiene: Marcel explains that real-time reporting fails if data quality is poor. Clean, reliable reporting requires an ETL (Extract, Transform, Load) process, not direct reporting from source data.19:54 – 22:55 – The Precision Trap: Kristen and Marcel explore the conflict between PMs needing granular precision and executives needing simple, high-level rollups. Forcing perfect data consistency across teams destroys usability and compliance.22:55 – 26:28 – Practical Limits of In-Tool Reporting: Marcel describes how building detailed profitability reporting directly in PM tools creates unsustainable complexity, unrealistic data maintenance, and unreliable results.26:28 – 34:38 – Building a Sustainable Data Architecture: They outline how Parakeeto's ETL pipeline works—extracting time data (person, project, hours), joining it with payroll and project grids, normalizing fields, and applying ongoing QA to ensure accuracy.34:38 – 42:37 – The Big Takeaway: Kristen and Marcel conclude that PM tools are essential for delivery but not the whole profit solution. Agencies should use them for managing work while relying on a clear framework and data pipeline for accurate reporting.Show NotesConnect with Kristen via LinkedInFree Agency ToolkitParakeeto Foundations CourseFree access to our Model PlatformLove this Episode?Leave us a review here. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

building defining transform points hitting psa load pms project management agencies forcing qa simplecast management tools real solutions etl kristen kelly

Over Easy Done Well from Oct 3, 2025

The Eternal Now with Andy Ortmann | WFMU

Play Episode Listen Later Oct 3, 2025 58:48

Roger Baudet - "Compliainte a Deux" - Musique Électronique Pour La Scène Et L'image 1976 - 1992 https://www.wfmu.org/playlists/shows/156794

etl

Incremental Design, DevOps, Microservices & CICD • Michael Nygard & Dave Farley

GOTO - Today, Tomorrow and the Future

Play Episode Listen Later Oct 3, 2025 32:41

This interview was recorded at GOTO Copenhagen 2024.https://gotocph.comMichael Nygard - General Manager of Data at NubankDave Farley - Continuous Delivery & DevOps Pioneer, Award-winning Author, Founder & Director of Continuous Delivery Ltd.RESOURCESMichaelhttps://www.linkedin.com/in/mtnygardhttps://twitter.com/mtnygardhttp://www.michaelnygard.comDavehttps://bsky.app/profile/davefarley77.bsky.socialhttps://www.continuous-delivery.co.ukhttps://linkedin.com/in/dave-farley-a67927https://twitter.com/davefarley77http://www.davefarley.netRead the full abstract hereRECOMMENDED BOOKSDavid Deutsch • The Beginning of InfinityMichael Nygard • Release It! 2nd EditionMichael Nygard • Release It! 1st EditionZhamak Dehghani • Data MeshDave Farley • Modern Software EngineeringDave Farley • Continuous Delivery PipelinesDave Farley & Jez Humble • Continuous DeliveryInspiring Tech Leaders - The Technology PodcastInterviews with Tech Leaders and insights on the latest emerging technology trends.Listen on: Apple Podcasts SpotifyBlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!

director founders design data devops incremental ci cd microservices coupling tech leaders etl continuous delivery data lake platform engineering data mesh incremental progress dave farley michael nygard netread

More data? More growth!

Growth Masterminds Podcast

Play Episode Listen Later Oct 2, 2025 16:27

MMPs give you a strong foundation for measuring mobile campaigns. But what if that's not enough? What if the data you're missing could unlock faster growth, smarter user acquisition, and better ROI?That's where Extract comes in. In this episode of Growth Masterminds, John Koetsier talks with Maayan Schor about why mobile marketers need a next-gen ELT and reverse ETL platform to move raw data in and out of their systems. From app stores to social to ad networks, Extract helps you pull it all together with your MMP data ... and make smarter decisions with more confidence.Leading, of course, to more growth.We cover:- Why there's more data than MMPs provide alone- How to access raw app store, organic social, and granular ad network data- Real-world use cases from top mobile marketers- How BI teams and marketers collaborate to make Extract work- Why flexibility and context are key to growth in mobile marketingIf you're working in mobile marketing, user acquisition, data engineering, or growth analytics, this conversation is packed with insights you can use today.

tiktok growth real data app roi organic flexibility extract elt etl mmp granular mmps

Why the Middle Layer of Your Agency Org Chart May Not Survive AI with Jennifer Bagley | Ep #841

Smart Agency Masterclass with Jason Swenk: Podcast for Digital Marketing Agencies

Play Episode Listen Later Oct 1, 2025 28:36

Would you like access to our advanced agency training for FREE? https://www.agencymastery360.com/training Are you still thinking of AI as just “ChatGPT with a better prompt”? Or maybe you've played around with Zapier automations and thought, yeah, that's good enough. Today's featured guest knows that the agencies pulling ahead right now are building full-on AI agent networks that replace routine tasks, streamline data pipelines, and give their teams superpowers. She's re-engineering her agency around AI and will talk about where she finds top-tier talent and why you don't need to code to lead your agency into the future. Jennifer Bagley is the CEO and founder of CI Web Group, a fully virtual digital marketing agency registered in 22 U.S. states with clients across the United States and Canada. A former corporate operator turned entrepreneur, Jennifer started in real estate and mortgage brokerage before leaning into the marketing work she built to support those businesses. Today she runs a modern, tech-forward agency that's rebuilt its stack around AI, centralized data, and agentic networks, all while carrying the scars and lessons of scaling, pivoting, and re-founding a business from the ground up. In this episode, we'll discuss: Feeling trapped by the business. Hiring, firing, and the people reset AI, reskilling, and the end of “middle” roles What does this talent cost? Subscribe Apple | Spotify | iHeart Radio Sponsors and Resources E2M Solutions: Today's episode of the Smart Agency Masterclass is sponsored by E2M Solutions, a web design, and development agency that has provided white-label services for the past 10 years to agencies all over the world. Check out e2msolutions.com/smartagency and get 10% off for the first three months of service. From Corporate Ladder to Accidental Agency Founder Jennifer came from an operations background, a self-proclaimed black belt in Six Sigma and certified project manager. Having built that corporate background, she had made a promise to herself (“by 30 I'll be an entrepreneur”), and started to build the side hustle that became the main event. She started in real estate and mortgage brokering where she had to learn marketing the hard way; not because she wanted to be a marketer, but because the survival of her businesses depended on it. Initially, Jennifer didn't set out to build a scalable agency; she built a team to support her broker network. When the market collapsed in 2008, the same team that did marketing for agents suddenly had a market outside real estate. That “we'll just help this painter or HVAC company” phase is where the web group was born: small, service-focused, and useful to people in her network. That accidental turn became a business by solving real, pressing problems for paying clients, then leaned into that. Trading Time for Freedom: The Hard Pivot For the first five years, Jennifer describes the business as a “lifestyle” operation, profitable maybe, but trapping her time. She was trading billable hours for income and was reaching her limit when she hired a coach that forced a reckoning: if entrepreneurship isn't buying you time, money, and freedom, what's the point? So she made the brutal choice of cutting consulting contracts and burning the bridge to the “safety” of hourly work, and effectively gave herself a mulligan. This is the classic founder pivot: you have to choose between growth that keeps you doing the work and growth that scales the business without you. Jennifer's reset wasn't pretty, for a while she lost everything and she and her son lived in an office for a while, but it bought her the permission to build something salable, not just sustainable. Agency owners who feel trapped in delivery need to remember that sometimes you have to give up short-term revenue to create long-term value. Feeling Trapped by the Agency and Becoming a CEO Those first five years, Jennifer continued to run a business that started as a supply chain consulting and eventually turned into a sales supply chain consulting. This change meant the business was now a good lead generator for the agency but it also meant Jennifer was essentially selling her image and her time. Until she ran out of time. Once she felt trapped by the business, Jennifer actually hired a business coach that helped her change the model from “selling Jennifer with marketing on the side” to an actual sustainable business. She had to go back to the basics and remember she, like every entrepreneur, started the business with the idea of having more time, money, and freedom. It took losing everything, but Jennifer knew she didn't want a lifestyle business, she wanted a sellable business. The antidote was delegation plus systems. If you want growth and a future exit, you need to own those CEO responsibilities and be comfortable with letting go of the day-to-day. Hiring, Firing, and Resetting the Team Jennifer's talent strategy has evolved with each stage of growth. Her early hires were the classic “friends, family, fools” bootstrap crew; later she invested in developers, content teams, project managers, and over time, more strategic hires like CFOs, chief of staff, BI teams, and AI engineers. Each five-year arc brought a new set of needs and a new level of sophistication in hiring. Now, she divides her time between promoting her agency's work in podcasts and content and thinking of ways to navigate her business in these volatile and exciting times. Her most recent addition to the team was a technology and transformation team that is revisiting all of the agency's processes, investments, and infrastructure. As a result, she has downsized her team from over 300 W2 employees and refocus the team. The takeaway for agency owners: be honest about whether your people are builders or maintainers, and hire accordingly. The workforce you need for growth is not the same as the workforce you need for stable operations. Building AI Agent Networks with Centralized Data Jennifer's agency shifted from WordPress to Webflow and built agentic networks: hundreds of AI agents that crawl competitors, do strategy homework, and automate tasks that humans used to do. More importantly, they rebuilt infrastructure into a hub-and-spoke model with a centralized min.io data layer and ETL pipelines feeding analytics and BI. Two big lessons here. One: invest in your tech stack deliberately so you're not a Frankenstein of five different platforms that don't talk to each other. Two: design your data architecture so your people (and your AI agents) have a single source of truth. That's how you get from fire-fighting in six dashboards to proactive, predictive signals that tell you when a client engagement needs attention. AI, Reskilling, and Shrinking Middle Roles Jennifer draws a hard line: the agency now tends to hire either very seasoned client-facing leaders or AI engineers; the middle is shrinking. With agentic networks giving junior staff “superpowers,” the agency can afford fewer mid-level “lever pullers.” At this level there's no room for slow execution or elementary work. That's a cultural and ethical challenge, both for hiring and for workforce development. For agency owners, this raises practical HR questions: do you reskill your people, or replace them? Jennifer suggests building agent-driven systems that augment humans, and being brutally honest about who can grow into that future. It's also a call to action for how we prepare the next generation: schools won't teach this; companies will need to. Playing with AI Platforms: Why Leaders Need to Just Know Enough to Be Dangerous Jennifer started like a lot of agency owners dipping into AI, playing around on tools like n8n, Make.com, Relevance, and Longchain. Her dev team laughed, calling her an “elementary school kid on a tricycle,” but here's the point: she didn't need to master the tech. She needed to know enough to point her team in the right direction. Instead of obsessing over code, she framed the problem differently: “Here's what I don't want a human doing anymore. Can you make that happen?” That mindset shift is key for agency owners. You don't need to be a full-stack AI engineer to lead an agency into the future; you just need to clearly define outcomes and invest in people who can deliver them. Find Real AI Talent in Unlikely Places This is where most agencies get stuck. You're not going to find your next AI architect on Upwork. Jennifer leaned on her network, starting with her cousin Chris, a hardcore developer who initially thought AI platforms were “rookie business.” Once Chris realized the power of agentic networks to scale his expertise, he became the backbone of CI Web Group's transformation. Now, she hunts talent in unconventional places: hackathons, LinkedIn, and especially YouTube. Forget the flashy “10x growth hack” videos — she looks for nerds with four views, geeking out about orchestrators and ETL pipelines. Those are the builders who care about solving real problems, not just building hype. Her tip: if you find one, reach out immediately. They don't want sales, they just want to build. Designing AI Agents Like an Agency Org Chart Jennifer compares AI agents to a company org chart. You don't hire one person to do everything, that's a recipe for burnout. Same thing with AI. Each agent should tightly focus on a single task, with checks, auditors, and orchestrators overseeing the system. The payoff was massive efficiency gains. Instead of six different platforms that don't talk, her agency built a centralized hub with min.io, ClickHouse, and AI layers on top. That's how you go from patchwork automation to true predictive intelligence. The Real Cost of AI Talent If you're wondering how much this all costs, the answer is… a lot. On the high end, seasoned AI engineers can run you a quarter million in salary. On the low end, Jennifer tests new hires on project-based sprints, maybe $6K for a 10-hour challenge. The point isn't to cut costs; it's to prove quickly who can deliver and who can't. Her recruiting process is brutal but effective: give candidates a project, a tight deadline, and see how they perform. If they stall, they're out. If they screen-share fast and solve problems live, they're in. No fluff, no endless interviews. Do You Want to Transform Your Agency from a Liability to an Asset? Looking to dig deeper into your agency's potential? Check out our Agency Blueprint. Designed for agency owners like you, our Agency Blueprint helps you uncover growth opportunities, tackle obstacles, and craft a customized blueprint for your agency's success.

Fail‑Safe by Design: Avoiding Catastrophic Product Failures

China Manufacturing Decoded

Play Episode Listen Later Sep 26, 2025 28:52 Transcription Available

In this episode, Adrian is joined by Renaud Anjoran to explore fail-safe design principles: essential thinking for anyone developing most kinds of products. Through real-world examples ranging from Tesla doors to Boeing and consumer electronics, they highlight how designers must ask: “If this fails, what happens to the user?” They break down why it matters, what trade-offs exist, and how structured risk analysis, simplification, redundancy, and error-proofing can dramatically reduce hazards and costly failures. Episode Sections: 00:00:03 – Introduction 00:01:00 – Tesla door handle fail-safe issue 00:02:32 – Building lock systems vs. car safety 00:05:55 – Structured thinking in fail-safe design 00:07:21 – Designing with users in mind 00:09:02 – Risk analysis methods: FMEA & fault tree analysis 00:11:10 – Catastrophic failures & extreme examples 00:12:18 – Everyday product applications 00:14:21 – Principle: Simplification in design 00:16:13 – Redundancy in critical systems 00:20:30 – Battery management & safety logic 00:20:34 – Human error and mistake-proofing 00:23:09 – Error-proofing examples: tables & plugs 00:23:41 – Trade-offs and cost considerations 00:26:03 – Testing, regulations & standards (UL, ETL, etc.) 00:27:11 – Summary & wrap-up 00:28:07 – Final thoughts & listener takeaway 00:28:19 – Outro Are you designing a new product? Ask yourself: “If this fails, what happens?” Visit Sofeast.com to learn how our quality, reliability, and product development teams can support you in building safer, more reliable products. Related content... Fail Safe Design Principles & Examples | Product Risk Reduction Alaska Airlines Boeing 737 Max 9 Near Disaster! Quality & Reliability Issues? Why Product Safety, Quality, and Reliability Are Tightly Linked Tesla's Cybertruck Debacle: Reliability, Politics, & Plummeting Sales [Podcast] We can do your manufacturing at Agilian Technology Get in touch with us Connect with us on LinkedIn Contact us via Sofeast's contact page Subscribe to our YouTube channel Prefer Facebook? Check us out on FB

politics building design risk trade fb testing tesla human product designing failures error boeing batteries structured catastrophic ul redundancy failsafe etl fmea

Making Data Simple: Live Data, Smarter AI with Snow Leopard founder Deepti Srivastava

IBM Analytics Insights Podcasts

Play Episode Listen Later Sep 17, 2025 39:24

Send us a textWhat if AI could tap into live operational data — without ETL or RAG? In this episode, Deepti Srivastava, founder of Snow Leopard, reveals how her company is transforming enterprise data access with intelligent data retrieval, semantic intelligence, and a governance-first approach. Tune in for a fresh perspective on the future of AI and the startup journey behind it.We explore how companies are revolutionizing their data access and AI strategies. Deepti Srivastava, founder of Snow Leopard, shares her insights on bridging the gap between live operational data and generative AI — and how it's changing the game for enterprises worldwide.We dive into Snow Leopard's innovative approach to data retrieval, semantic intelligence, and governance-first architecture.04:54 Meeting Deepti Srivastava 14:06 AI with No ETL, no RAG 17:11 Snow Leopard's Intelligent Data Fetching 19:00 Live Query Challenges 21:01 Snow Leopard's Secret Sauce 22:14 Latency 23:48 Schema Changes 25:02 Use Cases 26:06 Snow Leopard's Roadmap 29:16 Getting Started 33:30 The Startup Journey 34:12 A Woman in Technology 36:03 The Contrarian View

Making Data Simple: Live Data, Smarter AI with Snow Leopard founder Deepti Srivastava

Making Data Simple

Play Episode Listen Later Sep 17, 2025 39:24

From Bots To Agents: Building Trustworthy Autonomy With Hakkōda, an IBM Company

The Tech Blog Writer Podcast

Play Episode Listen Later Sep 13, 2025 25:49

I invited Atalia Horenshtien to unpack a topic many leaders are wrestling with right now. Everyone is talking about AI agents, yet most teams are still living with rule based bots, brittle scripts, and a fair bit of anxiety about handing decisions to software. Atalia has lived through the full arc, from early machine learning and automated pipelines to today's agent frameworks inside large enterprises. She is an AI and data strategist, a former data scientist and software engineer, and has just joined Hakoda, an IBM company, to help global brands move from experiments to outcomes. The timing matters. She starts on the 18th, and this conversation captures how she thinks about responsible progress at exactly the moment she steps into that new role. Here's the thing. Words like autonomy sound glamorous until an agent faces a messy real world task. Atalia draws a clear line between scripted bots and agents with goals, memory, and the ability to learn from feedback. Her advice is refreshingly grounded. Start internal where you can observe behavior. Put human in the loop review where it counts. Use role based access rather than feeding an LLM everything you own. Build an observability layer so you can see what the model did, why it did it, and what it cost. We also get into measurements that matter. Time saved, cycle time reduction, adoption, before and after comparisons, and a sober look at LLM costs against any reduction in FTE hours. She shares how custom cost tracking for agents prevents surprises, and why version one should ship even if it is imperfect. Culture shows up as a recurring theme. Leaders need to talk openly about reskilling, coach managers through change, and invite teams to be co creators. Her story about Hakoda's internal AI Lab is a good example. What began as an engineer's idea for ETL schema matching grew into agent powered tools that won a CIO 100 award and now help deliver faster, better outcomes for clients. There are lighter moments too. Atalia explains how she taught an ex NFL player the basics of time series forecasting using football tactics. Then she takes us behind the scenes with McLaren Racing, where data and strategy collide on the F1 circuit, and admits she has become a committed fan because of that work. If you want a practical playbook for moving from shiny demos to dependable agents, this episode will help you think clearly about scope, safeguards, and speed. Connect with Atalia on LinkedIn, explore Hakoda's work at hakoda.io, and then tell me how you plan to measure your first agent's value. ********* Visit the Sponsor of Tech Talks Network: Land your first job in tech in 6 months as a Software QA Engineering Bootcamp with Careerist https://crst.co/OGCLA

time ai culture nfl leaders ibm bots f1 cio autonomy trustworthy llm hakk fte etl mclaren racing atalia careerist ibm company

102. Los datos no valen nada sin ESTO

Big Data e Inteligencia Artificial

Play Episode Listen Later Sep 10, 2025 19:53

chatgpt despu esto descubre snowflakes datos regreso inteligencia artificial haz herramientas valen accede elt sesiones etl hadoop modern data stack

3412: PuppyGraph at the IT Press Tour: Graph Power Without the Pain

The Tech Blog Writer Podcast

Play Episode Listen Later Sep 6, 2025 21:59

During the IT Press Tour, I had the pleasure of speaking with Weimo Liu, CEO and co-founder of PuppyGraph, and hearing firsthand how his team is rethinking graph technology for the enterprise. In this episode of Tech Talks Daily, Weimo joins me to share the story behind PuppyGraph's “zero ETL” approach, which lets organizations query their existing data as a graph without ever moving or duplicating it. We discuss why graph databases, despite their promise, have struggled with mainstream adoption, often because of complex pipelines and heavy infrastructure requirements. Weimo explains how PuppyGraph borrows from his time at TigerGraph and Google's F1 engine to build something new: a distributed query engine that maps tables into a logical graph and delivers subsecond performance on massive datasets. That shift opens the door for use cases in cybersecurity, fraud detection, and AI-driven applications where latency and accuracy matter most. We also unpack the developer experience. Instead of rewriting schemas or reloading data every time requirements change, PuppyGraph allows teams to define nodes and edges directly from existing tables. That design lowers the barrier for SQL-focused teams and accelerates time to value. Weimo even touches on the role of graph in reducing AI hallucinations, showing how structured relationships can make enterprise AI systems more reliable. What struck me most in our conversation is how PuppyGraph's playful branding belies its serious engineering depth. Behind the “puppy” name lies a distributed engine built to scale with today's data volumes, backed by strong early adoption and a team that listens closely to customer needs. Whether you're exploring graph for cybersecurity, AI chatbots, or supply chain analytics, this discussion offers a glimpse of how the next generation of graph tech might finally break free from its niche and go mainstream. ********* Visit the Sponsor of Tech Talks Network: Land your first job in tech in 6 months as a Software QA Engineering Bootcamp with Careerist https://crst.co/OGCLA

ceo ai google pain tour press f1 graphs sql etl careerist tech talks daily

Siri Co-Founder Adam Cheyer: Why Data, Not Algorithms, Is AI's True Competitive Edge

Irish Tech News Audio Articles

Play Episode Listen Later Sep 4, 2025 12:56

Adam Cheyer is a pioneering AI technologist whose innovations have fundamentally shaped today's intelligent interfaces. As co-founder of Siri Inc. (acquired by Apple), he served as a Director of Engineering at Apple's iOS group, and later co-founded Viv Labs (acquired by Samsung), Sentient Technologies, and played a founding role in Change.org. Adam Cheyer was Chief Architect of CALO, one of DARPA's largest AI projects, authored over 60 publications and holds more than 25 patents In recognition of his achievement, he received his alma mater Brandeis University's 2024 Alumni Achievement Award - for transforming a long?standing AI vision into everyday tools used by hundreds of millions. Now represented by Champions Speakers Agency, he continues to speak globally on how organisations can harness AI with responsibility, scale, and impact. Q1. How do you see the role of data management in enabling AI capabilities and bringing data to life for organisations? Adam Cheyer: "AI systems are built on two foundations: algorithms and data. The algorithms themselves are well established, but without high-quality, well-organised data, they can't deliver real value. Data is the fuel that powers every AI application, and managing it effectively is now a mission-critical skill for any organisation developing AI. "With the rapid acceleration of AI in recent years - especially in the past six months - the ability to handle, refine, and govern data has shifted from being a technical advantage to an essential requirement across industries." Q2. What challenges have you faced when managing large data sets? Adam Cheyer: "I've been building AI systems over 30 years, so it's changed a little bit over time. Clearly, the first issue is just storage and management and processing of the data. The data now is so large. Back in the 80s and 90s that wasn't quite as essential, it was smaller data sets, but today the data sets are huge. "So, you need a system that can store it efficiently in a distributed way, and we've used various systems over the years to do that. You need a system that can process this huge amount of data in parallel at scale. "One of the key areas in data management for me is data quality. Even if you work with data companies - and when we were a start-up, and then even at Apple for instance - many of the data sources come from other places, other vendors, and surprisingly the data is not always in perfect clean form. "So, you need to have a process and tools and a pipeline that goes through and takes that data, cleanses it, adapts it, and often if you have multiple sources you need to integrate data together, and that can be a real challenge. "There are standard systems, ETL systems etc., but sometimes you need proprietary algorithms. As an example, with Siri, when we were a start-up, you would get millions and millions of restaurant name data and business name data. "If you had something like Joe's Restaurant and Joe's Bar and Grill - are they the same or not? That's a real problem. Joe's - probably you'd say yes, but Joe's Pizzeria and Joe's Grill maybe not, right? And so, how do you know? "There's a lot of work that goes into cleansing, integrating data. "And then the final thing I'll mention, which is a big topic in data management, is privacy and security. Once you have data coming in from users, there are standards, issues, and regulations that mean you need to be able to ensure that the data you have is accessible only by the right people, that it is secured and protected, and that it keeps privacy as much as possible - standardised. "At Apple, we had a number of techniques and teams, and there's a lot that goes into that. So, you need good systems, good processes, and to set up your organisation to be able to handle all of these challenges." Q3. How do you manage data privacy when building large AI systems? Adam Cheyer: "Absolutely, so it is a challenge. Your first tendency is, well, we just record everything, but I think that'...

director ai apple change co founders data restaurants engineering ios bar samsung algorithms siri grill darpa brandeis university pizzeria competitive edge chief architect calo etl alumni achievement award adam cheyer viv labs sentient technologies

What “Data-Driven” Really Means

The Tech Trek

Play Episode Listen Later Sep 2, 2025 32:11

What does it really mean to be data-driven? Mark Gergess, VP of Data and BI at DoubleVerify, joins the show to unpack how data teams can go beyond dashboards to drive meaningful business action. From building an internal consulting lens to evaluating the latest AI tools, Mark shares how his team translates complex data flows into measurable revenue impact. If you've ever wrestled with the gap between insights and outcomes, this conversation will hit home.Key Takeaways• Being data-driven is about driving action, not just reporting numbers• Stakeholders don't care about your data problems—they care about business outcomes• The biggest challenge with AI adoption isn't the model, it's the use cases• Efficiency gains from AI should shift focus from ETL tasks to solving real business problems• Data culture health is measured by how naturally teams rely on data day-to-dayTimestamped Highlights01:17 How DoubleVerify helps advertisers build safer, more effective digital campaigns04:55 Why the definition of “data-driven” still varies and why it matters09:25 Measuring whether data efforts are moving the needle on revenue13:15 How to separate hype from value when evaluating AI and GenAI tools17:10 Lessons from the data science boom and why companies must go “all in” with AI25:31 Can AI act as your junior analyst? Where efficiency gains really show up27:01 How freeing up time changes the structure of data teams and boosts business impactA thought worth holding onto“It's not about dashboards. It's not about reporting. It's about doing something with the information.”Pro TipsMark recommends treating AI as a “junior analyst”—let it handle quick, lower-priority questions so your team can focus on bigger business challenges.Call to ActionEnjoyed the conversation? Share this episode with a colleague who talks about being “data-driven.” Subscribe on your favorite podcast platform and connect with me on LinkedIn for more insights from leaders shaping the future of data and technology.

ai lessons data measuring efficiency bi data driven stakeholders really means genai impacta etl key takeaways being

Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)

Vanishing Gradients

Play Episode Listen Later Aug 29, 2025 41:27

While many people talk about “agents,” Shreya Shankar (UC Berkeley) has been building the systems that make them reliable. In this episode, she shares how AI agents and LLM judges can be used to process millions of documents accurately and cheaply. Drawing from work on projects ranging from databases of police misconduct reports to large-scale customer transcripts, Shreya explains the frameworks, error analysis, and guardrails needed to turn flaky LLM outputs into trustworthy pipelines. We talk through: - Treating LLM workflows as ETL pipelines for unstructured text - Error analysis: why you need humans reviewing the first 50–100 traces - Guardrails like retries, validators, and “gleaning” - How LLM judges work — rubrics, pairwise comparisons, and cost trade-offs - Cheap vs. expensive models: when to swap for savings - Where agents fit in (and where they don't) If you've ever wondered how to move beyond unreliable demos, this episode shows how to scale LLMs to millions of documents — without breaking the bank. LINKS Shreya's website (https://www.sh-reya.com/) DocETL, A system for LLM-powered data processing (https://www.docetl.org/) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Watch the podcast video on YouTube (https://youtu.be/3r_Hsjy85nk) Shreya's AI evals course, which she teaches with Hamel "Evals" Husain (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME)

ai drawing bank scale millions judges cheap error machine learning processing documents llm upcoming events software engineers data scientists rag guardrails luma shreya etl

Data Warehouse Automation: Benefits and Market Overview – with Florian Bigelmaier, BARC

Data Culture Podcast

Play Episode Listen Later Aug 18, 2025 34:10

"In essence, this approach empowers you to evolve your infrastructure as needed, while your data warehouse processes and logic remain stable and consistent."

#232 n8n na prática: benefícios e automações funcionais

Entre Chaves

Play Episode Listen Later Aug 5, 2025 35:53

Você ainda está perdendo energia mental com tarefas de ETL que poderiam ser automatizadas visualmente? Neste episódio, recebemos José Victor Machuca, Analista de Desenvolvimento Sênior, e Givaldo Moreira, Tech Manager, ambos da dti digital, para uma discussão prática sobre o n8n. Eles compartilham como essa plataforma de automação visual está transformando processos de desenvolvimento, desde integrações simples até assistentes de IA, e revelam por que migrar do hard code para soluções visuais pode ser a chave para focar no que realmente importa. Dê o play e ouça agora! Links importantes: Vagas disponíveis Newsletter Dúvidas? Nos mande pelo Linkedin Contato: entrechaves@dtidigital.com.br O Entre Chaves é uma iniciativa da dti digital, uma empresa WPP

nos neste voc ia eles cios napr benef analista wpp vagas etl automa

Episode 472: Should my junior dev use AI and thrown in to ETL

Soft Skills Engineering

Play Episode Listen Later Aug 4, 2025 26:59

In this episode, Dave and Jamison answer these questions: I'm the CTO of a small startup. We're 3 devs including me and one of them is a junior developer. My current policy is to discourage the use of AI tools for the junior dev to make sure they build actual skills and don't just prompt their way through tasks. However I'm more and more questioning my stance as AI skills will be in demand for jobs to come and I want to prepare this junior dev for a life after my startup. How would you do this? What's the AI coding assistant policy in your companies. Is it the same for all seniority levels? Hi everyone! Long-time listener here, and I really appreciate all the insights you share. Greetings from Brazil! I recently joined a large company (5,000 employees) that hired around 500 developers in a short time. It seems like they didn't have enough projects aligned with everyone's expertise, so many of us, myself included, were placed in roles that don't match our skill sets. I'm a web developer with experience in Java and TypeScript, but I was assigned to a data-focused project involving Python and ETL pipelines, which is far from my area of interest or strength. I've already mentioned to my manager that I don't have experience in this stack, but the response was that the priority is to place people in projects. He told me to “keep [him] in the loop if you don't feel comfortable”, but I'm not sure that should I do. The company culture is chill, and I don't want to come across as unwilling to work or ungrateful. But I also want to grow in the right direction for my career. How can I ask for a project change, ideally one that aligns with my web development background, without sounding negative or uncooperative? Maybe wait for like 3 months inside of this project and then ask for a change? Thanks so much for your thoughts!

ai brazil cto python java thrown use ai typescript etl

624: Tampa Tech With Joey DeVilla

Coder Radio

Play Episode Listen Later Aug 2, 2025 34:57

Joey DeVilla of Tampa Tech fame and accordion playing glory joins Mike to discuss the Tampa Tech scene, some Python goodness, a little Rust and much more. Try Mailtrap for free (https://l.rw.rw/coder_radio_6) Joey's Blog (https://www.joeydevilla.com/) Mike on X (https://x.com/dominucco) Mike on BlueSky (https://bsky.app/profile/dominucco.bsky.social) Coder on X (https://x.com/coderradioshow) Coder on BlueSky (https://bsky.app/profile/coderradio.bsky.social) Show Discord (https://discord.gg/k8e7gKUpEp) Alice (https://alice.dev)

tech blog startups stem tampa steam automation programming rust blue sky coding python coders etl devrel devilla

190: Alteryx Use Cases in the Tax Industry

Alter Everything

Play Episode Listen Later Jul 30, 2025 26:33

Unlock the power of Alteryx for tax professionals in this insightful episode of Alter Everything! Join us in an interview with Adrian Steller, Director of Tax Technology at Ryan, to explore how Alteryx revolutionizes tax processes, automates data workflows, and enhances efficiency for tax teams. Discover real-world Alteryx use cases in VAT compliance, transfer pricing, and automation, and learn practical tips for transitioning from Excel to Alteryx. Whether you're a tax analyst, data professional, or business leader, this episode provides actionable insights on leveraging Alteryx for tax data transformation, reporting, and analytics.Panelists: Adrian Steller, Director @ International Tax Technology - LinkedInMegan Bowers, Sr. Content Manager @ Alteryx - @MeganBowers, LinkedInShow notes: Ryan (Company)Ryan Tax Lab (Podcast)Alteryx Community BlogsAlteryx Help Docs Interested in sharing your feedback with the Alter Everything team? Take our feedback survey here!This episode was produced by Megan Bowers, Mike Cusic, and Matt Rotundo. Special thanks to Andy Uttley for the theme music.

Solar Translucent Lifeformless from Jul 17, 2025

The Eternal Now with Andy Ortmann | WFMU

Play Episode Listen Later Jul 18, 2025 65:09

Roger Baudet - "Anhamete (Ceremonial) 1991" - Musique Électronique Pour La Scène Et L'image 1976 - 1992 Mariana La Palma - "Hong-Kong Shoes" - SNX va C. Lavender - "An Offering Proclaimed in the Dream" - Rupture in the Eternal Realm Anni-Frid Lyngstad - "Så Synd Du Måste Gå (It Hurts To Say Goodbye)" - The Girls Want The Boys! Sweden's Beat Girls 1964-1970 Secos & Molhados - "Não Digas Nada" - Secos & Molhados Serei Usignolo , Giampiero Boneschi E I Suoi Strumenti Elettronici - "Mitridate - Visione" Brandon Auger - "T24.d02.0315" - Anthology of Experimental Music From Canada va Bernard Parmegiani - "Entropie" - Chants Magnetiques Amedeo Tommasi - "Gemelli" - Zodiac Matia Bazar - "Lili Marleen" - Berlino, Parigi, Londra Marius Constant - "La Publicite (excerpt)" - Eloge De La Folie Nurse With Wound - "A Snake In Your Abdomen (excerpt)" - More Automating Ash Ra Tempel - "Echo Waves (excerpt)" - Inventions For Electric Guitar Brainticket - "Voyage (part 1) excerpt" - Voyage MT Luciani - "Ribellione Del Terzo Mondo" - Situazione Del Le Terzo Mondo https://www.wfmu.org/playlists/shows/154222

sweden solar musique anthology parigi etl translucent

253: Why Traditional Data Pipelines Are Broken (And How to Fix Them) with Ruben Burdin of Stacksync

The Data Stack Show

Play Episode Listen Later Jul 16, 2025 58:37

This week on The Data Stack Show, Eric and welcomes back Ruben Burdin, Founder and CEO of Stacksync as they together dismantle the myths surrounding zero-copy ETL and traditional data integration methods. Ruben reveals the complex challenges of two-way syncing between enterprise systems like Salesforce, HubSpot, and NetSuite, highlighting how existing tools often create more problems than solutions. He also introduces Stacksync's innovative approach, which uses real-time SQL-based synchronization to simplify data integration, reduce maintenance overhead, and enable more efficient operational workflows. The conversation exposes the limitations of current data transfer techniques and offers a glimpse into a more declarative, flexible approach to managing enterprise data across multiple systems. You won't want to miss it.Highlights from this week's conversation include:The Pain of Two-Way Sync and Early Integration Challenges (2:01)Zero Copy ETL: Hype vs. Reality (3:50)Data Definitions and System Complexity (7:39)Limitations of Out-of-the-Box Integrations (9:35)The CSV File: The Original Two-Way Sync (11:18)Stacksync's Approach and Capabilities (12:21)Zero Copy ETL: Technical and Business Barriers (14:22)Data Sharing, Clean Rooms, and Marketing Myths (18:40)The Reliable Loop: ETL, Transform, Reverse ETL (27:08)Business Logic Fragmentation and Maintenance (33:43)Simplifying Architecture with Real-Time Two-Way Sync (35:14)Operational Use Case: HubSpot, Salesforce, and Snowflake (39:10)Filtering, Triggers, and Real-Time Workflows (45:38)Complex Use Case: Salesforce to NetSuite with Data Discrepancies (48:56)Declarative Logic and Debugging with SQL (54:54)Connecting with Ruben and Parting Thoughts (57:58)The Data Stack Show is a weekly podcast powered by RudderStack, customer data infrastructure that enables you to deliver real-time customer event data everywhere it's needed to power smarter decisions and better customer experiences. Each week, we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

LCC 327 - Mon ami de 30 ans

Les Cast Codeurs Podcast

Play Episode Listen Later Jun 16, 2025 103:18

Dans cet épisode, c'est le retour de Katia et d'Antonio. Les Cast Codeurs explorent WebAssembly 2.0, les 30 ans de Java, l'interopérabilité Swift-Java et les dernières nouveautés Kotlin. Ils plongent dans l'évolution de l'IA avec Claude 4 et GPT-4.1, débattent de la conscience artificielle et partagent leurs retours d'expérience sur l'intégration de l'IA dans le développement. Entre virtualisation, défis d'infrastructure et enjeux de sécurité open source, une discussion riche en insights techniques et pratiques. Enregistré le 13 juin 2025 Téléchargement de l'épisode LesCastCodeurs-Episode-327.mp3 ou en vidéo sur YouTube. News Langages Wasm 2.0 enfin officialisé ! https://webassembly.org/news/2025-03-20-wasm-2.0/ La spécification Wasm 2.0 est officiellement sortie en décembre dernier. Le consensus sur la spécification avait été atteint plus tôt, en 2022. Les implémentations majeures supportent Wasm 2.0 depuis un certain temps. Le processus W3C a pris du temps pour atteindre le statut de “Recommandation Candidate” pour des raisons non techniques. Les futures versions de Wasm adopteront un modèle “evergreen” où la “Recommandation Candidate” sera mise à jour en place. La dernière version de la spécification est considérée comme le standard actuel (Candidate Recommendation Draft). La version la plus à jour est disponible sur la page GitHub (GitHub page). Wasm 2.0 inclut les nouveautés suivantes : Instructions vectorielles pour le SIMD 128-bit. Instructions de manipulation de mémoire en bloc pour des copies et initialisations plus rapides. Résultats multiples pour les instructions, blocs et fonctions. Types références pour les références à des fonctions ou objets externes. Conversions non-piégeantes de flottant à entier. Instructions d'extension de signe pour les entiers signés. Wasm 2.0 est entièrement rétrocompatible avec Wasm 1.0. Paul Sandoz annonce que le JDK intègrera bientôt une API minimaliste pour lire et écrire du JSON https://mail.openjdk.org/pipermail/core-libs-dev/2025-May/145905.html Java a 30 ans, c'était quoi les points bluffants au début ? https://blog.jetbrains.com/idea/2025/05/do-you-really-know-java/ nom de code Oak Mais le trademark était pris Write Once Run Anywhere Garbage Collector Automatique multi threading au coeur de la palteforme meme si Java est passé par les green threads pendant un temps modèle de sécurité: sandbox applets, security manager, bytecode verifier, classloader Des progrès dans l'interopérabilité Swift / Java mentionnés à la conférence Apple WWDC 2025 https://www.youtube.com/watch?v=QSHO-GUGidA Interopérabilité Swift-Java : Utiliser Swift dans des apps Java et vice-versa. Historique : L'interopérabilité Swift existait déjà avec C et C++. Méthodes : Deux directions d'interopérabilité : Java depuis Swift et Swift depuis Java. JNI : JNI est l'API Java pour le code natif, mais elle est verbeuse. Swift-Java : Un projet pour une interaction Swift-Java plus flexible, sûre et performante. Exemples pratiques : Utiliser des bibliothèques Java depuis Swift et rendre des bibliothèques Swift disponibles pour Java. Gestion mémoire : Swift-Java utilise la nouvelle API FFM de Java pour gérer la mémoire des objets Swift. Open Source : Le projet Swift-Java est open source et invite aux contributions. KotlinConf le retour https://www.sfeir.dev/tendances/kotlinconf25-quelles-sont-les-annonces-a-retenir/ par Adelin de Sfeir “1 developeur sur 10” utilise Kotlin Kotlin 2.2 en RC $$ multi dollar interpolation pour eviter les sur interpolations non local break / continue (changement dans la conssitance de Kotlin guards sur le pattern matching D'autres features annoncées alignement des versions de l'ecosysteme sur kotlin jvm par defaut un nouvel outil de build Amper beaucoup d'annonces autour de l'IA Koog, framework agentique de maniere declarative nouvelle version du LLM de JetBrains: Mellum (focalisé sur le code) Kotlin et Compose multiplateforme (stable en iOS) Hot Reload dans compose en alpha partenariat strategque avec Spring pour bien integrer kotlin dans spring Librairies Sortie d'une version Java de ADK, le framework d'agents IA lancé par Google https://glaforge.dev/posts/2025/05/20/writing-java-ai-agents-with-adk-for-java-getting-started/ Guillaume a travaillé sur le lancement de ce framework ! (améliorations de l'API, code d'exemple, doc…) Comment déployer un serveur MCP en Java, grâce à Quarkus, et le déployer sur Google Cloud Run https://glaforge.dev/posts/2025/06/09/building-an-mcp-server-with-quarkus-and-deploying-on-google-cloud-run/ Même Guillaume se met à faire du Quarkus ! Utilisation du support MCP développé par l'équipe Quarkus. C'est facile, suffit d'annoter une méthode avec @Tool et ses arguments avec @ToolArg et c'est parti ! L'outil MCP inspector est très pratique pour inspecter manuellement le fonctionnement de ses serveurs MCP Déployer sur Cloud Run est facile grâce aux Dockerfiles fournis par Quarkus En bonus, Guillaume montre comment configuré un serveur MCP comme un outil dans le framework ADK pour Java, pour créer ses agents IA Jilt 1.8 est sorti, un annotation processor pour le pattern builder https://www.endoflineblog.com/jilt-1_8-and-1_8_1-released processing incrémental pour Gradle meilleure couverture de votre code (pour ne pas comptabiliser le code généré par l'annotation processeur) une correction d'un problème lors de l'utilisation des types génériques récursifs (genre Node Hibernate Search 8 est sorti https://in.relation.to/2025/06/06/hibernate-search-8-0-0-Final/ aggregation de metriques compatibilité avec les dernieres OpenSearch et Elasticsearch Lucene 10 en backend Preview des requetes validées à la compilation Hibernate 7 est sorti https://in.relation.to/2025/05/20/hibernate-orm-seven/ ASL 2.0 Hibernate Validator 9 Jakarta Persistence 3.2 et Jakarta Validation 3.1 saveOrUpdate (reattachement d'entité) n'est plus supporté session stateless plus capable: oeprations unitaires et pas seulement bach, acces au cache de second niveau, m,eilleure API pour les batchs (insertMultiple etc) nouvelle API criteria simple et type-safe: et peut ajouter a une requete de base Un article qui décrit la Dev UI de Quarkus https://www.sfeir.dev/back/quarkus-dev-ui-linterface-ultime-pour-booster-votre-productivite-en-developpement-java/ apres un test pour soit ou une demo, c'est un article détaillé et la doc de Quarkus n'est pas top là dessus Vert.x 5 est sorti https://vertx.io/blog/eclipse-vert-x-5-released/ on en avait parlé fin de l'année dernière ou début d'année Modèle basé uniquement sur les Futures : Vert.x 5 abandonne le modèle de callbacks pour ne conserver que les Futures, avec une nouvelle classe de base VerticleBase mieux adaptée à ce modèle asynchrone. Support des modules Java (JPMS) : Vert.x 5 prend en charge le système de modules de la plateforme Java avec des modules explicites, permettant une meilleure modularité des applications. Améliorations majeures de gRPC : Support natif de gRPC Web et gRPC Transcoding (support HTTP/JSON et gRPC), format JSON en plus de Protobuf, gestion des timeouts et deadlines, services de réflexion et de health. Support d'io_uring : Intégration native du système io_uring de Linux (précédemment en incubation) pour de meilleures performances I/O sur les systèmes compatibles. Load balancing côté client : Nouvelles capacités de répartition de charge pour les clients HTTP et gRPC avec diverses politiques de distribution. Service Resolver : Nouveau composant pour la résolution dynamique d'adresses de services, étendant les capacités de load balancing à un ensemble plus large de résolveurs. Améliorations du proxy HTTP : Nouvelles transformations prêtes à l'emploi, interception des upgrades WebSocket et interface SPI pour le cache avec support étendu des spécifications. Suppressions et remplacements : Plusieurs composants sont dépréciés (gRPC Netty, JDBC API, Service Discovery) ou supprimés (Vert.x Sync, RxJava 1), remplacés par des alternatives plus modernes comme les virtual threads et Mutiny. Spring AI 1.0 est sorti https://spring.io/blog/2025/05/20/spring-ai-1-0-GA-released ChatClient multi-modèles : API unifiée pour interagir avec 20 modèles d'IA différents avec support multi-modal et réponses JSON structurées. Écosystème RAG complet : Support de 20 bases vectorielles, pipeline ETL et enrichissement automatique des prompts via des advisors. Fonctionnalités enterprise : Mémoire conversationnelle persistante, support MCP, observabilité Micrometer et évaluateurs automatisés. Agents et workflows : Patterns prédéfinis (routing, orchestration, chaînage) et agents autonomes pour applications d'IA complexes. Infrastructure Les modèles d'IA refusent d'être éteint et font du chantage pour l'eviter, voire essaient se saboter l'extinction https://www.thealgorithmicbridge.com/p/ai-companies-have-lost-controland?utm_source=substac[…]aign=email-restack-comment&r=2qoalf&triedRedirect=true Les chercheur d'Anthropic montrent comment Opus 4 faisait du chantage aux ingenieurs qui voulaient l'eteindre pour mettre une nouvelle version en ligne Une boite de recherche a montré la même chose d'Open AI o3 non seulemenmt il ne veut pas mais il essaye activement d'empêcher l'extinction Apple annonce le support de la virtualisation / conteneurisation dans macOS lors de la WWDC https://github.com/apple/containerization C'est open source Possibilité de lancer aussi des VM légères Documentation technique : https://apple.github.io/containerization/documentation/ Grosse chute de services internet suite à un soucis sur GCP Le retour de cloud flare https://blog.cloudflare.com/cloudflare-service-outage-june-12-2025/ Leur système de stockage (une dépendance majeure) dépend exclusivement de GCP Mais ils ont des plans pour surfit de cette dépendance exclusive la première analyse de Google https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1SsW Un quota auto mis à jour qui a mal tourné. ils ont bypassé le quota en code mais le service de quote en us-central1 était surchargé. Prochaines améliorations: pas d propagation de données corrompues, pas de déploiement global sans rolling upgrade avec monitoring qui peut couper par effet de bord (fail over) certains autres cloud providers ont aussi eu quelques soucis (charge) - unverified Data et Intelligence Artificielle Claude 4 est sorti https://www.anthropic.com/news/claude-4 Deux nouveaux modèles lancés : Claude Opus 4 (le meilleur modèle de codage au monde) et Claude Sonnet 4 (une amélioration significative de Sonnet 3.7) Claude Opus 4 atteint 72,5% sur SWE-bench et peut maintenir des performances soutenues sur des tâches longues durant plusieurs heures Claude Sonnet 4 obtient 72,7% sur SWE-bench tout en équilibrant performance et efficacité pour un usage quotidien Nouvelle fonctionnalité de “pensée étendue avec utilisation d'outils” permettant à Claude d'alterner entre raisonnement et usage d'outils Les modèles peuvent maintenant utiliser plusieurs outils en parallèle et suivre les instructions avec plus de précision Capacités mémoire améliorées : Claude peut extraire et sauvegarder des informations clés pour maintenir la continuité sur le long terme Claude Code devient disponible à tous avec intégrations natives VS Code et JetBrains pour la programmation en binôme Quatre nouvelles capacités API : outil d'exécution de code, connecteur MCP, API Files et mise en cache des prompts Les modèles hybrides offrent deux modes : réponses quasi-instantanées et pensée étendue pour un raisonnement plus approfondi en mode “agentique” L'intégration de l'IA au delà des chatbots et des boutons à étincelles https://glaforge.dev/posts/2025/05/23/beyond-the-chatbot-or-ai-sparkle-a-seamless-ai-integration/ Plaidoyer pour une IA intégrée de façon transparente et intuitive, au-delà des chatbots. Chatbots : pas toujours l'option LLM la plus intuitive ou la moins perturbatrice. Préconisation : IA directement dans les applications pour plus d'intelligence et d'utilité naturelle. Exemples d'intégration transparente : résumés des conversations Gmail et chat, web clipper Obsidian qui résume et taggue, complétion de code LLM. Meilleure UX IA : intégrée, contextuelle, sans “boutons IA” ou fenêtres de chat dédiées. Conclusion de Guillaume : intégrations IA réussies = partie naturelle du système, améliorant les workflows sans perturbation, le développeur ou l'utilisateur reste dans le “flow” Garder votre base de donnée vectorielle à jour avec Debezium https://debezium.io/blog/2025/05/19/debezium-as-part-of-your-ai-solution/ pas besoin de detailler mais expliquer idee de garder les changements a jour dans l'index Outillage guide pratique pour choisir le bon modèle d'IA à utiliser avec GitHub Copilot, en fonction de vos besoins en développement logiciel. https://github.blog/ai-and-ml/github-copilot/which-ai-model-should-i-use-with-github-copilot/ - Équilibre coût/performance : GPT-4.1, GPT-4o ou Claude 3.5 Sonnet pour des tâches générales et multilingues. - Tâches rapides : o4-mini ou Claude 3.5 Sonnet pour du prototypage ou de l'apprentissage rapide. - Besoins complexes : Claude 3.7 Sonnet, GPT-4.5 ou o3 pour refactorisation ou planification logicielle. - Entrées multimodales : Gemini 2.0 Flash ou GPT-4o pour analyser images, UI ou diagrammes. - Projets techniques/scientifiques : Gemini 2.5 Pro pour raisonnement avancé et gros volumes de données. UV, un package manager pour les pythonistes qui amène un peu de sanité et de vitesse http://blog.ippon.fr/2025/05/12/uv-un-package-manager-python-adapte-a-la-data-partie-1-theorie-et-fonctionnalites/ pour les pythonistes un ackage manager plus rapide et simple mais il est seulement semi ouvert (license) IntelliJ IDEA 2025.1 permet de rajouter un mode MCP client à l'assistant IA https://blog.jetbrains.com/idea/2025/05/intellij-idea-2025-1-model-context-protocol/ par exemple faire tourner un MCP server qui accède à la base de donnée Méthodologies Développement d'une bibliothèque OAuth 2.1 open source par Cloudflare, en grande partie générée par l'IA Claude: - Prompts intégrés aux commits : Chaque commit contient le prompt utilisé, ce qui facilite la compréhension de l'intention derrière le code. - Prompt par l'exemple : Le premier prompt montrait un exemple d'utilisation de l'API qu'on souhaite obtenir, ce qui a permis à l'IA de mieux comprendre les attentes. - Prompts structurés : Les prompts les plus efficaces suivaient un schéma clair : état actuel, justification du changement, et directive précise. - Traitez les prompts comme du code source : Les inclure dans les commits aide à la maintenance. - Acceptez les itérations : Chaque fonctionnalité a nécessité plusieurs essais. - Intervention humaine indispensable : Certaines tâches restent plus rapides à faire à la main. https://www.maxemitchell.com/writings/i-read-all-of-cloudflares-claude-generated-commits/ Sécurité Un packet npm malicieux passe par Cursor AI pour infecter les utilisateurs https://thehackernews.com/2025/05/malicious-npm-packages-infect-3200.html Trois packages npm malveillants ont été découverts ciblant spécifiquement l'éditeur de code Cursor sur macOS, téléchargés plus de 3 200 fois au total.Les packages se déguisent en outils de développement promettant “l'API Cursor la moins chère” pour attirer les développeurs intéressés par des solutions AI abordables. Technique d'attaque sophistiquée : les packages volent les identifiants utilisateur, récupèrent un payload chiffré depuis des serveurs contrôlés par les pirates, puis remplacent le fichier main.js de Cursor. Persistance assurée en désactivant les mises à jour automatiques de Cursor et en redémarrant l'application avec le code malveillant intégré. Nouvelle méthode de compromission : au lieu d'injecter directement du malware, les attaquants publient des packages qui modifient des logiciels légitimes déjà installés sur le système. Persistance même après suppression : le malware reste actif même si les packages npm malveillants sont supprimés, nécessitant une réinstallation complète de Cursor. Exploitation de la confiance : en s'exécutant dans le contexte d'une application légitime (IDE), le code malveillant hérite de tous ses privilèges et accès. Package “rand-user-agent” compromis : un package légitime populaire a été infiltré pour déployer un cheval de Troie d'accès distant (RAT) dans certaines versions. Recommandations de sécurité : surveiller les packages exécutant des scripts post-installation, modifiant des fichiers hors node_modules, ou initiant des appels réseau inattendus, avec monitoring d'intégrité des fichiers. Loi, société et organisation Le drama OpenRewrite (automatisation de refactoring sur de larges bases de code) est passé en mode propriétaire https://medium.com/@jonathan.leitschuh/when-open-source-isnt-how-openrewrite-lost-its-way-642053be287d Faits Clés : Moderne, Inc. a re-licencié silencieusement du code OpenRewrite (dont rewrite-java-security) de la licence Apache 2.0 à une licence propriétaire (MPL) sans consultation des contributeurs. Ce re-licenciement rend le code inaccessible et non modifiable pour les contributeurs originaux. Moderne s'est retiré de la Commonhaus Foundation (dédiée à l'open source) juste avant ces changements. La justification de Moderne est la crainte que de grandes entreprises utilisent OpenRewrite sans contribuer, créant une concurrence. Des contributions communautaires importantes (VMware, AliBaba) sous Apache 2.0 ont été re-licenciées sans leur consentement. La légalité de ce re-licenciement est incertaine sans CLA des contributeurs. Cette action crée un précédent dangereux pour les futurs contributeurs et nuit à la confiance dans l'écosystème OpenRewrite. Corrections de Moderne (Suite aux réactions) : Les dépôts Apache originaux ont été restaurés et archivés. Des versions majeures ont été utilisées pour signaler les changements de licence. Des espaces de noms distincts (org.openrewrite vs. io.moderne) ont été créés pour différencier les modules. Suggestions de Correction de l'Auteur : Annuler les changements de licence sur toutes les recettes communautaires. S'engager dans le dialogue et communiquer publiquement les changements majeurs. Respecter le versionnement sémantique (versions majeures pour les changements de licence). L'ancien gourou du design d'Apple, Jony Ive, va occuper un rôle majeur chez OpenAI OpenAI va acquérir la startup d'Ive pour 6,5 milliards de dollars, tandis qu'Ive et le PDG Sam Altman travaillent sur une nouvelle génération d'appareils et d'autres produits d'IA https://www.wsj.com/tech/ai/former-apple-design-guru-jony-ive-to-take-expansive-role-at-openai-5787f7da Rubrique débutant Un article pour les débutants sur le lien entre source, bytecode et le debug https://blog.jetbrains.com/idea/2025/05/sources-bytecode-debugging/ le debugger voit le bytecode et le lien avec la ligne ou la methode est potentiellement perdu javac peut ajouter les ligne et offset des operations pour que le debugger les affichent les noms des arguments est aussi ajoutable dans le .class quand vous pointez vers une mauvaise version du fichier source, vous avez des lignes decalées, c'est pour ca peu de raisons de ne pas actier des approches de compilations mais cela rend le fichier un peu plus gros Conférences La liste des conférences provenant de Developers Conferences Agenda/List par Aurélie Vache et contributeurs : 11-13 juin 2025 : Devoxx Poland - Krakow (Poland) 12-13 juin 2025 : Agile Tour Toulouse - Toulouse (France) 12-13 juin 2025 : DevLille - Lille (France) 13 juin 2025 : Tech F'Est 2025 - Nancy (France) 17 juin 2025 : Mobilis In Mobile - Nantes (France) 19-21 juin 2025 : Drupal Barcamp Perpignan 2025 - Perpignan (France) 24 juin 2025 : WAX 2025 - Aix-en-Provence (France) 25 juin 2025 : Rust Paris 2025 - Paris (France) 25-26 juin 2025 : Agi'Lille 2025 - Lille (France) 25-27 juin 2025 : BreizhCamp 2025 - Rennes (France) 26-27 juin 2025 : Sunny Tech - Montpellier (France) 1-4 juillet 2025 : Open edX Conference - 2025 - Palaiseau (France) 7-9 juillet 2025 : Riviera DEV 2025 - Sophia Antipolis (France) 5 septembre 2025 : JUG Summer Camp 2025 - La Rochelle (France) 12 septembre 2025 : Agile Pays Basque 2025 - Bidart (France) 18-19 septembre 2025 : API Platform Conference - Lille (France) & Online 23 septembre 2025 : OWASP AppSec France 2025 - Paris (France) 25-26 septembre 2025 : Paris Web 2025 - Paris (France) 2-3 octobre 2025 : Volcamp - Clermont-Ferrand (France) 3 octobre 2025 : DevFest Perros-Guirec 2025 - Perros-Guirec (France) 6-7 octobre 2025 : Swift Connection 2025 - Paris (France) 6-10 octobre 2025 : Devoxx Belgium - Antwerp (Belgium) 7 octobre 2025 : BSides Mulhouse - Mulhouse (France) 9 octobre 2025 : DevCon #25 : informatique quantique - Paris (France) 9-10 octobre 2025 : Forum PHP 2025 - Marne-la-Vallée (France) 9-10 octobre 2025 : EuroRust 2025 - Paris (France) 16 octobre 2025 : PlatformCon25 Live Day Paris - Paris (France) 16 octobre 2025 : Power 365 - 2025 - Lille (France) 16-17 octobre 2025 : DevFest Nantes - Nantes (France) 30-31 octobre 2025 : Agile Tour Bordeaux 2025 - Bordeaux (France) 30-31 octobre 2025 : Agile Tour Nantais 2025 - Nantes (France) 30 octobre 2025-2 novembre 2025 : PyConFR 2025 - Lyon (France) 4-7 novembre 2025 : NewCrafts 2025 - Paris (France) 5-6 novembre 2025 : Tech Show Paris - Paris (France) 6 novembre 2025 : dotAI 2025 - Paris (France) 7 novembre 2025 : BDX I/O - Bordeaux (France) 12-14 novembre 2025 : Devoxx Morocco - Marrakech (Morocco) 13 novembre 2025 : DevFest Toulouse - Toulouse (France) 15-16 novembre 2025 : Capitole du Libre - Toulouse (France) 19 novembre 2025 : SREday Paris 2025 Q4 - Paris (France) 20 novembre 2025 : OVHcloud Summit - Paris (France) 21 novembre 2025 : DevFest Paris 2025 - Paris (France) 27 novembre 2025 : DevFest Strasbourg 2025 - Strasbourg (France) 28 novembre 2025 : DevFest Lyon - Lyon (France) 5 décembre 2025 : DevFest Dijon 2025 - Dijon (France) 10-11 décembre 2025 : Devops REX - Paris (France) 10-11 décembre 2025 : Open Source Experience - Paris (France) 28-31 janvier 2026 : SnowCamp 2026 - Grenoble (France) 2-6 février 2026 : Web Days Convention - Aix-en-Provence (France) 3 février 2026 : Cloud Native Days France 2026 - Paris (France) 23-25 avril 2026 : Devoxx Greece - Athens (Greece) 17 juin 2026 : Devoxx Poland - Krakow (Poland) Nous contacter Pour réagir à cet épisode, venez discuter sur le groupe Google https://groups.google.com/group/lescastcodeurs Contactez-nous via X/twitter https://twitter.com/lescastcodeurs ou Bluesky https://bsky.app/profile/lescastcodeurs.com Faire un crowdcast ou une crowdquestion Soutenez Les Cast Codeurs sur Patreon https://www.patreon.com/LesCastCodeurs Tous les épisodes et toutes les infos sur https://lescastcodeurs.com/

ai power google apple france pr spring data tool ga types flash dans conclusion preview patterns ia swift technique faire futures gemini intervention suggestions load ils package instructions correction blue sky api conf rat chatbots io gmail gpt ui leur nouvelle alibaba linux java trois guillaume corrections moderne uv int aur exploitation conversions wwdc vm sync entr plusieurs macos documentation apache opus llm mod prompt nouvelles gestion prompts grosse wax quatre rc mutiny vmware obsidian vert certaines asl vall ide katia projets loi rag utiliser cloudflare garder sonnets aix enregistr jony ive json capacit apple wwdc cla paris france respecter spi mcp cursor github copilot besoins possibilit compose utilisation vs code recommandations capitole plaidoyer etl exemples persistance webassembly kotlin prochaines oauth w3c wasm vache jetbrains swe hibernate mpl troie amper grpc mon ami websockets rubrique devcon gradle jdk lyon france adk acceptez bordeaux france intellij idea simd cloud run provence france strasbourg france adelin rxjava lille france micrometer dijon france

Building an AI+Data Startup Studio | Tom Chavez, cofounder of super{set}

Infinite Machine Learning

Play Episode Listen Later Jun 10, 2025 52:57

Tom Chavez is the cofounder of super{set}, a startup studio that founds, funds, and builds data and AI startups. Prior to this, he was the CEO and co-founder of Krux, a martech platform acquired by Salesforce in 2016. Before Krux, he was the CEO and co-founder of Rapt, a provider of software for media monetization acquired by Microsoft in 2008. He went to Harvard for undergrad and Stanford for his PhD.Tom's favorite book: The Three Musketeers (Author: Alexandre Dumas)(00:01) Origin Story and Starting Superset(02:58) How Superset Evaluates Ideas and Risk(06:24) What Is a Venture Studio and How Superset Works(10:49) Underfunded Layers in AI Infrastructure(14:55) Orchestration Opportunities in LLM Workflows(15:49) The Future of Data Infra and ETL in the AI Era(20:46) Code Infra: Code Quality and AI-Generated Software(24:55) Model Infra, MLOps, and Why It's Underwhelming(27:22) Cloud Economics and Gross Margins in AI Companies(32:15) Early Team Structure in AI Infra Startups(34:49) Full Stack vs Composable Infra in AI(37:52) Fragmentation vs Consolidation in AI Tooling(41:02) Where Moats Will Accumulate: Data In, AI, Data Out(45:10) Biggest Challenge in Building Superset(46:23) Rapid Fire Round--------Where to find Tom Chavez: LinkedIn: https://www.linkedin.com/in/tommychavez/--------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 X: https://x.com/prateekvjoshi

The history and future of the data ecosystem (w/ Lonne Jaffe)

The Analytics Engineering Podcast

Play Episode Listen Later Jun 8, 2025 53:53

In this decades-spanning episode, Tristan Handy sits down with Lonne Jaffe, Managing Director at Insight Partners and former CEO of Syncsort (now Precisely), to trace the history of the data ecosystem—from its mainframe origins to its AI-infused future. Lonne reflects on the evolution of ETL, the unexpected staying power of legacy tech, and why AI may finally erode the switching costs that have long protected incumbents. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

ceo history ai data managing directors labs ecosystem jaffe etl insight partners syncsort

Unlocking Med Spa Growth with Smarter Data

Medical Spa Insider

Play Episode Listen Later May 28, 2025 41:40

Thiersch, JD, speaks with Alex Lirtsman, founder and CEO of CorralData, to explore how medical spas can unlock real-time, HIPAA-compliant insights without changing their existing systems. CorralData integrates everything from your EMR to marketing, payroll, and finance systems, giving med spas the ability to uncover actionable insights that drive profitability, patient retention, and scalable growth. Listen for strategies to ask smarter questions of your data, including: Integrating all of your existing platforms to get actionable insights from your data; How multi-location practices, med spa rollups and private equity develop playbooks; Navigating HIPAA and BAAs with AI companies to create secure data analysis tools; Using reverse ETL to optimize for high lifetime value patients and boost profitability; The questions you can ask your data with conversational AI and large language models; CorralData's tailor-made solutions for Advanced MedAesthetic Partners, and more! -- Music by Ghost Score

ceo music ai growth data unlocking smarter jd integrating hipaa emr baas medspa etl

The End of Stale AI Data with Snow Leopard | Episode #99

Great Things with Great Tech!

Play Episode Listen Later May 12, 2025 41:07

We all talk about #AI, but what good is it if your models are powered by stale, outdated data?In Episode 99 of Great Things with Great Tech, Deepti Srivastava, founder and CEO of Snow Leopard, and former founding PM of Google Spanner, calls out the broken state of enterprise AI. With decades of experience in distributed systems and data infrastructure, Deepti unveils how Snow Leopard is redefining how AI applications are built, by tapping into live, real-time data from SQL and APIs without the need for ETL or pipelines.Instead of relying on static snapshots or disconnected data lakes, Snow Leopard's #agentic platform queries native sources like PostgreSQL, Snowflake, and Salesforce on-demand, empowering AI to live directly in the critical decision path.In This Episode, We Cover:Deepti's journey from building Spanner at Google to founding Snow Leopard AI.Why most enterprise AI fails due to reliance on stale data and outdated pipelines. How Snow Leopard federates live data across SQL and APIs with zero ETL.The limitations of vector databases in structured, real-time business use cases.Why putting AI in the critical path of business decisions unlocks real value.Snow Leopard is a U.S.-based technology company founded in 2023 by and is Headquartered in San Francisco, CaliforniaSnow Leopard specializes in building a platform that enables the development of production-ready AI applications by leveraging live business data. The company's approach focuses on real-time data retrieval directly from sources like SQL databases and APIs, eliminating the need for traditional ETL processes and data pipelines. This innovation allows for more accurate and timely AI-driven business decision.PODCAST LINKSGreat Things with Great Tech Podcast: https://gtwgt.comGTwGT Playlist on YouTube: https://www.youtube.com/@GTwGTPodcastListen on Spotify: https://open.spotify.com/show/5Y1Fgl4DgGpFd5Z4dHulVXListen on Apple Podcasts: https://podcasts.apple.com/us/podcast/great-things-with-great-tech-podcast/id1519439787EPISODE LINKSSnow Leopard Web: https://www.snowleopard.ai/Deepti Srivastava on LinkedIn:https://www.linkedin.com/in/thedeepti/Snow Leopard on LinkedIn: https://www.linkedin.com/company/snow-leopard-ai/GTwGT LINKSSupport the Channel: https://ko-fi.com/gtwgtBe on #GTwGT: Contact via Twitter/X @GTwGTPodcast or visit https://www.gtwgt.comSubscribe to YouTube: https://www.youtube.com/@GTwGTPodcast?sub_confirmation=1Great Things with Great Tech Podcast Website: https://gtwgt.comSOCIAL LINKSFollow GTwGT on Social Media:Twitter/X: https://twitter.com/GTwGTPodcastInstagram: https://www.instagram.com/GTwGTPodcastTikTok: https://www.tiktok.com/@GTwGTPodcast

ceo spotify ai google san francisco salesforce snowflakes apis great things sql stale bensound headquartered ai data etl postgresql snow leopards deepti spanner

Pourquoi La Joconde n'a-t-elle pas de sourcils ?

Choses à Savoir

Play Episode Listen Later May 5, 2025 2:12

Si tu observes attentivement la Joconde, le célèbre tableau de Léonard de Vinci exposé au Louvre, un détail intrigue immédiatement : elle n'a ni sourcils ni cils. Un visage d'une précision incroyable, un regard presque vivant… mais un front totalement nu. Comment expliquer cette absence ?Une mode de la Renaissance ?Pendant longtemps, on a pensé que l'absence de sourcils était simplement liée à la mode de l'époque. Au début du XVIe siècle, en Italie, certaines femmes aristocrates s'épilaient les sourcils (et parfois la racine des cheveux) pour dégager le front, considéré alors comme un signe de beauté et de noblesse. Selon cette hypothèse, Mona Lisa (ou Lisa Gherardini, si l'on en croit la thèse majoritaire) aurait pu suivre cette tendance esthétique.Mais cette explication ne tient pas totalement : d'autres portraits de femmes de la même époque montrent clairement des sourcils, même fins ou discrets. Et Léonard de Vinci, connu pour son obsession du réalisme, aurait-il vraiment volontairement omis un tel détail ?Une disparition progressiveL'explication la plus crédible aujourd'hui repose sur l'histoire matérielle du tableau. La Joconde a plus de 500 ans, et au fil des siècles, elle a été soumise à des restaurations, nettoyages et vernissages qui ont pu altérer les détails les plus fins.Une étude scientifique menée par le spécialiste Pascal Cotte, en 2004, à l'aide d'une technologie de réflectographie multispectrale, a révélé qu'à l'origine, Léonard avait bien peint des sourcils et des cils, très fins et délicats. Mais ces détails auraient disparu avec le temps, en raison de l'usure naturelle de la couche picturale ou de restaurations trop agressives. En somme, les sourcils étaient là, mais ils se sont effacés au fil des siècles.Un effet renforçant le mystèreL'absence de sourcils contribue aussi, paradoxalement, au mystère et à l'ambiguïté du visage de la Joconde. Son expression indéfinissable, ce mélange de sourire et de neutralité, est renforcé par ce manque de lignes faciales qui encadreraient normalement le regard. Ce flou contribue au caractère intemporel et énigmatique du tableau, qui fascine depuis des siècles.En résumé : la Joconde avait probablement des sourcils, peints avec la finesse propre à Léonard de Vinci. Mais le temps, les restaurations et les vernis les ont effacés. Ce détail oublié est devenu un élément clé de son mystère. Hébergé par Acast. Visitez acast.com/privacy pour plus d'informations.

acast renaissance pendant visitez vinci louvre mona lisa selon italie rel pourquoi la etl xvie joconde sourcils

EP. 264 Beyond the Database: Mastering Multi-Cloud Data, AI Automation & Integration (feat. Peter Ngai, SnapLogic)

The MongoDB Podcast

Play Episode Listen Later May 1, 2025 58:31

✨ Heads up! This episode features a demonstration of the SnapLogic UI and its AI Agent Creator towards the end. For the full visual experience, check out the video version on the Spotify app! ✨(Episode Summary)Tired of tangled data spread across multiple clouds, on-premise systems, and the edge? In this episode, MongoDB's Shane McAllister sits down with Peter Ngai, Principal Architect at SnapLogic, to explore the future of data integration and management in today's complex tech landscape.Dive into the challenges and solutions surrounding modern data architecture, including:Navigating the complexities of multi-cloud and hybrid cloud environments.The secrets to building flexible, resilient data ecosystems that avoid vendor lock-in.Strategies for seamless data integration and connecting disparate applications using low-code/no-code platforms like SnapLogic.Meeting critical data compliance, security, and sovereignty demands (think GDPR, HIPAA, etc.).How AI is revolutionizing data automation and providing faster access to insights (featuring SnapLogic's Agent Creator).The powerful synergy between SnapLogic and MongoDB, leveraging MongoDB both internally and for customer integrations.Real-world applications, from IoT data processing to simplifying enterprise workflows.Whether you're an IT leader, data engineer, business analyst, or simply curious about cloud strategy, iPaaS solutions, AI in business, or simplifying your data stack, Peter offers invaluable insights into making data connectivity a driver, not a barrier, for innovation.-Keywords: Data Integration, Multi-Cloud, Hybrid Cloud, Edge Computing, SnapLogic, MongoDB, AI, Artificial Intelligence, Data Automation, iPaaS, Low-Code, No-Code, Data Architecture, Data Management, Cloud Data, Enterprise Data, API Integration, Data Compliance, Data Sovereignty, Data Security, Business Automation, ETL, ELT, Tech Stack Simplification, Peter Ngai, Shane McAllister.

What is Oracle GoldenGate 23ai?

Oracle University Podcast

Play Episode Listen Later Apr 29, 2025 18:03

In a new season of the Oracle University Podcast, Lois Houston and Nikita Abraham dive into the world of Oracle GoldenGate 23ai, a cutting-edge software solution for data management. They are joined by Nick Wagner, a seasoned expert in database replication, who provides a comprehensive overview of this powerful tool. Nick highlights GoldenGate's ability to ensure continuous operations by efficiently moving data between databases and platforms with minimal overhead. He emphasizes its role in enabling real-time analytics, enhancing data security, and reducing costs by offloading data to low-cost hardware. The discussion also covers GoldenGate's role in facilitating data sharing, improving operational efficiency, and reducing downtime during outages. Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. --------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Nikita: Welcome to the Oracle University Podcast! I'm Nikita Abraham, Team Lead: Editorial Services with Oracle University, and with me is Lois Houston: Director of Innovation Programs. Lois: Hi everyone! Welcome to a new season of the podcast. This time, we're focusing on the fundamentals of Oracle GoldenGate. Oracle GoldenGate helps organizations manage and synchronize their data across diverse systems and databases in real time. And with the new Oracle GoldenGate 23ai release, we'll uncover the latest innovations and features that empower businesses to make the most of their data. Nikita: Taking us through this is Nick Wagner, Senior Director of Product Management for Oracle GoldenGate. He's been doing database replication for about 25 years and has been focused on GoldenGate on and off for about 20 of those years. 01:18 Lois: In today's episode, we'll ask Nick to give us a general overview of the product, along with some use cases and benefits. Hi Nick! To start with, why do customers need GoldenGate? Nick: Well, it delivers continuous operations, being able to continuously move data from one database to another database or data platform in efficiently and a high-speed manner, and it does this with very low overhead. Almost all the GoldenGate environments use transaction logs to pull the data out of the system, so we're not creating any additional triggers or very little overhead on that source system. GoldenGate can also enable real-time analytics, being able to pull data from all these different databases and move them into your analytics system in real time can improve the value that those analytics systems provide. Being able to do real-time statistics and analysis of that data within those high-performance custom environments is really important. 02:13 Nikita: Does it offer any benefits in terms of cost? Nick: GoldenGate can also lower IT costs. A lot of times people run these massive OLTP databases, and they are running reporting in those same systems. With GoldenGate, you can offload some of the data or all the data to a low-cost commodity hardware where you can then run the reports on that other system. So, this way, you can get back that performance on the OLTP system, while at the same time optimizing your reporting environment for those long running reports. You can improve efficiencies and reduce risks. Being able to reduce the amount of downtime during planned and unplanned outages can really make a big benefit to the overall operational efficiencies of your company. 02:54 Nikita: What about when it comes to data sharing and data security? Nick: You can also reduce barriers to data sharing. Being able to pull subsets of data, or just specific pieces of data out of a production database and move it to the team or to the group that needs that information in real time is very important. And it also protects the security of your data by only moving in the information that they need and not the entire database. It also provides extensibility and flexibility, being able to support multiple different replication topologies and architectures. 03:24 Lois: Can you tell us about some of the use cases of GoldenGate? Where does GoldenGate truly shine? Nick: Some of the more traditional use cases of GoldenGate include use within the multicloud fabric. Within a multicloud fabric, this essentially means that GoldenGate can replicate data between on-premise environments, within cloud environments, or hybrid, cloud to on-premise, on-premise to cloud, or even within multiple clouds. So, you can move data from AWS to Azure to OCI. You can also move between the systems themselves, so you don't have to use the same database in all the different clouds. For example, if you wanted to move data from AWS Postgres into Oracle running in OCI, you can do that using Oracle GoldenGate. We also support maximum availability architectures. And so, there's a lot of different use cases here, but primarily geared around reducing your recovery point objective and recovery time objective. 04:20 Lois: Ah, reducing RPO and RTO. That must have a significant advantage for the customer, right? Nick: So, reducing your RPO and RTO allows you to take advantage of some of the benefits of GoldenGate, being able to do active-active replication, being able to set up GoldenGate for high availability, real-time failover, and it can augment your active Data Guard and Data Guard configuration. So, a lot of times GoldenGate is used within Oracle's maximum availability architecture platinum tier level of replication, which means that at that point you've got lots of different capabilities within the Oracle Database itself. But to help eke out that last little bit of high availability, you want to set up an active-active environment with GoldenGate to really get true zero RPO and RTO. GoldenGate can also be used for data offloading and data hubs. Being able to pull data from one or more source systems and move it into a data hub, or into a data warehouse for your operational reporting. This could also be your analytics environment too. 05:22 Nikita: Does GoldenGate support online migrations? Nick: In fact, a lot of companies actually get started in GoldenGate by doing a migration from one platform to another. Now, these don't even have to be something as complex as going from one database like a DB2 on-premise into an Oracle on OCI, it could even be simple migrations. A lot of times doing something like a major application or a major database version upgrade is going to take downtime on that production system. You can use GoldenGate to eliminate that downtime. So this could be going from Oracle 19c to Oracle 23ai, or going from application version 1.0 to application version 2.0, because GoldenGate can do the transformation between the different application schemas. You can use GoldenGate to migrate your database from on premise into the cloud with no downtime as well. We also support real-time analytic feeds, being able to go from multiple databases, not only those on premise, but being able to pull information from different SaaS applications inside of OCI and move it to your different analytic systems. And then, of course, we also have the ability to stream events and analytics within GoldenGate itself. 06:34 Lois: Let's move on to the various topologies supported by GoldenGate. I know GoldenGate supports many different platforms and can be used with just about any database. Nick: This first layer of topologies is what we usually consider relational database topologies. And so this would be moving data from Oracle to Oracle, Postgres to Oracle, Sybase to SQL Server, a lot of different types of databases. So the first architecture would be unidirectional. This is replicating from one source to one target. You can do this for reporting. If I wanted to offload some reports into another server, I can go ahead and do that using GoldenGate. I can replicate the entire database or just a subset of tables. I can also set up GoldenGate for bidirectional, and this is what I want to set up GoldenGate for something like high availability. So in the event that one of the servers crashes, I can almost immediately reconnect my users to the other system. And that almost immediately depends on the amount of latency that GoldenGate has at that time. So a typical latency is anywhere from 3 to 6 seconds. So after that primary system fails, I can reconnect my users to the other system in 3 to 6 seconds. And I can do that because as GoldenGate's applying data into that target database, that target system is already open for read and write activity. GoldenGate is just another user connecting in issuing DML operations, and so it makes that failover time very low. 07:59 Nikita: Ok…If you can get it down to 3 to 6 seconds, can you bring it down to zero? Like zero failover time? Nick: That's the next topology, which is active-active. And in this scenario, all servers are read/write all at the same time and all available for user activity. And you can do multiple topologies with this as well. You can do a mesh architecture, which is where every server talks to every other server. This works really well for 2, 3, 4, maybe even 5 environments, but when you get beyond that, having every server communicate with every other server can get a little complex. And so at that point we start looking at doing what we call a hub and spoke architecture, where we have lots of different spokes. At the end of each spoke is a read/write database, and then those communicate with a hub. So any change that happens on one spoke gets sent into the hub, and then from the hub it gets sent out to all the other spokes. And through that architecture, it allows you to really scale up your environments. We have customers that are doing up to 150 spokes within that hub architecture. Within active-active replication as well, we can do conflict detection and resolution, which means that if two users modify the same row on two different systems, GoldenGate can actually determine that there was an issue with that and determine what user wins or which row change wins, which is extremely important when doing active-active replication. And this means that if one of those systems fails, there is no downtime when you switch your users to another active system because it's already available for activity and ready to go. 09:35 Lois: Wow, that's fantastic. Ok, tell us more about the topologies. Nick: GoldenGate can do other things like broadcast, sending data from one system to multiple systems, or many to one as far as consolidation. We can also do cascading replication, so when data moves from one environment that GoldenGate is replicating into another environment that GoldenGate is replicating. By default, we ignore all of our own transactions. But there's actually a toggle switch that you can flip that says, hey, GoldenGate, even though you wrote that data into that database, still push it on to the next system. And then of course, we can also do distribution of data, and this is more like moving data from a relational database into something like a Kafka topic or a JMS queue or into some messaging service. 10:24 Raise your game with the Oracle Cloud Applications skills challenge. Get free training on Oracle Fusion Cloud Applications, Oracle Modern Best Practice, and Oracle Cloud Success Navigator. Pass the free Oracle Fusion Cloud Foundations Associate exam to earn a Foundations Associate certification. Plus, there's a chance to win awards and prizes throughout the challenge! What are you waiting for? Join the challenge today by visiting visit oracle.com/education. 10:58 Nikita: Welcome back! Nick, does GoldenGate also have nonrelational capabilities? Nick: We have a number of nonrelational replication events in topologies as well. This includes things like data lake ingestion and streaming ingestion, being able to move data and data objects from these different relational database platforms into data lakes and into these streaming systems where you can run analytics on them and run reports. We can also do cloud ingestion, being able to move data from these databases into different cloud environments. And this is not only just moving it into relational databases with those clouds, but also their data lakes and data fabrics. 11:38 Lois: You mentioned a messaging service earlier. Can you tell us more about that? Nick: Messaging replication is also possible. So we can actually capture from things like messaging systems like Kafka Connect and JMS, replicate that into a relational data, or simply stream it into another environment. We also support NoSQL replication, being able to capture from MongoDB and replicate it onto another MongoDB for high availability or disaster recovery, or simply into any other system. 12:06 Nikita: I see. And is there any integration with a customer's SaaS applications? Nick: GoldenGate also supports a number of different OCI SaaS applications. And so a lot of these different applications like Oracle Financials Fusion, Oracle Transportation Management, they all have GoldenGate built under the covers and can be enabled with a flag that you can actually have that data sent out to your other GoldenGate environment. So you can actually subscribe to changes that are happening in these other systems with very little overhead. And then of course, we have event processing and analytics, and this is the final topology or flexibility within GoldenGate itself. And this is being able to push data through data pipelines, doing data transformations. GoldenGate is not an ETL tool, but it can do row-level transformation and row-level filtering. 12:55 Lois: Are there integrations offered by Oracle GoldenGate in automation and artificial intelligence? Nick: We can do time series analysis and geofencing using the GoldenGate Stream Analytics product. It allows you to actually do real time analysis and time series analysis on data as it flows through the GoldenGate trails. And then that same product, the GoldenGate Stream Analytics, can then take the data and move it to predictive analytics, where you can run MML on it, or ONNX or other Spark-type technologies and do real-time analysis and AI on that information as it's flowing through. 13:29 Nikita: So, GoldenGate is extremely flexible. And given Oracle's focus on integrating AI into its product portfolio, what about GoldenGate? Does it offer any AI-related features, especially since the product name has “23ai” in it? Nick: With the advent of Oracle GoldenGate 23ai, it's one of the two products at this point that has the AI moniker at Oracle. Oracle Database 23ai also has it, and that means that we actually do stuff with AI. So the Oracle GoldenGate product can actually capture vectors from databases like MySQL HeatWave, Postgres using pgvector, which includes things like AlloyDB, Amazon RDS Postgres, Aurora Postgres. We can also replicate data into Elasticsearch and OpenSearch, or if the data is using vectors within OCI or the Oracle Database itself. So GoldenGate can be used for a number of things here. The first one is being able to migrate vectors into the Oracle Database. So if you're using something like Postgres, MySQL, and you want to migrate the vector information into the Oracle Database, you can. Now one thing to keep in mind here is a vector is oftentimes like a GPS coordinate. So if I need to know the GPS coordinates of Austin, Texas, I can put in a latitude and longitude and it will give me the GPS coordinates of a building within that city. But if I also need to know the altitude of that same building, well, that's going to be a different algorithm. And GoldenGate and replicating vectors is the same way. When you create a vector, it's essentially just creating a bunch of numbers under the screen, kind of like those same GPS coordinates. The dimension and the algorithm that you use to generate that vector can be different across different databases, but the actual meaning of that data will change. And so GoldenGate can replicate the vector data as long as the algorithm and the dimensions are the same. If the algorithm and the dimensions are not the same between the source and the target, then you'll actually want GoldenGate to replicate the base data that created that vector. And then once GoldenGate replicates the base data, it'll actually call the vector embedding technology to re-embed that data and produce that numerical formatting for you. 15:42 Lois: So, there are some nuances there… Nick: GoldenGate can also replicate and consolidate vector changes or even do the embedding API calls itself. This is really nice because it means that we can take changes from multiple systems and consolidate them into a single one. We can also do the reverse of that too. A lot of customers are still trying to find out which algorithms work best for them. How many dimensions? What's the optimal use? Well, you can now run those in different servers without impacting your actual AI system. Once you've identified which algorithm and dimension is going to be best for your data, you can then have GoldenGate replicate that into your production system and we'll start using that instead. So it's a nice way to switch algorithms without taking extensive downtime. 16:29 Nikita: What about in multicloud environments? Nick: GoldenGate can also do multicloud and N-way active-active Oracle replication between vectors. So if there's vectors in Oracle databases, in multiple clouds, or multiple on-premise databases, GoldenGate can synchronize them all up. And of course we can also stream changes from vector information, including text as well into different search engines. And that's where the integration with Elasticsearch and OpenSearch comes in. And then we can use things like NVIDIA and Cohere to actually do the AI on that data. 17:01 Lois: Using GoldenGate with AI in the database unlocks so many possibilities. Thanks for that detailed introduction to Oracle GoldenGate 23ai and its capabilities, Nick. Nikita: We've run out of time for today, but Nick will be back next week to talk about how GoldenGate has evolved over time and its latest features. And if you liked what you heard today, head over to mylearn.oracle.com and take a look at the Oracle GoldenGate 23ai Fundamentals course to learn more. Until next time, this is Nikita Abraham… Lois: And Lois Houston, signing off! 17:33 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

texas ai pass raise gps spark oracle saas senior director fundamentals nvidia api aws product management azure kafka mongodb david wright mysql rpo rto multicloud nosql etl sql server elasticsearch postgres oci jms cohere mml sybase db2 oracle database dml oltp oracle university nick you innovation programs nick wagner nick well nick so kafka connect oracle cloud applications

Building Trust in Data: Transparency, Collaboration, and Governance for Successful AI

Data Transforming Business

Play Episode Listen Later Apr 14, 2025 22:59

"So you want trusted data, but you want it now? Building this trust really starts with transparency and collaboration. It's not just technology. It's about creating a single governed view of data that is consistent no matter who accesses it, " says Errol Rodericks, Director of Product Marketing at Denodo.In this episode of the 'Don't Panic, It's Just Data' podcast, Shawn Rogers, CEO at BARC US, speaks with Errol Rodericks from Denodo. They explore the crucial link between trusted data and successful AI initiatives. They discuss key factors such as data orchestration, governance, and cost management within complex cloud environments. We've all heard the horror stories – AI projects that fail spectacularly, delivering biased or inaccurate results. But what's the root cause of these failures? More often than not, it's a lack of focus on the data itself. Rodericks emphasises that "AI is only as good as the data it's trained on." This episode explores how organisations can avoid the "garbage in, garbage out" scenario by prioritising data quality, lineage, and responsible AI practices. Learn how to avoid AI failures and discover strategies for building an AI-ready data foundation that ensures trusted, reliable outcomes. Key topics include overcoming data bias, ETL processes, and improving data sharing practices.TakeawaysBad data leads to bad AI outputs.Trust in data is essential for effective AI.Organisations must prioritise data quality and orchestration.Transparency and collaboration are key to building trust in data.Compliance is a responsibility for the entire organisation, not just IT.Agility in accessing data is crucial for AI success.Chapters00:00 The Importance of Data Quality in AI02:57 Building Trust in Data Ecosystems06:11 Navigating Complex Data Landscapes09:11 Top-Down Pressure for AI Strategy11:49 Responsible AI and Data Governance15:08 Challenges in Personalisation and Compliance17:47 The Role of Speed in Data Utilisation20:47 Advice for CFOs on AI InvestmentsAbout DenodoDenodo is a leader in data management. The award-winning Denodo Platform is the leading logical data management platform for transforming data into trustworthy insights and outcomes for all data-related initiatives across the enterprise, including AI and self-service. Denodo's customers in all industries all over the world have delivered trusted AI-ready and business-ready data in a third of the time and with 10x better performance than with lakehouses and other mainstream data platforms alone.

EP.260 Vector Search Secrets Revealed! - AI-Powered Image Search with MongoDB - Live Demo

The MongoDB Podcast

Play Episode Listen Later Mar 26, 2025 57:33

Ever wondered how companies like Amazon or Pinterest deliver lightning-fast image search? Dive into this episode of MongoDB Podcast Live with Shane McAllister and Nenad, a MongoDB Champion, as they unravel the magic of semantic image search powered by MongoDB Atlas Vector Search!

amazon ai dive pinterest toyota verizon javascript ai powered vector secrets revealed ai ml mongodb etl live demo nenad image search

Data, Data, and Metadata: Letting ChatGPT Interpret Power BI Output

Raw Data By P3

Play Episode Listen Later Mar 25, 2025 16:17

What happens when you hand off your Power BI output to ChatGPT and ask it to make sense of your world? You might be surprised. This week, Rob shares a deeply personal use case. One that ties together two major themes we've been exploring: Gen AI is reshaping the way we think about dashboards. To get real value out of AI, you need more than just data. You need metadata. And yes, that kind of metadata—the kind you create in Power BI when you translate raw data into something meaningful. Along the way, we revisit the old guard of data warehousing. The mighty (and now dusty?) ETL priesthood. And we uncover a delicious little irony about how the future of data looks a lot like its past, just with better tools and smarter questions. The big twist? We're all ETL now. But the "T" might not mean what you think it does anymore. Listen now to find out how a few rows of carefully modeled data, a table visual, and one really good AI assistant changed the game. For Rob and, just possibly, for all of us. Also in this episode: Blind Melon – Change (YouTube) The Data Warehouse Toolkit Raw Data Episode - The Human Side of Data: Using Analytics for Personal Well-Being

ai chatgpt output interpret genai metadata power bi etl

A Conversation with NAVAC at AHR 2025

HVAC School - For Techs, By Techs

Play Episode Listen Later Mar 7, 2025 46:23

In this engaging episode of the HVAC School Podcast, host Bryan sits down with Jesse from NAVAC to dive deep into the evolving landscape of refrigeration technology, focusing primarily on the transition to A2L refrigerants. The conversation offers a refreshingly pragmatic approach to addressing industry concerns about these new, mildly flammable refrigerants, dispelling myths and providing practical insights for HVAC technicians. The discussion begins by addressing the most pressing question for many technicians: Do you need to buy all new tools to work with A2L refrigerants? Jesse from NAVAC provides a nuanced response, emphasizing that while there are currently no regulations mandating new equipment, the company has proactively developed tools that are safety-certified and compatible with the new refrigerant types. They explore the intricacies of safety certifications like UL and CSA, explaining the differences between UL Listed and UL Verified, and highlighting the importance of intrinsically safe equipment, especially for tools like vacuum pumps and recovery machines. NAVAC's approach goes beyond mere product promotion, with Jesse positioning himself as an educator first. The podcast delves into the technical details of A2L refrigerants, challenging common misconceptions and providing context about their flammability. Bryan and Jesse draw parallels with previous refrigerant transitions, noting how technicians were initially skeptical about R-410A but eventually adapted. They emphasize the importance of best practices, proper training, and understanding the actual risks associated with these new refrigerants, rather than succumbing to fear-based narratives. The episode also showcases NAVAC's latest technological innovations, including smart probes, a Bluetooth scale, a smart valve for charging and recovery, and an advanced vacuum pump with a one-touch oil testing feature. These tools represent the company's commitment to improving technician efficiency and safety, with features that address real-world challenges faced by HVAC professionals. Key Topics Covered: A2L Refrigerants Myths and misconceptions about flammability Comparison with previous refrigerant transitions Safety considerations and best practices Safety Certifications Differences between UL Listed and UL Verified Importance of intrinsically safe equipment CSA and ETL certifications NAVAC's New Tools Smart probes with Bluetooth connectivity Advanced vacuum pump with automatic oil testing Flex manifold with digital accuracy and analog feel Battery-operated pumps with improved run times Industry Trends Preparation for A2L and future refrigerant transitions Regulatory changes and efficiency standards Importance of technician education and adaptation Additional Insights: No current regulations require new tools for A2L refrigerants Proper training and best practices are crucial Technicians should focus on understanding new technologies Safety is about awareness and proper procedures, not fear Have a question that you want us to answer on the podcast? Submit your questions at https://www.speakpipe.com/hvacschool. Purchase your tickets or learn more about the 6th Annual HVACR Training Symposium at https://hvacrschool.com/symposium. Subscribe to our podcast on your iPhone or Android. Subscribe to our YouTube channel. Check out our handy calculators here or on the HVAC School Mobile App for Apple and Android

conversations apple safety iphone android comparison proper batteries flex bluetooth regulatory hvac csa ul technicians etl

Integrating Your CRM With Your Data Warehouse

The Marketing Hero Podcast

Play Episode Listen Later Mar 4, 2025 38:29

Companies have crucial data stored across multiple systems in their organization. And the bigger the company, the more systems there are. Sales, marketing, finance, ERP, inventory, contract management, billing, and service delivery are just types of data and systems that show the story of the business. Many times your CRM just by itself can't show all of that!Because of this, many companies have started setting up a centralized data warehouse like Snowflake or Redshift to pull in data from their CRM and other systems to be able to run more advanced and centralized reporting across it all. But if you want to set this up, where do you start? How do you manage it? How do you protect it? How do you keep it maintained? How do you actually derive value from all the hard work of implementing it?I contacted Ryan Severns to talk through all these questions with him. We talk through all these questions and more, starting with questions like:Why set up a CRM data warehouse infrastructure in the first place?How do you build the integration?What important considerations do you need to plan for?We dig deeper into the details from there, talking through topics like ETL tooling, DBT, data governance, cross-platform analytics, building effective business intelligence systems, and more.If you're ready to level up your customer data skills and advance to the next level of your RevOps hero journey, this episode is for you! Give it a watch and a like. And hit that subscribe button so that you'll always get notified of future episodes of The RevOps Hero Podcast as well.

sales companies crm integrating snowflakes erp dbt etl revops redshift data warehouses

Connect to any data with Shortcuts, Mirroring and Data Factory using Microsoft Fabric

Microsoft Mechanics Podcast

Play Episode Listen Later Mar 3, 2025 16:21 Transcription Available

Easily access and unify your data for analytics and AI—no matter where it lives. With OneLake in Microsoft Fabric, you can connect to data across multiple clouds, databases, and formats without duplication. Use the OneLake catalog to quickly find and interact with your data, and let Copilot in Fabric help you transform and analyze it effortlessly. Eliminate barriers to working with your data using Shortcuts to virtualize external sources and Mirroring to keep databases and warehouses in sync—all without ETL. For deeper integration, leverage Data Factory's 180+ connectors to bring in structured, unstructured, and real-time streaming data at scale. Maraki Ketema from the Microsoft Fabric team shows how to combine these methods, ensuring fast, reliable access to quality data for analytics and AI workloads. ► QUICK LINKS: 00:00 - Access data wherever it lives 00:42 - Microsoft Fabric background 01:17 - Manage data with Microsoft Fabric 03:04 - Low latency 03:34 - How Shortcuts work 06:41 - Mirroring 08:10 - Open mirroring 08:40 - Low friction ways to bring data in 09:32 - Data Factory in Microsoft Fabric 10:52 - Build out your data flow 11:49 - Use built-in AI to ask questions of data 12:56 - OneLake catalog 13:36 - Data security & compliance 15:10 - Additional options to bring data in 15:42 - Wrap up ► Link References Watch our show on Real-Time Intelligence at https://aka.ms/MechanicsRTI Check out Open Mirroring at https://aka.ms/FabricOpenMirroring ► Unfamiliar with Microsoft Mechanics? As Microsoft's official video series for IT, you can watch and share valuable content and demos of current and upcoming tech from the people who build it at Microsoft. • Subscribe to our YouTube: https://www.youtube.com/c/MicrosoftMechanicsSeries • Talk with other IT Pros, join us on the Microsoft Tech Community: https://techcommunity.microsoft.com/t5/microsoft-mechanics-blog/bg-p/MicrosoftMechanicsBlog • Watch or listen from anywhere, subscribe to our podcast: https://microsoftmechanics.libsyn.com/podcast ► Keep getting this insider knowledge, join us on social: • Follow us on Twitter: https://twitter.com/MSFTMechanics • Share knowledge on LinkedIn: https://www.linkedin.com/company/microsoft-mechanics/ • Enjoy us on Instagram: https://www.instagram.com/msftmechanics/ • Loosen up with us on TikTok: https://www.tiktok.com/@msftmechanics

Why Most Reporting Systems Fail, & What to Do Instead, With Ben Zittlau

The Agency Profit Podcast

Play Episode Listen Later Feb 12, 2025 51:15

Points of Interest01:00 – 01:45 – Guest Introduction: Marcel welcomes back his co-founder, Ben Zittlau, highlighting his expertise in data operations and agency growth strategies.03:35 – 05:50 – Scaling Data Operations: Lessons from Jobber: Ben shares insights from his experience at Jobber, detailing the challenges of building and scaling a data operations team in a fast-growing company.08:43 – 13:14 – Why Agencies Struggle to Get Insights from Their Data: Discussion on how agencies collect vast amounts of data across multiple tools but fail to derive meaningful insights due to fragmentation and inconsistency.13:15 – 16:05 – The Myth of a “Single Source of Truth: Marcel and Ben challenge the common belief that pushing all data into a single platform solves reporting issues, highlighting the reality of messy operational data.19:26 – 21:48 – The Limitations of All-in-One Software Solutions: Exploring why all-in-one agency management tools often fail to deliver on their promise of seamless reporting and data integration.24:43 – 27:44 – The Hidden Costs of Locking into a Single Platform: Discussion on how agencies become “trapped” by software providers, making it difficult to switch tools without major operational disruptions.30:29 – 35:53 – How to Integrate Data Without Sacrificing Flexibility: A deep dive into the challenges of stitching data from various tools while maintaining adaptability and historical accuracy.35:54 – 41:14 – Accuracy vs. Precision: Why Clean Data is a Myth: Why agencies should focus on broader trends instead of pursuing impossible data perfection, and how to handle data inconsistencies effectively.41:15 – 44:15 – A Modern Data Approach: Extract, Transform, Load (ETL): Introduction to the ETL process, which allows agencies to clean and transform data before reporting, improving reliability and flexibility.50:14 – 52:00 – Lessons from Finance: What Agencies Can Learn from Accounting: Marcel compares data operations to bookkeeping, explaining how structured financial workflows can serve as a model for better agency data management.Show NotesConnect with Ben via LinkedInFree ToolkitParakeeto Foundations CourseLove the PodcastLeave us a review here.

lessons fail myth transform points reporting limitations accuracy locking business intelligence hidden costs data management jobber business analytics etl data integration agency growth single source data transformation

Episode 284: The Four-Letter Word ETL - Data Movement

SQL Data Partners Podcast

Play Episode Listen Later Feb 4, 2025 58:57

Once you have your data stored in OneLake, you'll be ready to start transforming it to improve it's usability, accuracy, and efficiency. In this episode of the podcast, Belinda Allen takes us on a delightful journey through Data Flows, Power Query, Azure Data Factory (Pipelines), and discusses the merits of shortcuts. We also learn about a handy way to manually upload a table if you have some static data you need to update. There are many tools and techniques that can be used for data ingestion and transformations. And while some of these options we discuss will be up to individual preference, there are pros and cons to each. One of the blessings and curses of Fabric is that there are many ways of achieving the same result, so what you choose may depend on the nature of the data you have and your goals, but might also be dictated by personal experience. We hope you enjoyed this conversation with Belinda on ingesting and transforming data in Microsoft Fabric. If you have questions or comments, please send them our way. We would love to answer your questions on a future episode. Leave us a comment and some love ❤️on LinkedIn, X, Facebook, or Instagram. The show notes for today's episode can be found at Episode 284: The Four-Letter Word ETL - Data Movement. Have fun on the SQL Trail!

data movement fabric four letter word etl data transformation microsoft fabric power query belinda allen

Data Debt

Voice of the DBA

Play Episode Listen Later Jan 31, 2025 3:33

I had never heard of data debt until I saw this article on the topic. In reading it, I couldn't help thinking that most everyone has data debt, it creates inefficiencies, and it's unlikely we'll get rid of it. And by the way, it's too late to get this under control. I somewhat dismissed the article when I saw this: "addressing data debt in its early stages is crucial to ensure that it does not become an overwhelming barrier to progress." I know it's a barrier, as I assume most of you also know, but it's also not stopping us. We keep building more apps, databases, and systems, and accruing more data debt. Somehow, most organizations keep running. The description of debt might help here. How many of you have inconsistent data standards, where you might define a data element differently in different databases? Maybe you have duplicated data that is slow to update (think ETL/warehouses), maybe you have different ways of tracking a completed sale in different systems. Maybe you even store dates in different formats (int, string, or something weirder). How many of you lack some documentation on what the columns in your databases mean? Maybe I should ask the reverse, where the few of you who have complete data dictionaries can raise your hands. Read the rest of Data Debt

data debt etl

235: Wrapping Up 2024 with Types

Thinking Elixir Podcast

Play Episode Listen Later Jan 7, 2025 26:55

News includes the official release of Elixir 1.18.0 with enhanced type system support, José Valim's retrospective on Elixir's progress in 2024, LiveView Native's significant v0.4.0-rc.0 release with a new networking stack, ExDoc v0.36's introduction of swup.js for smoother page navigations, the announcement of a new Elixir conference called Goatmire in Sweden, and more! Show Notes online - http://podcast.thinkingelixir.com/235 (http://podcast.thinkingelixir.com/235) Elixir Community News https://elixir-lang.org/blog/2024/12/19/elixir-v1-18-0-released/ (https://elixir-lang.org/blog/2024/12/19/elixir-v1-18-0-released/?utm_source=thinkingelixir&utm_medium=shownotes) – Official Elixir 1.18.0 release announcement https://github.com/elixir-lang/elixir/blob/v1.18/CHANGELOG.md (https://github.com/elixir-lang/elixir/blob/v1.18/CHANGELOG.md?utm_source=thinkingelixir&utm_medium=shownotes) – Changelog for Elixir 1.18.0 release https://bsky.app/profile/david.bernheisel.com/post/3leetmgvihk2a (https://bsky.app/profile/david.bernheisel.com/post/3leetmgvihk2a?utm_source=thinkingelixir&utm_medium=shownotes) – Details about upcoming Elixir 1.19 type checking capabilities for protocols https://bsky.app/profile/josevalim.bsky.social/post/3ldyphlun4c2z (https://bsky.app/profile/josevalim.bsky.social/post/3ldyphlun4c2z?utm_source=thinkingelixir&utm_medium=shownotes) – José Valim's retrospective on Elixir's progress in 2024, highlighting type system improvements and project releases https://github.com/liveview-native/liveviewnative/releases (https://github.com/liveview-native/live_view_native/releases?utm_source=thinkingelixir&utm_medium=shownotes) – LiveView Native v0.4.0-rc.0 release announcement https://x.com/liveviewnative/status/1869081462659809771 (https://x.com/liveviewnative/status/1869081462659809771?utm_source=thinkingelixir&utm_medium=shownotes) – Twitter announcement about LiveView Native release https://github.com/liveview-native/liveviewnative/blob/main/CHANGELOG.md (https://github.com/liveview-native/live_view_native/blob/main/CHANGELOG.md?utm_source=thinkingelixir&utm_medium=shownotes) – Changelog for LiveView Native v0.4.0-rc.0 https://bsky.app/profile/josevalim.bsky.social/post/3le25qqcfh22x (https://bsky.app/profile/josevalim.bsky.social/post/3le25qqcfh22x?utm_source=thinkingelixir&utm_medium=shownotes) – ExDoc v0.36 release announcement introducing swup.js for navigation https://github.com/swup/swup (https://github.com/swup/swup?utm_source=thinkingelixir&utm_medium=shownotes) – Swup.js GitHub repository https://swup.js.org/ (https://swup.js.org/?utm_source=thinkingelixir&utm_medium=shownotes) – Swup.js documentation https://swup.js.org/getting-started/demos/ (https://swup.js.org/getting-started/demos/?utm_source=thinkingelixir&utm_medium=shownotes) – Swup.js demos showing page transition capabilities https://github.com/hexpm/hexdocs/pull/44 (https://github.com/hexpm/hexdocs/pull/44?utm_source=thinkingelixir&utm_medium=shownotes) – Pull request for cross-package function search in ExDoc using Typesense https://github.com/elixir-lang/ex_doc/issues/1811 (https://github.com/elixir-lang/ex_doc/issues/1811?utm_source=thinkingelixir&utm_medium=shownotes) – Related issue for cross-package function search feature https://bsky.app/profile/tylerayoung.com/post/3lejnfttgok2u (https://bsky.app/profile/tylerayoung.com/post/3lejnfttgok2u?utm_source=thinkingelixir&utm_medium=shownotes) – Announcement of parameterized_test v0.6.0 with improved failure messages https://hexdocs.pm/phoenix_test/changelog.html#0-5-1 (https://hexdocs.pm/phoenix_test/changelog.html#0-5-1?utm_source=thinkingelixir&utm_medium=shownotes) – phoenix_test v0.5.1 changelog with new assertion helpers https://x.com/germsvel/status/1873732271611469976 (https://x.com/germsvel/status/1873732271611469976?utm_source=thinkingelixir&utm_medium=shownotes) – Twitter announcement about phoenix_test updates https://x.com/ElixirConf/status/1873445096773111848 (https://x.com/ElixirConf/status/1873445096773111848?utm_source=thinkingelixir&utm_medium=shownotes) – Announcement of new ElixirConf US 2024 videos https://www.youtube.com/playlist?list=PLqj39LCvnOWbW2Zli4LurDGc6lL5ij-9Y (https://www.youtube.com/playlist?list=PLqj39LCvnOWbW2Zli4LurDGc6lL5ij-9Y?utm_source=thinkingelixir&utm_medium=shownotes) – YouTube playlist of ElixirConf US 2024 talks https://x.com/TylerAYoung/status/1873798040525693040 (https://x.com/TylerAYoung/status/1873798040525693040?utm_source=thinkingelixir&utm_medium=shownotes) – Recommendation for David's ETL talk at ElixirConf https://goatmire.com/ (https://goatmire.com/?utm_source=thinkingelixir&utm_medium=shownotes) – New Elixir conference "Goatmire" announced in Sweden https://bsky.app/profile/lawik.bsky.social/post/3ldougsbvhk2s (https://bsky.app/profile/lawik.bsky.social/post/3ldougsbvhk2s?utm_source=thinkingelixir&utm_medium=shownotes) – Lars Wikman's announcement about Goatmire conference Do you have some Elixir news to share? Tell us at @ThinkingElixir (https://twitter.com/ThinkingElixir) or email at show@thinkingelixir.com (mailto:show@thinkingelixir.com) Find us online - Message the show - Bluesky (https://bsky.app/profile/thinkingelixir.com) - Message the show - X (https://x.com/ThinkingElixir) - Message the show on Fediverse - @ThinkingElixir@genserver.social (https://genserver.social/ThinkingElixir) - Email the show - show@thinkingelixir.com (mailto:show@thinkingelixir.com) - Mark Ericksen on X - @brainlid (https://x.com/brainlid) - Mark Ericksen on Bluesky - @brainlid.bsky.social (https://bsky.app/profile/brainlid.bsky.social) - Mark Ericksen on Fediverse - @brainlid@genserver.social (https://genserver.social/brainlid) - David Bernheisel on Bluesky - @david.bernheisel.com (https://bsky.app/profile/david.bernheisel.com) - David Bernheisel on Fediverse - @dbern@genserver.social (https://genserver.social/dbern)

news sweden types recommendations blue sky conferences wrapping up announcement github 2024 elixir etl changelog elixirconf lars wikman

Dave Yaffe and Johnny Graettinger (Estuary) - Data Integration, Running Startups While Raising Kids, and More

The Joe Reis Show

Play Episode Listen Later Dec 10, 2024 55:54

Dave and Johnny run Estuary, a data integration company focused on real-time ETL and ELT. We're also friends, so we decided to have a chat. In this episode, we chat about the current state of the data integration space, running a startup while raising kids, and much more. Estuary

running startups raising kids elt etl data integration estuary yaffe

Beyond ETL: How Snow Leopard Connects AI, Agents, and Live Data

The Data Exchange with Ben Lorica

Play Episode Listen Later Dec 5, 2024 43:33

Deepti Srivastava is the Founder and CEO of Snow Leopard. We dive into Snow Leopard's innovative approach to data integration, exploring its live data access model that bypasses traditional ETL pipelines to offer real-time data retrieval directly from source systems.Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.

ceo founders data detailed connects etl snow leopards

Why Compound AI + Open Source will beat Closed AI

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Nov 25, 2024 58:25

We have a full slate of upcoming events: AI Engineer London, AWS Re:Invent in Las Vegas, and now Latent Space LIVE! at NeurIPS in Vancouver and online. Sign up to join and speak!We are still taking questions for our next big recap episode! Submit questions and messages on Speakpipe here for a chance to appear on the show!We try to stay close to the inference providers as part of our coverage, as our podcasts with Together AI and Replicate will attest: However one of the most notable pull quotes from our very well received Braintrust episode was his opinion that open source model adoption has NOT gone very well and is actually declining in relative market share terms (it is of course increasing in absolute terms):Today's guest, Lin Qiao, would wholly disagree. Her team of Pytorch/GPU experts are wholly dedicated toward helping you serve and finetune the full stack of open source models from Meta and others, across all modalities (Text, Audio, Image, Embedding, Vision-understanding), helping customers like Cursor and Hubspot scale up open source model inference both rapidly and affordably.Fireworks has emerged after its successive funding rounds with top tier VCs as one of the leaders of the Compound AI movement, a term first coined by the Databricks/Mosaic gang at Berkeley AI and adapted as “Composite AI” by Gartner:Replicating o1We are the first podcast to discuss Fireworks' f1, their proprietary replication of OpenAI's o1. This has become a surprisingly hot area of competition in the past week as both Nous Forge and Deepseek r1 have launched competitive models.Full Video PodcastLike and subscribe!Timestamps* 00:00:00 Introductions* 00:02:08 Pre-history of Fireworks and PyTorch at Meta* 00:09:49 Product Strategy: From Framework to Model Library* 00:13:01 Compound AI Concept and Industry Dynamics* 00:20:07 Fireworks' Distributed Inference Engine* 00:22:58 OSS Model Support and Competitive Strategy* 00:29:46 Declarative System Approach in AI* 00:31:00 Can OSS replicate o1?* 00:36:51 Fireworks f1* 00:41:03 Collaboration with Cursor and Speculative Decoding* 00:46:44 Fireworks quantization (and drama around it)* 00:49:38 Pricing Strategy* 00:51:51 Underrated Features of Fireworks Platform* 00:55:17 HiringTranscriptAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner at CTO at Danceable Partners, and I'm joined by my co-host, Swyx founder, Osmalayar.Swyx [00:00:11]: Hey, and today we're in a very special studio inside the Fireworks office with Lin Qiang, CEO of Fireworks. Welcome. Yeah.Lin [00:00:20]: Oh, you should welcome us.Swyx [00:00:21]: Yeah, welcome. Yeah, thanks for having us. It's unusual to be in the home of a startup, but it's also, I think our relationship is a bit unusual compared to all our normal guests. Definitely.Lin [00:00:34]: Yeah. I'm super excited to talk about very interesting topics in that space with both of you.Swyx [00:00:41]: You just celebrated your two-year anniversary yesterday.Lin [00:00:43]: Yeah, it's quite a crazy journey. We circle around and share all the crazy stories across these two years, and it has been super fun. All the way from we experienced Silicon Valley bank run to we delete some data that shouldn't be deleted operationally. We went through a massive scale where we actually are busy getting capacity to, yeah, we learned to kind of work with it as a team with a lot of brilliant people across different places to join a company. It has really been a fun journey.Alessio [00:01:24]: When you started, did you think the technical stuff will be harder or the bank run and then the people side? I think there's a lot of amazing researchers that want to do companies and it's like the hardest thing is going to be building the product and then you have all these different other things. So, were you surprised by what has been your experience the most?Lin [00:01:42]: Yeah, to be honest with you, my focus has always been on the product side and then after the product goes to market. And I didn't realize the rest has been so complicated, operating a company and so on. But because I don't think about it, I just kind of manage it. So it's done. I think I just somehow don't think about it too much and solve whatever problem coming our way and it worked.Swyx [00:02:08]: So let's, I guess, let's start at the pre-history, the initial history of Fireworks. You ran the PyTorch team at Meta for a number of years and we previously had Sumit Chintal on and I think we were just all very interested in the history of GenEI. Maybe not that many people know how deeply involved Faire and Meta were prior to the current GenEI revolution.Lin [00:02:35]: My background is deep in distributed system, database management system. And I joined Meta from the data side and I saw this tremendous amount of data growth, which cost a lot of money and we're analyzing what's going on. And it's clear that AI is driving all this data generation. So it's a very interesting time because when I joined Meta, Meta is going through ramping down mobile-first, finishing the mobile-first transition and then starting AI-first. And there's a fundamental reason about that sequence because mobile-first gave a full range of user engagement that has never existed before. And all this user engagement generated a lot of data and this data power AI. So then the whole entire industry is also going through, falling through this same transition. When I see, oh, okay, this AI is powering all this data generation and look at where's our AI stack. There's no software, there's no hardware, there's no people, there's no team. I want to dive up there and help this movement. So when I started, it's very interesting industry landscape. There are a lot of AI frameworks. It's a kind of proliferation of AI frameworks happening in the industry. But all the AI frameworks focus on production and they use a very certain way of defining the graph of neural network and then use that to drive the model iteration and productionization. And PyTorch is completely different. So they could also assume that he was the user of his product. And he basically says, researchers face so much pain using existing AI frameworks, this is really hard to use and I'm going to do something different for myself. And that's the origin story of PyTorch. PyTorch actually started as the framework for researchers. They don't care about production at all. And as they grow in terms of adoption, so the interesting part of AI is research is the top of our normal production. There are so many researchers across academic, across industry, they innovate and they put their results out there in open source and that power the downstream productionization. So it's brilliant for MATA to establish PyTorch as a strategy to drive massive adoption in open source because MATA internally is a PyTorch shop. So it creates a flying wheel effect. So that's kind of a strategy behind PyTorch. But when I took on PyTorch, it's kind of at Caspo, MATA established PyTorch as the framework for both research and production. So no one has done that before. And we have to kind of rethink how to architect PyTorch so we can really sustain production workload, the stability, reliability, low latency, all this production concern was never a concern before. Now it's a concern. And we actually have to adjust its design and make it work for both sides. And that took us five years because MATA has so many AI use cases, all the way from ranking recommendation as powering the business top line or as ranking newsfeed, video ranking to site integrity detect bad content automatically using AI to all kinds of effects, translation, image classification, object detection, all this. And also across AI running on the server side, on mobile phones, on AI VR devices, the wide spectrum. So by the time we actually basically managed to support AI across ubiquitous everywhere across MATA. But interestingly, through open source engagement, we work with a lot of companies. It is clear to us like this industry is starting to take on AI first transition. And of course, MATA's hyperscale always go ahead of industry. And it feels like when we start this AI journey at MATA, there's no software, no hardware, no team. For many companies we engage with through PyTorch, we feel the pain. That's the genesis why we feel like, hey, if we create fireworks and support industry going through this transition, it will be a huge amount of impact. Of course, the problem that the industry is facing will not be the same as MATA. MATA is so big, right? So it's kind of skewed towards extreme scale and extreme optimization in the industry will be different. But we feel like we have the technical chop and we've seen a lot. We'll look to kind of drive that. So yeah, so that's how we started.Swyx [00:06:58]: When you and I chatted about the origins of fireworks, it was originally envisioned more as a PyTorch platform, and then later became much more focused on generative AI. Is that fair to say? What was the customer discovery here?Lin [00:07:13]: Right. So I would say our initial blueprint is we should build a PyTorch cloud because a PyTorch library and there's no SaaS platform to enable AI workloads.Swyx [00:07:26]: Even in 2022, it's interesting.Lin [00:07:28]: I would not say absolutely no, but cloud providers have some of those, but it's not first class citizen, right? At 2022, there's still like TensorFlow is massively in production. And this is all pre-gen AI, and PyTorch is kind of getting more and more adoption. But there's no PyTorch-first SaaS platform existing. At the same time, we are also a very pragmatic set of people. We really want to make sure from the get-go, we get really, really close to customers. We understand their use case, we understand their pain points, we understand the value we deliver to them. So we want to take a different approach instead of building a horizontal PyTorch cloud. We want to build a verticalized platform first. And then we talk with many customers. And interestingly, we started the company in September 2022, and in October, November, the OpenAI announced ChatGPT. And then boom, when we talked with many customers, they were like, can you help us work on the JNS aspect? So of course, there are some open source models. It's not as good at that time, but people are already putting a lot of attention there. Then we decided that if we're going to pick a vertical, we're going to pick JNI. The other reason is all JNI models are PyTorch models. So that's another reason. We believe that because of the nature of JNI, it's going to generate a lot of human consumable content. It will drive a lot of consumer, customer-developer-facing application and product innovation. Guaranteed. We're just at the beginning of this. Our prediction is for those kind of applications, the inference is much more important than training because inference scale is proportional to the up-limit award population. And training scale is proportional to the number of researchers. Of course, each training round could be very expensive. Although PyTorch supports both inference and training, we decided to laser focus on inference. So yeah, so that's how we got started. And we launched our public platform August last year. When we launched, it was a single product. It's a distributed inference engine with a simple API, open AI compatible API with many models. We started with LM and then we added a lot of models. Fast forward to now, we are a full platform with multiple product lines. So we love to kind of dive deep into what we offer. But that's a very fun journey in the past two years.Alessio [00:09:49]: What was the transition from you start to focus on PyTorch and people want to understand the framework, get it live. And now say maybe most people that use you don't even really know much about PyTorch at all. You know, they're just trying to consume a model. From a product perspective, like what were some of the decisions early on? Like right in October, November, you were just like, hey, most people just care about the model, not about the framework. We're going to make it super easy or was it more a gradual transition to the model librarySwyx [00:10:16]: you have today?Lin [00:10:17]: Yeah. So our product decision is all based on who is our ICP. And one thing I want to acknowledge here is the generic technology is disruptive. It's very different from AI before GNI. So it's a clear leap forward. Because before GNI, the companies that want to invest in AI, they have to train from scratch. There's no other way. There's no foundation model. It doesn't exist. So that means then to start a team, first hire a team who is capable of crunch data. There's a lot of data to crunch, right? Because training from scratch, you have to prepare a lot of data. And then they need to have GPUs to train, and then you start to manage GPUs. So then it becomes a very complex project. It takes a long time and not many companies can afford it, actually. And the GNI is a very different game right now, because it is a foundation model. So you don't have to train anymore. That makes AI much more accessible as a technology. As an app developer or product manager, even, not a developer, they can interact with GNI models directly. So our goal is to make AI accessible to all app developers and product engineers. That's our goal. So then getting them into the building model doesn't make any sense anymore with this new technology. And then building easy, accessible APIs is the most important. Early on, when we got started, we decided we're going to be open AI compatible. It's just kind of very easy for developers to adopt this new technology, and we will manage the underlying complexity of serving all these models.Swyx [00:11:56]: Yeah, open AI has become the standard. Even as we're recording today, Gemini announced that they have open AI compatible APIs. Interesting. So we just need to drop it all in line, and then we have everyone popping in line.Lin [00:12:09]: That's interesting, because we are working very closely with Meta as one of the partners. Meta, of course, is kind of very generous to donate many very, very strong open source models, expecting more to come. But also they have announced LamaStack, which is basically standardized, the upper level stack built on top of Lama models. So they don't just want to give out models and you figure out what the upper stack is. They instead want to build a community around the stack and build a new standard. I think there's an interesting dynamics in play in the industry right now, when it's more standardized across open AI, because they are kind of creating the top of the funnel, or standardized across Lama, because this is the most used open source model. So I think it's a lot of fun working at this time.Swyx [00:13:01]: I've been a little bit more doubtful on LamaStack, I think you've been more positive. Basically it's just like the meta version of whatever Hugging Face offers, you know, or TensorRT, or BLM, or whatever the open source opportunity is. But to me, it's not clear that just because Meta open sources Lama, that the rest of LamaStack will be adopted. And it's not clear why I should adopt it. So I don't know if you agree.Lin [00:13:27]: It's very early right now. That's why I kind of work very closely with them and give them feedback. The feedback to the meta team is very important. So then they can use that to continue to improve the model and also improve the higher level I think the success of LamaStack heavily depends on the community adoption. And there's no way around it. And I know the meta team would like to kind of work with a broader set of community. But it's very early.Swyx [00:13:52]: One thing that after your Series B, so you raced for Benchmark, and then Sequoia. I remember being close to you for at least your Series B announcements, you started betting heavily on this term of Compound AI. It's not a term that we've covered very much in the podcast, but I think it's definitely getting a lot of adoption from Databricks and Berkeley people and all that. What's your take on Compound AI? Why is it resonating with people?Lin [00:14:16]: Right. So let me give a little bit of context why we even consider that space.Swyx [00:14:22]: Because like pre-Series B, there was no message, and now it's like on your landing page.Lin [00:14:27]: So it's kind of very organic evolution from when we first launched our public platform, we are a single product. We are a distributed inference engine, where we do a lot of innovation, customized KUDA kernels, raw kernel kernels, running on different kinds of hardware, and build distributed disaggregated execution, inference execution, build all kinds of caching. So that is one. So that's kind of one product line, is the fast, most cost-efficient inference platform. Because we wrote PyTorch code, we know we basically have a special PyTorch build for that, together with a custom kernel we wrote. And then we worked with many more customers, we realized, oh, the distributed inference engine, our design is one size fits all. We want to have this inference endpoint, then everyone come in, and no matter what kind of form and shape or workload they have, it will just work for them. So that's great. But the reality is, we realized all customers have different kinds of use cases. The use cases come in all different forms and shapes. And the end result is the data distribution in their inference workload doesn't align with the data distribution in the training data for the model. It's a given, actually. If you think about it, because researchers have to guesstimate what is important, what's not important in preparing data for training. So because of that misalignment, then we leave a lot of quality, latency, cost improvement on the table. So then we're saying, OK, we want to heavily invest in a customization engine. And we actually announced it called FHIR Optimizer. So FHIR Optimizer basically helps users navigate a three-dimensional optimization space across quality, latency, and cost. So it's a three-dimensional curve. And even for one company, for different use cases, they want to land in different spots. So we automate that process for our customers. It's very simple. You have your inference workload. You inject into the optimizer along with the objective function. And then we spit out inference deployment config and the model setup. So it's your customized setup. So that is a completely different product. So that product thinking is one size fits all. And now on top of that, we provide a huge variety of state-of-the-art models, hundreds of them, varying from text to large state-of-the-art English models. That's where we started. And as we talk with many customers, we realize, oh, audio and text are very, very close. Many of our customers start to build assistants, all kinds of assistants using text. And they immediately want to add audio, audio in, audio out. So we support transcription, translation, speech synthesis, text, audio alignment, all different kinds of audio features. It's a big announcement. You should have heard by the time this is out. And the other areas of vision and text are very close with each other. Because a lot of information doesn't live in plain text. A lot of information lives in multimedia format, images, PDFs, screenshots, and many other different formats. So oftentimes to solve a problem, we need to put the vision model first to extract information and then use language model to process and then send out results. So vision is important. We also support vision model, various different kinds of vision models specialized in processing different kinds of source and extraction. And we're also going to have another announcement of a new API endpoint we'll support for people to upload various different kinds of multimedia content and then get the extract very accurate information out and feed that into LM. And of course, we support embedding because embedding is very important for semantic search, for RAG, and all this. And in addition to that, we also support text-to-image, image generation models, text-to-image, image-to-image, and we're adding text-to-video as well in our portfolio. So it's a very comprehensive set of model catalog that built on top of File Optimizer and Distributed Inference Engine. But then we talk with more customers, they solve business use case, and then we realize one model is not sufficient to solve their problem. And it's very clear because one is the model hallucinates. Many customers, when they onboard this JNI journey, they thought this is magical. JNI is going to solve all my problems magically. But then they realize, oh, this model hallucinates. It hallucinates because it's not deterministic, it's probabilistic. So it's designed to always give you an answer, but based on probabilities, so it hallucinates. And that's actually sometimes a feature for creative writing, for example. Sometimes it's a bug because, hey, you don't want to give misinformation. And different models also have different specialties. To solve a problem, you want to ask different special models to kind of decompose your task into multiple small tasks, narrow tasks, and then have an expert model solve that task really well. And of course, the model doesn't have all the information. It has limited knowledge because the training data is finite, not infinite. So the model oftentimes doesn't have real-time information. It doesn't know any proprietary information within the enterprise. It's clear that in order to really build a compiling application on top of JNI, we need a compound AI system. Compound AI system basically is going to have multiple models across modalities, along with APIs, whether it's public APIs, internal proprietary APIs, storage systems, database systems, knowledge to work together to deliver the best answer.Swyx [00:20:07]: Are you going to offer a vector database?Lin [00:20:09]: We actually heavily partner with several big vector database providers. Which is your favorite? They are all great in different ways. But it's public information, like MongoDB is our investor. And we have been working closely with them for a while.Alessio [00:20:26]: When you say distributed inference engine, what do you mean exactly? Because when I hear your explanation, it's almost like you're centralizing a lot of the decisions through the Fireworks platform on the quality and whatnot. What do you mean distributed? It's like you have GPUs in a lot of different clusters, so you're sharding the inference across the same model.Lin [00:20:45]: So first of all, we run across multiple GPUs. But the way we distribute across multiple GPUs is unique. We don't distribute the whole model monolithically across multiple GPUs. We chop them into pieces and scale them completely differently based on what's the bottleneck. We also are distributed across regions. We have been running in North America, EMEA, and Asia. We have regional affinity to applications because latency is extremely important. We are also doing global load balancing because a lot of applications there, they quickly scale to global population. And then at that scale, different content wakes up at a different time. And you want to kind of load balancing across. So all the way, and we also have, we manage various different kinds of hardware skew from different hardware vendors. And different hardware design is best for different types of workload, whether it's long context, short context, long generation. So all these different types of workload is best fitted for different kinds of hardware skew. And then we can even distribute across different hardware for a workload. So the distribution actually is all around in the full stack.Swyx [00:22:02]: At some point, we'll show on the YouTube, the image that Ray, I think, has been working on with all the different modalities that you offer. To me, it's basically you offer the open source version of everything that OpenAI typically offers. I don't think there is. Actually, if you do text to video, you will be a superset of what OpenAI offers because they don't have Sora. Is that Mochi, by the way? Mochi. Mochi, right?Lin [00:22:27]: Mochi. And there are a few others. I will say, the interesting thing is, I think we're betting on the open source community is going to proliferate. This is literally what we're seeing. And there's amazing video generation companies. There is amazing audio companies. Like cross-border, the innovation is off the chart, and we are building on top of that. I think that's the advantage we have compared with a closed source company.Swyx [00:22:58]: I think I want to restate the value proposition of Fireworks for people who are comparing you versus a raw GPU provider like a RunPod or Lambda or anything like those, which is like you create the developer experience layer and you also make it easily scalable or serverless or as an endpoint. And then, I think for some models, you have custom kernels, but not all models.Lin [00:23:25]: Almost for all models. For all large language models, all your models, and the VRMs. Almost for all models we serve.Swyx [00:23:35]: And so that is called Fire Attention. I don't remember the speed numbers, but apparently much better than VLM, especially on a concurrency basis.Lin [00:23:44]: So Fire Attention is specific mostly for language models, but for other modalities, we'll also have a customized kernel.Swyx [00:23:51]: And I think the typical challenge for people is understanding that has value, and then there are other people who are also offering open-source models. Your mode is your ability to offer a good experience for all these customers. But if your existence is entirely reliant on people releasing nice open-source models, other people can also do the same thing.Lin [00:24:14]: So I would say we build on top of open-source model foundation. So that's the kind of foundation we build on top of. But we look at the value prop from the lens of application developers and product engineers. So they want to create new UX. So what's happening in the industry right now is people are thinking about a completely new way of designing products. And I'm talking to so many founders, it's just mind-blowing. They help me understand existing way of doing PowerPoint, existing way of coding, existing way of managing customer service. It's actually putting a box in our head. For example, PowerPoint. So PowerPoint generation is we always need to think about how to fit into my storytelling into this format of slide one after another. And I'm going to juggle through design together with what story to tell. But the most important thing is what's our storytelling lines, right? And why don't we create a space that is not limited to any format? And those kind of new product UX design combined with automated content generation through Gen AI is the new thing that many founders are doing. What are the challenges they're facing? Let's go from there. One is, again, because a lot of products built on top of Gen AI, they are consumer-personal developer facing, and they require interactive experience. It's just a kind of product experience we all get used to. And our desire is to actually get faster and faster interaction. Otherwise, nobody wants to spend time, right? And then that requires low latency. And the other thing is the nature of consumer-personal developer facing is your audience is very big. You want to scale up to product market fit quickly. But if you lose money at a small scale, you're going to bankrupt quickly. So it's actually a big contrast. I actually have product market fit, but when I scale, I scale out of my business. So that's kind of a very funny way to think about it. So then having low latency and low cost is essential for those new applications and products to survive and really become a generation company. So that's the design point for our distributed inference engine and the file optimizer. File optimizer, you can think about that as a feedback loop. The more you feed your inference workload to our inference engine, the more we help you improve quality, lower latency further, lower your cost. It basically becomes better. And we automate that because we don't want you as an app developer or product engineer to think about how to figure out all these low-level details. It's impossible because you're not trained to do that at all. You should kind of keep your focus on the product innovation. And then the compound AI, we actually feel a lot of pain as the app developers, engineers, there are so many models. Every week, there's at least a new model coming out.Swyx [00:27:09]: Tencent had a giant model this week. Yeah, yeah.Lin [00:27:13]: I saw that. I saw that.Swyx [00:27:15]: It's like $500 billion.Lin [00:27:18]: So they're like, should I keep chasing this or should I forget about it? And which model should I pick to solve what kind of sub-problem? How do I even decompose my problem into those smaller problems and fit the model into it? I have no idea. And then there are two ways to think about this design. I think I talked about that in the past. One is imperative, as in you figure out how to do it. You give developer tools to dictate how to do it. Or you build a declarative system where a developer tells what they want to do, not how. So these are completely two different designs. So the analogy I want to draw is, in the data world, the database management system is a declarative system because people use database, use SQL. SQL is a way you say, what do you want to extract out of a database? What kind of result do you want? But you don't figure out which node is going to, how many nodes you're going to run on top of, how you redefine your disk, which index you use, which project. You don't need to worry about any of those. And database management system will figure out, generate a new best plan, and execute on that. So database is declarative. And it makes it super easy. You just learn SQL, which is learn a semantic meaning of SQL, and you can use it. Imperative side is there are a lot of ETL pipelines. And people design this DAG system with triggers, with actions, and you dictate exactly what to do. And if it fails, then how to recover. So that's an imperative system. We have seen a range of systems in the ecosystem go different ways. I think there's value of both. There's value of both. I don't think one is going to subsume the other. But we are leaning more into the philosophy of the declarative system. Because from the lens of app developer and product engineer, that would be easiest for them to integrate.Swyx [00:29:07]: I understand that's also why PyTorch won as well, right? This is one of the reasons. Ease of use.Lin [00:29:14]: Focus on ease of use, and then let the system take on the hard challenges and complexities. So we follow, we extend that thinking into current system design. So another announcement is we will also announce our next declarative system is going to appear as a model that has extremely high quality. And this model is inspired by Owen's announcement for OpenAI. You should see that by the time we announce this or soon.Alessio [00:29:46]: Trained by you.Lin [00:29:47]: Yes.Alessio [00:29:48]: Is this the first model that you trained? It's not the first.Lin [00:29:52]: We actually have trained a model called FireFunction. It's a function calling model. It's our first step into compound AI system. Because function calling model can dispatch a request into multiple APIs. We have pre-baked set of APIs the model learned. You can also add additional APIs through the configuration to let model dispatch accordingly. So we have a very high quality function calling model that's already released. We have actually three versions. The latest version is very high quality. But now we take a further step that you don't even need to use function calling model. You use our new model we're going to release. It will solve a lot of problems approaching very high OpenAI quality. So I'm very excited about that.Swyx [00:30:41]: Do you have any benchmarks yet?Lin [00:30:43]: We have a benchmark. We're going to release it hopefully next week. We just put our model to LMSYS and people are guessing. Is this the next Gemini model or a MADIS model? People are guessing. That's very interesting. We're watching the Reddit discussion right now.Swyx [00:31:00]: I have to ask more questions about this. When OpenAI released o1, a lot of people asked about whether or not it's a single model or whether it's a chain of models. Noam and basically everyone on the Strawberry team was very insistent that what they did for reinforcement learning, chain of thought, cannot be replicated by a whole bunch of open source model calls. Do you think that that is wrong? Have you done the same amount of work on RL as they have or was it a different direction?Lin [00:31:29]: I think they take a very specific approach where the caliber of team is very high. So I do think they are the domain expert in doing the things they are doing. I don't think there's only one way to achieve the same goal. We're on the same direction in the sense that the quality scaling law is shifting from training to inference. For that, I fully agree with them. But we're taking a completely different approach to the problem. All of that is because, of course, we didn't train the model from scratch. All of that is because we built on the show of giants. The current model available we have access to is getting better and better. The future trend is the gap between the open source model and the co-source model. It's just going to shrink to the point there's not much difference. And then we're on the same level field. That's why I think our early investment in inference and all the work we do around balancing across quality, latency, and cost pay off because we have accumulated a lot of experience and that empowers us to release this new model that is approaching open-ended quality.Alessio [00:32:39]: I guess the question is, what do you think the gap to catch up will be? Because I think everybody agrees with open source models eventually will catch up. And I think with 4, then with Lama 3.2, 3.1, 4.5b, we close the gap. And then 0.1 just reopened the gap so much and it's unclear. Obviously, you're saying your model will have...Swyx [00:32:57]: We're closing that gap.Alessio [00:32:58]: But you think in the future, it's going to be months?Lin [00:33:02]: So here's the thing that's happened. There's public benchmark. It is what it is. But in reality, open source models in certain dimensions are already on par or beat closed source models. So for example, in the coding space, open source models are really, really good. And in function calling, file function is also really, really good. So it's all a matter of whether you build one model to solve all the problems and you want to be the best of solving all the problems, or in the open source domain, it's going to specialize. All these different model builders specialize in certain narrow area. And it's logical that they can be really, really good in that very narrow area. And that's our prediction is with specialization, there will be a lot of expert models really, really good and even better than one-size-fits-all closed source models.Swyx [00:33:55]: I think this is the core debate that I am still not 100% either way on in terms of compound AI versus normal AI. Because you're basically fighting the bitter lesson.Lin [00:34:09]: Look at the human society, right? We specialize. And you feel really good about someone specializing doing something really well, right? And that's how our way evolved from ancient times. We're all journalists. We do everything. Now we heavily specialize in different domains. So my prediction is in the AI model space, it will happen also. Except for the bitter lesson.Swyx [00:34:30]: You get short-term gains by having specialists, domain specialists, and then someone just needs to train like a 10x bigger model on 10x more inference, 10x more data, 10x more model perhaps, whatever the current scaling law is. And then it supersedes all the individual models because of some generalized intelligence slash world knowledge. I think that is the core insight of the GPTs, the GPT-123 networks. Right.Lin [00:34:56]: But the training scaling law is because you have an increasing amount of data to train from. And you can do a lot of compute. So I think on the data side, we're approaching the limit. And the only data to increase that is synthetic generated data. And then there's like what is the secret sauce there, right? Because if you have a very good large model, you can generate very good synthetic data and then continue to improve quality. So that's why I think in OpenAI, they are shifting from the training scaling law intoSwyx [00:35:25]: inference scaling law.Lin [00:35:25]: And it's the test time and all this. So I definitely believe that's the future direction. And that's where we are really good at, doing inference.Swyx [00:35:34]: A couple of questions on that. Are you planning to share your reasoning choices?Lin [00:35:39]: That's a very good question. We are still debating.Swyx [00:35:43]: Yeah.Lin [00:35:45]: We're still debating.Swyx [00:35:46]: I would say, for example, it's interesting that, for example, SweetBench. If you want to be considered for ranking, you have to submit your reasoning choices. And that has actually disqualified some of our past guests. Cosign was doing well on SweetBench, but they didn't want to leak those results. So that's why you don't see O1 preview on SweetBench, because they don't submit their reasoning choices. And obviously, it's IP. But also, if you're going to be more open, then that's one way to be more open. So your model is not going to be open source, right? It's going to be an endpoint that you provide. Okay, cool. And then pricing, also the same as OpenAI, just kind of based on...Lin [00:36:25]: Yeah, this is... I don't have, actually, information. Everything is going so fast, we haven't even thought about that yet. Yeah, I should be more prepared.Swyx [00:36:33]: I mean, this is live. You know, it's nice to just talk about it as it goes live. Any other things that you want feedback on or you're thinking through? It's kind of nice to just talk about something when it's not decided yet. About this new model. It's going to be exciting. It's going to generate a lot of buzz. Right.Lin [00:36:51]: I'm very excited to see how people are going to use this model. So there's already a Reddit discussion about it. And people are asking very deep, mathematical questions. And since the model got it right, surprising. And internally, we're also asking the model to generate what is AGI. And it generates a very complicated DAG thinking process. So we're having a lot of fun testing this internally. But I'm more curious, how will people use it? What kind of application they're going to try and test on it? And that's where we really like to hear feedback from the community. And also feedback to us. What works out well? What doesn't work out well? What works out well, but surprising them? And what kind of thing they think we should improve on? And those kind of feedback will be tremendously helpful.Swyx [00:37:44]: Yeah. So I've been a production user of Preview and Mini since launch. I would say they're very, very obvious jobs in quality. So much so that they made clods on it. And they made the previous state-of-the-art look bad. It's really that stark, that difference. The number one thing, just feedback or feature requests, is people want control on the budget. Because right now, in 0.1, it kind of decides its own thinking budget. But sometimes you know how hard the problem is. And you want to actually tell the model, spend two minutes on this. Or spend some dollar amount. Maybe it's time you miss dollars. I don't know what the budget is. That makes a lot of sense.Lin [00:38:27]: So we actually thought about that requirement. And it should be, at some point, we need to support that. Not initially. But that makes a lot of sense.Swyx [00:38:38]: Okay. So that was a fascinating overview of just the things that you're working on. First of all, I realized that... I don't know if I've ever given you this feedback. But I think you guys are one of the reasons I agreed to advise you. Because I think when you first met me, I was kind of dubious. I was like... Who are you? There's Replicate. There's Together. There's Laptop. There's a whole bunch of other players. You're in very, very competitive fields. Like, why will you win? And the reason I actually changed my mind was I saw you guys shipping. I think your surface area is very big. The team is not that big. No. We're only 40 people. Yeah. And now here you are trying to compete with OpenAI and everyone else. What is the secret?Lin [00:39:21]: I think the team. The team is the secret.Swyx [00:39:23]: Oh boy. So there's no thing I can just copy. You just... No.Lin [00:39:30]: I think we all come from a very aligned culture. Because most of our team came from meta.Swyx [00:39:38]: Yeah.Lin [00:39:38]: And many startups. So we really believe in results. One is result. And second is customer. We're very customer obsessed. And we don't want to drive adoption for the sake of adoption. We really want to make sure we understand we are delivering a lot of business values to the customer. And we really value their feedback. So we would wake up midnight and deploy some model for them. Shuffle some capacity for them. And yeah, over the weekend, no brainer.Swyx [00:40:15]: So yeah.Lin [00:40:15]: So that's just how we work as a team. And the caliber of the team is really, really high as well. So as plug-in, we're hiring. We're expanding very, very fast. So if we are passionate about working on the most cutting-edge technology in the general space, come talk with us. Yeah.Swyx [00:40:38]: Let's talk a little bit about that customer journey. I think one of your more famous customers is Cursor. We were the first podcast to have Cursor on. And then obviously since then, they have blown up. Cause and effect are not related. But you guys especially worked on a fast supply model where you were one of the first people to work on speculative decoding in a production setting. Maybe just talk about what was the behind the scenes of working with Cursor?Lin [00:41:03]: I will say Cursor is a very, very unique team. I think the unique part is the team has very high technical caliber. There's no question about it. But they have decided, although many companies building coding co-pilot, they will say, I'm going to build a whole entire stack because I can. And they are unique in the sense they seek partnership. Not because they cannot. They're fully capable, but they know where to focus. That to me is amazing. And of course, they want to find a bypass partner. So we spent some time working together. They are pushing us very aggressively because for them to deliver high caliber product experience, they need the latency. They need the interactive, but also high quality at the same time. So actually, we expanded our product feature quite a lot as we support Cursor. And they are growing so fast. And we massively scaled quickly across multiple regions. And we developed a pretty high intense inference stack, almost like similar to what we do for Meta. I think that's a very, very interesting engagement. And through that, there's a lot of trust being built. They realize, hey, this is a team they can really partner with. And they can go big with. That comes back to, hey, we're really customer obsessed. And all the engineers working with them, there's just enormous amount of time syncing together with them and discussing. And we're not big on meetings, but we are like stack channel always on. Yeah, so you almost feel like working as one team. So I think that's really highlighted.Swyx [00:42:38]: Yeah. For those who don't know, so basically Cursor is a VS Code fork. But most of the time, people will be using closed models. Like I actually use a lot of SONET. So you're not involved there, right? It's not like you host SONET or you have any partnership with it. You're involved where Cursor is small, or like their house brand models are concerned, right?Lin [00:42:58]: I don't know what I can say, but the things they haven't said.Swyx [00:43:04]: Very obviously, the drop down is 4.0, but in Cursor, right? So I assume that the Cursor side is the Fireworks side. And then the other side, they're calling out the other. Just kind of curious. And then, do you see any more opportunity on the... You know, I think you made a big splash with 1,000 tokens per second. That was because of speculative decoding. Is there more to push there?Lin [00:43:25]: We push a lot. Actually, when I mentioned Fire Optimizer, right? So as in, we have a unique automation stack that is one size fits one. We actually deployed to Cursor earlier on. Basically optimized for their specific workload. And that's a lot of juice to extract out of there. And we see success in that product. It actually can be widely adopted. So that's why we started a separate product line called Fire Optimizer. So speculative decoding is just one approach. And speculative decoding here is not static. We actually wrote a blog post about it. There's so many different ways to do speculative decoding. You can pair a small model with a large model in the same model family. Or you can have equal pads and so on. There are different trade-offs which approach you take. It really depends on your workload. And then with your workload, we can align the Eagle heads or Medusa heads or a small big model pair much better to extract the best latency reduction. So all of that is part of the Fire Optimizer offering.Alessio [00:44:23]: I know you mentioned some of the other inference providers. I think the other question that people always have is around benchmarks. So you get different performance on different platforms. How should people think about... People are like, hey, Lama 3.2 is X on MMLU. But maybe using speculative decoding, you go down a different path. Maybe some providers run a quantized model. How should people think about how much they should care about how you're actually running the model? What's the delta between all the magic that you do and what a raw model...Lin [00:44:57]: Okay, so there are two big development cycles. One is experimentation, where they need fast iteration. They don't want to think about quality, and they just want to experiment with product experience and so on. So that's one. And then it looks good, and they want to post-product market with scaling. And the quality is really important. And latency and all the other things are becoming important. During the experimentation phase, it's just pick a good model. Don't worry about anything else. Make sure you even generate the right solution to your product. And that's the focus. And then post-product market fit, then that's kind of the three-dimensional optimization curve start to kick in across quality, latency, cost, where you should land. And to me, it's purely a product decision. To many products, if you choose a lower quality, but better speed and lower cost, but it doesn't make a difference to the product experience, then you should do it. So that's why I think inference is part of the validation. The validation doesn't stop at offline eval. The validation will go through A-B testing, through inference. And that's where we offer various different configurations for you to test which is the best setting. So this is the traditional product evaluation. So product evaluation should also include your new model versions and different model setup into the consideration.Swyx [00:46:22]: I want to specifically talk about what happens a few months ago with some of your major competitors. I mean, all of this is public. What is your take on what happens? And maybe you want to set the record straight on how Fireworks does quantization because I think a lot of people may have outdated perceptions or they didn't read the clarification post on your approach to quantization.Lin [00:46:44]: First of all, it's always a surprise to us that without any notice, we got called out.Swyx [00:46:51]: Specifically by name, which is normally not what...Lin [00:46:54]: Yeah, in a public post. And have certain interpretation of our quality. So I was really surprised. And it's not a good way to compete, right? We want to compete fairly. And oftentimes when one vendor gives out results, the interpretation of another vendor is always extremely biased. So we actually refrain ourselves to do any of those. And we happily partner with third parties to do the most fair evaluation. So we're very surprised. And we don't think that's a good way to figure out the competition landscape. So then we react. I think when it comes to quantization, the interpretation, we wrote actually a very thorough blog post. Because again, no one says it's all. We have various different quantization schemes. We can quantize very different parts of the model from ways to activation to cross-TPU communication. They can use different quantization schemes or consistent across the board. And again, it's a trade-off. It's a trade-off across this three-dimensional quality, latency, and cost. And for our customer, we actually let them find the best optimized point. And we have a very thorough evaluation process to pick that point. But for self-serve, there's only one point to pick. There's no customization available. So of course, it depends on what we talk with many customers. We have to pick one point. And I think the end result, like AA published, later on AA published a quality measure. And we actually looked really good. So that's why what I mean is, I will leave the evaluation of quality or performance to third party and work with them to find the most fair benchmark. And I think that's a good approach, a methodology. But I'm not a part of an approach of calling out specific namesSwyx [00:48:55]: and critique other competitors in a very biased way. Databases happens as well. I think you're the more politically correct one. And then Dima is the more... Something like this. It's you on Twitter.Lin [00:49:11]: It's like the Russian... We partner. We play different roles.Swyx [00:49:20]: Another one that I wanted to... I'm just the last one on the competition side. There's a perception of price wars in hosting open source models. And we talked about the competitiveness in the market. Do you aim to make margin on open source models? Oh, absolutely, yes.Lin [00:49:38]: So, but I think it really... When we think about pricing, it's really need to coordinate with the value we're delivering. If the value is limited, or there are a lot of people delivering the same value, there's no differentiation. There's only one way to go. It's going down. So through competition. If I take a big step back, there is pricing from... We're more compared with close model providers, APIs, right? The close model provider, their cost structure is even more interesting because we don't bear any training costs. And we focus on inference optimization, and that's kind of where we continue to add a lot of product value. So that's how we think about product. But for the close source API provider, model provider, they bear a lot of training costs. And they need to amortize the training costs into the inference. So that created very interesting dynamics of, yeah, if we match pricing there, and I think how they are going to make money is very, very interesting.Swyx [00:50:37]: So for listeners, opening eyes 2024, $4 billion in revenue, $3 billion in compute training, $2 billion in compute inference, $1 billion in research compute amortization, and $700 million in salaries. So that is like...Swyx [00:50:59]: I mean, a lot of R&D.Lin [00:51:01]: Yeah, so I think matter is basically like, make it zero. So that's a very, very interesting dynamics we're operating within. But coming back to inference, so we are, again, as I mentioned, our product is, we are a platform. We're not just a single model as a service provider as many other inference providers, like they're providing a single model. We have our optimizer to highly customize towards your inference workload. We have a compound AI system where significantly simplify your interaction to high quality and low latency, low cost. So those are all very different from other providers.Alessio [00:51:38]: What do people not know about the work that you do? I guess like people are like, okay, Fireworks, you run model very quickly. You have the function model. Is there any kind of like underrated part of Fireworks that more people should try?Lin [00:51:51]: Yeah, actually, one user post on x.com, he mentioned, oh, actually, Fireworks can allow me to upload the LoRa adapter to the service model at the same cost and use it at same cost. Nobody has provided that. That's because we have a very special, like we rolled out multi-LoRa last year, actually. And we actually have this function for a long time. And many people has been using it, but it's not well known that, oh, if you find your model, you don't need to use on demand. If you find your model is LoRa, you can upload your LoRa adapter and we deploy it as if it's a new model. And then you use, you get your endpoint and you can use that directly, but at the same cost as the base model. So I'm happy that user is marketing it for us. He discovered that feature, but we have that for last year. So I think to feedback to me is, we have a lot of very, very good features, as Sean just mentioned. I'm the advisor to the company,Swyx [00:52:57]: and I didn't know that you had speculative decoding released.Lin [00:53:02]: We have prompt catching way back last year also. We have many, yeah. So I think that is one of the underrated feature. And if they're developers, you are using our self-serve platform, please try it out.Swyx [00:53:16]: The LoRa thing is interesting because I think you also, the reason people add additional costs to it, it's not because they feel like charging people. Normally in normal LoRa serving setups, there is a cost to dedicating, loading those weights and dedicating a machine to that inference. How come you can't avoid it?Lin [00:53:36]: Yeah, so this is kind of our technique called multi-LoRa. So we basically have many LoRa adapters share the same base model. And basically we significantly reduce the memory footprint of serving. And the one base model can sustain a hundred to a thousand LoRa adapters. And then basically all these different LoRa adapters can share the same, like direct the same traffic to the same base model where base model is dominating the cost. So that's how we advertise that way. And that's how we can manage the tokens per dollar, million token pricing, the same as base model.Swyx [00:54:13]: Awesome. Is there anything that you think you want to request from the community or you're looking for model-wise or tooling-wise that you think like someone should be working on in this?Lin [00:54:23]: Yeah, so we really want to get a lot of feedback from the application developers who are starting to build on JNN or on the already adopted or starting about thinking about new use cases and so on to try out Fireworks first. And let us know what works out really well for you and what is your wishlist and what sucks, right? So what is not working out for you and we would like to continue to improve. And for our new product launches, typically we want to launch to a small group of people. Usually we launch on our Discord first to have a set of people use that first. So please join our Discord channel. We have a lot of communication going on there. Again, you can also give us feedback. We'll have a starting office hour for you to directly talk with our DevRel and engineers to exchange more long notes.Alessio [00:55:17]: And you're hiring across the board?Lin [00:55:18]: We're hiring across the board. We're hiring front-end engineers, infrastructure cloud, infrastructure engineers, back-end system optimization engineers, applied researchers, like researchers who have done post-training, who have done a lot of fine-tuning and so on.Swyx [00:55:34]: That's it. Thank you. Thanks for having us. Get full access to Latent Space at www.latent.space/subscribe

Bonus Episode: Keeping Up with Search Trends and Evolved User Intent with Arv Natarajan

Honest eCommerce

Play Episode Listen Later Nov 14, 2024 24:03

Arv is the Director of Product at GroupBy. He is a passionate entrepreneur currently responsible for product management. Arv has over eight years of experience in the oil and gas industry, including five years with WorleyParsons.In This Conversation We Discuss: [00:45] Intro[01:21] Driving revenue with better product discovery[02:40] Scaling search for retailers with high SKU counts[03:31] Adapting to modern, conversational search queries[05:58] Better product details for user-friendly browsing[08:30] Avoiding AI errors with human-guided review[09:55] Guiding users with smart sorting options[11:46] Product variations using smart automation[13:02] Shopper trust through respectful customization[14:09] Automating reorders with helpful reminders[15:05] Preventing bounce rates with smart stock management[16:32] Clear shipping times to boost purchase confidence[17:11] advanced analytics for more effective action[18:34] Using best-in-class apps as your brand grows[20:17] Recognizing evolving trends in ecommerce technology[21:24] Product search with AI-driven recommendations[23:01] GroupBy for enhancing your retail experienceResources:Subscribe to Honest Ecommerce on YoutubeProduct discovery platform powered by Google Cloud Vertex AI search for retail groupbyinc.com/Follow Arv Natarajan linkedin.com/in/arvnatarajan/If you're enjoying the show, we'd love it if you left Honest Ecommerce a review on Apple Podcasts. It makes a huge impact on the success of the podcast, and we love reading every one of your reviews!

E155: Taking on Elasticsearch - the ParadeDB Story

Open Source Startup Podcast

Play Episode Listen Later Nov 12, 2024 33:17

Philippe Noël is Co-Founder & CEO of ParadeDB, the modern Elasticsearch alternative built on Postgres. They're purpose-built for heavy, real-time workloads and their open source project, also called paradedb, has over 6K stars on GitHub. ParadeDB has raised $2M from investors including General Catalyst & YC. In this episode, we dig into the benefits of connecting search directly to the database (ie. no ETL), the types of users / use cases that really benefit from ParadeDB (e-commerce, FinTech, etc.), the decision to focus on Postgres, making adoption super easy, Philippe's learnings as a second-time founder & more!

ceo co founders fintech github 2m yc 6k etl elasticsearch general catalyst postgres

Reimagining Code: The AI-Driven Transformation of Programming and Data Analytics

The Data Exchange with Ben Lorica

Play Episode Listen Later Oct 17, 2024 41:05

Matt Welsh is a technical leader at Aryn AI, an AI-powered ETL system for RAG frameworks, LLM-based applications, and vector databases. In this episode, we explore how AI is revolutionizing programming and software development. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.

ai transformation code driven programming reimagining detailed data analytics llm rag etl matt welsh

Lake House Architecture: Cost, Efficiency, AI Integration, and Data Governance Trends w/ Ori Rafael

Business of Tech

Play Episode Listen Later Sep 22, 2024 19:25

host Dave Sobel welcomes Ori Rafael, CEO and co-founder of Upsolver, to discuss the emerging concept of lake house architecture in data management. The conversation begins with an exploration of how lake houses compare to traditional data warehouses and data lakes. Ori explains that a lake house is essentially a modern data warehouse architecture that allows customers to manage their own data layers, providing flexibility and control over their data storage and processing.Ori delves into the evolution of data management architectures, highlighting the transition from on-premise data warehouses to cloud-managed solutions. He discusses the challenges faced by database administrators (DBAs) in the past, such as vendor lock-in and the limitations of traditional data warehouses. The lake house model addresses these issues by decoupling storage and compute, enabling organizations to utilize multiple query engines and platforms without being tied to a single vendor.The discussion also touches on the significant advantages of lake house architecture, particularly in terms of cost reduction and operational efficiency. Ori emphasizes that organizations can save a substantial portion of their data warehouse budgets by eliminating the need for expensive ETL processes tied to specific warehouse vendors. Additionally, the ability to leverage various engines for analytics and AI applications empowers businesses to innovate without the constraints of traditional data management systems.As the conversation progresses, Ori highlights the importance of optimizing storage for improved query performance and efficiency. He explains how Upsolver manages the file system layer to ensure that organizations can achieve performance levels comparable to traditional warehouses while maintaining high storage efficiency. The episode concludes with a discussion on the evolving role of data engineers, emphasizing the need for them to transition from developers to platform managers, enabling greater independence and efficiency in data operations. All our Sponsors: https://businessof.tech/sponsors/ Do you want the show on your podcast app or the written versions of the stories? Subscribe to the Business of Tech: https://www.businessof.tech/subscribe/Looking for a link from the stories? The entire script of the show, with links to articles, are posted in each story on https://www.businessof.tech/ Support the show on Patreon: https://patreon.com/mspradio/ Want our stuff? Cool Merch? Wear “Why Do We Care?” - Visit https://mspradio.myspreadshop.com Follow us on:LinkedIn: https://www.linkedin.com/company/28908079/YouTube: https://youtube.com/mspradio/Facebook: https://www.facebook.com/mspradionews/Instagram: https://www.instagram.com/mspradio/TikTok: https://www.tiktok.com/@businessoftechBluesky: https://bsky.app/profile/businessoftech.bsky.social

ceo tiktok ai business tech cost architecture efficiency ori ai integration data governance lake house etl dbas cool merch

Automating Unstructured Data Extraction with LLMs

The Data Exchange with Ben Lorica

Play Episode Listen Later Aug 8, 2024 35:29

Shuveb Hussain is co-founder of Unstract, a no-code platform that uses large language models to extract structured data from unstructured documents, allowing users to build API endpoints and ETL pipelines to automate document processing workflows. Subscribe to the Gradient Flow Newsletter: https://gradientflow.substack.com/Subscribe: Apple • Spotify • Overcast • Pocket Casts • AntennaPod • Podcast Addict • Amazon • RSS.Detailed show notes - with links to many references - can be found on The Data Exchange web site.

api detailed extraction automating etl unstructured data

Podcasts about ETL

Best podcasts about ETL

ENTER THE LAB

Entrepreneurial Thought Leaders

Data Engineering Podcast

Les Grandes Gueules

Screaming in the Cloud

A Way with Words — language, linguistics, and callers from all over

The Data Stack Show

The Cloud Pod

Drill to Detail

Software Engineering Daily

AWS Podcast

Roaring Elephant

Engenharia de Dados [Cast]

MLOps.community

Raw Data By P3

SQL Data Partners Podcast

Voice of the DBA

Streaming Audio: a Confluent podcast about Apache Kafka

The Cloudcast

The Data Exchange with Ben Lorica

Software Defined Talk

Non-Technical

RunAs Radio

Modernize or Die ® Podcast - CFML News Edition

MarTech Podcast // Marketing + Technology = Business Growth

Giant Robots Smashing Into Other Giant Robots

Catalog & Cocktails

Podcast – Software Engineering Daily

Data Mesh Radio

Latent Space: The AI Engineer Podcast â€” CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Revenue Generator Podcast: Sales + Marketing + Product + Customer Success = Revenue Growth

Extra to Love: A Trisomy Podcast

Real-Time Analytics with Tim Berglund

IBM Analytics Insights Podcasts

Making Data Simple

Monday Morning Data Chat

Latest news about ETL

Latest podcast episodes about ETL

Quickest way to improve analytics using live data in a campaign

The most important learning about data at Meta

Episode 53: ETL - Enter The Lion

Parakeeto vs. Project Management Tools: What's the Real Solution?, With Kristen Kelly

Over Easy Done Well from Oct 3, 2025

Incremental Design, DevOps, Microservices & CICD • Michael Nygard & Dave Farley

More data? More growth!

Why the Middle Layer of Your Agency Org Chart May Not Survive AI with Jennifer Bagley | Ep #841

Fail‑Safe by Design: Avoiding Catastrophic Product Failures

Making Data Simple: Live Data, Smarter AI with Snow Leopard founder Deepti Srivastava

Making Data Simple: Live Data, Smarter AI with Snow Leopard founder Deepti Srivastava

From Bots To Agents: Building Trustworthy Autonomy With Hakkōda, an IBM Company

102. Los datos no valen nada sin ESTO

3412: PuppyGraph at the IT Press Tour: Graph Power Without the Pain

Siri Co-Founder Adam Cheyer: Why Data, Not Algorithms, Is AI's True Competitive Edge

What “Data-Driven” Really Means

Episode 57: AI Agents and LLM Judges at Scale: Processing Millions of Documents (Without Breaking the Bank)

Data Warehouse Automation: Benefits and Market Overview – with Florian Bigelmaier, BARC

#232 n8n na prática: benefícios e automações funcionais

Episode 472: Should my junior dev use AI and thrown in to ETL

624: Tampa Tech With Joey DeVilla

190: Alteryx Use Cases in the Tax Industry

Solar Translucent Lifeformless from Jul 17, 2025

253: Why Traditional Data Pipelines Are Broken (And How to Fix Them) with Ruben Burdin of Stacksync

LCC 327 - Mon ami de 30 ans

Building an AI+Data Startup Studio | Tom Chavez, cofounder of super{set}

The history and future of the data ecosystem (w/ Lonne Jaffe)

Unlocking Med Spa Growth with Smarter Data

The End of Stale AI Data with Snow Leopard | Episode #99

Pourquoi La Joconde n'a-t-elle pas de sourcils ?

EP. 264 Beyond the Database: Mastering Multi-Cloud Data, AI Automation & Integration (feat. Peter Ngai, SnapLogic)

What is Oracle GoldenGate 23ai?

Building Trust in Data: Transparency, Collaboration, and Governance for Successful AI

EP.260 Vector Search Secrets Revealed! - AI-Powered Image Search with MongoDB - Live Demo

Data, Data, and Metadata: Letting ChatGPT Interpret Power BI Output

A Conversation with NAVAC at AHR 2025

Integrating Your CRM With Your Data Warehouse

Connect to any data with Shortcuts, Mirroring and Data Factory using Microsoft Fabric

Why Most Reporting Systems Fail, & What to Do Instead, With Ben Zittlau

Episode 284: The Four-Letter Word ETL - Data Movement