Podcasts about duckdb

  • 62PODCASTS
  • 116EPISODES
  • 48mAVG DURATION
  • 1EPISODE EVERY OTHER WEEK
  • Oct 20, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about duckdb

Latest podcast episodes about duckdb

Talk Python To Me - Python conversations for passionate developers
#524: 38 things Python developers should learn in 2025

Talk Python To Me - Python conversations for passionate developers

Play Episode Listen Later Oct 20, 2025 69:15 Transcription Available


Python in 2025 is different. Threads really are about to run in parallel, installs finish before your coffee cools, and containers are the default. In this episode, we count down 38 things to learn this year: free-threaded CPython, uv for packaging, Docker and Compose, Kubernetes with Tilt, DuckDB and Arrow, PyScript at the edge, plus MCP for sane AI workflows. Expect practical wins and migration paths. No buzzword bingo, just what pays off in real apps. Join me along with Peter Wang and Calvin Hendrix-Parker for a fun, fast-moving conversation. Episode sponsors Seer: AI Debugging, Code TALKPYTHON Agntcy Talk Python Courses Links from the show Calvin Hendryx-Parker: github.com/calvinhp Peter on BSky: @wang.social Free-Threaded Wheels: hugovk.github.io Tilt: tilt.dev The Five Demons of Python Packaging That Fuel Our ...: youtube.com Talos Linux: talos.dev Docker: Accelerated Container Application Development: docker.com Scaf - Six Feet Up: sixfeetup.com BeeWare: beeware.org PyScript: pyscript.net Cursor: The best way to code with AI: cursor.com Cline - AI Coding, Open Source and Uncompromised: cline.bot Watch this episode on YouTube: youtube.com Episode #524 deep-dive: talkpython.fm/524 Episode transcripts: talkpython.fm Theme Song: Developer Rap

alphalist.CTO Podcast - For CTOs and Technical Leaders
#130 - From PhD Research to DuckDB: Building the Next Generation of Analytical DBs with Mark Raasveldt // CTO @ DuckDB

alphalist.CTO Podcast - For CTOs and Technical Leaders

Play Episode Listen Later Oct 16, 2025 53:12 Transcription Available


Mark Raasveldt, co-founder and CTO of DuckDB Labs, shares his journey from academic research at CWI Amsterdam to creating one of the most innovative analytical databases of the last decade. Mark discusses the technical challenges of building DuckDB from scratch, the philosophy behind embedded analytical databases, and why single-node performance still matters in our cloud-first world. He provides insights into open source business models, the evolution of data formats like Parquet, and how DuckDB is democratizing high-performance analytics for developers everywhere.

DataTalks.Club
Berlin PyData 2025 Conference Interviews

DataTalks.Club

Play Episode Listen Later Sep 26, 2025 49:21


At PyData Berlin, community members and industry voices highlighted how AI and data tooling are evolving across knowledge graphs, MLOps, small-model fine-tuning, explainability, and developer advocacy.- Igor Kvachenok (Leuphana University / ProKube) combined knowledge graphs with LLMs for structured data extraction in the polymer industry, and noted how MLOps is shifting toward LLM-focused workflows.- Selim Nowicki (Distill Labs) introduced a platform that uses knowledge distillation to fine-tune smaller models efficiently, making model specialization faster and more accessible.- Gülsah Durmaz (Architect & Developer) shared her transition from architecture to coding, creating Python tools for design automation and volunteering with PyData through PyLadies.- Yashasvi Misra (Pure Storage) spoke on explainable AI, stressing accountability and compliance, and shared her perspective as both a data engineer and active Python community organizer.- Mehdi Ouazza (MotherDuck) reflected on developer advocacy through video, workshops, and branding, showing how creative communication boosts adoption of open-source tools like DuckDB.Igor KvachenokMaster's student in Data Science at Leuphana University of Lüneburg, writing a thesis on LLM-enhanced data extraction for the polymer industry. Builds RDF knowledge graphs from semi-structured documents and works at ProKube on MLOps platforms powered by Kubeflow and Kubernetes.Connect: https://www.linkedin.com/in/igor-kvachenok/Selim NowickiFounder of Distill Labs, a startup making small-model fine-tuning simple and fast with knowledge distillation. Previously led data teams at Berlin startups like Delivery Hero, Trade Republic, and Tier Mobility. Sees parallels between today's ML tooling and dbt's impact on analytics.Connect: https://www.linkedin.com/in/selim-nowicki/Gülsah DurmazArchitect turned developer, creating Python-based tools for architectural design automation with Rhino and Grasshopper. Active in PyLadies and a volunteer at PyData Berlin, she values the community for networking and learning, and aims to bring ML into architecture workflows.Connect: https://www.linkedin.com/in/gulsah-durmaz/Yashasvi (Yashi) MisraData Engineer at Pure Storage, community organizer with PyLadies India, PyCon India, and Women Techmakers. Advocates for inclusive spaces in tech and speaks on explainable AI, bridging her day-to-day in data engineering with her passion for ethical ML.Connect: https://www.linkedin.com/in/misrayashasvi/Mehdi OuazzaDeveloper Advocate at MotherDuck, formerly a data engineer, now focused on building community and education around DuckDB. Runs popular YouTube channels ("mehdio DataTV" and "MotherDuck") and delivered a hands-on workshop at PyData Berlin. Blends technical clarity with creative storytelling.Connect: https://www.linkedin.com/in/mehd-io/

MLOps.community
The DuckLake Lakehouse Format // Hannes Mühleisen // #339

MLOps.community

Play Episode Listen Later Sep 19, 2025 57:24


The DuckLake Lakehouse Format // MLOps Podcast #339 with Hannes Mühleisen, Co-founder and CEO of DuckDB Labs.Join the Community: https://go.mlops.community/YTJoinInGet the newsletter: https://go.mlops.community/YTNewsletter// AbstractManaging data on Object Stores has been a painful affair. Users had to choose between data swamp chaos or a maze of metadata files with catalog servers on top. DuckLake is a new paradigm for managing data on object stores: First, it uses classical SQL data management systems to manage metadata. Second, actual data is stored in Parquet files on pretty arbitrary storage. Third, processing queries is done client-side, or anywhere really. DuckDB is the first system to integrate with DuckLake using an extension with the same name. Conceptually, DuckLake enables central control over truth while decentralizing compute and storage entirely. DuckLake turns data warehouse architecture upside down by departing from the integrated metadata/compute layer towards a fully disconnected operation with only centralized metadata. For the first time, DuckLake allows a “multi-player” experience with DuckDB, where computation stays fully local, but transactional control is centralized.// BioHannes Mühleisen

Stories from the Hackery
The Engine Behind AI: Why Data Engineering is in Demand | Stories From The Hackery

Stories from the Hackery

Play Episode Listen Later Sep 10, 2025 67:21


In this episode of Stories from the Hackery, we talk with Nashville tech leader and hiring manager Jason Turan about one of tech's most in-demand fields: data engineering. Jason, a long-time friend of NSS, was one of the first people to tell us that Nashville needed more data engineers. He shares his perspective on what a data engineer does, describing the role as the "connective tissue between data producers and data consumers". Listen in to hear us discuss: - Why data engineers are essential for flipping the 80/20 rule, allowing data scientists and analysts to spend less time cleaning data and more time finding insights. - How the rise of generative AI has acted as an "accelerant," increasing the need for high-quality data and the professionals who can provide it. - Actionable advice for getting started in the field, including the importance of focusing on a "T-shaped skillset" with SQL at its core. - Why Jason's number one piece of advice is to be curious, experiment, and "go out and do the thing". 01:20 Meet Jason Turan: His Tech Origin Story 03:04 Jason's History with NSS and Hiring Grads 07:28 Defining Data Engineering: The "Connective Tissue" of Tech 11:15 Why Nashville is a Hub for Data Engineers 13:56 Healthcare's Impact on Nashville's Data Jobs 20:35 How GenAI Accelerates the Need for Data Engineers 31:33 Getting Started: Lower Barriers to Entry 39:03 A Top Use Case for AI: Understanding Your Codebase 52:21 Misconceptions & the "T-Shaped Skillset" 55:29 The Value of Hands-On Learning: "Go Do the Thing" 58:52 Lightning Round: Favorite Tech Tools 01:00:32 Lightning Round: Top Reads & Resources Links Metabase: https://www.metabase.com/ DuckDB: https://duckdb.org/ MotherDuck: https://motherduck.com/ Ralph Kimball: The Data Warehouse Toolkit: https://www.amazon.com/gp/product/1118530802 Bill Inmon: Building the Data Warehouse: https://www.amazon.com/Building-Data-Warehouse-W-Inmon/dp/0764599445 Edward Tufte: The Visual Display of Quantitative Information: https://www.amazon.com/Visual-Display-Quantitative-Information/dp/0961392142 Brendan Keeler: The Health API Guy: https://healthapiguy.substack.com/ TLDR Newsletter: https://tldr.tech/ Nashville Technology Council (NTC): https://technologycouncil.com/

Kodsnack in English
Kodsnack 654 - German-style strings, with Matt Topol

Kodsnack in English

Play Episode Listen Later Aug 5, 2025 53:20


Fredrik talks to Matt Topol about Arrow and how the Arrow ecosystem is evolving. Arrow is an open source, columnar in-memory data format designed for efficient data processing and analytics - which means passing data between things without needing to transform it, and ideally even without needing to copy it. What makes the ecosystem grow, and why is it very cool to have Arrow on the GPU? What is the connection between Arrow, machine learning, and Hugging face? Matt emphasizes the value of open standards, even as they work with or within more closed systems they can help open things up, and help bring about more modular solutions so that developers can focus on doing their core area really well. This episode can be seen as a follow-up to episode 567, where Matt first joined to discuss everything Arrow. Recorded during Øredev 2024. Thank you Cloudnet for sponsoring our VPS! Comments, questions or tips? We a re @kodsnack, @tobiashieta, @oferlund and @bjoreman on Twitter, have a page on Facebook and can be emailed at info@kodsnack.se if you want to write longer. We read everything we receive. If you enjoy Kodsnack we would love a review in iTunes! You can also support the podcast by buying us a coffee (or two!) through Ko-fi. Links Matt Matt’s Øredev 2023 talks: State of the Apache Arrow ecosystem: How your project can leverage Arrow! and Leveraging Apache Arrow for ML workflows Previous episodes with Matt Øredev 2024 Matt’s Øredev 2024 talks - on Arrow ADBC and Composable and modular data systems ADBC - Arrow database connectivity Arrow Snowflake Snowflake drivers for ADBC Bigquery The Bigquery driver Microsoft Fabric Duckdb Postgres SQLite Arrow flight - RPC framework for services based on Arrow data Arrow flight SQL Microsoft Power BI Velox Apache datafusion Query planning Substrait - query IR Polaris Libcudf Nvidia RAPIDS Pytorch Tensorflow Arrow device interface DLPack - in-memory tensor structure Tensors Nanoarrow Voltron data - where Matt used to work. He’s now at Columnar Theseus GPU compute engine The composable data management system manifesto Support us on Ko-fi! Matt’s book - In-memory analytics with Apache Arrow Spark Spark connect RPC UDFs Photon Datafusion Apache Cassandra ODBC JDBC R - programming language for statistical computing Hugging face Ray Stringview - “German-style strings” Scaling up with R and Arrow - the book on using Arrow with R Titles It’s gotten a lot bigger The bones of it are in the repo (Powered by ADBC) Individual compute components Feed it substrate Where the ecosystem is going Arrow on the GPU The data stays on the GPU A forced copy Leverage that device interface Without forcing the copy Shy of that last mile Turtles all the way down The guy who said yes German-style strings

DataTalks.Club
From Simulations to Freelance Data Engineering: Orell's Journey Out of Academia and Into Consulting - Orell Garten

DataTalks.Club

Play Episode Listen Later Aug 1, 2025 58:22


In this episode, we talk with Orell about his journey from electrical engineering to freelancing in data engineering. Exploring lessons from startup life, working with messy industrial data, the realities of freelancing, and how to stay up to date with new tools. Topics covered: Why Orel left a PhD and a simulation‑focused start‑up after Covid hitWhat he learned trying (and failing) to commercialise medical‑imaging simulationsThe first freelance project and the long, quiet months that followedHow he now finds clients, keeps projects small and delivers value quicklyTypical work he does for industrial companies: parsing messy machine logs, building simple pipelines, adding structure laterFavorite everyday tools (Python, DuckDB, a bit of C++) and the habit of blocking time for learningAdvice for anyone thinking about freelancing: cash runway, networking, and focusing on problems rather than “perfect” tech choicesA practical conversation for listeners who are curious about moving from research or permanent roles into freelance data engineering.

Python Bytes
#441 It's Michaels All the Way Down

Python Bytes

Play Episode Listen Later Jul 21, 2025 27:48 Transcription Available


Topics covered in this episode: * Distributed sqlite follow up: Turso and Litestream* * PEP 792 – Project status markers in the simple index* Run coverage on tests docker2exe: Convert a Docker image to an executable Extras Joke Watch on YouTube About the show Sponsored by Digital Ocean: pythonbytes.fm/digitalocean-gen-ai Use code DO4BYTES and get $200 in free credit Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Michael #1: Distributed sqlite follow up: Turso and Litestream Michael Booth: Turso marries the familiarity and simplicity of SQLite with modern, scalable, and distributed features. Seems to me that Turso is to SQLite what MotherDuck is to DuckDB. Mike Fiedler Continue to use the SQLite you love and care about (even the one inside Python runtime) and launch a daemon that watches the db for changes and replicates changes to an S3-type object store. Deeper dive: Litestream: Revamped Brian #2: PEP 792 – Project status markers in the simple index Currently 3 status markers for packages Trove Classifier status Indices can be yanked PyPI projects - admins can quarantine a project, owners can archive a project Proposal is to have something that can have only one state active archived quarantined deprecated This has been Approved, but not Implemented yet. Brian #3: Run coverage on tests Hugo van Kemenade And apparently, run Ruff with at least F811 turned on Helps with copy/paste/modify mistakes, but also subtler bugs like consumed generators being reused. Michael #4: docker2exe: Convert a Docker image to an executable This tool can be used to convert a Docker image to an executable that you can send to your friends. Build with a simple command: $ docker2exe --name alpine --image alpine:3.9 Requires docker on the client device Probably doesn't map volumes/ports/etc, though could potentially be exposed in the dockerfile. Extras Brian: Back catalog of Test & Code is now on YouTube under @TestAndCodePodcast So far 106 of 234 episodes are up. The rest are going up according to daily limits. Ordering is rather chaotic, according to upload time, not release ordering. There will be a new episode this week pytest-django with Adam Johnson Joke: If programmers were doctors

CaSE: Conversations about Software Engineering
Data Architecture with Christoph Windheuser

CaSE: Conversations about Software Engineering

Play Episode Listen Later Jul 2, 2025 109:08 Transcription Available


The three of us talk with Christoph Windheuser about the styles in data architecture: data mesh, data lake (house) and data warehouse and how to make a decision. In between Christoph explains data quality, data lineage, and data catalog - cornerstones of any modern approach. We end with emerging trends, DuckDB and data governance.

The Data Stack Show
249: Quacking Through Data: Duckdb's Emerging Ecosystem

The Data Stack Show

Play Episode Listen Later Jun 18, 2025 19:20


This week on The Data Stack Show, John Wessel and Matt Kelliher-Gibson dive into the recent Duck Lake announcement, exploring the evolving landscape of data analytics technologies. They discuss DuckDB's role as a lightweight, local analytics database and its potential as a caching layer for open table formats like Iceberg. The conversation also highlights the current state of data storage standards, focusing on agreements around Parquet and Iceberg, while noting the ongoing complexity in catalog management. Key takeaways include the importance of local compute solutions, the early stage of open table formats, and the potential for simplified data infrastructure that can provide faster, more cost-effective analytics workflows. The episode underscores the ongoing innovation in data technologies and the need for more streamlined, flexible data management solutions. Don't miss it!Highlights from this week's conversation include:Discussion on Duck Lake Announcement (1:41)Compatibility with Apache Iceberg (4:05)Use Cases for DuckDB (6:23)Concerns About Data Management (10:01)Introduction to Data Formats (11:40)Catalog Space Challenges (13:13)Metadata Orchestration (14:54)Simplicity in Data Management (15:25)SQL Demo Discussion (17:26)Wrap-Up and Final Thoughts (18:44)The Data Stack Show is a weekly podcast powered by RudderStack, customer data infrastructure that enables you to deliver real-time customer event data everywhere it's needed to power smarter decisions and better customer experiences. Each week, we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Product Guru's
Os agentes de IA vão acabar com os devs? | Anderson Amaral - Co Founder @ScoraS

Product Guru's

Play Episode Listen Later Jun 4, 2025 66:21


Neste episódio do Product Guru's, Paulo Chiodi conversa com Anderson Amaral, um dos maiores especialistas em Agentes de IA e LLMs no Brasil, segundo a própria comunidade de tecnologia no LinkedIn. Fundador da Scoras Digital e Scoras Academy, Anderson compartilha sua trajetória, explica a diferença entre LLM, SLM e agentes autônomos, além de mostrar na prática como funciona um sistema multiagente de IA.A conversa aborda temas técnicos e estratégicos com leveza e profundidade: desde o impacto de ferramentas como Devin e Manus AI no futuro do desenvolvimento de software, até os riscos éticos e técnicos do uso de IA em larga escala. Anderson ainda dá dicas valiosas para quem quer começar a aprender IA na prática, destacando oportunidades de carreira, ferramentas como LangGraph e os desafios da área em 2025. Um episódio essencial para PMs, devs e criadores de produtos digitais.//// Onde encontrar o convidada: Anderson Amaral | Co-Founder @ Scorashttps://www.linkedin.com/in/andersonlamaral/// Recado Importante: O futuro dos produtos digitais já começou e a Inteligência Artificial é parte do time.A PM3 acaba de lançar a Formação em Gestão de Produtos de IA: um curso pensado para Product Managers que querem criar, delegar e inovar com mais inteligência. Muito além dos prompts: você vai aprender a liderar produtos baseados em IA, dominar temas como Machine Learning, Deep Learning e IA Generativa, e aplicar novas formas de discovery, experimentação e validação.Prepare-se para o mercado que mais cresce no mundo e torne-se o PM que lidera a transformação.Acesse o link e saiba mais: https://go.pm3.com.br/ProductGurus-AI-Specialist/// Outros parceiros:Codando sem Codar - A maior comunidade de AI (Vibe) Coding do Brasil: https://codandosemcodar.com.br/?utm_campaign=pg_podcastCurling - Do treinamento à criação de soluções com IA, estamos em cada etapa. https://www.usecurling.com//// Nesse episódio abordamos:​A combinação de múltiplos LLMs para resolver tarefas de forma coordenada.​ Scoras Academy já formou quase 500 alunos em menos de 6 meses.​ A diferença entre LLM e agente é que o agente age no mundo com base em instruções fixas.​ Modelos chineses como o DeepSeek são mais eficientes e baratos por design.​ O LandGraph é uma ferramenta poderosa para criar sistemas multiagente personalizados.​ Pequenos modelos (SLMs) resolvem 80% dos problemas empresariais de IA.​ O maior desafio atual não é técnico, mas ético e de segurança no uso da IA.​ Profissionais introspectivos e sem habilidade de comunicação tendem a ser substituídos por IA.​ Agentes de IA têm potencial para gerar golpes e deepfakes — mais perigosos que NFTs.​ Comece com casos reais simples e depois evolua para soluções como LangGraph e DuckDB./// Capítulos00:00 Introdução e apresentação do convidado01:41 Origem da Scoras e a criação das empresas04:30 Crescimento da Scoras Academy e carência de talentos no Brasil07:59 Cupom e convite para estudar na Scoras Academy10:28 Diferença entre LLM e agentes autônomos16:09 Como funciona o treino de modelos de IA19:29 Por que modelos chineses são mais baratos?23:08 DeepSeek e eficiência computacional29:08 Demonstração prática: como funciona um sistema multiagente39:10 Diferença entre LLM e SLM (Small Language Model)42:08 Desafios técnicos e éticos dos agentes de IA48:12 Agentes de IA viraram o novo "NFT"?54:28 O futuro do desenvolvimento com IA e ferramentas como Devin01:00:06 Conselhos práticos para quem quer aprender IA01:07:28 Encerramento e convite final/// Onde encontrar a Product Guru's:WhatsApp: https://whatsapp.com/channel/0029Va7uwHS5fM5U0LIatu3XX (antigo Twitter): ⁠https://twitter.com/product_gurus⁠LinkedIn: ⁠https://www.linkedin.com/company/product-guru-s/⁠Instagram: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.instagram.com/product.gurus/⁠

The Joe Reis Show
Hamilton Ulmer - Instant SQL with DuckDB/MotherDuck - Practical Data Lunch and Learn

The Joe Reis Show

Play Episode Listen Later May 30, 2025 51:06


Imagine writing SQL and getting instant results as you type? Yes, this is reality now. It's amazing!DuckDB/MotherDuck's Instant SQL made a big splash at last month's Data Council. Hamilton Ulmer gives a demo of Instant SQL at the Practical Data Community.----------------------------Instant SQL: https://motherduck.com/blog/introducing-instant-sql/Practical Data Community Discord: https://discord.gg/gNfw5AKWSK

The Real Python Podcast
Exploring DuckDB & Comparing Python Expressions vs Statements

The Real Python Podcast

Play Episode Listen Later Apr 18, 2025 52:01


Are you looking for a fast database that can handle large datasets in Python? What's the difference between a Python expression and a statement? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.

The Data Engineering Show
Beyond Database Optimization with AI

The Data Engineering Show

Play Episode Listen Later Mar 19, 2025 30:52


In this episode of The Data Engineering Show, the bros welcome the CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. They delve into the groundbreaking journey of DuckDB, an analytical database that processes billions of queries every month. Learn why DuckDB prioritizes broad compatibility over specialized optimizations, how its extension model works and the emerging solutions for database technology in the age of AI.

DataTalks.Club
Trends in Data Engineering – Adrian Brudaru

DataTalks.Club

Play Episode Listen Later Mar 7, 2025 56:59


In this podcast episode, we talked with Adrian Brudaru about ​the past, present and future of data engineering.About the speaker:Adrian Brudaru studied economics in Romania but soon got bored with how creative the industry was, and chose to go instead for the more factual side. He ended up in Berlin at the age of 25 and started a role as a business analyst. At the age of 30, he had enough of startups and decided to join a corporation, but quickly found out that it did not provide the challenge he wanted.As going back to startups was not a desirable option either, he decided to postpone his decision by taking freelance work and has never looked back since. Five years later, he co-founded a company in the data space to try new things. This company is also looking to release open source tools to help democratize data engineering.0:00 Introduction to DataTalks.Club1:05 Discussing trends in data engineering with Adrian2:03 Adrian's background and journey into data engineering5:04 Growth and updates on Adrian's company, DLT Hub9:05 Challenges and specialization in data engineering today13:00 Opportunities for data engineers entering the field15:00 The "Modern Data Stack" and its evolution17:25 Emerging trends: AI integration and Iceberg technology27:40 DuckDB and the emergence of portable, cost-effective data stacks32:14 The rise and impact of dbt in data engineering34:08 Alternatives to dbt: SQLMesh and others35:25 Workflow orchestration tools: Airflow, Dagster, Prefect, and GitHub Actions37:20 Audience questions: Career focus in data roles and AI engineering overlaps39:00 The role of semantics in data and AI workflows41:11 Focusing on learning concepts over tools when entering the field 45:15 Transitioning from backend to data engineering: challenges and opportunities 47:48 Current state of the data engineering job market in Europe and beyond 49:05 Introduction to Apache Iceberg, Delta, and Hudi file formats 50:40 Suitability of these formats for batch and streaming workloads 52:29 Tools for streaming: Kafka, SQS, and related trends 58:07 Building AI agents and enabling intelligent data applications 59:09Closing discussion on the place of tools like DBT in the ecosystem

R Weekly Highlights
Issue 2025-W10 Highlights

R Weekly Highlights

Play Episode Listen Later Mar 7, 2025 41:45 Transcription Available


A major milestone for leveraging LLMs in R just landed with the new ellmer package, along with a terrific showcase of retrieval-augmented generation combining ellmer and DuckDB. Plus an inspiring roundup of the recent Closeread contest winners.Episode LinksThis week's curator: Sam Parmar - @parmsam@fosstodon.org (Mastodon) & @parmsam_ (X/Twitter)Announcing ellmer: A package for interacting with Large Language Models in RRapid RAG Prototyping: Building a Retrieval Augmented Generation Prototype with ellmer and DuckDBWinners of the Closeread Prize – Data-Driven Scrollytelling with QuartoEntire issue available at rweekly.org/2025-W10Supplement ResourcesCoder Radio episode 608 - R with Eric Nantz https://coder.show/608nhyris - The minimal framework for transform R shiny application into standaloneSupporting the showUse the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedbackR-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.A new way to think about value: https://value4value.infoGet in touch with us on social mediaEric Nantz: @rpodcast@podcastindex.social (Mastodon), @rpodcast.bsky.social (BlueSky) and @theRcast (X/Twitter)Mike Thomas: @mike_thomas@fosstodon.org (Mastodon), @mike-thomas.bsky.social (BlueSky), and @mike_ketchbrook (X/Twitter) Music credits powered by OCRemixWatermelon Flava - Breath of Fire III - Joshua Morse, posu yan - https://ocremix.org/remix/OCR01411Stomp the Summer Sky - Secret of Mana - Ziwtra - https://ocremix.org/remix/OCR00859

Entre Dev y Ops Podcast
EDyO 96 - Fosdem 2025

Entre Dev y Ops Podcast

Play Episode Listen Later Mar 5, 2025


En el episodio 96 del podcast de Entre Dev y Ops hablaremos del veinticinco aniversario de la FOSDEM. Blog Entre Dev y Ops - https://www.entredevyops.es Telegram Entre Dev y Ops - https://t.me/entredevyops Twitter Entre Dev y Ops - https://twitter.com/entredevyops LinkedIn Entre Dev y Ops - https://www.linkedin.com/company/entredevyops/ Patreon Entre Dev y Ops - https://www.patreon.com/edyo Amazon Entre Dev y Ops - https://amzn.to/2HrlmRw Enlaces comentados: Fosdem 2025 - https://fosdem.org/2025/  Fosdem Treasure Hunt - https://fosdem.org/2025/news/2025-01-30-treasure-hunt/ Curl - https://curl.se/  Luanti (formerly Minetest) - https://www.luanti.org/ 0 A.D. - https://play0ad.com/ The Battle for Wesnoth - https://www.wesnoth.org Charla optimización JavaScript - https://fosdem.org/2025/schedule/event/fosdem-2025-4391-how-to-lose-weight-optimising-memory-usage-in-javascript-and-beyond/ Charla DuckDB y graph queries - https://fosdem.org/2025/schedule/event/fosdem-2025-4135-empowering-data-analytics-high-performance-graph-queries-in-duckdb-with-duckpgq/ Charla segundo cerebro - https://fosdem.org/2025/schedule/event/fosdem-2025-6542-building-your-local-llm-second-brain/ Charla ecosistema Huggingface - https://fosdem.org/2025/schedule/event/fosdem-2025-6341-hugging-face-ecosystem-for-local-ai-ml/ DuckDB - https://duckdb.org DuckDB Con en Amsterdam - https://duckdb.org/events/2025/01/31/duckcon6/ Charla Leslie Lamport - https://fosdem.org/2025/schedule/event/fosdem-2025-4941-was-leslie-lamport-right-/ Paper sobre consistencia - https://www.scs.stanford.edu/17au-cs244b/labs/projects/clow_jiang.pdf immich - https://immich.app/ FuriLabs - https://furilabs.com/ TinyGo - https://tinygo.org Gopher Badge - https://gopherbadge.com/ FastHMTL - https://fastht.ml/ Contexto de FastHTML para LLMs - https://docs.fastht.ml/llms-ctx.txt Xwiki - https://www.xwiki.org EL BOLI de la discordia - https://www.amazon.com/Tactical-Multi-Tool-Utility-Screwdriver-Touchscreen/dp/B0BGQXVCFD

AWS Bites
140. DuckDB Meets AWS: A Match Made in Cloud

AWS Bites

Play Episode Listen Later Feb 21, 2025 17:38


In this episode, we explore DuckDB, an open-source analytical database known for its speed and simplicity. Discover how DuckDB stands out in various applications and compare it to other tools like SQLite, Athena, Pandas, and Polars. We also demonstrate integrating DuckDB with AWS Lambda and Step Functions for serverless analytics.AWS Bites is brought to you by fourTheorem. If you are looking for a partner to architect, develop and modernise on AWS, give fourTheorem a call. Check out ⁠fourtheorem.com⁠In this episode, we mentioned the following resources: Our `duck-query-lambda`, A Lambda runtime for DuckDB queries: https://github.com/fourTheorem/duck-query-lambda DuckDB's official website: https://duckdb.org/ LibSQL: https://github.com/tursodatabase/libsql Do you have any AWS questions you would like us to address?Leave a comment here or connect with us on X/Twitter, BlueSky or LinkedIn:- ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://twitter.com/eoins⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ | https://bsky.app/profile/eoin.sh | https://www.linkedin.com/in/eoins/- ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://twitter.com/loige⁠⁠⁠⁠ | https://bsky.app/profile/loige.co | https://www.linkedin.com/in/lucianomammino/

Bigdata Hebdo
Episode 211 - Motherduck

Bigdata Hebdo

Play Episode Listen Later Jan 23, 2025 55:19


Le BigDataHebdo, reçoit Mehdi, Developer Advocate chez MotherDuck, pour explorer l'univers de DuckDB et MotherDuck. Au programme, les origines académiques de DuckDB, son évolution en tant que moteur SQL analytique performant, et son extension MotherDuck qui permet de l'utiliser comme un Data Warehouse en ligne.Show notes sur http://bigdatahebdo.com/podcast/episode-211-motherduck/

The GeekNarrator
Power of #Duckdb with Postgres: pg_duckdb

The GeekNarrator

Play Episode Listen Later Jan 22, 2025 60:19


The GeekNarrator memberships can be joined here: https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/join Membership will get you access to member only videos, exclusive notes and monthly 1:1 with me. Here you can see all the member only videos: https://www.youtube.com/playlist?list=UUMO_mGuY4g0mggeUGM6V1osdA ------------------------------------------------------------------------------------------------------------------------------------------------------------------ About this episode: ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Hey folks - In this episode we have Jelte with us, who is the main contributor to the pg_duckdb project, which is a postgres extension to add the #duckdb power to our beloved #postgresql. We will try to understand how it works? Why is it needed and what's the future of pg_duckdb? If you love #Postgres or #Duckdb or just understanding #database internals then this episode will give you pretty solid insights into Postgres query processing, Duckdb analytics, Postgres extension ecosystem and so on. Basics: pg_duckdb is a Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres. We recommend using pg_duckdb to build high performance analytics and data-intensive applications. Chapters: 00:00 Introduction to PG-DuckDB 03:40 Understanding the Integration of DuckDB with Postgres 06:23 Architecture of PG-DuckDB: Query Processing Explained 10:02 Configuring DuckDB for Analytics Queries 15:37 Managing Workloads: Transactional vs. Analytical 21:02 Observability and Debugging in DuckDB 25:58 Data Deletion and GDPR Compliance 30:46 Schema Management and Migration Challenges 33:14 Managing Schema Changes in Databases 35:21 Upgrading Database Extensions 36:33 Enhancing Data Reading Methods 38:33 Future Features and Improvements 45:54 Use Cases for PGDuckDB 50:03 Challenges in Building the Extension 55:25 Getting Involved with PGDuckDB Important links: The duckdb discord server, which has a pg_duckdb channel inside it: https://discord.duckdb.org/ repo: https://github.com/duckdb/pg_duckdb good-first-issue issues: https://github.com/duckdb/pg_duckdb/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Like building real stuff? ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Link to other playlists. LIKE, SHARE and SUBSCRIBE ------------------------------------------------------------------------------------------------------------------------------------------------------------------ If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #sql #postgres #databasesystems

Infinite Machine Learning
Building MotherDuck to a $400M Company

Infinite Machine Learning

Play Episode Listen Later Jan 16, 2025 49:18 Transcription Available


Jordan Tigani is the cofounder and CEO of MotherDuck, a data warehouse platform based on open source database DuckDB. They've raised $100M in funding from amazing investors like Andreessen Horowitz, Felicis, Madrona, and Altimeter. He was previously the CPO at SingleStore and spent 11 years at Google before that. He has a degree in electrical engineering from Harvard.  Jordan's favorite book: The Master and Margarita (Author: Mikhail Bulgakov)(00:01) Introduction(00:08) Founding of MotherDuck(01:12) The Philosophy of Shipping Products at MotherDuck(05:02) Founding Story and Identifying the Market Opportunity(10:57) Building the First Version and Overcoming Early Challenges(12:23) Validating Customer Needs and Asking the Right Questions(18:24) Deciding What Features to Prioritize and Exclude(21:30) Positioning a New Product in a Mature Market(27:36) Overcoming Challenges in Scaling MotherDuck(32:29) Measuring Success of New Features in Enterprise Products(36:20) Structuring the Organization for Effective Execution(41:09) Preparing MotherDuck for the AI Native Era(43:28) Rapid Fire Round --------Where to find Jordan Tigani: LinkedIn: https://www.linkedin.com/in/jordantigani/--------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 Twitter: https://twitter.com/prateekvjoshi 

Postgres FM
pg_duckdb

Postgres FM

Play Episode Listen Later Jan 3, 2025 40:08


Michael and Nikolay are joined by Joe Sciarrino and Jelte Fennema-Nio to discuss pg_duckdb — what it is, how it started, what early users are using it for, and what they're working on next. Here are some links to things they mentioned:Joe Sciarrino https://postgres.fm/people/joe-sciarrinoJelte Fennema-Nio https://postgres.fm/people/jelte-fennema-niopg_duckdb https://github.com/duckdb/pg_duckdbHydra https://www.hydra.soMotherDuck https://motherduck.comThe problems and benefits of an elephant with a beak (lightning talk by Jelte) https://www.youtube.com/watch?v=ogvbKE4fw9A&list=PLF36ND7b_WU4QL6bA28NrzBOevqUYiPYq&t=1073spg_duckdb announcement post (by Jordan and Brett from MotherDuck) https://motherduck.com/blog/pg_duckdb-postgresql-extension-for-duckdb-motherduckpg_duckdb 0.2 release https://github.com/duckdb/pg_duckdb/releases/tag/v0.2.0~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artwork 

Talk Python To Me - Python conversations for passionate developers
#491: DuckDB and Python: Ducks and Snakes living together

Talk Python To Me - Python conversations for passionate developers

Play Episode Listen Later Dec 27, 2024 62:03 Transcription Available


Join me for an insightful conversation with Alex Monahan, who works on documentation, tutorials, and training at DuckDB Labs. We explore why DuckDB is gaining momentum among Python and data enthusiasts, from its in-process database design to its blazingly fast, columnar architecture. We also dive into indexing strategies, concurrency considerations, and the fascinating way MotherDuck (the cloud companion to DuckDB) handles large-scale data seamlessly. Don't miss this chance to learn how a single pip install could totally transform your Python data workflow! Episode sponsors Sentry Error Monitoring, Code TALKPYTHON Data Citizens Podcast Talk Python Courses Links from the show Alex on Mastodon: @__Alex__ DuckDB: duckdb.org MotherDuck: motherduck.com SQLite: sqlite.org Moka-Py: github.com PostgreSQL: www.postgresql.org MySQL: www.mysql.com Redis: redis.io Apache Parquet: parquet.apache.org Apache Arrow: arrow.apache.org Pandas: pandas.pydata.org Polars: pola.rs Pyodide: pyodide.org DB-API (PEP 249): peps.python.org/pep-0249 Flask: flask.palletsprojects.com Gunicorn: gunicorn.org MinIO: min.io Amazon S3: aws.amazon.com/s3 Azure Blob Storage: azure.microsoft.com/products/storage Google Cloud Storage: cloud.google.com/storage DigitalOcean: www.digitalocean.com Linode: www.linode.com Hetzner: www.hetzner.com BigQuery: cloud.google.com/bigquery DBT (Data Build Tool): docs.getdbt.com Mode: mode.com Hex: hex.tech Python: www.python.org Node.js: nodejs.org Rust: www.rust-lang.org Go: go.dev .NET: dotnet.microsoft.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy

The Joe Reis Show
Hannes Muhleisen - DuckDB Deep Dive, The Challenges of Lakehouses, and More

The Joe Reis Show

Play Episode Listen Later Dec 12, 2024 77:56


Hannes Muhleisen is the creator of DuckDB and CEO of DuckDB Labs. We finally got a chance to meet in person at the Forward Data Conference in Paris. We hit it off immediately, and at times, I felt like I was talking with my long lost brother. Hannes is a very cool guy! While at the conference, we recorded a chat about all things DuckDB, the challenges of data lakehouses and open table formats, local-first tech, and much more.

DOU Podcast
Знову скорочення у Reface | новий рекорд Bitcoin | «адвент календар» OpenAI — DOU News #176

DOU Podcast

Play Episode Listen Later Dec 8, 2024 26:59


Project Geospatial
FOSS4G - Blazing Fast Geospatial SQL in DuckDB - Isaac Brodsky

Project Geospatial

Play Episode Listen Later Oct 28, 2024 26:43


Isaac Brodsky discusses the integration of H3, an open-source hierarchical hexagonal grid system, with DuckDB, an analytical SQL database, to enhance geospatial data analysis. This combination enables efficient querying and manipulation of diverse datasets in real-time. Highlights

The Geospatial Index
Crunchy Data

The Geospatial Index

Play Episode Listen Later Oct 28, 2024 45:28


Elizabeth Christensen of Crunchy Data walks us through how to use open source tooling to avoid paying the Esri tax. It was a great tour of the options and also a nice vibe check of the industry. A headline here is she echoes former guest Stephanie May in endorsing DuckDB. She also wanted to pass on that a great way to find out more is to attend PostGIS Day! More here. On the topic of resources, Elizabeth has been very helpful and provided the following set of links: #PostgreSQL, open source relational database https://www.postgresql.org/ #PostGIS, open source GIS data store https://postgis.net/ # PostGIS Day 2024 https://www.crunchydata.com/community/events/postgis-day-2024 #Crunchy Data, Postgres and PostGIS services provider https://www.crunchydata.com/ # Open Source Geospatial Foundation https://www.osgeo.org/ #QGIS download, open source mapping https://www.qgis.org/ #Simple map SQL queries as QGIS layers https://www.crunchydata.com/blog/connecting-qgis-to-postgres-and-postgis #pg_tileserv - Tile server for PostGIS https://github.com/CrunchyData/pg_tileserv #pg_featureserv - API JSON server for PostGIS https://github.com/CrunchyData/pg_featureserv/ #OpenLayers project https://openlayers.org/ #OpenLayers + PgRouting + pg_tileserv + pg_featureserv sample code https://github.com/CrunchyData/pg_featureserv/tree/master/demo #PostGIS day videos https://www.youtube.com/@CrunchyDataPostgres#Crunchy Data's Postgres Playground https://www.crunchydata.com/developers/tutorials #Really cool open source GIS people to follow Paul Ramsey @ Crunchy Data / cleverelephant Regina Obe @ Paragon Ryan Lambert @ RustProofLabs Cliff Patterson @ Luna Geospatial Matt Forrest @ Whereabots #Elizabeth's crunchy blogs https://www.crunchydata.com/blog/author/elizabeth-christensen #Elizabeth's LinkedIn https://www.linkedin.com/in/elizabeth-garrett-christensen/ #Elizabeth's Twitter https://twitter.com/sqlliz THE GEOSPATIAL INDEX The Geospatial Index is a comprehensive listing of all publicly traded geospatial businesses worldwide. Why? The industry is growing at ~5% annually (after inflation and after adjusting for base rates). This rate varies significantly, however, by sub index. For $480,000 to start, this growth rate is $5,000,000 over a working life. This channel, Bluesky account, newsletter, watchlist and podcast express the view that you are serious about geospatial if you take the view of an investor, venture capitalist or entrepreneur. You are expected to do your own research. This is not a replacement for that. This is not investment advice. Consider it entertainment. NOT THE OPINION OF MY EMPLOYER NOT YOUR FIDUCIARY NOT INVESTMENT ADVICE Bluesky: https://bsky.app/profile/geospatialindex.bsky.social LinkedIn: https://uk.linkedin.com/in/geospatialindex Watchlist: ⁠https://www.tradingview.com/watchlists/123254792/ Newsletter: https://www.geospatial.money/ Podcast: https://open.spotify.com/show/5gpQUsaWxEBpYCnypEdHFC

Practical AI
Big data is dead, analytics is alive

Practical AI

Play Episode Listen Later Oct 24, 2024 50:21


We are on the other side of "big data" hype, but what is the future of analytics and how does AI fit in? Till and Adithya from MotherDuck join us to discuss why DuckDB is taking the analytics and AI world by storm. We dive into what makes DuckDB, a free, in-process SQL OLAP database management system, unique including its ability to execute lighting fast analytics queries against a variety of data sources, even on your laptop! Along the way we dig into the intersections with AI, such as text-to-sql, vector search, and AI-driven SQL query correction.

The MAD Podcast with Matt Turck
The Death of Big Data and Why It's Time To Think Small | Jordan Tigani, CEO, MotherDuck

The MAD Podcast with Matt Turck

Play Episode Listen Later Oct 24, 2024 59:00


A founding engineer on Google BigQuery and now at the helm of MotherDuck, Jordan Tigani challenges the decade-long dominance of Big Data and introduces a compelling alternative that could change how companies handle data. Jordan discusses why Big Data technologies are an overkill for most companies, how MotherDuck and DuckDB offer fast analytical queries, and lessons learned as a technical founder building his first startup. Watch the episode with Tomasz Tunguz: https://youtu.be/gU6dGmZzmvI Website - https://motherduck.com Twitter - https://x.com/motherduck Jordan Tigani LinkedIn - https://www.linkedin.com/in/jordantigani Twitter - https://x.com/jrdntgn FIRSTMARK Website - https://firstmark.com Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) LinkedIn - https://www.linkedin.com/in/turck/ Twitter - https://twitter.com/mattturck (00:00) Intro (00:56) What is the Small Data? (06:56) Marketing strategy of MotherDuck (08:39) Processing Small Data with Big Data stack (15:30) DuckDB (17:21) Creation of DuckDB (18:48) Founding story of MotherDuck (24:08) MotherDuck's community (25:25) MotherDuck of today ($100M raised) (33:15) Why MotherDuck and DuckDB are so fast? (39:08) The limitations and the future of MotherDuck's platform (39:49) Small Models (42:37) Small Data and the Modern Data Stack (46:47) Making things simpler with a shift from Big Data to Small Data (50:04) Jordan Tigani's entrepreneurial journey (58:31) Outro

Changelog Master Feed
Big data is dead, analytics is alive (Practical AI #292)

Changelog Master Feed

Play Episode Listen Later Oct 24, 2024 50:21


We are on the other side of "big data" hype, but what is the future of analytics and how does AI fit in? Till and Adithya from MotherDuck join us to discuss why DuckDB is taking the analytics and AI world by storm. We dive into what makes DuckDB, a free, in-process SQL OLAP database management system, unique including its ability to execute lighting fast analytics queries against a variety of data sources, even on your laptop! Along the way we dig into the intersections with AI, such as text-to-sql, vector search, and AI-driven SQL query correction.

R Weekly Highlights
Issue 2024-W43 Highlights

R Weekly Highlights

Play Episode Listen Later Oct 23, 2024 49:59 Transcription Available


Bringing tidy principles to a fundamental visualization for gene expressions, being on your best "behavior" for organizing your tests, and how data.table stacks up to DuckDB and polars for reshaping your data layouts.Episode LinksThis week's curator: Jon Carroll - @jonocarroll@fosstodon.org (Mastodon) & @carroll_jono (X/Twitter)Exploring the tidyHeatmap R packageDon't Expect That "Function Works Correctly", Do This InsteadComparing data.table reshape to duckdb and polarsEntire issue available at rweekly.org/2024-W43Supplement ResourcestidyHeatmap: Draw heatmap simply using a tidy data frame https://stemangiola.github.io/tidyHeatmap/Novel App knock-in mouse model shows key features of amyloid pathology and reveals profound metabolic dysregulation of microglia https://molecularneurodegeneration.biomedcentral.com/articles/10.1186/s13024-022-00547-7Shiny App-Packages chapter on writing tests and specifications https://mjfrigaard.github.io/shiny-app-pkgs/test_specs.htmlWANT CLEANER UNIT TESTS? TRY ARRANGE, ACT, ASSERT COMMENTS https://jakubsob.github.io/blog/want-cleaner-test-try-arrange-act-assert/Super Data Science Podcast 827: Polars: Past, Present and Future, with Polars Creator Ritchie Vink https://www.superdatascience.com/podcast/827duckplyr: A DuckDB-backed version for dplyr https://duckplyr.tidyverse.org/Supporting the showUse the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedbackR-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.A new way to think about value: https://value4value.infoGet in touch with us on social mediaEric Nantz: @rpodcast@podcastindex.social (Mastodon) and @theRcast (X/Twitter)Mike Thomas: @mike_thomas@fosstodon.org (Mastodon) and @mike_ketchbrook (X/Twitter) Music credits powered by OCRemixBlack Feathers in the Sky - Kid Icarus: Uprising - MkVaff - https://ocremix.org/remix/OCR04200Cross-Examination - Phoenix Wright: Ace Attorney - PrototypeRaptor - https://ocremix.org/remix/OCR01846

What's New In Data
Small Data, Big Impact: Insights from MotherDuck's Jacob Matson

What's New In Data

Play Episode Listen Later Sep 19, 2024 41:35 Transcription Available


What makes MotherDuck and DuckDB a game-changer for data analytics? Join us as we sit down with Jacob Matson, a renowned expert in SQL Server, dbt, and Excel, who recently became a developer advocate at MotherDuck. During this episode, Jacob shares his compelling journey to MotherDuck, driven by his frequent use of DuckDB for solving data challenges. We explore the unique attributes of DuckDB, comparing it to SQLite for analytics, and uncover its architectural benefits, such as utilizing multi-core machines for parallel query execution. Jacob also sheds light on how MotherDuck is pushing the envelope with their innovative concept of multiplayer analytics.Our discussion takes a deep dive into MotherDuck's innovative tenancy model and how it impacts database workloads, highlighting the use of DuckDB format in Wasm for enhanced data visualization. Jacob explains how this approach offers significant compression and faster query performance, making data visualization more interactive. We also touch on the potential and limitations of replacing traditional BI tools with Mosaic, and where MotherDuck stands in the modern data stack landscape, especially for organizations that don't require the scale of BigQuery or Snowflake. Plus, get a sneak peek into the upcoming Small Data Conference in San Francisco on September 23rd, where we'll explore how small data solutions can address significant problems without relying on big data. Don't miss this episode packed with insights on DuckDB and MotherDuck innovations!Small Data SF Signup  Discount Code: MATSON100What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

ThoughtWorks Podcast
Exploring DuckDB: A relational database built for online analytical processing

ThoughtWorks Podcast

Play Episode Listen Later Sep 19, 2024 35:26


Like every other kind of technology, when it comes to databases there's no one-size-fits-all solution that's going to be the best thing for the job every time. That's what drives innovation and new solutions. It's ultimately also the story behind DuckDB, an open source relational database specifically designed for the demands of online analytical processing (OLAP), and particularly useful for data analysts, scientists and engineers.  To get a deeper understanding of DuckDB and how the product has developed, on this episode of the Technology Podcast hosts Ken Mugrage and Lilly Ryan are joined by Thoughtworker Ned Letcher and Thoughtworks alumnus Simon Aubury. Ned and Simon explain the thinking behind DuckDB, the design decisions made by the project and how its being used by data practitioners in the wild. Learn more about DuckDB: https://duckdb.org/why_duckdb.html    

DevZen Podcast
Сваренная батарея — Episode 475

DevZen Podcast

Play Episode Listen Later Sep 10, 2024 133:47


В этом выпуске мы делимся еженедельными открытиями, обсуждаем VPN в России, сравниваем Swift и Rust, говорим о DirectX 9, Windows10, DuckDB 1.1.0 и ретрогейминге. [00:03:22] Чемы мы научились на этой неделе The first professional hosting of cloud VPS/VDS servers — VDSina Open Data Protocol — Wikipedia Сварочный инвертор за 5$ своими руками! https://www.amazon.co.uk/dp/B0C9WWCQ82/ref=emc_bcc_2_i?th=1 [00:03:39] VPN который… Читать далее →

The Joe Reis Show
Jordan Tigani - Why Small Data is Awesome, DuckDB, and More

The Joe Reis Show

Play Episode Listen Later Sep 5, 2024 54:15


Jordan Tigani is back to chat about why small data is awesome, data lakehouses, DuckDB, AI, and much more. Motherduck: https://motherduck.com/ LinkedIn: https://www.linkedin.com/in/jordantigani/ Twitter: https://twitter.com/jrdntgn?lang=en

AI + a16z
AI, SQL, and the End of Big Data

AI + a16z

Play Episode Listen Later Aug 30, 2024 33:08


In this episode of AI + a16z, a16z General Partner Jennifer Li joins MotherDuck Cofounder and CEO Jordan Tigani to discuss DuckDB's spiking popularity as the era of big data wanes, as well as the applicability of SQL-based systems for AI workloads and the prospect of text-to-SQL for analyzing data.Here's an excerpt of Jordan discussing an early win when it comes to applying generative AI to data analysis:"Everybody forgets syntax for various SQL calls. And it's just like  in coding. So there's some people that memorize . . . all of the code base, and so they don't need auto-complete. They don't need any copilot. . . . They don't need an ID; they can just type in Notepad. But for the rest of us, I think these tools are super useful. And I think we have seen that these tools have already changed how people are interacting with their data, how they're writing their SQL queries."One of the things that we've done . . .  is we focused on improving the experience of writing queries. Something we found is actually really useful is when somebody runs a query and there's an error, we basically feed the line of the error into GPT 4 and ask it to fix it. And it turns out to be really good. ". . . It's a great way of letting you stay in the flow of writing your queries and having true interactivity."Learn more:Small Data SF conferenceDuckDBFollow everyone on X:Jordan TiganiJennifer LiDerrick Harris Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

Voice of the DBA
Trying New Technology

Voice of the DBA

Play Episode Listen Later Aug 29, 2024 4:13


I had someone ask me about DuckDB recently. Would I think that's a good choice for a database. I don't really know. From their blog and some online research, maybe, but it's also a minority player in a niche space. I had a chat recently with someone that had implemented ArangoDB, a graph database. Why that and not Neo4J I asked them? Someone at the company had tried it and recommended it. Not a bad reason, as I think experience with tech is important, but it's not the only thing. Read the rest of Trying New Technology

Software Engineering Daily
DuckDB with Hannes Mühleisen

Software Engineering Daily

Play Episode Listen Later Aug 8, 2024


DuckDB is an open-source column-oriented relational database that was first released in 2019. It's designed to provide high performance on complex queries against large databases, and focuses on online analytical processing workloads. Hannes Mühleisen is the Co-Creator of DuckBD, and is the CEO and Co-Founder of DuckDB Labs. He joins the show to talk about The post DuckDB with Hannes Mühleisen appeared first on Software Engineering Daily.

Developer Voices
Practical Applications for DuckDB (with Simon Aubury & Ned Letcher)

Developer Voices

Play Episode Listen Later Jul 31, 2024 68:04


DuckDB's become a favourite data-handling tool of mine, simply because it does so many small things well. It can read and write a huge number of data formats; it can infer schemas automatically when you just want to move quickly; and it can interface with most languages, run like lightning on the desktop or be embedded into a webpage. I'm a huge fan.But I'm not nearly as knowledgeable as this week's two fans, Simon Aubury and Ned Letcher, who've just written a book on all the many ways you can use DuckDB and all the hidden tricks and tips that help you make the most of this. So in this episode we're taking a practical look at DuckDB, what problems it can solve at work, and how to start getting the most out of it.–Getting Started with DuckDB (book): https://packt.link/byKYtDuckDB episode with Hannes Mühleisen: https://youtu.be/pZV9FvdKmLcDuckDB: https://duckdb.org/dplyr, the data-manipulation language: https://dplyr.tidyverse.org/duckplyr, DuckDB's ‘native' version: https://github.com/duckdblabs/duckplyrSubstrait: https://substrait.io/Observable (Markdown+DuckDB=Reports): https://observablehq.com/framework/DuckDB's “friendly” SQL: https://duckdb.org/docs/sql/dialect/friendly_sql.htmlCommunity Extensions: https://community-extensions.duckdb.org/DuckCon #5: https://duckdb.org/2024/08/15/duckcon5.htmlSupport Developer Voices on Patreon: https://patreon.com/DeveloperVoicesSupport Developer Voices on YouTube: https://www.youtube.com/@developervoices/joinSimon on Twitter: https://x.com/SimonAuburyNed on Twitter: https://x.com/nletcherKris on Mastodon: http://mastodon.social/@krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/Kris on Twitter: https://twitter.com/krisajenkins

The Data Stack Show
194: Building Retail Churn Prediction on DuckDB with Clint Dunn of Wilde

The Data Stack Show

Play Episode Listen Later Jun 19, 2024 48:08


Highlights from this week's conversation include:Clint's Background and Journey in Data (0:51)Starting a Data Career (2:01)Transition to Startup SaaS World (4:27)Clint's Connection to a Federal Reserve Database (5:31)Challenges in Predictive Modeling (10:27)Data Input Challenges (15:50)Marketers' Workflow and Data Integration (18:29)Soft ROI vs. Hard ROI in Data Analysis (00:21:31)Balancing Internal Marketing and Data Team's Value (22:35)Simplifying Data Inputs for Predictive Models (25:09)Data Analysis Workflow and Tech Stack (29:06)Open Data Formats and Impact on Data Platforms (34:40)The S3 and Ecosystem Model (37:08)In-browser SQL Queries with DuckDB (39:24)Data Security Concerns and Solutions (41:47)Clean Rooms and Data Sharing (43:32)Final Thoughts and Takeaways (47:35)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Python Bytes
#388 Don't delete all the repos

Python Bytes

Play Episode Listen Later Jun 18, 2024 21:59


Topics covered in this episode: PSF Elections coming up Cloud engineer gets 2 years for wiping ex-employer's code repos Python: Import by string with pkgutil.resolve_name() DuckDB goes 1.0 Extras Joke Watch on YouTube About the show Sponsored by ScoutAPM: pythonbytes.fm/scout Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: PSF Elections coming up This is elections for the PSF Board and for 3 bylaw changes. To vote in the PSF election, you need to be a Supporting, Managing, Contributing, or Fellow member of the PSF, … And affirm your voting status by June 25. See Affirm your PSF Membership Voting Status for more details. Timeline Board Nominations open: Tuesday, June 11th, 2:00 pm UTC Board Nominations close: Tuesday, June 25th, 2:00 pm UTC Voter application cut-off date: Tuesday, June 25th, 2:00 pm UTC same date is also for voter affirmation. Announce candidates: Thursday, June 27th Voting start date: Tuesday, July 2nd, 2:00 pm UTC Voting end date: Tuesday, July 16th, 2:00 pm UTC See also Thinking about running for the Python Software Foundation Board of Directors? Let's talk! There's still one upcoming office hours session on June 18th, 12 PM UTC And For your consideration: Proposed bylaws changes to improve our membership experience 3 proposed bylaws changes Michael #2: Cloud engineer gets 2 years for wiping ex-employer's code repos Miklos Daniel Brody, a cloud engineer, was sentenced to two years in prison and a restitution of $529,000 for wiping the code repositories of his former employer in retaliation for being fired. The court documents state that Brody's employment was terminated after he violated company policies by connecting a USB drive. Brian #3: Python: Import by string with pkgutil.resolve_name() Adam Johnson You can use pkgutil.resolve_name("[HTML_REMOVED]:[HTML_REMOVED]")to import classes, functions or modules using strings. You can also use importlib.import_module("[HTML_REMOVED]") Both of these techniques are so that you have an object imported, but the end thing isn't imported into the local namespace. Michael #4: DuckDB goes 1.0 via Alex Monahan The cloud hosted product @MotherDuck also opened up General Availability Codenamed "Snow Duck" The core theme of the 1.0.0 release is stability. Extras Brian: Sending us topics. Please send before Tuesday. But any time is welcome. NumPy 2.0 htmx 2.0.0 Michael: Get 6 months of PyCharm Pro for free. Just take a course (even a free one) at Talk Python Training. Then visit your account page > details tab and have fun. Coming soon at Talk Python: Shiny for Python Joke: .gitignore thoughts won't let me sleep

Hacker News Recap
June 3rd, 2024 | I Am So Sick of Leetcode-Style Interviews

Hacker News Recap

Play Episode Listen Later Jun 4, 2024 17:07


This is a recap of the top 10 posts on Hacker News on June 3rd, 2024.This podcast was generated by wondercraft.ai(00:30): How many photons are received per bit transmitted from Voyager 1?Original post: https://news.ycombinator.com/item?id=40561872&utm_source=wondercraft_ai(02:36): Hacking millions of modems and investigating who hacked my modemOriginal post: https://news.ycombinator.com/item?id=40570781&utm_source=wondercraft_ai(04:15): I Am So Sick of Leetcode-Style InterviewsOriginal post: https://news.ycombinator.com/item?id=40571395&utm_source=wondercraft_ai(05:39): Diffusion on syntax trees for program synthesisOriginal post: https://news.ycombinator.com/item?id=40569531&utm_source=wondercraft_ai(07:22): If English was written like Chinese (1999)Original post: https://news.ycombinator.com/item?id=40565060&utm_source=wondercraft_ai(08:51): FBI Raids Big Corporate Landlord over Nationwide Rent HikesOriginal post: https://news.ycombinator.com/item?id=40562834&utm_source=wondercraft_ai(10:33): Why YC went to DCOriginal post: https://news.ycombinator.com/item?id=40564639&utm_source=wondercraft_ai(12:08): DuckDB 1.0.0Original post: https://news.ycombinator.com/item?id=40562342&utm_source=wondercraft_ai(13:31): What if they gave an Industrial Revolution and nobody came? (2023)Original post: https://news.ycombinator.com/item?id=40562741&utm_source=wondercraft_ai(15:14): Oldest largest German Minecraft server shut down and open sourced everythingOriginal post: https://news.ycombinator.com/item?id=40566533&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

The MAD Podcast with Matt Turck
AI, Data and Blockchain: a VC perspective | Tomasz Tunguz, Founder of Theory Ventures

The MAD Podcast with Matt Turck

Play Episode Listen Later May 16, 2024 54:54


In this episode, we sat down with Tomasz Tunguz (https://twitter.com/ttunguz), the founder of Theory Ventures and a leading voice in the tech investment space. We discussed the transformative potential of Ethereum as a database company, the importance of data security in a decentralized world, and the evolving landscape of AI technologies from foundational models to AI-native applications.

.NET Rocks!
Visually Debugging EF Queries with Giorgi Dalakishvili

.NET Rocks!

Play Episode Listen Later Apr 25, 2024 47:00


How do you debug your EF queries? Carl and Richard talk to Giorgi Dalakishvili about his open-source Visual Studio extension, EFCore Visualizer. Giorgi talks about bringing together the EF rendering of the query with the database query plan to ensure you retrieve data from your database as efficiently as possible. The conversation ranges over a number of tools Giorgi has built over the years, including EF Framework Exceptions, DuckDB.NET, and more!

The Changelog
Another one bites the dust

The Changelog

Play Episode Listen Later Mar 25, 2024 9:07


Redis' re-licensing prompts forks like Drew DeVault's Redict, Matthew Miller thinks we need more community built software, Paul Gross makes the case that DuckDB is the new jq, Anton Zhiyanov shares how he makes a living as a developer despite being “pretty dumb” & Baldur Bjarnason chimes in on the state of the web developer job market.

Talk Python To Me - Python conversations for passionate developers

Do you have data that you pull from external sources or is generated and appears at your digital doorstep? I bet that data needs processed, filtered, transformed, distributed, and much more. One of the biggest tools to create these data pipelines with Python is Dagster. And we are fortunate to have Pedram Navid on the show this episode. Pedram is the Head of Data Engineering and DevRel at Dagster Labs. And we're talking data pipelines this week at Talk Python. Episode sponsors Talk Python Courses Posit Links from the show Rock Solid Python with Types Course: training.talkpython.fm Pedram on Twitter: twitter.com Pedram on LinkedIn: linkedin.com Ship data pipelines with extraordinary velocity: dagster.io dagster-open-platform: github.com The Dagster Master Plan: dagster.io data load tool (dlt): dlthub.com DataFrames for the new era: pola.rs Apache Arrow: arrow.apache.org DuckDB is a fast in-process analytical database: duckdb.org Ship trusted data products faster: www.getdbt.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy

Data Engineering Podcast
Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

Play Episode Listen Later Feb 25, 2024 56:00


Summary Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster (https://www.dataengineeringpodcast.com/dagster) today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council (https://www.dataengineeringpodcast.com/data-council) and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm interviewing Paul Dix about his investment in the Apache Arrow ecosystem and how it led him to create the latest PFAD in database design Interview Introduction How did you get involved in the area of data management? Can you start by describing the FDAP stack and how the components combine to provide a foundational architecture for database engines? This was the core of your recent re-write of the InfluxDB engine. What were the design goals and constraints that led you to this architecture? Each of the architectural components are well engineered for their particular scope. What is the engineering work that is involved in building a cohesive platform from those components? One of the major benefits of using open source components is the network effect of ecosystem integrations. That can also be a risk when the community vision for the project doesn't align with your own goals. How have you worked to mitigate that risk in your specific platform? Can you describe the operational/architectural aspects of building a full data engine on top of the FDAP stack? What are the elements of the overall product/user experience that you had to build to create a cohesive platform? What are some of the other tools/technologies that can benefit from some or all of the pieces of the FDAP stack? What are the pieces of the Arrow ecosystem that are still immature or need further investment from the community? What are the most interesting, innovative, or unexpected ways that you have seen parts or all of the FDAP stack used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on/with the FDAP stack? When is the FDAP stack the wrong choice? What do you have planned for the future of the InfluxDB IOx engine and the FDAP stack? Contact Info LinkedIn (https://www.linkedin.com/in/pauldix/) pauldix (https://github.com/pauldix) on GitHub Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links FDAP Stack Blog Post (https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/) Apache Arrow (https://arrow.apache.org/) DataFusion (https://arrow.apache.org/datafusion/) Arrow Flight (https://arrow.apache.org/docs/format/Flight.html) Apache Parquet (https://parquet.apache.org/) InfluxDB (https://www.influxdata.com/products/influxdb/) Influx Data (https://www.influxdata.com/) Podcast Episode (https://www.dataengineeringpodcast.com/influxdb-timeseries-data-platform-episode-199) Rust Language (https://www.rust-lang.org/) DuckDB (https://duckdb.org/) ClickHouse (https://clickhouse.com/) Voltron Data (https://voltrondata.com/) Podcast Episode (https://www.dataengineeringpodcast.com/voltron-data-apache-arrow-episode-346/) Velox (https://github.com/facebookincubator/velox) Iceberg (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/iceberg-with-ryan-blue-episode-52/) Trino (https://trino.io/) ODBC == Open DataBase Connectivity (https://en.wikipedia.org/wiki/Open_Database_Connectivity) GeoParquet (https://github.com/opengeospatial/geoparquet) ORC == Optimized Row Columnar (https://orc.apache.org/) Avro (https://avro.apache.org/) Protocol Buffers (https://protobuf.dev/) gRPC (https://grpc.io/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Data Engineering Podcast
Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

Play Episode Listen Later Feb 11, 2024 59:55


Summary Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster (https://www.dataengineeringpodcast.com/dagster) today to get started. Your first 30 days are free! Your host is Tobias Macey and today I'm interviewing Andy Jefferson about how to solve the problem of data sharing Interview Introduction How did you get involved in the area of data management? Can you start by giving some context and scope of what we mean by "data sharing" for the purposes of this conversation? What is the current state of the ecosystem for data sharing protocols/practices/platforms? What are some of the main challenges/shortcomings that teams/organizations experience with these options? What are the technical capabilities that need to be present for an effective data sharing solution? How does that change as a function of the type of data? (e.g. tabular, image, etc.) What are the requirements around governance and auditability of data access that need to be addressed when sharing data? What are the typical boundaries along which data access requires special consideration for how the sharing is managed? Many data platform vendors have their own interfaces for data sharing. What are the shortcomings of those options, and what are the opportunities for abstracting the sharing capability from the underlying platform? What are the most interesting, innovative, or unexpected ways that you have seen data sharing/Bobsled used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data sharing? When is Bobsled the wrong choice? What do you have planned for the future of data sharing? Contact Info LinkedIn (https://www.linkedin.com/in/andyjefferson/?originalSubdomain=de) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links Bobsled (https://www.bobsled.co/) OLAP == OnLine Analytical Processing (https://en.wikipedia.org/wiki/Online_analytical_processing) Cassandra (https://cassandra.apache.org/_/index.html) Podcast Episode (https://www.dataengineeringpodcast.com/cassandra-global-scale-database-episode-220) Neo4J (https://neo4j.com/) FTP == File Transfer Protocol (https://en.wikipedia.org/wiki/File_Transfer_Protocol) S3 Access Points (https://aws.amazon.com/s3/features/access-points/) Snowflake Sharing (https://docs.snowflake.com/en/guides-overview-sharing) BigQuery Sharing (https://cloud.google.com/bigquery/docs/authorized-datasets) Databricks Delta Sharing (https://www.databricks.com/product/delta-sharing) DuckDB (https://duckdb.org/) Podcast Episode (https://www.dataengineeringpodcast.com/duckdb-in-process-olap-database-episode-270/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Python Bytes
#367 A New Cloud Computing Paradigm at Python Bytes

Python Bytes

Play Episode Listen Later Jan 16, 2024 36:21


Topics covered in this episode: Leaving the cloud PEP 723 - Inline script metadata Flet for Android harlequin: The SQL IDE for Your Terminal. Extras Joke Watch on YouTube About the show Sponsored by Bright Data : pythonbytes.fm/brightdata Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Michael #1: Leaving the cloud Also see Five values guiding our cloud exit We value independence above all else. We serve the internet. We spend our money wisely. We lead the way. We seek adventure. And We stand to save $7m over five years from our cloud exit Slice our new monster 192-thread Dell R7625s into isolated VMs Which added a combined 4,000 vCPUs with 7,680 GB of RAM and 384TB of NVMe storage to our server capacity They created Kamal — Deploy web apps anywhere A lot of these ideas have changed how I run the infrastructure at Talk Python and for Python Bytes. Brian #2: PEP 723 - Inline script metadata Author: Ofek Lev This PEP specifies a metadata format that can be embedded in single-file Python scripts to assist launchers, IDEs and other external tools which may need to interact with such scripts. Example: # /// script # requires-python = ">=3.11" # dependencies = [ # "requests<3", # "rich", # ] # /// import requests from rich.pretty import pprint resp = requests.get("https://peps.python.org/api/peps.json") data = resp.json() pprint([(k, v["title"]) for k, v in data.items()][:10]) Michael #3: Flet for Android via Balázs Remember Flet? Here's a code sample (scroll down a bit). It's amazing but has been basically impossible to deploy. Now we have Android. Here's a good YouTube video showing the build process for APKs. Brian #4: harlequin: The SQL IDE for Your Terminal. Ted Conbeer & other contributors Works with DuckDB and SQLite Speaking of SQLite Jeff Triplett and warnings of using Docker and SQLite in production Anže's post and and article: Django, SQLite, and the Database is Locked Error Extras Brian: Recent Python People episodes Will Vincent Julian Sequeira Pamela Fox Michael: PageFind and how I'm using it When "Everything" Becomes Too Much: The npm Package Chaos of 2024 Essay: Unsolicited Advice for Mozilla and Firefox SciPy 2024 is coming to Washington Joke: Careful with that bike lock combination code