Podcasts about duckdb

57PODCASTS
107EPISODES
47mAVG DURATION
1MONTHLY NEW EPISODE
Jul 21, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about duckdb

Data Engineering Podcast

9 episodes with duckdb

Python Bytes

3 episodes with duckdb

Screaming in the Cloud

3 episodes with duckdb

Bigdata Hebdo

6 episodes with duckdb

Talk Python To Me - Python conversations for passionate developers

2 episodes with duckdb

The Data Stack Show

3 episodes with duckdb

Software Engineering Daily

2 episodes with duckdb

What's New In Data

3 episodes with duckdb

Hacker News Recap

3 episodes with duckdb

Latest podcast episodes about duckdb

#441 It's Michaels All the Way Down

Python Bytes

Play Episode Listen Later Jul 21, 2025 27:48 Transcription Available

Topics covered in this episode: * Distributed sqlite follow up: Turso and Litestream* * PEP 792 – Project status markers in the simple index* Run coverage on tests docker2exe: Convert a Docker image to an executable Extras Joke Watch on YouTube About the show Sponsored by Digital Ocean: pythonbytes.fm/digitalocean-gen-ai Use code DO4BYTES and get $200 in free credit Connect with the hosts Michael: @mkennedy@fosstodon.org / @mkennedy.codes (bsky) Brian: @brianokken@fosstodon.org / @brianokken.bsky.social Show: @pythonbytes@fosstodon.org / @pythonbytes.fm (bsky) Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Monday at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Michael #1: Distributed sqlite follow up: Turso and Litestream Michael Booth: Turso marries the familiarity and simplicity of SQLite with modern, scalable, and distributed features. Seems to me that Turso is to SQLite what MotherDuck is to DuckDB. Mike Fiedler Continue to use the SQLite you love and care about (even the one inside Python runtime) and launch a daemon that watches the db for changes and replicates changes to an S3-type object store. Deeper dive: Litestream: Revamped Brian #2: PEP 792 – Project status markers in the simple index Currently 3 status markers for packages Trove Classifier status Indices can be yanked PyPI projects - admins can quarantine a project, owners can archive a project Proposal is to have something that can have only one state active archived quarantined deprecated This has been Approved, but not Implemented yet. Brian #3: Run coverage on tests Hugo van Kemenade And apparently, run Ruff with at least F811 turned on Helps with copy/paste/modify mistakes, but also subtler bugs like consumed generators being reused. Michael #4: docker2exe: Convert a Docker image to an executable This tool can be used to convert a Docker image to an executable that you can send to your friends. Build with a simple command: $ docker2exe --name alpine --image alpine:3.9 Requires docker on the client device Probably doesn't map volumes/ports/etc, though could potentially be exposed in the dockerfile. Extras Brian: Back catalog of Test & Code is now on YouTube under @TestAndCodePodcast So far 106 of 234 episodes are up. The rest are going up according to daily limits. Ordering is rather chaotic, according to upload time, not release ordering. There will be a new episode this week pytest-django with Adam Johnson Joke: If programmers were doctors

249: Quacking Through Data: Duckdb's Emerging Ecosystem

The Data Stack Show

Play Episode Listen Later Jun 18, 2025 19:20

This week on The Data Stack Show, John Wessel and Matt Kelliher-Gibson dive into the recent Duck Lake announcement, exploring the evolving landscape of data analytics technologies. They discuss DuckDB's role as a lightweight, local analytics database and its potential as a caching layer for open table formats like Iceberg. The conversation also highlights the current state of data storage standards, focusing on agreements around Parquet and Iceberg, while noting the ongoing complexity in catalog management. Key takeaways include the importance of local compute solutions, the early stage of open table formats, and the potential for simplified data infrastructure that can provide faster, more cost-effective analytics workflows. The episode underscores the ongoing innovation in data technologies and the need for more streamlined, flexible data management solutions. Don't miss it!Highlights from this week's conversation include:Discussion on Duck Lake Announcement (1:41)Compatibility with Apache Iceberg (4:05)Use Cases for DuckDB (6:23)Concerns About Data Management (10:01)Introduction to Data Formats (11:40)Catalog Space Challenges (13:13)Metadata Orchestration (14:54)Simplicity in Data Management (15:25)SQL Demo Discussion (17:26)Wrap-Up and Final Thoughts (18:44)The Data Stack Show is a weekly podcast powered by RudderStack, customer data infrastructure that enables you to deliver real-time customer event data everywhere it's needed to power smarter decisions and better customer experiences. Each week, we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

data emerging simplicity final thoughts ecosystem iceberg wrap up compatibility use cases data management parquet duckdb apache iceberg rudderstack

Os agentes de IA vão acabar com os devs? | Anderson Amaral - Co Founder @ScoraS

Product Guru's

Play Episode Listen Later Jun 4, 2025 66:21

Neste episódio do Product Guru's, Paulo Chiodi conversa com Anderson Amaral, um dos maiores especialistas em Agentes de IA e LLMs no Brasil, segundo a própria comunidade de tecnologia no LinkedIn. Fundador da Scoras Digital e Scoras Academy, Anderson compartilha sua trajetória, explica a diferença entre LLM, SLM e agentes autônomos, além de mostrar na prática como funciona um sistema multiagente de IA.A conversa aborda temas técnicos e estratégicos com leveza e profundidade: desde o impacto de ferramentas como Devin e Manus AI no futuro do desenvolvimento de software, até os riscos éticos e técnicos do uso de IA em larga escala. Anderson ainda dá dicas valiosas para quem quer começar a aprender IA na prática, destacando oportunidades de carreira, ferramentas como LangGraph e os desafios da área em 2025. Um episódio essencial para PMs, devs e criadores de produtos digitais.//// Onde encontrar o convidada: Anderson Amaral | Co-Founder @ Scorashttps://www.linkedin.com/in/andersonlamaral/// Recado Importante: O futuro dos produtos digitais já começou e a Inteligência Artificial é parte do time.A PM3 acaba de lançar a Formação em Gestão de Produtos de IA: um curso pensado para Product Managers que querem criar, delegar e inovar com mais inteligência. Muito além dos prompts: você vai aprender a liderar produtos baseados em IA, dominar temas como Machine Learning, Deep Learning e IA Generativa, e aplicar novas formas de discovery, experimentação e validação.Prepare-se para o mercado que mais cresce no mundo e torne-se o PM que lidera a transformação.Acesse o link e saiba mais: https://go.pm3.com.br/ProductGurus-AI-Specialist/// Outros parceiros:Codando sem Codar - A maior comunidade de AI (Vibe) Coding do Brasil: https://codandosemcodar.com.br/?utm_campaign=pg_podcastCurling - Do treinamento à criação de soluções com IA, estamos em cada etapa. https://www.usecurling.com//// Nesse episódio abordamos:A combinação de múltiplos LLMs para resolver tarefas de forma coordenada. Scoras Academy já formou quase 500 alunos em menos de 6 meses. A diferença entre LLM e agente é que o agente age no mundo com base em instruções fixas. Modelos chineses como o DeepSeek são mais eficientes e baratos por design. O LandGraph é uma ferramenta poderosa para criar sistemas multiagente personalizados. Pequenos modelos (SLMs) resolvem 80% dos problemas empresariais de IA. O maior desafio atual não é técnico, mas ético e de segurança no uso da IA. Profissionais introspectivos e sem habilidade de comunicação tendem a ser substituídos por IA. Agentes de IA têm potencial para gerar golpes e deepfakes — mais perigosos que NFTs. Comece com casos reais simples e depois evolua para soluções como LangGraph e DuckDB./// Capítulos00:00 Introdução e apresentação do convidado01:41 Origem da Scoras e a criação das empresas04:30 Crescimento da Scoras Academy e carência de talentos no Brasil07:59 Cupom e convite para estudar na Scoras Academy10:28 Diferença entre LLM e agentes autônomos16:09 Como funciona o treino de modelos de IA19:29 Por que modelos chineses são mais baratos?23:08 DeepSeek e eficiência computacional29:08 Demonstração prática: como funciona um sistema multiagente39:10 Diferença entre LLM e SLM (Small Language Model)42:08 Desafios técnicos e éticos dos agentes de IA48:12 Agentes de IA viraram o novo "NFT"?54:28 O futuro do desenvolvimento com IA e ferramentas como Devin01:00:06 Conselhos práticos para quem quer aprender IA01:07:28 Encerramento e convite final/// Onde encontrar a Product Guru's:WhatsApp: https://whatsapp.com/channel/0029Va7uwHS5fM5U0LIatu3XX (antigo Twitter): ⁠https://twitter.com/product_gurus⁠LinkedIn: ⁠https://www.linkedin.com/company/product-guru-s/⁠Instagram: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.instagram.com/product.gurus/⁠

Hamilton Ulmer - Instant SQL with DuckDB/MotherDuck - Practical Data Lunch and Learn

The Joe Reis Show

Play Episode Listen Later May 30, 2025 51:06

Imagine writing SQL and getting instant results as you type? Yes, this is reality now. It's amazing!DuckDB/MotherDuck's Instant SQL made a big splash at last month's Data Council. Hamilton Ulmer gives a demo of Instant SQL at the Practical Data Community.----------------------------Instant SQL: https://motherduck.com/blog/introducing-instant-sql/Practical Data Community Discord: https://discord.gg/gNfw5AKWSK

data practical hamilton lunch instant sql ulmer duckdb

Exploring DuckDB & Comparing Python Expressions vs Statements

The Real Python Podcast

Play Episode Listen Later Apr 18, 2025 52:01

Are you looking for a fast database that can handle large datasets in Python? What's the difference between a Python expression and a statement? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.

comparing statements python expressions duckdb

Go makes everything faster. Even ducks!

Cup o' Go

Play Episode Listen Later Mar 24, 2025 39:56 Transcription Available

Updates on old news:

news microsoft single driver programming ducks accepted github macos software development datadog golang jonathan hall duckdb

Beyond Database Optimization with AI

The Data Engineering Show

Play Episode Listen Later Mar 19, 2025 30:52

In this episode of The Data Engineering Show, the bros welcome the CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. They delve into the groundbreaking journey of DuckDB, an analytical database that processes billions of queries every month. Learn why DuckDB prioritizes broad compatibility over specialized optimizations, how its extension model works and the emerging solutions for database technology in the age of AI.

ai optimization databases duckdb

Trends in Data Engineering – Adrian Brudaru

DataTalks.Club

Play Episode Listen Later Mar 7, 2025 56:59

In this podcast episode, we talked with Adrian Brudaru about the past, present and future of data engineering.About the speaker:Adrian Brudaru studied economics in Romania but soon got bored with how creative the industry was, and chose to go instead for the more factual side. He ended up in Berlin at the age of 25 and started a role as a business analyst. At the age of 30, he had enough of startups and decided to join a corporation, but quickly found out that it did not provide the challenge he wanted.As going back to startups was not a desirable option either, he decided to postpone his decision by taking freelance work and has never looked back since. Five years later, he co-founded a company in the data space to try new things. This company is also looking to release open source tools to help democratize data engineering.0:00 Introduction to DataTalks.Club1:05 Discussing trends in data engineering with Adrian2:03 Adrian's background and journey into data engineering5:04 Growth and updates on Adrian's company, DLT Hub9:05 Challenges and specialization in data engineering today13:00 Opportunities for data engineers entering the field15:00 The "Modern Data Stack" and its evolution17:25 Emerging trends: AI integration and Iceberg technology27:40 DuckDB and the emergence of portable, cost-effective data stacks32:14 The rise and impact of dbt in data engineering34:08 Alternatives to dbt: SQLMesh and others35:25 Workflow orchestration tools: Airflow, Dagster, Prefect, and GitHub Actions37:20 Audience questions: Career focus in data roles and AI engineering overlaps39:00 The role of semantics in data and AI workflows41:11 Focusing on learning concepts over tools when entering the field 45:15 Transitioning from backend to data engineering: challenges and opportunities 47:48 Current state of the data engineering job market in Europe and beyond 49:05 Introduction to Apache Iceberg, Delta, and Hudi file formats 50:40 Suitability of these formats for batch and streaming workloads 52:29 Tools for streaming: Kafka, SQS, and related trends 58:07 Building AI agents and enabling intelligent data applications 59:09Closing discussion on the place of tools like DBT in the ecosystem

Issue 2025-W10 Highlights

R Weekly Highlights

Play Episode Listen Later Mar 7, 2025 41:45 Transcription Available

A major milestone for leveraging LLMs in R just landed with the new ellmer package, along with a terrific showcase of retrieval-augmented generation combining ellmer and DuckDB. Plus an inspiring roundup of the recent Closeread contest winners.Episode LinksThis week's curator: Sam Parmar - @parmsam@fosstodon.org (Mastodon) & @parmsam_ (X/Twitter)Announcing ellmer: A package for interacting with Large Language Models in RRapid RAG Prototyping: Building a Retrieval Augmented Generation Prototype with ellmer and DuckDBWinners of the Closeread Prize – Data-Driven Scrollytelling with QuartoEntire issue available at rweekly.org/2025-W10Supplement ResourcesCoder Radio episode 608 - R with Eric Nantz https://coder.show/608nhyris - The minimal framework for transform R shiny application into standaloneSupporting the showUse the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedbackR-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.A new way to think about value: https://value4value.infoGet in touch with us on social mediaEric Nantz: @rpodcast@podcastindex.social (Mastodon), @rpodcast.bsky.social (BlueSky) and @theRcast (X/Twitter)Mike Thomas: @mike_thomas@fosstodon.org (Mastodon), @mike-thomas.bsky.social (BlueSky), and @mike_ketchbrook (X/Twitter) Music credits powered by OCRemixWatermelon Flava - Breath of Fire III - Joshua Morse, posu yan - https://ocremix.org/remix/OCR01411Stomp the Summer Sky - Secret of Mana - Ziwtra - https://ocremix.org/remix/OCR00859

blue sky mastodon large language models alby duckdb podcastindex

EDyO 96 - Fosdem 2025

Entre Dev y Ops Podcast

Play Episode Listen Later Mar 5, 2025

En el episodio 96 del podcast de Entre Dev y Ops hablaremos del veinticinco aniversario de la FOSDEM. Blog Entre Dev y Ops - https://www.entredevyops.es Telegram Entre Dev y Ops - https://t.me/entredevyops Twitter Entre Dev y Ops - https://twitter.com/entredevyops LinkedIn Entre Dev y Ops - https://www.linkedin.com/company/entredevyops/ Patreon Entre Dev y Ops - https://www.patreon.com/edyo Amazon Entre Dev y Ops - https://amzn.to/2HrlmRw Enlaces comentados: Fosdem 2025 - https://fosdem.org/2025/ Fosdem Treasure Hunt - https://fosdem.org/2025/news/2025-01-30-treasure-hunt/ Curl - https://curl.se/ Luanti (formerly Minetest) - https://www.luanti.org/ 0 A.D. - https://play0ad.com/ The Battle for Wesnoth - https://www.wesnoth.org Charla optimización JavaScript - https://fosdem.org/2025/schedule/event/fosdem-2025-4391-how-to-lose-weight-optimising-memory-usage-in-javascript-and-beyond/ Charla DuckDB y graph queries - https://fosdem.org/2025/schedule/event/fosdem-2025-4135-empowering-data-analytics-high-performance-graph-queries-in-duckdb-with-duckpgq/ Charla segundo cerebro - https://fosdem.org/2025/schedule/event/fosdem-2025-6542-building-your-local-llm-second-brain/ Charla ecosistema Huggingface - https://fosdem.org/2025/schedule/event/fosdem-2025-6341-hugging-face-ecosystem-for-local-ai-ml/ DuckDB - https://duckdb.org DuckDB Con en Amsterdam - https://duckdb.org/events/2025/01/31/duckcon6/ Charla Leslie Lamport - https://fosdem.org/2025/schedule/event/fosdem-2025-4941-was-leslie-lamport-right-/ Paper sobre consistencia - https://www.scs.stanford.edu/17au-cs244b/labs/projects/clow_jiang.pdf immich - https://immich.app/ FuriLabs - https://furilabs.com/ TinyGo - https://tinygo.org Gopher Badge - https://gopherbadge.com/ FastHMTL - https://fastht.ml/ Contexto de FastHTML para LLMs - https://docs.fastht.ml/llms-ctx.txt Xwiki - https://www.xwiki.org EL BOLI de la discordia - https://www.amazon.com/Tactical-Multi-Tool-Utility-Screwdriver-Touchscreen/dp/B0BGQXVCFD

battle amsterdam javascript ops contexto curl fosdem huggingface duckdb wesnoth minetest tinygo

140. DuckDB Meets AWS: A Match Made in Cloud

AWS Bites

Play Episode Listen Later Feb 21, 2025 17:38

In this episode, we explore DuckDB, an open-source analytical database known for its speed and simplicity. Discover how DuckDB stands out in various applications and compare it to other tools like SQLite, Athena, Pandas, and Polars. We also demonstrate integrating DuckDB with AWS Lambda and Step Functions for serverless analytics.AWS Bites is brought to you by fourTheorem. If you are looking for a partner to architect, develop and modernise on AWS, give fourTheorem a call. Check out ⁠fourtheorem.com⁠In this episode, we mentioned the following resources: Our `duck-query-lambda`, A Lambda runtime for DuckDB queries: https://github.com/fourTheorem/duck-query-lambda DuckDB's official website: https://duckdb.org/ LibSQL: https://github.com/tursodatabase/libsql Do you have any AWS questions you would like us to address?Leave a comment here or connect with us on X/Twitter, BlueSky or LinkedIn:- ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://twitter.com/eoins⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ | https://bsky.app/profile/eoin.sh | https://www.linkedin.com/in/eoins/- ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://twitter.com/loige⁠⁠⁠⁠ | https://bsky.app/profile/loige.co | https://www.linkedin.com/in/lucianomammino/

discover cloud blue sky aws pandas lambda match made aws lambda sqlite polars duckdb

Episode 211 - Motherduck

Bigdata Hebdo

Play Episode Listen Later Jan 23, 2025 55:19

Le BigDataHebdo, reçoit Mehdi, Developer Advocate chez MotherDuck, pour explorer l'univers de DuckDB et MotherDuck. Au programme, les origines académiques de DuckDB, son évolution en tant que moteur SQL analytique performant, et son extension MotherDuck qui permet de l'utiliser comme un Data Warehouse en ligne.Show notes sur http://bigdatahebdo.com/podcast/episode-211-motherduck/

sql mehdi developer advocate data warehouses duckdb

Power of #Duckdb with Postgres: pg_duckdb

The GeekNarrator

Play Episode Listen Later Jan 22, 2025 60:19

The GeekNarrator memberships can be joined here: https://www.youtube.com/channel/UC_mGuY4g0mggeUGM6V1osdA/join Membership will get you access to member only videos, exclusive notes and monthly 1:1 with me. Here you can see all the member only videos: https://www.youtube.com/playlist?list=UUMO_mGuY4g0mggeUGM6V1osdA ------------------------------------------------------------------------------------------------------------------------------------------------------------------ About this episode: ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Hey folks - In this episode we have Jelte with us, who is the main contributor to the pg_duckdb project, which is a postgres extension to add the #duckdb power to our beloved #postgresql. We will try to understand how it works? Why is it needed and what's the future of pg_duckdb? If you love #Postgres or #Duckdb or just understanding #database internals then this episode will give you pretty solid insights into Postgres query processing, Duckdb analytics, Postgres extension ecosystem and so on. Basics: pg_duckdb is a Postgres extension that embeds DuckDB's columnar-vectorized analytics engine and features into Postgres. We recommend using pg_duckdb to build high performance analytics and data-intensive applications. Chapters: 00:00 Introduction to PG-DuckDB 03:40 Understanding the Integration of DuckDB with Postgres 06:23 Architecture of PG-DuckDB: Query Processing Explained 10:02 Configuring DuckDB for Analytics Queries 15:37 Managing Workloads: Transactional vs. Analytical 21:02 Observability and Debugging in DuckDB 25:58 Data Deletion and GDPR Compliance 30:46 Schema Management and Migration Challenges 33:14 Managing Schema Changes in Databases 35:21 Upgrading Database Extensions 36:33 Enhancing Data Reading Methods 38:33 Future Features and Improvements 45:54 Use Cases for PGDuckDB 50:03 Challenges in Building the Extension 55:25 Getting Involved with PGDuckDB Important links: The duckdb discord server, which has a pg_duckdb channel inside it: https://discord.duckdb.org/ repo: https://github.com/duckdb/pg_duckdb good-first-issue issues: https://github.com/duckdb/pg_duckdb/issues?q=sort%3Aupdated-desc+is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22 ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Like building real stuff? ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Try out CodeCrafters and build amazing real world systems like Redis, Kafka, Sqlite. Use the link below to signup and get 40% off on paid subscription. https://app.codecrafters.io/join?via=geeknarrator ------------------------------------------------------------------------------------------------------------------------------------------------------------------ Link to other playlists. LIKE, SHARE and SUBSCRIBE ------------------------------------------------------------------------------------------------------------------------------------------------------------------ If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #sql #postgres #databasesystems

challenges building basics integration chapters architecture real time databases kafka use cases getting involved debugging observability keep learning redis postgres sqlite duckdb

Building MotherDuck to a $400M Company

Infinite Machine Learning

Play Episode Listen Later Jan 16, 2025 49:18 Transcription Available

Jordan Tigani is the cofounder and CEO of MotherDuck, a data warehouse platform based on open source database DuckDB. They've raised $100M in funding from amazing investors like Andreessen Horowitz, Felicis, Madrona, and Altimeter. He was previously the CPO at SingleStore and spent 11 years at Google before that. He has a degree in electrical engineering from Harvard. Jordan's favorite book: The Master and Margarita (Author: Mikhail Bulgakov)(00:01) Introduction(00:08) Founding of MotherDuck(01:12) The Philosophy of Shipping Products at MotherDuck(05:02) Founding Story and Identifying the Market Opportunity(10:57) Building the First Version and Overcoming Early Challenges(12:23) Validating Customer Needs and Asking the Right Questions(18:24) Deciding What Features to Prioritize and Exclude(21:30) Positioning a New Product in a Mature Market(27:36) Overcoming Challenges in Scaling MotherDuck(32:29) Measuring Success of New Features in Enterprise Products(36:20) Structuring the Organization for Effective Execution(41:09) Preparing MotherDuck for the AI Native Era(43:28) Rapid Fire Round --------Where to find Jordan Tigani: LinkedIn: https://www.linkedin.com/in/jordantigani/--------Where to find Prateek Joshi: Newsletter: https://prateekjoshi.substack.com Website: https://prateekj.com LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19 Twitter: https://twitter.com/prateekvjoshi

pg_duckdb

Postgres FM

Play Episode Listen Later Jan 3, 2025 40:08

Michael and Nikolay are joined by Joe Sciarrino and Jelte Fennema-Nio to discuss pg_duckdb — what it is, how it started, what early users are using it for, and what they're working on next. Here are some links to things they mentioned:Joe Sciarrino https://postgres.fm/people/joe-sciarrinoJelte Fennema-Nio https://postgres.fm/people/jelte-fennema-niopg_duckdb https://github.com/duckdb/pg_duckdbHydra https://www.hydra.soMotherDuck https://motherduck.comThe problems and benefits of an elephant with a beak (lightning talk by Jelte) https://www.youtube.com/watch?v=ogvbKE4fw9A&list=PLF36ND7b_WU4QL6bA28NrzBOevqUYiPYq&t=1073spg_duckdb announcement post (by Jordan and Brett from MotherDuck) https://motherduck.com/blog/pg_duckdb-postgresql-extension-for-duckdb-motherduckpg_duckdb 0.2 release https://github.com/duckdb/pg_duckdb/releases/tag/v0.2.0~~~What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!~~~Postgres FM is produced by:Michael Christofides, founder of pgMustardNikolay Samokhvalov, founder of Postgres.aiWith special thanks to:Jessie Draws for the elephant artwork

google technology databases sql postgresql postgres nikolay duckdb

#491: DuckDB and Python: Ducks and Snakes living together

Talk Python To Me - Python conversations for passionate developers

Play Episode Listen Later Dec 27, 2024 62:03 Transcription Available

Join me for an insightful conversation with Alex Monahan, who works on documentation, tutorials, and training at DuckDB Labs. We explore why DuckDB is gaining momentum among Python and data enthusiasts, from its in-process database design to its blazingly fast, columnar architecture. We also dive into indexing strategies, concurrency considerations, and the fascinating way MotherDuck (the cloud companion to DuckDB) handles large-scale data seamlessly. Don't miss this chance to learn how a single pip install could totally transform your Python data workflow! Episode sponsors Sentry Error Monitoring, Code TALKPYTHON Data Citizens Podcast Talk Python Courses Links from the show Alex on Mastodon: @__Alex__ DuckDB: duckdb.org MotherDuck: motherduck.com SQLite: sqlite.org Moka-Py: github.com PostgreSQL: www.postgresql.org MySQL: www.mysql.com Redis: redis.io Apache Parquet: parquet.apache.org Apache Arrow: arrow.apache.org Pandas: pandas.pydata.org Polars: pola.rs Pyodide: pyodide.org DB-API (PEP 249): peps.python.org/pep-0249 Flask: flask.palletsprojects.com Gunicorn: gunicorn.org MinIO: min.io Amazon S3: aws.amazon.com/s3 Azure Blob Storage: azure.microsoft.com/products/storage Google Cloud Storage: cloud.google.com/storage DigitalOcean: www.digitalocean.com Linode: www.linode.com Hetzner: www.hetzner.com BigQuery: cloud.google.com/bigquery DBT (Data Build Tool): docs.getdbt.com Mode: mode.com Hex: hex.tech Python: www.python.org Node.js: nodejs.org Rust: www.rust-lang.org Go: go.dev .NET: dotnet.microsoft.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to Talk Python on YouTube: youtube.com Talk Python on Bluesky: @talkpython.fm at bsky.app Talk Python on Mastodon: talkpython Michael on Bluesky: @mkennedy.codes at bsky.app Michael on Mastodon: mkennedy

Hannes Muhleisen - DuckDB Deep Dive, The Challenges of Lakehouses, and More

The Joe Reis Show

Play Episode Listen Later Dec 12, 2024 77:56

Hannes Muhleisen is the creator of DuckDB and CEO of DuckDB Labs. We finally got a chance to meet in person at the Forward Data Conference in Paris. We hit it off immediately, and at times, I felt like I was talking with my long lost brother. Hannes is a very cool guy! While at the conference, we recorded a chat about all things DuckDB, the challenges of data lakehouses and open table formats, local-first tech, and much more.

ceo challenges deep dive hannes duckdb

Знову скорочення у Reface | новий рекорд Bitcoin | «адвент календар» OpenAI — DOU News #176

DOU Podcast

Play Episode Listen Later Dec 8, 2024 26:59

amazon nasa bitcoin discord intel openai arc blackrock sql mozilla dou automattic kolo browser company jenko duckdb reface

FOSS4G - Blazing Fast Geospatial SQL in DuckDB - Isaac Brodsky

Project Geospatial

Play Episode Listen Later Oct 28, 2024 26:43

Isaac Brodsky discusses the integration of H3, an open-source hierarchical hexagonal grid system, with DuckDB, an analytical SQL database, to enhance geospatial data analysis. This combination enables efficient querying and manipulation of diverse datasets in real-time. Highlights

uber blazing sql overture demonstrates brodsky geospatial h3 duckdb foss4g

Crunchy Data

The Geospatial Index

Play Episode Listen Later Oct 28, 2024 45:28

Elizabeth Christensen of Crunchy Data walks us through how to use open source tooling to avoid paying the Esri tax. It was a great tour of the options and also a nice vibe check of the industry. A headline here is she echoes former guest Stephanie May in endorsing DuckDB. She also wanted to pass on that a great way to find out more is to attend PostGIS Day! More here. On the topic of resources, Elizabeth has been very helpful and provided the following set of links: #PostgreSQL, open source relational database https://www.postgresql.org/ #PostGIS, open source GIS data store https://postgis.net/ # PostGIS Day 2024 https://www.crunchydata.com/community/events/postgis-day-2024 #Crunchy Data, Postgres and PostGIS services provider https://www.crunchydata.com/ # Open Source Geospatial Foundation https://www.osgeo.org/ #QGIS download, open source mapping https://www.qgis.org/ #Simple map SQL queries as QGIS layers https://www.crunchydata.com/blog/connecting-qgis-to-postgres-and-postgis #pg_tileserv - Tile server for PostGIS https://github.com/CrunchyData/pg_tileserv #pg_featureserv - API JSON server for PostGIS https://github.com/CrunchyData/pg_featureserv/ #OpenLayers project https://openlayers.org/ #OpenLayers + PgRouting + pg_tileserv + pg_featureserv sample code https://github.com/CrunchyData/pg_featureserv/tree/master/demo #PostGIS day videos https://www.youtube.com/@CrunchyDataPostgres#Crunchy Data's Postgres Playground https://www.crunchydata.com/developers/tutorials #Really cool open source GIS people to follow Paul Ramsey @ Crunchy Data / cleverelephant Regina Obe @ Paragon Ryan Lambert @ RustProofLabs Cliff Patterson @ Luna Geospatial Matt Forrest @ Whereabots #Elizabeth's crunchy blogs https://www.crunchydata.com/blog/author/elizabeth-christensen #Elizabeth's LinkedIn https://www.linkedin.com/in/elizabeth-garrett-christensen/ #Elizabeth's Twitter https://twitter.com/sqlliz THE GEOSPATIAL INDEX The Geospatial Index is a comprehensive listing of all publicly traded geospatial businesses worldwide. Why? The industry is growing at ~5% annually (after inflation and after adjusting for base rates). This rate varies significantly, however, by sub index. For $480,000 to start, this growth rate is $5,000,000 over a working life. This channel, Bluesky account, newsletter, watchlist and podcast express the view that you are serious about geospatial if you take the view of an investor, venture capitalist or entrepreneur. You are expected to do your own research. This is not a replacement for that. This is not investment advice. Consider it entertainment. NOT THE OPINION OF MY EMPLOYER NOT YOUR FIDUCIARY NOT INVESTMENT ADVICE Bluesky: https://bsky.app/profile/geospatialindex.bsky.social LinkedIn: https://uk.linkedin.com/in/geospatialindex Watchlist: ⁠https://www.tradingview.com/watchlists/123254792/ Newsletter: https://www.geospatial.money/ Podcast: https://open.spotify.com/show/5gpQUsaWxEBpYCnypEdHFC

simple data newsletter blue sky gis watchlist sql crunchy tile postgresql esri postgres duckdb qgis stephanie may postgis

Big data is dead, analytics is alive

Practical AI

Play Episode Listen Later Oct 24, 2024 50:21

We are on the other side of "big data" hype, but what is the future of analytics and how does AI fit in? Till and Adithya from MotherDuck join us to discuss why DuckDB is taking the analytics and AI world by storm. We dive into what makes DuckDB, a free, in-process SQL OLAP database management system, unique including its ability to execute lighting fast analytics queries against a variety of data sources, even on your laptop! Along the way we dig into the intersections with AI, such as text-to-sql, vector search, and AI-driven SQL query correction.

ai alive analytics big data sql duckdb chris benson adithya daniel whitenack

The Death of Big Data and Why It's Time To Think Small | Jordan Tigani, CEO, MotherDuck

The MAD Podcast with Matt Turck

Play Episode Listen Later Oct 24, 2024 59:00

A founding engineer on Google BigQuery and now at the helm of MotherDuck, Jordan Tigani challenges the decade-long dominance of Big Data and introduces a compelling alternative that could change how companies handle data. Jordan discusses why Big Data technologies are an overkill for most companies, how MotherDuck and DuckDB offer fast analytical queries, and lessons learned as a technical founder building his first startup. Watch the episode with Tomasz Tunguz: https://youtu.be/gU6dGmZzmvI Website - https://motherduck.com Twitter - https://x.com/motherduck Jordan Tigani LinkedIn - https://www.linkedin.com/in/jordantigani Twitter - https://x.com/jrdntgn FIRSTMARK Website - https://firstmark.com Twitter - https://twitter.com/FirstMarkCap Matt Turck (Managing Director) LinkedIn - https://www.linkedin.com/in/turck/ Twitter - https://twitter.com/mattturck (00:00) Intro (00:56) What is the Small Data? (06:56) Marketing strategy of MotherDuck (08:39) Processing Small Data with Big Data stack (15:30) DuckDB (17:21) Creation of DuckDB (18:48) Founding story of MotherDuck (24:08) MotherDuck's community (25:25) MotherDuck of today ($100M raised) (33:15) Why MotherDuck and DuckDB are so fast? (39:08) The limitations and the future of MotherDuck's platform (39:49) Small Models (42:37) Small Data and the Modern Data Stack (46:47) Making things simpler with a shift from Big Data to Small Data (50:04) Jordan Tigani's entrepreneurial journey (58:31) Outro

death marketing big data founding 100m think small small data modern data stack duckdb tomasz tunguz google bigquery

Big data is dead, analytics is alive (Practical AI #292)

Changelog Master Feed

Play Episode Listen Later Oct 24, 2024 50:21

ai alive analytics big data sql practical ai duckdb chris benson adithya daniel whitenack

Issue 2024-W43 Highlights

R Weekly Highlights

Play Episode Listen Later Oct 23, 2024 49:59 Transcription Available

Bringing tidy principles to a fundamental visualization for gene expressions, being on your best "behavior" for organizing your tests, and how data.table stacks up to DuckDB and polars for reshaping your data layouts.Episode LinksThis week's curator: Jon Carroll - @jonocarroll@fosstodon.org (Mastodon) & @carroll_jono (X/Twitter)Exploring the tidyHeatmap R packageDon't Expect That "Function Works Correctly", Do This InsteadComparing data.table reshape to duckdb and polarsEntire issue available at rweekly.org/2024-W43Supplement ResourcestidyHeatmap: Draw heatmap simply using a tidy data frame https://stemangiola.github.io/tidyHeatmap/Novel App knock-in mouse model shows key features of amyloid pathology and reveals profound metabolic dysregulation of microglia https://molecularneurodegeneration.biomedcentral.com/articles/10.1186/s13024-022-00547-7Shiny App-Packages chapter on writing tests and specifications https://mjfrigaard.github.io/shiny-app-pkgs/test_specs.htmlWANT CLEANER UNIT TESTS? TRY ARRANGE, ACT, ASSERT COMMENTS https://jakubsob.github.io/blog/want-cleaner-test-try-arrange-act-assert/Super Data Science Podcast 827: Polars: Past, Present and Future, with Polars Creator Ritchie Vink https://www.superdatascience.com/podcast/827duckplyr: A DuckDB-backed version for dplyr https://duckplyr.tidyverse.org/Supporting the showUse the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedbackR-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.A new way to think about value: https://value4value.infoGet in touch with us on social mediaEric Nantz: @rpodcast@podcastindex.social (Mastodon) and @theRcast (X/Twitter)Mike Thomas: @mike_thomas@fosstodon.org (Mastodon) and @mike_ketchbrook (X/Twitter) Music credits powered by OCRemixBlack Feathers in the Sky - Kid Icarus: Uprising - MkVaff - https://ocremix.org/remix/OCR04200Cross-Examination - Phoenix Wright: Ace Attorney - PrototypeRaptor - https://ocremix.org/remix/OCR01846

future act mastodon alby mike thomas duckdb podcastindex jon carroll

Small Data, Big Impact: Insights from MotherDuck's Jacob Matson

What's New In Data

Play Episode Listen Later Sep 19, 2024 41:35 Transcription Available

What makes MotherDuck and DuckDB a game-changer for data analytics? Join us as we sit down with Jacob Matson, a renowned expert in SQL Server, dbt, and Excel, who recently became a developer advocate at MotherDuck. During this episode, Jacob shares his compelling journey to MotherDuck, driven by his frequent use of DuckDB for solving data challenges. We explore the unique attributes of DuckDB, comparing it to SQLite for analytics, and uncover its architectural benefits, such as utilizing multi-core machines for parallel query execution. Jacob also sheds light on how MotherDuck is pushing the envelope with their innovative concept of multiplayer analytics.Our discussion takes a deep dive into MotherDuck's innovative tenancy model and how it impacts database workloads, highlighting the use of DuckDB format in Wasm for enhanced data visualization. Jacob explains how this approach offers significant compression and faster query performance, making data visualization more interactive. We also touch on the potential and limitations of replacing traditional BI tools with Mosaic, and where MotherDuck stands in the modern data stack landscape, especially for organizations that don't require the scale of BigQuery or Snowflake. Plus, get a sneak peek into the upcoming Small Data Conference in San Francisco on September 23rd, where we'll explore how small data solutions can address significant problems without relying on big data. Don't miss this episode packed with insights on DuckDB and MotherDuck innovations!Small Data SF Signup Discount Code: MATSON100What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

san francisco excel bi mosaic snowflakes big impact matson sql server wasm sqlite bigquery small data duckdb

Exploring DuckDB: A relational database built for online analytical processing

ThoughtWorks Podcast

Play Episode Listen Later Sep 19, 2024 35:26

Like every other kind of technology, when it comes to databases there's no one-size-fits-all solution that's going to be the best thing for the job every time. That's what drives innovation and new solutions. It's ultimately also the story behind DuckDB, an open source relational database specifically designed for the demands of online analytical processing (OLAP), and particularly useful for data analysts, scientists and engineers. To get a deeper understanding of DuckDB and how the product has developed, on this episode of the Technology Podcast hosts Ken Mugrage and Lilly Ryan are joined by Thoughtworker Ned Letcher and Thoughtworks alumnus Simon Aubury. Ned and Simon explain the thinking behind DuckDB, the design decisions made by the project and how its being used by data practitioners in the wild. Learn more about DuckDB: https://duckdb.org/why_duckdb.html

online built processing ned analytical thoughtworks technology podcast olap relational databases duckdb

Сваренная батарея — Episode 475

DevZen Podcast

Play Episode Listen Later Sep 10, 2024 133:47

В этом выпуске мы делимся еженедельными открытиями, обсуждаем VPN в России, сравниваем Swift и Rust, говорим о DirectX 9, Windows10, DuckDB 1.1.0 и ретрогейминге. [00:03:22] Чемы мы научились на этой неделе The first professional hosting of cloud VPS/VDS servers — VDSina Open Data Protocol — Wikipedia Сварочный инвертор за 5$ своими руками! https://www.amazon.co.uk/dp/B0C9WWCQ82/ref=emc_bcc_2_i?th=1 [00:03:39] VPN который… Читать далее →

wikipedia vpn windows 10 directx duckdb

Jordan Tigani - Why Small Data is Awesome, DuckDB, and More

The Joe Reis Show

Play Episode Listen Later Sep 5, 2024 54:15

Jordan Tigani is back to chat about why small data is awesome, data lakehouses, DuckDB, AI, and much more. Motherduck: https://motherduck.com/ LinkedIn: https://www.linkedin.com/in/jordantigani/ Twitter: https://twitter.com/jrdntgn?lang=en

ai small data duckdb

AI, SQL, and the End of Big Data

AI + a16z

Play Episode Listen Later Aug 30, 2024 33:08

In this episode of AI + a16z, a16z General Partner Jennifer Li joins MotherDuck Cofounder and CEO Jordan Tigani to discuss DuckDB's spiking popularity as the era of big data wanes, as well as the applicability of SQL-based systems for AI workloads and the prospect of text-to-SQL for analyzing data.Here's an excerpt of Jordan discussing an early win when it comes to applying generative AI to data analysis:"Everybody forgets syntax for various SQL calls. And it's just like in coding. So there's some people that memorize . . . all of the code base, and so they don't need auto-complete. They don't need any copilot. . . . They don't need an ID; they can just type in Notepad. But for the rest of us, I think these tools are super useful. And I think we have seen that these tools have already changed how people are interacting with their data, how they're writing their SQL queries."One of the things that we've done . . . is we focused on improving the experience of writing queries. Something we found is actually really useful is when somebody runs a query and there's an error, we basically feed the line of the error into GPT 4 and ask it to fix it. And it turns out to be really good. ". . . It's a great way of letting you stay in the flow of writing your queries and having true interactivity."Learn more:Small Data SF conferenceDuckDBFollow everyone on X:Jordan TiganiJennifer LiDerrick Harris Check out everything a16z is doing with artificial intelligence here, including articles, projects, and more podcasts.

ai artificial intelligence id big data data science gpt databases sql data analysis notepad foundation models duckdb

Trying New Technology

Voice of the DBA

Play Episode Listen Later Aug 29, 2024 4:13

I had someone ask me about DuckDB recently. Would I think that's a good choice for a database. I don't really know. From their blog and some online research, maybe, but it's also a minority player in a niche space. I had a chat recently with someone that had implemented ArangoDB, a graph database. Why that and not Neo4J I asked them? Someone at the company had tried it and recommended it. Not a bad reason, as I think experience with tech is important, but it's not the only thing. Read the rest of Trying New Technology

technology databases new technology duckdb

DuckDB with Hannes Mühleisen

Software Engineering Daily

Play Episode Listen Later Aug 8, 2024

DuckDB is an open-source column-oriented relational database that was first released in 2019. It's designed to provide high performance on complex queries against large databases, and focuses on online analytical processing workloads. Hannes Mühleisen is the Co-Creator of DuckBD, and is the CEO and Co-Founder of DuckDB Labs. He joins the show to talk about The post DuckDB with Hannes Mühleisen appeared first on Software Engineering Daily.

ceo co founders co creators hannes duckdb software engineering daily

DuckDB with Hannes Mühleisen

Podcast – Software Engineering Daily

Play Episode Listen Later Aug 8, 2024

DuckDB is an open-source column-oriented relational database that was first released in 2019. It’s designed to provide high performance on complex queries against large databases, and focuses on online analytical processing workloads. Hannes Mühleisen is the Co-Creator of DuckBD, and is the CEO and Co-Founder of DuckDB Labs. He joins the show to talk about The post DuckDB with Hannes Mühleisen appeared first on Software Engineering Daily.

ceo co founders co creators hannes duckdb software engineering daily

Practical Applications for DuckDB (with Simon Aubury & Ned Letcher)

Developer Voices

Play Episode Listen Later Jul 31, 2024 68:04

DuckDB's become a favourite data-handling tool of mine, simply because it does so many small things well. It can read and write a huge number of data formats; it can infer schemas automatically when you just want to move quickly; and it can interface with most languages, run like lightning on the desktop or be embedded into a webpage. I'm a huge fan.But I'm not nearly as knowledgeable as this week's two fans, Simon Aubury and Ned Letcher, who've just written a book on all the many ways you can use DuckDB and all the hidden tricks and tips that help you make the most of this. So in this episode we're taking a practical look at DuckDB, what problems it can solve at work, and how to start getting the most out of it.–Getting Started with DuckDB (book): https://packt.link/byKYtDuckDB episode with Hannes Mühleisen: https://youtu.be/pZV9FvdKmLcDuckDB: https://duckdb.org/dplyr, the data-manipulation language: https://dplyr.tidyverse.org/duckplyr, DuckDB's ‘native' version: https://github.com/duckdblabs/duckplyrSubstrait: https://substrait.io/Observable (Markdown+DuckDB=Reports): https://observablehq.com/framework/DuckDB's “friendly” SQL: https://duckdb.org/docs/sql/dialect/friendly_sql.htmlCommunity Extensions: https://community-extensions.duckdb.org/DuckCon #5: https://duckdb.org/2024/08/15/duckcon5.htmlSupport Developer Voices on Patreon: https://patreon.com/DeveloperVoicesSupport Developer Voices on YouTube: https://www.youtube.com/@developervoices/joinSimon on Twitter: https://x.com/SimonAuburyNed on Twitter: https://x.com/nletcherKris on Mastodon: http://mastodon.social/@krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/Kris on Twitter: https://twitter.com/krisajenkins

getting started mastodon sql practical applications duckdb

Issue 2024-W26 Highlights

R Weekly Highlights

Play Episode Listen Later Jun 26, 2024 45:56 Transcription Available

The latest updates to the rayverse bring new meaning to smoothing out the rough edges of your next 3-D visualization, the momentum of DuckDB continues with the MotherDuck data warehouse, and the role nanoparquet plays to bring the benefits of parquet to small data sets.Episode LinksThis week's curator: Eric Nantz: @rpodcast@podcastindex.social (Mastodon) and @theRcast (X/Twitter)Sculpting the Moon in R: Subdivision Surfaces and Displacement MappingJoining the flock from R: working with data on MotherDucknanoparquet 0.3.0Entire issue available at rweekly.org/2024-W26Supporting the showUse the contact page at https://serve.podhome.fm/custompage/r-weekly-highlights/contact to send us your feedbackR-Weekly Highlights on the Podcastindex.org - You can send a boost into the show directly in the Podcast Index. First, top-up with Alby, and then head over to the R-Weekly Highlights podcast entry on the index.A new way to think about value: https://value4value.info Get in touch with us on social media Eric Nantz: @rpodcast@podcastindex.social (Mastodon) and @theRcast (X/Twitter) Mike Thomas: @mikethomas@fosstodon.org (Mastodon) and @mikeketchbrook (X/Twitter) Music credits powered by OCRemixThe Amazon Session - Ducktales - Gux - https://ocremix.org/remix/OCR00402Doomsday - Sonic & Knuckles - elzfernomusic - https://ocremix.org/remix/OCR02532

moon mastodon knuckles alby mike thomas duckdb podcastindex

194: Building Retail Churn Prediction on DuckDB with Clint Dunn of Wilde

The Data Stack Show

Play Episode Listen Later Jun 19, 2024 48:08

Highlights from this week's conversation include:Clint's Background and Journey in Data (0:51)Starting a Data Career (2:01)Transition to Startup SaaS World (4:27)Clint's Connection to a Federal Reserve Database (5:31)Challenges in Predictive Modeling (10:27)Data Input Challenges (15:50)Marketers' Workflow and Data Integration (18:29)Soft ROI vs. Hard ROI in Data Analysis (00:21:31)Balancing Internal Marketing and Data Team's Value (22:35)Simplifying Data Inputs for Predictive Models (25:09)Data Analysis Workflow and Tech Stack (29:06)Open Data Formats and Impact on Data Platforms (34:40)The S3 and Ecosystem Model (37:08)In-browser SQL Queries with DuckDB (39:24)Data Security Concerns and Solutions (41:47)Clean Rooms and Data Sharing (43:32)Final Thoughts and Takeaways (47:35)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

#388 Don't delete all the repos

Python Bytes

Play Episode Listen Later Jun 18, 2024 21:59

Topics covered in this episode: PSF Elections coming up Cloud engineer gets 2 years for wiping ex-employer's code repos Python: Import by string with pkgutil.resolve_name() DuckDB goes 1.0 Extras Joke Watch on YouTube About the show Sponsored by ScoutAPM: pythonbytes.fm/scout Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 10am PT. Older video versions available there too. Finally, if you want an artisanal, hand-crafted digest of every week of the show notes in email form? Add your name and email to our friends of the show list, we'll never share it. Brian #1: PSF Elections coming up This is elections for the PSF Board and for 3 bylaw changes. To vote in the PSF election, you need to be a Supporting, Managing, Contributing, or Fellow member of the PSF, … And affirm your voting status by June 25. See Affirm your PSF Membership Voting Status for more details. Timeline Board Nominations open: Tuesday, June 11th, 2:00 pm UTC Board Nominations close: Tuesday, June 25th, 2:00 pm UTC Voter application cut-off date: Tuesday, June 25th, 2:00 pm UTC same date is also for voter affirmation. Announce candidates: Thursday, June 27th Voting start date: Tuesday, July 2nd, 2:00 pm UTC Voting end date: Tuesday, July 16th, 2:00 pm UTC See also Thinking about running for the Python Software Foundation Board of Directors? Let's talk! There's still one upcoming office hours session on June 18th, 12 PM UTC And For your consideration: Proposed bylaws changes to improve our membership experience 3 proposed bylaws changes Michael #2: Cloud engineer gets 2 years for wiping ex-employer's code repos Miklos Daniel Brody, a cloud engineer, was sentenced to two years in prison and a restitution of $529,000 for wiping the code repositories of his former employer in retaliation for being fired. The court documents state that Brody's employment was terminated after he violated company policies by connecting a USB drive. Brian #3: Python: Import by string with pkgutil.resolve_name() Adam Johnson You can use pkgutil.resolve_name("[HTML_REMOVED]:[HTML_REMOVED]")to import classes, functions or modules using strings. You can also use importlib.import_module("[HTML_REMOVED]") Both of these techniques are so that you have an object imported, but the end thing isn't imported into the local namespace. Michael #4: DuckDB goes 1.0 via Alex Monahan The cloud hosted product @MotherDuck also opened up General Availability Codenamed "Snow Duck" The core theme of the 1.0.0 release is stability. Extras Brian: Sending us topics. Please send before Tuesday. But any time is welcome. NumPy 2.0 htmx 2.0.0 Michael: Get 6 months of PyCharm Pro for free. Just take a course (even a free one) at Talk Python Training. Then visit your account page > details tab and have fun. Coming soon at Talk Python: Shiny for Python Joke: .gitignore thoughts won't let me sleep

June 3rd, 2024 | I Am So Sick of Leetcode-Style Interviews

Hacker News Recap

Play Episode Listen Later Jun 4, 2024 17:07

This is a recap of the top 10 posts on Hacker News on June 3rd, 2024.This podcast was generated by wondercraft.ai(00:30): How many photons are received per bit transmitted from Voyager 1?Original post: https://news.ycombinator.com/item?id=40561872&utm_source=wondercraft_ai(02:36): Hacking millions of modems and investigating who hacked my modemOriginal post: https://news.ycombinator.com/item?id=40570781&utm_source=wondercraft_ai(04:15): I Am So Sick of Leetcode-Style InterviewsOriginal post: https://news.ycombinator.com/item?id=40571395&utm_source=wondercraft_ai(05:39): Diffusion on syntax trees for program synthesisOriginal post: https://news.ycombinator.com/item?id=40569531&utm_source=wondercraft_ai(07:22): If English was written like Chinese (1999)Original post: https://news.ycombinator.com/item?id=40565060&utm_source=wondercraft_ai(08:51): FBI Raids Big Corporate Landlord over Nationwide Rent HikesOriginal post: https://news.ycombinator.com/item?id=40562834&utm_source=wondercraft_ai(10:33): Why YC went to DCOriginal post: https://news.ycombinator.com/item?id=40564639&utm_source=wondercraft_ai(12:08): DuckDB 1.0.0Original post: https://news.ycombinator.com/item?id=40562342&utm_source=wondercraft_ai(13:31): What if they gave an Industrial Revolution and nobody came? (2023)Original post: https://news.ycombinator.com/item?id=40562741&utm_source=wondercraft_ai(15:14): Oldest largest German Minecraft server shut down and open sourced everythingOriginal post: https://news.ycombinator.com/item?id=40566533&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

ai english chinese style original sick hacking voyager oldest industrial revolution diffusion yc hn hacker news duckdb

AI, Data and Blockchain: a VC perspective | Tomasz Tunguz, Founder of Theory Ventures

The MAD Podcast with Matt Turck

Play Episode Listen Later May 16, 2024 54:54

In this episode, we sat down with Tomasz Tunguz (https://twitter.com/ttunguz), the founder of Theory Ventures and a leading voice in the tech investment space. We discussed the transformative potential of Ethereum as a database company, the importance of data security in a decentralized world, and the evolving landscape of AI technologies from foundational models to AI-native applications.

Issue 2024-W20 Highlights

R Weekly Highlights

Play Episode Play 264 sec Highlight Listen Later May 15, 2024 49:15 Transcription Available

An aesthetically-pleasing journey through the history of R, another demonstration of DuckDB's power with analytics, and how webR with shinylive brings new learning life to the Pharmaverse TLG gallery.Episode LinksThis week's curator: Sam Parmar - @parmsam@fosstodon.org (Mastodon) & @parmsam_ (X/Twitter)The Aesthetics Wiki - an R AddendumR Dplyr vs. DuckDB - How to Enhance Your Data Processing Pipelines with R DuckDBTLG Catalog

podcasting sean combs mastodon quarto adam curry alby mike thomas dave jones web r duckdb apache arrow podcastindex

Visually Debugging EF Queries with Giorgi Dalakishvili

.NET Rocks!

Play Episode Listen Later Apr 25, 2024 47:00

How do you debug your EF queries? Carl and Richard talk to Giorgi Dalakishvili about his open-source Visual Studio extension, EFCore Visualizer. Giorgi talks about bringing together the EF rendering of the query with the database query plan to ensure you retrieve data from your database as efficiently as possible. The conversation ranges over a number of tools Giorgi has built over the years, including EF Framework Exceptions, DuckDB.NET, and more!

ef visually queries debugging visual studio giorgi duckdb

Another one bites the dust

The Changelog

Play Episode Listen Later Mar 25, 2024 9:07

Redis' re-licensing prompts forks like Drew DeVault's Redict, Matthew Miller thinks we need more community built software, Paul Gross makes the case that DuckDB is the new jq, Anton Zhiyanov shares how he makes a living as a developer despite being “pretty dumb” & Baldur Bjarnason chimes in on the state of the web developer job market.

dust another one bites redis matthew miller paul gross duckdb jerod santo

Another one bites the dust

Changelog News

Play Episode Listen Later Mar 25, 2024 9:07 Transcription Available

dust another one bites redis matthew miller paul gross duckdb jerod santo

From .NET to DuckDB: Unleashing the Database Evolution with Giorgi Dalakishvili

The .NET Core Podcast

Play Episode Listen Later Mar 22, 2024 65:15

NService Bus This episode of The Modern .NET Show is supported, in part, by NServiceBus, the ultimate tool to build robust and reliable systems that can handle failures gracefully, maintain high availability, and scale to meet growing demand. Make sure you click the link in the show notes to learn more about NServiceBus. Show Notes Yeah. So what I was thinking the other day is that what we want is to concentrate on the business logic that we need to implement and spend as small as little time as possible configuring, installing and figuring out the tools and libraries that we are using for this specific task. Like our mission is to produce the business logic and we should try to minimize the time that we spend on the tools and libraries that enable us to build the software. —Giorgi Dalakishvili Welcome to The Modern .NET Show! Formerly known as The .NET Core Podcast, we are the go-to podcast for all .NET developers worldwide and I am your host Jamie "GaProgMan" Taylor. In this episode, I spoke with Giorgi Dalakishvili about Postgresql, DuckDB, and where you might use either of them in your applications. As Giorgi points out, .NET has support for SQL Server baked in, but there's also support for other database technologies too: Yes, there are many database technologies and just like you, for me, SQL Server was the default go to database for quite a long time because it's from Microsoft. All the frameworks and libraries work with SQL Server out of the box, and have usually better support for SQL Server than for other databases. But recently I have been diving into Postgresql, which is a free database and I discovered that it has many interesting features and I think that many .NET developers will be quite excited about these features. The are very useful in some very specific scenarios. And it also has a very good support for .NET. Nowadays there is a .NET driver for Postgres, there is a .NET driver for Entity Framework core. So I would say it's not behind SQL server in terms of .NET support or feature wise. —Giorgi Dalakishvili He also points out that our specialist skill as developers is not to focus on the tools, libraries, and frameworks, but to use what we have in our collective toolboxes to build the business logic that our customers, clients, and users desire of us. And along the way, he drops some knowledge on an essential NuGet package for those of us who are using Entity Framework.. So let's sit back, open up a terminal, type in dotnet new podcast and we'll dive into the core of Modern .NET. Supporting the Show If you find this episode useful in any way, please consider supporting the show by either leaving a review (check our review page for ways to do that), sharing the episode with a friend or colleague, buying the host a coffee, or considering becoming a Patron of the show. Full Show Notes The full show notes, including links to some of the things we discussed and a full transcription of this episode, can be found at: https://dotnetcore.show/season-6/from-net-to-DuckDB-unleashing-the-database-evolution-with-giorgi-dalakishvili/ Useful Links Giorgi's GitHub DuckDB .NET Driver Postgres Array data type Postgres Range data type DuckDB DbUpdateException EntityFramework.Exceptions JsonB data type Vector embeddings Cosine similarity Vector databases: Chroma qdrant pgvector pgvector .NET library OLAP queries parquet files Dapper DuckDB documentation Dapr DuckDB Wasm; run DuckDB in your browser GitHub Codespaces Connecting with Giorgi: on Twitter on LinkedIn on his website Supporting the show: Leave a rating or review Buy the show a coffee Become a patron Getting in touch: via the contact page joining the Discord Music created by Mono Memory Music, licensed to RJJ Software for use in The Modern .NET Show Remember to rate and review the show on Apple Podcasts, Podchaser, or wherever you find your podcasts, this will help the show's audience grow. Or you can just share the show with a friend. And don't forget to reach out via our Contact page. We're very interested in your opinion of the show, so please get in touch. You can support the show by making a monthly donation on the show's Patreon page at: https://www.patreon.com/TheDotNetCorePodcast.

March 21st, 2024 | U.S. sues Apple, accusing it of maintaining an iPhone monopoly

Hacker News Recap

Play Episode Listen Later Mar 22, 2024 18:31

This is a recap of the top 10 posts on Hacker News on March 21st, 2024.This podcast was generated by wondercraft.ai(00:34): U.S. sues Apple, accusing it of maintaining an iPhone monopolyOriginal post: https://news.ycombinator.com/item?id=39778999&utm_source=wondercraft_ai(02:04): Difftastic, a structural diff tool that understands syntaxOriginal post: https://news.ycombinator.com/item?id=39778412&utm_source=wondercraft_ai(04:09): The baffling intelligence of a single cell: The story of E. coli chemotaxisOriginal post: https://news.ycombinator.com/item?id=39777229&utm_source=wondercraft_ai(06:02): The RedditsOriginal post: https://news.ycombinator.com/item?id=39778590&utm_source=wondercraft_ai(07:45): Hackers found a way to open any of 3M hotel keycard locksOriginal post: https://news.ycombinator.com/item?id=39779291&utm_source=wondercraft_ai(09:18): DuckDB as the New jqOriginal post: https://news.ycombinator.com/item?id=39782356&utm_source=wondercraft_ai(11:06): Ludic: New framework for Python with seamless Htmx supportOriginal post: https://news.ycombinator.com/item?id=39776199&utm_source=wondercraft_ai(12:53): GoFetch: New side-channel attack using data memory-dependent prefetchersOriginal post: https://news.ycombinator.com/item?id=39779195&utm_source=wondercraft_ai(14:34): Ikigai: What We Got Wrong and How to Find Meaning in LifeOriginal post: https://news.ycombinator.com/item?id=39777896&utm_source=wondercraft_ai(16:04): Research shows plant-based polymers can disappear within seven monthsOriginal post: https://news.ycombinator.com/item?id=39777898&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

ai apple research iphone maintaining hackers monopoly python 3m sues yc find meaning accusing hn hacker news duckdb

#454: Data Pipelines with Dagster

Talk Python To Me - Python conversations for passionate developers

Play Episode Listen Later Mar 21, 2024 58:25

Do you have data that you pull from external sources or is generated and appears at your digital doorstep? I bet that data needs processed, filtered, transformed, distributed, and much more. One of the biggest tools to create these data pipelines with Python is Dagster. And we are fortunate to have Pedram Navid on the show this episode. Pedram is the Head of Data Engineering and DevRel at Dagster Labs. And we're talking data pipelines this week at Talk Python. Episode sponsors Talk Python Courses Posit Links from the show Rock Solid Python with Types Course: training.talkpython.fm Pedram on Twitter: twitter.com Pedram on LinkedIn: linkedin.com Ship data pipelines with extraordinary velocity: dagster.io dagster-open-platform: github.com The Dagster Master Plan: dagster.io data load tool (dlt): dlthub.com DataFrames for the new era: pola.rs Apache Arrow: arrow.apache.org DuckDB is a fast in-process analytical database: duckdb.org Ship trusted data products faster: www.getdbt.com Watch this episode on YouTube: youtube.com Episode transcripts: talkpython.fm --- Stay in touch with us --- Subscribe to us on YouTube: youtube.com Follow Talk Python on Mastodon: talkpython Follow Michael on Mastodon: mkennedy

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Engineering Podcast

Play Episode Listen Later Feb 25, 2024 56:00

Summary Building a database engine requires a substantial amount of engineering effort and time investment. Over the decades of research and development into building these software systems there are a number of common components that are shared across implementations. When Paul Dix decided to re-write the InfluxDB engine he found the Apache Arrow ecosystem ready and waiting with useful building blocks to accelerate the process. In this episode he explains how he used the combination of Apache Arrow, Flight, Datafusion, and Parquet to lay the foundation of the newest version of his time-series database. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster (https://www.dataengineeringpodcast.com/dagster) today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council (https://www.dataengineeringpodcast.com/data-council) and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm interviewing Paul Dix about his investment in the Apache Arrow ecosystem and how it led him to create the latest PFAD in database design Interview Introduction How did you get involved in the area of data management? Can you start by describing the FDAP stack and how the components combine to provide a foundational architecture for database engines? This was the core of your recent re-write of the InfluxDB engine. What were the design goals and constraints that led you to this architecture? Each of the architectural components are well engineered for their particular scope. What is the engineering work that is involved in building a cohesive platform from those components? One of the major benefits of using open source components is the network effect of ecosystem integrations. That can also be a risk when the community vision for the project doesn't align with your own goals. How have you worked to mitigate that risk in your specific platform? Can you describe the operational/architectural aspects of building a full data engine on top of the FDAP stack? What are the elements of the overall product/user experience that you had to build to create a cohesive platform? What are some of the other tools/technologies that can benefit from some or all of the pieces of the FDAP stack? What are the pieces of the Arrow ecosystem that are still immature or need further investment from the community? What are the most interesting, innovative, or unexpected ways that you have seen parts or all of the FDAP stack used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on/with the FDAP stack? When is the FDAP stack the wrong choice? What do you have planned for the future of the InfluxDB IOx engine and the FDAP stack? Contact Info LinkedIn (https://www.linkedin.com/in/pauldix/) pauldix (https://github.com/pauldix) on GitHub Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links FDAP Stack Blog Post (https://www.influxdata.com/blog/flight-datafusion-arrow-parquet-fdap-architecture-influxdb/) Apache Arrow (https://arrow.apache.org/) DataFusion (https://arrow.apache.org/datafusion/) Arrow Flight (https://arrow.apache.org/docs/format/Flight.html) Apache Parquet (https://parquet.apache.org/) InfluxDB (https://www.influxdata.com/products/influxdb/) Influx Data (https://www.influxdata.com/) Podcast Episode (https://www.dataengineeringpodcast.com/influxdb-timeseries-data-platform-episode-199) Rust Language (https://www.rust-lang.org/) DuckDB (https://duckdb.org/) ClickHouse (https://clickhouse.com/) Voltron Data (https://voltrondata.com/) Podcast Episode (https://www.dataengineeringpodcast.com/voltron-data-apache-arrow-episode-346/) Velox (https://github.com/facebookincubator/velox) Iceberg (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/iceberg-with-ryan-blue-episode-52/) Trino (https://trino.io/) ODBC == Open DataBase Connectivity (https://en.wikipedia.org/wiki/Open_Database_Connectivity) GeoParquet (https://github.com/opengeospatial/geoparquet) ORC == Optimized Row Columnar (https://orc.apache.org/) Avro (https://avro.apache.org/) Protocol Buffers (https://protobuf.dev/) gRPC (https://grpc.io/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

ai technology data flight arrow trusted doordash python databases comcast iceberg hug sql analytical pfad ctos starburst parquet trino grpc avro hudi influxdb clickhouse duckdb apache arrow apache iceberg velox paul dix freak fandango orchestra protocol buffers database development

Data Sharing Across Business And Platform Boundaries

Data Engineering Podcast

Play Episode Listen Later Feb 11, 2024 59:55

Summary Sharing data is a simple concept, but complicated to implement well. There are numerous business rules and regulatory concerns that need to be applied. There are also numerous technical considerations to be made, particularly if the producer and consumer of the data aren't using the same platforms. In this episode Andrew Jefferson explains the complexities of building a robust system for data sharing, the techno-social considerations, and how the Bobsled platform that he is building aims to simplify the process. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster (https://www.dataengineeringpodcast.com/dagster) today to get started. Your first 30 days are free! Your host is Tobias Macey and today I'm interviewing Andy Jefferson about how to solve the problem of data sharing Interview Introduction How did you get involved in the area of data management? Can you start by giving some context and scope of what we mean by "data sharing" for the purposes of this conversation? What is the current state of the ecosystem for data sharing protocols/practices/platforms? What are some of the main challenges/shortcomings that teams/organizations experience with these options? What are the technical capabilities that need to be present for an effective data sharing solution? How does that change as a function of the type of data? (e.g. tabular, image, etc.) What are the requirements around governance and auditability of data access that need to be addressed when sharing data? What are the typical boundaries along which data access requires special consideration for how the sharing is managed? Many data platform vendors have their own interfaces for data sharing. What are the shortcomings of those options, and what are the opportunities for abstracting the sharing capability from the underlying platform? What are the most interesting, innovative, or unexpected ways that you have seen data sharing/Bobsled used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on data sharing? When is Bobsled the wrong choice? What do you have planned for the future of data sharing? Contact Info LinkedIn (https://www.linkedin.com/in/andyjefferson/?originalSubdomain=de) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links Bobsled (https://www.bobsled.co/) OLAP == OnLine Analytical Processing (https://en.wikipedia.org/wiki/Online_analytical_processing) Cassandra (https://cassandra.apache.org/_/index.html) Podcast Episode (https://www.dataengineeringpodcast.com/cassandra-global-scale-database-episode-220) Neo4J (https://neo4j.com/) FTP == File Transfer Protocol (https://en.wikipedia.org/wiki/File_Transfer_Protocol) S3 Access Points (https://aws.amazon.com/s3/features/access-points/) Snowflake Sharing (https://docs.snowflake.com/en/guides-overview-sharing) BigQuery Sharing (https://cloud.google.com/bigquery/docs/authorized-datasets) Databricks Delta Sharing (https://www.databricks.com/product/delta-sharing) DuckDB (https://duckdb.org/) Podcast Episode (https://www.dataengineeringpodcast.com/duckdb-in-process-olap-database-episode-270/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

#367 A New Cloud Computing Paradigm at Python Bytes

Python Bytes

Play Episode Listen Later Jan 16, 2024 36:21

Topics covered in this episode: Leaving the cloud PEP 723 - Inline script metadata Flet for Android harlequin: The SQL IDE for Your Terminal. Extras Joke Watch on YouTube About the show Sponsored by Bright Data : pythonbytes.fm/brightdata Connect with the hosts Michael: @mkennedy@fosstodon.org Brian: @brianokken@fosstodon.org Show: @pythonbytes@fosstodon.org Join us on YouTube at pythonbytes.fm/live to be part of the audience. Usually Tuesdays at 11am PT. Older video versions available there too. Michael #1: Leaving the cloud Also see Five values guiding our cloud exit We value independence above all else. We serve the internet. We spend our money wisely. We lead the way. We seek adventure. And We stand to save $7m over five years from our cloud exit Slice our new monster 192-thread Dell R7625s into isolated VMs Which added a combined 4,000 vCPUs with 7,680 GB of RAM and 384TB of NVMe storage to our server capacity They created Kamal — Deploy web apps anywhere A lot of these ideas have changed how I run the infrastructure at Talk Python and for Python Bytes. Brian #2: PEP 723 - Inline script metadata Author: Ofek Lev This PEP specifies a metadata format that can be embedded in single-file Python scripts to assist launchers, IDEs and other external tools which may need to interact with such scripts. Example: # /// script # requires-python = ">=3.11" # dependencies = [ # "requests<3", # "rich", # ] # /// import requests from rich.pretty import pprint resp = requests.get("https://peps.python.org/api/peps.json") data = resp.json() pprint([(k, v["title"]) for k, v in data.items()][:10]) Michael #3: Flet for Android via Balázs Remember Flet? Here's a code sample (scroll down a bit). It's amazing but has been basically impossible to deploy. Now we have Android. Here's a good YouTube video showing the build process for APKs. Brian #4: harlequin: The SQL IDE for Your Terminal. Ted Conbeer & other contributors Works with DuckDB and SQLite Speaking of SQLite Jeff Triplett and warnings of using Docker and SQLite in production Anže's post and and article: Django, SQLite, and the Database is Locked Error Extras Brian: Recent Python People episodes Will Vincent Julian Sequeira Pamela Fox Michael: PageFind and how I'm using it When "Everything" Becomes Too Much: The npm Package Chaos of 2024 Essay: Unsolicited Advice for Mozilla and Firefox SciPy 2024 is coming to Washington Joke: Careful with that bike lock combination code

science education washington news speaking leaving web software android developers joke ram programming older careful open source paradigm gb python data science slice databases essay pep extras bal firefox cloud computing django ides ide mozilla deploy docker kamal software developers web development unsolicited advice vms inline sqlite nvme duckdb pycharm scipy apks talk python python bytes bright data python3 vcpus pamela fox julian sequeira

Adding An Easy Mode For The Modern Data Stack With 5X

Data Engineering Podcast

Play Episode Listen Later Dec 18, 2023 56:12

Summary The "modern data stack" promised a scalable, composable data platform that gave everyone the flexibility to use the best tools for every job. The reality was that it left data teams in the position of spending all of their engineering effort on integrating systems that weren't designed with compatible user experiences. The team at 5X understand the pain involved and the barriers to productivity and set out to solve it by pre-integrating the best tools from each layer of the stack. In this episode founder Tarush Aggarwal explains how the realities of the modern data stack are impacting data teams and the work that they are doing to accelerate time to value. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today to get 2 weeks free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm welcoming back Tarush Aggarwal to talk about what he and his team at 5x data are building to improve the user experience of the modern data stack. Interview Introduction How did you get involved in the area of data management? Can you describe what 5x is and the story behind it? We last spoke in March of 2022. What are the notable changes in the 5x business and product? What are the notable shifts in the data ecosystem that have influenced your adoption and product direction? What trends are you most focused on tracking as you plan the continued evolution of your offerings? What are the points of friction that teams run into when trying to build their data platform? Can you describe design of the system that you have built? What are the strategies that you rely on to support adaptability and speed of onboarding for new integrations? What are some of the types of edge cases that you have to deal with while integrating and operating the platform implementations that you design for your customers? What is your process for selection of vendors to support? How would you characterize your relationships with the vendors that you rely on? For customers who have pre-existing investment in a portion of the data stack, what is your process for engaging with them to understand how best to support their goals? What are the most interesting, innovative, or unexpected ways that you have seen 5XData used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on 5XData? When is 5X the wrong choice? What do you have planned for the future of 5X? Contact Info LinkedIn (https://www.linkedin.com/in/tarushaggarwal/) @tarush (https://twitter.com/tarush) on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links 5X (https://5x.co) Informatica (https://www.informatica.com/) Snowflake (https://www.snowflake.com/en/) Podcast Episode (https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/) Looker (https://cloud.google.com/looker/) Podcast Episode (https://www.dataengineeringpodcast.com/looker-with-daniel-mintz-episode-55/) DuckDB (https://duckdb.org/) Podcast Episode (https://www.dataengineeringpodcast.com/duckdb-in-process-olap-database-episode-270/) Redshift (https://aws.amazon.com/redshift/) Reverse ETL (https://medium.com/memory-leak/reverse-etl-a-primer-4e6694dcc7fb) Fivetran (https://www.fivetran.com/) Podcast Episode (https://www.dataengineeringpodcast.com/fivetran-data-replication-episode-93/) Rudderstack (https://www.rudderstack.com/) Podcast Episode (https://www.dataengineeringpodcast.com/rudderstack-open-source-customer-data-platform-episode-263/) Peak.ai (https://peak.ai/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack

Data Engineering Podcast

Play Episode Listen Later Dec 11, 2023 49:51

Summary If your business metrics looked weird tomorrow, would you know about it first? Anomaly detection is focused on identifying those outliers for you, so that you are the first to know when a business critical dashboard isn't right. Unfortunately, it can often be complex or expensive to incorporate anomaly detection into your data platform. Andrew Maguire got tired of solving that problem for each of the different roles he has ended up in, so he created the open source Anomstack project. In this episode he shares what it is, how it works, and how you can start using it today to get notified when the critical metrics in your business aren't quite right. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today to get 2 weeks free! Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Miro is your single pane of glass where everyone can discover, track, and collaborate on your organization's data. I especially like the ability to combine your technical diagrams with data documentation and dependency mapping, allowing your data engineers and data consumers to communicate seamlessly about your projects. Find simplicity in your most complex projects with Miro. Your first three Miro boards are free when you sign up today at dataengineeringpodcast.com/miro (https://www.dataengineeringpodcast.com/miro). That's three free boards at dataengineeringpodcast.com/miro (https://www.dataengineeringpodcast.com/miro). Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Andrew Maguire about his work on the Anomstack project and how you can use it to run your own anomaly detection for your metrics Interview Introduction How did you get involved in the area of data management? Can you describe what Anomstack is and the story behind it? What are your goals for this project? What other tools/products might teams be evaluating while they consider Anomstack? In the context of Anomstack, what constitutes a "metric"? What are some examples of useful metrics that a data team might want to monitor? You put in a lot of work to make Anomstack as easy as possible to get started with. How did this focus on ease of adoption influence the way that you approached the overall design of the project? What are the core capabilities and constraints that you selected to provide the focus and architecture of the project? Can you describe how Anomstack is implemented? How have the design and goals of the project changed since you first started working on it? What are the steps to getting Anomstack running and integrated as part of the operational fabric of a data platform? What are the sharp edges that are still present in the system? What are the interfaces that are available for teams to customize or enhance the capabilities of Anomstack? What are the most interesting, innovative, or unexpected ways that you have seen Anomstack used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Anomstack? When is Anomstack the wrong choice? What do you have planned for the future of Anomstack? Contact Info LinkedIn (https://www.linkedin.com/in/andrewm4894/) Twitter (https://twitter.com/@andrewm4894) GitHub (http://github.com/andrewm4894) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Anomstack Github repo (http://github.com/andrewm4894/anomstack) Airflow Anomaly Detection Provider Github repo (https://github.com/andrewm4894/airflow-provider-anomaly-detection) Netdata (https://www.netdata.cloud/) Metric Tree (https://www.datacouncil.ai/talks/designing-and-building-metric-trees) Semantic Layer (https://en.wikipedia.org/wiki/Semantic_layer) Prometheus (https://prometheus.io/) Anodot (https://www.anodot.com/) Chaos Genius (https://www.chaosgenius.io/) Metaplane (https://www.metaplane.dev/) Anomalo (https://www.anomalo.com/) PyOD (https://pyod.readthedocs.io/) Airflow (https://airflow.apache.org/) DuckDB (https://duckdb.org/) Anomstack Gallery (https://github.com/andrewm4894/anomstack/tree/main/gallery) Dagster (https://dagster.io/) InfluxDB (https://www.influxdata.com/) TimeGPT (https://docs.nixtla.io/docs/timegpt_quickstart) Prophet (https://facebook.github.io/prophet/) GreyKite (https://linkedin.github.io/greykite/) OpenLineage (https://openlineage.io/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Podcasts about duckdb

Best podcasts about duckdb

Data Engineering Podcast

Python Bytes

Screaming in the Cloud

Bigdata Hebdo

Talk Python To Me - Python conversations for passionate developers

The Data Stack Show

Software Engineering Daily

What's New In Data

Hacker News Recap

Latest news about duckdb

Latest podcast episodes about duckdb

#441 It's Michaels All the Way Down

249: Quacking Through Data: Duckdb's Emerging Ecosystem

Os agentes de IA vão acabar com os devs? | Anderson Amaral - Co Founder @ScoraS

Hamilton Ulmer - Instant SQL with DuckDB/MotherDuck - Practical Data Lunch and Learn

Exploring DuckDB & Comparing Python Expressions vs Statements

Go makes everything faster. Even ducks!

Beyond Database Optimization with AI

Trends in Data Engineering – Adrian Brudaru

Issue 2025-W10 Highlights

EDyO 96 - Fosdem 2025

140. DuckDB Meets AWS: A Match Made in Cloud

Episode 211 - Motherduck

Power of #Duckdb with Postgres: pg_duckdb

Building MotherDuck to a $400M Company

pg_duckdb

#491: DuckDB and Python: Ducks and Snakes living together

Hannes Muhleisen - DuckDB Deep Dive, The Challenges of Lakehouses, and More

Знову скорочення у Reface | новий рекорд Bitcoin | «адвент календар» OpenAI — DOU News #176

FOSS4G - Blazing Fast Geospatial SQL in DuckDB - Isaac Brodsky

Crunchy Data

Big data is dead, analytics is alive

The Death of Big Data and Why It's Time To Think Small | Jordan Tigani, CEO, MotherDuck

Big data is dead, analytics is alive (Practical AI #292)

Issue 2024-W43 Highlights

Small Data, Big Impact: Insights from MotherDuck's Jacob Matson

Exploring DuckDB: A relational database built for online analytical processing

Сваренная батарея — Episode 475

Jordan Tigani - Why Small Data is Awesome, DuckDB, and More

AI, SQL, and the End of Big Data

Trying New Technology

DuckDB with Hannes Mühleisen

DuckDB with Hannes Mühleisen

Practical Applications for DuckDB (with Simon Aubury & Ned Letcher)

Issue 2024-W26 Highlights

194: Building Retail Churn Prediction on DuckDB with Clint Dunn of Wilde

#388 Don't delete all the repos

June 3rd, 2024 | I Am So Sick of Leetcode-Style Interviews

AI, Data and Blockchain: a VC perspective | Tomasz Tunguz, Founder of Theory Ventures

Issue 2024-W20 Highlights

Visually Debugging EF Queries with Giorgi Dalakishvili

Another one bites the dust

Another one bites the dust

From .NET to DuckDB: Unleashing the Database Evolution with Giorgi Dalakishvili

March 21st, 2024 | U.S. sues Apple, accusing it of maintaining an iPhone monopoly

#454: Data Pipelines with Dagster

Find Out About The Technology Behind The Latest PFAD In Analytical Database Development

Data Sharing Across Business And Platform Boundaries

#367 A New Cloud Computing Paradigm at Python Bytes

Adding An Easy Mode For The Modern Data Stack With 5X

Run Your Own Anomaly Detection For Your Critical Business Metrics With Anomstack