Trino Community Broadcast is a show where we cover events and happenings within the open-source Trino community and show off some cool stuff about Trino. Learn more at https://trino.io
Jan Waś teaches us about the new Faker connector and how you can use it to emulate data that does not exist on any storage, how you can shape it as you need, and how you can then learn real SQL, build real reports, and make some real charts - all with fake data.Details at https://trino.io/episodes/71
Manfred Moser is joined by Peter Kosztolanyi to talk about the origins, current status, and future of the new Preview Web UI for Trino, before we play around with it in a demo.More info at https://trino.io/episodes/70
Show notes and more details at https://trino.io/episodes/69
Show notes with more details at https://trino.io/episodes/68
More details at https://trino.io/episodes/67
Manfred is joined by Wren AI team members and contributors to talk about the new AI-powered, text to SQL tool and its great support for Trino.More details at https://trino.io/episodes/66
Manfred and Cole talk about recent releases in various features enhancing performance.Details in https://trino.io/episodes/65
Sebastian Bernauer and Sönke Liebau from Stackable join us to talk about their experience with using Open Policy Agent for access control with Trino.More details at https://trino.io/episodes/64
Emily Sunaryo, DevRel intern at Starburst, joins us to talk about her experience learning Trino and starting to write a web application with JavaScript to query data in Trino.More details at https://trino.io/episodes/63
More details at https://trino.io/episodes/62
Cole and Manfred talk with our guest Patrick Pichler from CreativeData about PowerBI and his open source Trino connector.More details in https://trino.io/episodes/61
We chat with Isa Inalcik from BestSecret about his proof of concepts for Trino functions calling AI/LLM systems.More details at https://trino.io/episodes/60
More details at https://trino.io/episodes/59
Timestamps:- 0:00 Intro- 1:36 Releases 437-438- 4:12 Introducing Peaka- 8:07 An overview of Peaka- 16:02 The engineering of Peaka- 20:04 Connectors- 26:51 Peaka demo- 41:34 Managing catalogs and security- 51:06 Peaka wrap-up- 53:14 PR of the episode: Filesystem caching with Alluxio- 56:16 Outro
Timestamps:- 0:00 Intro- 1:48 Releases 428-430- 6:30 Introducing Denis Magda from @YugabyteDB - 7:56 JDBC, Trino's JDBC driver, and the Postgres connector- 14:08 Introducing YugabyteDB- 21:33 Demo time! Trino with PostgreSQL- 29:56 Demoing Trino with YugabyteDB- 44:57 Failover and resiliency- 56:05 Upcoming events and Trino Summit soon!
- Intro Music: 0:00- Intro: 0:31- Trino releases 408-410: 2:02- Introducing the Ignite connector to Trino: 5:30- What is Ignite?: 7:30- Contributing the Ignite connector: 10:10- PR of the episode: 52:20
DolphinScheduler is a popular Apache data workflow orchestrator that enables running complex data pipelines. They recently added a Trino integration and will be demonstrating how to use DolphinScheduler to enable a series of transformations on the data lakehouse with Trino.- Intro Music: 0:00- Intro: 0:31- Trino release 407: 13:22- What is workflow orchestration?: 21:12- Why do we need a workflow orchestration tool for building a data lake?: 31:07- What is Apache DolphinScheduler?: 37:35- Does DolphinScheduler have any computing engine or storage layer?: 53:11- What are the differences with other workflow orchestration, such as Apache Airflow?: 58:46- Demo: Creating a simple Trino workflow in DolphinScheduler: 1:26:44- PR: Improve performance of Parquet files: 1:47:04Show Notes: https://trino.io/episodes/45Show Page: https://trino.io/broadcast/
We're going to discuss all of the awesome sessions that happened during Trino Summit this year. Manfred, Cole, and I will be joined by Martin, Dain, Brian Zhan, and Claudius for their, perspective and what they found most interesting about the summit. We also dive into stats around the summit and some exciting topics discussed off-camera.We'll also dive into some key takeaways from the Trino Contributor Congregation that took place the day after and some of the topics we went over there.- Intro Music: 0:00- Intro: 0:32- Trino Summit intro: 1:46- Why the Pokémon theme?: 3:56- Overview of Trino Summit and what stood out: 10:58- Bringing Trino to the masses: 33:57- Trino Contributor Congregation recap: 43:52- Releases: 1:00:01- Backlog grooming: 1:03:51Show Notes: https://trino.io/episodes/42.htmlShow Page: https://trino.io/broadcast/
Trino's initial use case was around replacing the Apache Hive runtime. As data lakes grew into prominence, it became clear that having a faster query engine didn't solve all problems. The Hive model itself was a huge bottleneck and didn't provide features that companies needed akin to data warehouses and databases. Apache Hudi is a new table format created out of Uber that aims to address many of these issues and usher in a new generation of data lake.Tune in as we speak to the Trino Hudi connector contributor Sagar Sumit and Grace Lu who uses Trino and Hudi at Robinhood to discuss the new Hudi connector and future plans!- Intro Music: 0:00- Intro: 0:32- Releases: 14:43- Concept of the episode: Intro to Hudi and the Hudi connector: 22:29- Concept of the episode: Merge on read and copy on write tables: 28:28- Concept of the episode: Hudi metadata table: 39:24- Concept of the episode: Hudi data layout: 46:39- Concept of the episode: Robinhood Trino and Hudi use cases: 51:12- Concept of the episode: Current state and roadmap for the Hudi connector: 1:03:15- Pull request of the episode: PR 14445: Fault-tolerant execution for PostgreSQL and MySQL connectors: 1:08:14- Demo of the episode: Using the Hudi Connector: 1:13:34Show Notes: https://trino.io/episodes/41.htmlShow Page: https://trino.io/broadcast/
Join us for this next episode of the broadcast, where we bring back Ryan Blue, the creator of Iceberg, to discuss some of the latest happenings in the Iceberg community. We also discuss and demo a bunch of new features that have come out in the Trino Iceberg connector. We also have a new guest, Tabular Developer Advocate Sam Redai, shedding light on this incredible community as well!Since the first episodes, Iceberg has finalized the v2 spec and added a lot of new features along the way. Likewise, we've improved Trino's writing capabilities around Iceberg. So much so that you can use Trino as the sole query engine atop Iceberg to support your data lake. We'll talk about all of this and more so don't miss it!- Intro Music: 0:00- Intro: 0:32- Releases: 6:27- Concept of the episode: What is Iceberg?: 11:27- Concept of the episode: Why Iceberg over other formats?: 16:50- Concept of the episode: Metadata catalogs: 35:40- Concept of the episode: Branching, tagging, and auditing, oh my!: 43:54- Concept of the episode: The Puffin format: 50:53- Concept of the episode: Trino Iceberg connector updates : 1:01:38- Pull request of the episode: PR 13111: Scale table writers per task based on throughput: 1:11:37- Demo of the episode: DML operations on Iceberg using Trino: 1:15:31Show Notes: https://trino.io/episodes/40.htmlShow Page: https://trino.io/broadcast/
In this episode we sit down with engineers, Steve Morgan and Edward Morgan, to discuss how they use Trino at Raft. Raft provides consulting services and is particularly skilled at DevSecOps. One particular challenge they face is dealing with fragmented government infrastructure. In this episode, we dive in to learn how Trino enables Raft to supply government sector clients with a data fabric solution. Raft takes a special stance on using and contributing to open source solutions that run well on the cloud.- Intro: 0:00- Intro: 0:32- Releases: 5:14- Concept of the episode: Trino at Raft: 12:08- Concept of the episode: Software factory: 13:21- Concept of the episode: Standards and anatomy of a stack: 16:24- Concept of the episode: Data Fabric at Raft: 18:58- Concept of the episode: Security concerns around Trino: 22:01- Concept of the episode: Iron bank container repository : 29:27- Concept of the episode: Data Fabric user perspective : 36:13- Concept of the episode: Challenges for adoption : 45:52- Pull request of the episode: PR 13354: Add S3 Select pushdown for JSON files: 1:05:34- Demo of the episode: Running Great Expectations on a Trino Data Lakehouse Tutorial: 1:08:01- Question of the episode: How can I deploy Trino on Kubernetes without using Helm chart?: 1:13:55Show Notes: https://trino.io/episodes/39.htmlShow Page: https://trino.io/broadcast/
We'll be doing a more focused look at a specific feature that's being added to Trino: polymorphic table functions. We're excited to talk about what they do, where we are so far, where we're going, and how you can leverage them to make Trino better than ever!Show Notes: https://trino.io/episodes/38.htmlShow Page: https://trino.io/broadcast/YouTube Video: https://www.youtube.com/watch?v=90e5WxhwNas
This episode covers will introduce the benefits of having the Trino community around the Trino project. What is the purpose of communities in tech projects? Would the product be successful without a community or anyone to maintain it?We introduce some new faces that will be stewards in our journey to growing the adoption of our favorite query engine, what each of them does, and how their work impacts you as a community member! Most importantly, you can learn how to get involved and help us learn how to best navigate ideas, issues, or any other contribution you may have that helps make our favorite query engine the best-in-class!- Intro song: 00:00- Intro: 00:32- Releases: 9:37- Concept of the episode: How to strengthen the Trino community: 15:07- Concept of the episode: Pull request process: 30:33- Concept of the episode: Impact of community and developer experience: 33:07 - Concept of the episode: Community metrics for better decision making: 44:00- Pull requests of the episode: PR 12259: Support updating Iceberg table partitioning: 1:09:42- Demo of the episode: Iceberg table partition migrations: 1:16:00- Question of the episode: Can I force a pushdown join into a connected data source?: 1:28:40Show Notes: https://trino.io/episodes/37.htmlShow Page: https://trino.io/broadcast/
As Trino preps to jump to Java 17, we discuss the latest features added Java 11 to Java 17, talk with Martin through a few of the potential uses of new features like the Vector API, language improvements, and G1GC speedups, and finally, we will dive into discussing some of the features that we'll be implementing in the upcoming months under a new project in Trino!- Intro song: 00:00- Intro: 00:36- Releases: 8:17- Question of the episode: Will Trino be making a vectorized C++ version of Trino workers?: 19:22- Concept of the episode: Java 17 and rearchitecting Trino: 36:39- Java 17 Updates: Performance: 40:10- Java 17 Updates: Garbage collectors: 46:45- Java 17 Updates: Java auto-vectorization: 1:06:22- Java 17 Updates: Java Vector API: 1:12:08- Java 17 Updates: Language features: 1:17:14- Rearchitecting Trino: Update to Java 17: 1:27:19- Rearchitecting Trino: Revamping Trino: 1:32:36- Rearchitecting Trino: Project Hummingbird: 1:39:40- Pull requests of the episode: PR 4649: Disable JIT byte code recompilation cutoffs in default jvm.config: 1:42:31- Demo of the episode: FizzBuzz - SIMD style!: 1:49:18Show Notes: https://trino.io/episodes/36.htmlShow Page: https://trino.io/broadcast/
In our Trino Community Broadcast episode 35 we are catching up on recent releases 375, 376, 377, and 378. We then talk about how Trino is packaged as tarball, rpm, and docker container, what some of the differences are, and how you can customize either of them. Beyond we also look for your feedback and input on usage of the different packages. As a next step we chat about adopting Java 17 is standard for Trino, and then we get a demo of a new feature of the web UI.- Intro song: 00:00- Intro: 00:32- Releases: 4:22- Concept of the episode: Packaging Trino: 21:28- Additional topic of the episode: Modernizing Trino with Java 17: 46:49- Pull requests of the episode: Worker stats in the Web UI: 55:25- Question of the episode: HDFS supported by Delta Lake connector?: 1:01:52- Demo of the episode: Tarball installation and new Web UI feature: 1:05:58Show Notes: https://trino.io/episodes/35.htmlShow Page: https://trino.io/broadcast/
News from the Trino releases 372, 373, and 374, and an update on Project Tardigrade are the start. Then we dive into the details of the new Delta Lake connector contributed to Trino by Starburst.- Intro song: 00:00- Intro: 00:37- Releases: 2:05- Project Tardigrade update: 9:21- Concept of the episode: A new connector for Delta Lake object storage. 18:37- Pull requests of the episode: Add Delta Lake connector and documentation. 26:10- Demo of the episode: Delta Lake connector in action. 29:14- Question of the episode: How do I secure the connection from a Trino cluster to the data source? 54:00Show Notes: https://trino.io/episodes/33.htmlShow Page: https://trino.io/broadcast/
Goldman Sachs uses Trino to reduce last-mile ETL and provide a unified way of accessing data through federated joins. Making a variety of data sets from different sources available in one spot for our data science team was a tall order. Data must be quickly accessible to data consumers and systems like Trino must be reliable for users to trust this singular access point for their data.Join us on this next episode as we discuss with engineers from Goldman Sachs on how they integrated Trino and achieved scaling and high availability.- Intro Song: 00:00- Intro: 00:28- News: 8:39- Concept of the month: High Availability with Trino: 20:23- PR of the month: PR 8956 Add support for external db for schema management in mongodb connector: 1:04:09- Bonus PR of the month: PR 8202 Metadata for alias in elasticsearch connector only uses the first mapping: 1:15:15- Demo of the month: Trino Fiddle: A tool for easy online testing and sharing of Trino SQL problems and their solutions: 1:32:08- Question of the month: Does trino hive connector supports CarbonData?: 1:38:09Show Notes: https://trino.io/episodes/33.htmlShow Page: https://trino.io/broadcast/
While Trino has been proven to run batch analytic workloads at scale, many have avoided long-running batch jobs in fear of query failure. Join this month's broadcast discussing the project introducing granular fault-tolerance to Trino. Codenamed Project Tardigrade, it is being thoughtfully crafted to maintain the speed advantage that Trino has over other query engines while increasing the resiliency of queries. We will discuss some of the design proposals being considered with Tardigrade engineers Andrii, Zebing, Lukasz Osipiuk, and Martin. We'll also cover how fault-tolerance will be exposed to users, and we will do a demo to showcase retries. Project Tardigrade is named after the microscopic Tardigrades that are the world's most indestructible creatures, akin to the resiliency we are adding to Trino's queries. We look forward to telling you more as features unfold.- Intro Song: 00:00- Intro: 00:34- News: 7:56- Concept of the month: Introducing Project Tardigrade: 20:26- Concept of the month: Why ETL in Trino?: 22:57- Concept of the month: Why are people reluctant to do their ETL in Trino?: 35:28- Concept of the month: What are the limitations of the current architecture?: 43:25- Concept of the month: Trino engine improvements with Project Tardigrade: 52:59- Demo of the month: Task retries with Project Tardigrade: 1:19:10- PR of the month: PR 10319 Trino lineage fails for AliasedRelation: 1:21:57- Question of the month: How do you cast JSON to varchar with Trino?: 1:24:38Show Notes: https://trino.io/episodes/32.htmlShow Page: https://trino.io/broadcast/
In the previous Trinites installation (https://trino.io/episodes/24.html), we introduced Kubernetes (k8s) and its concepts and how to use k8s with Trino. After discussing Kubernetes, we did a demo showing how to deploy Trino on k8s.This round, we're going to take the same k8s concepts and dive in a little deeper to help newbies to k8s (KuberNewbies...Kubies?) to but deploy Trino to the cloud (specifically the most common cloud provider, AWS). This takes us from proving Trino is awesome to just you to proving it to your coworkers, boss, and doing so at scale! We will deploy using Amazon's EKS service.- Intro Song: 00:00- Intro: 00:34- News: 9:08- Concept of the week: ReplicaSets, Deployments, and Services: 27:38- Demo of the month: Deploy Trino k8s to Amazon EKS: 1:21:31- PR of the week: PR 8921: Support TRUNCATE TABLE statement: 1:34:41- Question of the week: How do I run system.sync_partition_metadata with different catalogs?: 1:38:21Show Notes: https://trino.io/episodes/31.htmlShow Page: https://trino.io/broadcast/
Trino and dbt have become a common pattern due to Trino's ability to query data over mulitple data sources using ANSI SQL and dbt's capabilities to model robust pipelines using SQL and yaml files. José Cabada from Talkdesk joins us to discuss how Talkdesk uses Trino and dbt as central elements of their data platform to realize a data mesh. We then dive into why they are doing this and discuss what impacts a data mesh strategy has on the engineer's day-to-day work life.- Intro Song: 00:00- Intro: 00:34- News: 5:48- Concept of the week: Trino and dbt, a hot data mesh: 15:35- PR of the week: Partitioned table tests and fixed PR 9757: 1:24:15- Question of the week: What's the difference between location and external_location?: 1:28:18Show Notes: https://trino.io/episodes/30.htmlShow Page: https://trino.io/broadcast/
This is a revision on our inaugural episode of "What is Presto?" where we dive again into the question of What is Trino? We'll cover this history, the architecture, and certainly discuss a few use cases, and how to get started with the project.- Intro Song: 00:00- Intro: 00:34- News: 8:26- Concept of the week: What is Trino?: 17:03- PR of the week: PR 8821 Add HTTP/S query event logger: 58:35- Question of the week: Does the Hive connector depend on the Hive runtime?: 1:02:43Show Notes: https://trino.io/episodes/29.htmlShow Page: https://trino.io/broadcast/
Concept of the week: Event Stream abstractions and Pravega: 15:15Demo of the week: Event Stream abstractions and Pravega: 1:11:00PR of the week: Pravega presto-connector PR 49: 1:20:51Question of the week: What is the point of Trino Forum and what is the relationship to Trino Slack?: 1:26:07Show Notes: https://trino.io/episodes/28.htmlShow Page: https://trino.io/broadcast/
- Intro Song: 00:00- Intro: 00:34- News: 5:53- Concept of the week: LakeFS and Git on Object Storage: 9:06- Demo of the week: Running Trino on LakeFS: 40:45- PR of the week: PR 8762 Add query error info to cluster overview page in web UI: 1:11:11- Question of the week: Why are deletes so limited in Trino?: 1:14:14Show Notes: https://trino.io/episodes/27.htmlShow Page: https://trino.io/broadcast/
Trino is an enabler when it comes to giving you a single source of access across data sources. But how does anyone know where to find the data that they need? Many times, multiple teams have their own view of the world when it comes to the data they need but how can teams discover data beyond their day-to-day operations? Further questions like who owns the data and how do different data sources relate to eachother all can be answered by use of data discovery tools like Amundsen. Trino gets you quick access to your data, while Amundsen tells you how to navigate it.- Intro Song: 00:00- Intro: 00:33- News: 6:54- Concept of the week: Data discovery and Amundsen: 12:04- Concept of the week: Amundsen Architecture: 24:10- Concept of the week: Amundsen as a subcomponent to data mesh: 45:20- PR of the week: Index Trino Views: 1:00:06- Question of the week: Can I add a UDF without restarting Trino?: 1:12:54Show Notes: https://trino.io/episodes/26.htmlShow Page: https://trino.io/broadcast/
If you know Trino, you know it allows for flexible architectures that include many systems with varying use cases they support. We've come to accept this potpourri of systems as a general modus operandi for most businesses. Many times the data is copied to different systems to accomplish varying use cases from performance and data warehousing to merge cross cutting data into a single store. When copying data between systems, how do these systems stay in sync? We discuss Change Data Capture (CDC) and a particular implementation of it called Debezium as a method that solves this issue. It's a critical need especially for Trino to know that the state across the data sources we query are valid. Tune in to hear about CDC, Debezium, and how they are applied in real world use cases along with Trino on the next show!- Intro Song: 00:00- Intro: 00:33- News: 6:02- Concept of the week: Change Data Capture: 15:34- Concept of the week: Debezium: 27:32- Concept of the week: Debezium + Trino at Zomato: 45:47- PR of the week: PR 4140 Implement aggregation pushdown in Pinot: 1:04:25- Question of the week: Is there an array function that flattens a row into three rows?: 1:11:05Show Notes: https://trino.io/episodes/25.htmlShow Page: https://trino.io/broadcast/
- Intro Song: 00:00- Intro: 00:33- News: 8:02- Concept of the week: K8s architecture: Containers, Pods, and kubelets: 14:27- PR of the week: PR 11 Merge contributor version of k8s charts with the community version: 55:20- Demo: Running the Trino charts with kubectl: 57:42Show Notes: https://trino.io/episodes/24.htmlShow Page: https://trino.io/broadcast/
- Intro Song: 00:00- Intro: 00:34- News: 5:18- Concept of the week: Row pattern matching and MATCH_RECOGNIZE: 14:26- PR of the week: PR 8348 Document row pattern recognition in window: 52:16- Demo: Showing MATCH_RECOGNIZE functionality by example: 57:13- Question of the week: How do you tag a list of rows with custom periodic rules?: 1:12:51Show Notes: https://trino.io/episodes/23.htmlShow Page: https://trino.io/broadcast/
This episode will cover LinkedIn's journey to upgrade from PrestoSQL to Trino and some of the operational challenges LinkedIn's engineering team has faced at their scale.- Intro Song: 00:00- Intro: 00:36- News: 7:39- Concept of the week: Trino usage at LinkedIn: 15:55- Concept of the week: Trino hardware and operational scale: 23:23- Concept of the week: Challenges operating at scale: 44:09- Concept of the week: Open source at LinkedIn: 48:36- Concept of the week: PrestoSQL to Trino upgrade challenges: 58:11- Concept of the week: PrestoSQL to Trino upgrade steps: 1:13:32- PR of the week: Digging into join queries: 1:33:18- Demo: How to research the performance of a join: 1:38:53- Question of the week: How can I query the Hive views from Trino?: 1:48:10Show Notes: https://trino.io/episodes/22.htmlShow Page: https://trino.io/broadcast/
- Intro Song: 00:00- Intro: 00:35- News: 7:42- Question of the week: Can dbt connect to different databases in the same project?: 18:18- Concept of the week: What is dbt?: 21:28- Concept of the week: dbt + Trino: 38:09- Demo: Querying Trino from a dbt project: 47:21- PR of the week: PR 8283 Externalised destination table cache expiry duration for BigQuery Connector: 1:21:13Show Notes: https://trino.io/episodes/21.htmlShow Page: https://trino.io/broadcast/