From StarTree, founded by the creators of Apache Pinotâ„¢, "Real-Time Analytics with Tim Berglund" is a podcast dedicated to bringing analytics from the dashboard to the user interface. Accessible but technically rich, the show focuses on the infrastructure, tools, and techniques being developed by the people building systems that are serving analytics to our users in real-time. New episodes every Monday.
StarTree, founded by the creators of Apache Pinotâ„¢
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | After 50+ podcast episodes of the Real-Time Analytics Podcast, Tim Berglund bids farewell in his last episode as host of the show. As Tim mentions, the podcast will return in the near future with a new host. Thank you for listening and stay tuned.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Join us for this episode of the Real-Time Analytics podcast where Tim Berglund and Christina Lin from Redpanda discuss the innovative use of WebAssembly for stateless transformations directly within brokers. Learn about the benefits and architecture of this approach, which utilizes unused compute resources for efficient data processing.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | In this episode of the Real-Time Analytics podcast, Tim Berglund explores the innovative architecture of WarpStream with CEO and co-founder Richard Artoul. Discover how WarpStream is transforming the landscape of Kafka with a cloud-native approach that promises operational simplicity, reduced costs, and a stateless data architecture. Richard delves into the technical details of WarpStream, explaining its unique design that separates data and metadata and utilizes object storage for scalability and efficiency.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Join us for episode #51 of the Real-Time Analytics podcast as our host, Tim Berglund, is joined by Tim Veil, VP of Solutions Engineering and Enablement at StarTree. Dive into an discussion about Testcontainers, a powerful tool that leverages Docker for sophisticated integration testing. Learn how Testcontainers simplifies the testing process against real databases like Apache Pinot, enhancing code reliability and CI pipeline efficiency.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join us on the Real-Time Analytics podcast as Tim Berglund sits down with Dr. Rachel Laudan, renowned food historian and author, for a fascinating exploration into the evolution of food delivery and its cultural implications. Dr. Laudan shares her insights on how cooking and food preparation have transformed over centuries, reflecting broader social and economic shifts. This episode promises to deepen your understanding of our culinary past and present, shedding light on the future of meal delivery.
Pinot 1.1: https://docs.pinot.apache.org/basics/releases | Sub: https://stree.ai/sub | In this release video, Tim Berglund (VP of Developer Relations, StarTree) covers the updates since Pinot 1.0, including 166 new features and 152 bug fixes. Tim delves into key enhancements such as the introduction of vector index support—vital for AI and machine learning applications—and improvements in the multi-stage query engine. He also explains the significance of sticky query routing and new approximation algorithms like HyperLogLog++. Whether you're a seasoned developer or a data enthusiast keen to understand the latest trends in database technology, this video offers valuable insights into optimizing real-time data processing with Apache Pinot.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join us for Part 2 of our conversation between the integration of Flink and Kafka with Curtis Galione (Confluent). Building on last week's discussion, we explore the challenges and innovations at the intersection of these powerful technologies. Curtis shares his thoughts on why Flink remains focused on its core mission despite the evolving demands for data query capabilities, and the dynamic between Kafka's data handling and Flink's processing capabilities. Tune in to gain a better understanding of stream processing and the strategic decisions behind system capabilities in real-time analytics. Remember to use the 30% discount for the Real-Time Analytics Summit: https://stree.ai/rtapod30 (Code: RTAPOD30)► xkcd that Curtis referenced: https://xkcd.com/1838/► One SQL to Rule Them All: https://arxiv.org/abs/1905.12133
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! This week, Tim Berglund chats with Curtis Galione, a Flink aficionado from the Advanced Technology Group at Confluent, about the interplay between Flink and Kafka and how they revolutionize data infrastructure. This conversation explores the complexities and innovations of stream processing, promising to enhance your understanding of current data processing challenges. Remember to use the 30% discount for the Real-Time Analytics Summit: https://stree.ai/rtapod30 (Code: RTAPOD30)► xkcd that Curtis referenced: https://xkcd.com/1838/► One SQL to Rule Them All: https://arxiv.org/abs/1905.12133
Join: https://stree.ai/slack | Sub: https://stree.ai/sub | New episodes every Monday! We're back with another special episode of "The Real-Time Analytics Podcast". We launched our second podcast series called "Keyboard and Quill" at the start of March, which is really more of a tech history podcast. We're trying to bring together various threads of innovation and technology development and cultural change over the last 40,000 years, bringing all those threads together to the technology that we normally talk about in this podcast. For episodes on the history of telecommunications, Tim talked to Dr. Mara Mills, a professor in the media studies department at NYU about the history of the telephone. Since we only used a few snippets of it in "Keyboard and Quill", we wanted you to be able to hear the whole thing. Remember to use the 30% discount Tim mentioned for the Real-Time Analytics Summit: https://stree.ai/rtapod30 (Code: RTAPOD30)
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the Real-Time Analytics Podcast, Tim talks with Ryan Wright, CEO of thatDot and creator of Quine, a project heralded as the world's first streaming graph. Wright delves into Quine's innovative approach to overcoming the limitations of traditional graph databases, particularly in real-time analytics and cybersecurity applications. Listen as they highlight Quine's impact on data analytics, illustrating its potential to revolutionize how industries manage and interpret interconnected data streams. Remember to use the 30% discount Tim mentioned for the Real-Time Analytics Summit: https://stree.ai/rtapod30 (Code: RTAPOD30)► Quine.io
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Today, Tim dives into the world of Kafka Streams with Matthias Sax, Software Engineer at Confluent and core contributor to Apache Kafka. Matthias updates us on the latest in Interactive Queries, their enhancements in recent releases, insights on stream processing and how Kafka Streams stands out in the real-time analytics landscape. Remember to use the 30% discount Tim mentioned for the Real-Time Analytics Summit: https://stree.ai/rtapod30 (Code: RTAPOD30)
Register: https://stree.ai/rtapod30 | Tim's new podcast series: https://stree.ai/keyboardandquill | Tim sits down with Professor Laine Nooney (NYU) to discuss the significance of software in the evolution of personal computing, as part of unpacking Laine's book, "The Apple II Age: How the Computer Became Personal." Here's the 30% discount Tim mentioned for the Real-Time Analytics Summit: https://stree.ai/rtapod30 (Code: RTAPOD30)
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join us as we dive into the fascinating intersection of API gateways and real-time analytics with Viktor Gamov, the new head of Developer Advocacy at StarTree. Viktor shares his insights from his recent experiences and explores how these technologies are transforming user-facing analytics. We also discuss Viktor's upcoming all-day Pinot training at the RTA Summit and delve into some intriguing topics like the "Hyperion Cantos" and the concept of a StarTree. Join us at the summit, visit rtasummit.com► Hyperion Cantos by Dan Simmons: https://www.amazon.com/Hyperion-Cantos-Book-Complete-Set/dp/B084ZB7SMP► The Four: The Hidden DNA of Amazon, Apple, Facebook, and Google by Scott Galloway: https://www.amazon.com/Four-Hidden-Amazon-Facebook-Google/dp/0525501223► The Movie Database: https://developer.themoviedb.org/docs/getting-started► Developer Voices episode ft. Bobby Calderwood: https://www.youtube.com/watch?v=V7vhSHqMxus
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode, we continue our conversation with Ujwala Tulshigiri, Engineering Manager at Uber, focusing on the technical intricacies of migrating workloads and technology consolidation. Ujwala provides an in-depth look into Uber's strategic approach to infrastructure decisions, the challenges of technology migration, and how they contribute to and leverage the open-source community. She discusses the complexities of replacing systems like Elasticsearch with alternatives like Pinot, addressing the nuances of data management, search capabilities, and the importance of maintaining low-latency operations.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the Real-Time Analytics podcast, host Tim Berglund is joined by Ujwala Tulshigiri, Engineering Manager at Uber, to explore the journey of technology consolidation and the strategic embrace of open-source solutions in challenging economic times. Ujwala offers deep insights into navigating the complexities of technology migration, leveraging the power of the Apache Pinot community, and fostering innovation through collaboration. Tune in to part one of this engaging conversation to learn how Uber optimizes its technology stack for efficiency and scalability.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join us as we dive into the world of stream processing with Micah Wylde, CEO and co-founder of Arroyo. Discover how Arroyo, a cloud-first SQL native stream processing framework, addresses the challenges of previous generations of stream processing technologies. Learn about its unique approach to making stream processing accessible to non-experts and how it aims to revolutionize real-time data analysis. Whether you're a developer, data scientist, or just curious about the future of stream processing, this episode is packed with insights into Arroyo's design, goals, and how it's changing the game.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Dive into part two of our conversation with Eric Sammer as we explore the evolution of stream processing from Hadoop to Kafka and Flink. Eric shares his insights on the transformative journey of data processing technologies and their impact on the industry. Tune in for a compelling look at the past, present, and future of stream processing.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Eric Sammer (founder & CEO, Decodable) is back! This time, him and Tim talk about the current landscape of stream processing, explore the various architectures and their real-world applications. From practical insights to engaging anecdotes, this episode is a must-listen for anyone keen on understanding the dynamic world of real-time data processing. Episode #16 with Eric Sammer: https://youtu.be/I7Cs_OBM2bM
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join us for part two of our conversation with Joe Reis, host of 'Monday Morning Data Engineering' and co-author of 'Data Engineering Fundamentals'. In this episode, we continue our exploration of the evolution of data engineering and the shift towards real-time analytics. We discuss the fine line between streaming and real-time processing, the transition from ETL to data engineering, and the significance of immediate data processing in user interactions.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this week's episode, Tim chats with Joe Reis, a seasoned expert in data engineering and co-author of "Fundamentals of Data Engineering." They delve into the evolution of data engineering from ETL, the role of real-time data and analytics, and the future trajectory of the field. Joe also shares his diverse experiences, from being a 'recovering data scientist' to his current focus as a content creator and consultant.► The Fundamentals of Data Engineering by Joe Reis and Matt Housley: https://www.oreilly.com/library/view/fundamentals-of-data/9781098108298/► The Joe Reis Show: https://open.spotify.com/show/3mcKitYGS4VMG2eHd2PfDN► Monday Morning Data Chat: https://podcasts.apple.com/us/podcast/monday-morning-data-chat/id1565154727
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of "The Real Time Analytics Podcast," Tim Berglund is joined by returning guest Peter Corless (Director of Product Marketing, StarTree) to delve into the complex world of federated data systems. They discuss the evolution of data architectures, the challenges of federated identity and data governance, and the implications for modern businesses. Tune in for an insightful conversation on the intricacies and future directions of federated data in an era of diverse and interconnected systems.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Looking back at our favorite episodes from 2023, Tim Berglund chats with Anna McDonald about the fascinating world of Kafka Streams. Anna, a customer success technical architect at Confluent, shares her insights on the core concepts of Kafka Streams, including the all-important table and stream abstractions. They delve into the benefits of statefulness and durability, such as active and standby tasks, which ensure seamless failover, and how Kafka Streams stores state in RocksDB and in Kafka itself. New episodes every Monday resume on January 8, 2024!
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Looking back at our favorite episodes from 2023, host Tim Berglund welcomes Eric Sammer, Founder and CEO of Decodable. Eric, an industry leader in event streaming technology, discusses the company's focus on stream processing, real-time data processing, and integration with systems like Apache Pinot and Star Tree. The conversation delves into the challenges and complexities of managing data, from data cleansing to structuring for different use cases. They explore the ideal balance between generalized and specialized systems, emphasizing the importance of flexibility. Ultimately, they highlight how stream processing serves as an effective solution to adjust and distribute data intelligently, providing an essential abstraction point. New episodes every Monday resume on January 8, 2024!
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Looking back at our favorite episodes from 2023, we have Apache Flink's PMC Chair, Robert Metzger, on the show, who provides a friendly introduction to the world of Flink. Like a tour guide, he navigates listeners through Flink's role as a handy tool for building applications that process data in real-time. Metzger illustrates Flink's unique ability to work smoothly with both batch and streaming data, making it a nifty sidekick for anyone dealing with everything from historical data to real-time processing. New episodes every Monday resume on January 8, 2024!
Tim Berglund and Rachel Pedreschi introduce our limited series podcast, Keyboard and Quill, launching in March 2024: https://stree.ai/keyboardandquill Fear not, Tim will continue to host our weekly podcast for data professionals, Real-Time Analytics with Tim Berglund: https://stree.ai/podcastABOUT KEYBOARD AND QUILLKeyboard and Quill is a playful, narrative podcast exploring data and technology through time. Co-hosted by Silicon Valley database startup veterans, Tim Berglund and Rachel Pedreschi, and inspired by the soundscapes of NPR's award-winning Radiolab, Keyboard and Quill is a lively look at how we went from the wheel to the smartphone–from the printing press to the complex data ecosystems powering our modern lives. Tim and Rachel interview academics, philosophers, technologists, thought leaders, cofounders, software engineers, and a variety of other Silicon Valley professionals.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the "Real-Time Analytics" podcast, Tim is joined by Dunith Danushka, a senior developer advocate at Redpanda. They dive into the fascinating world of Redpanda, a cutting-edge streaming technology that's reshaping the realtime analytics ecosystem. Dunith shares his expert knowledge, comparing Redpanda to Kafka, explaining its unique features, and how it operates as an immutable append-only log. They also discuss the technical nuances, including the Seastar framework, and how Redpanda achieves superior performance by working closely with hardware.► https://redpanda.com/blog/redpanda-vs-kafka-performance-benchmark
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join host Tim Berglund and StarTree's Peter Corless on the "Real-Time Analytics" podcast as they explore the evolution of data architecture and the relevance of the 'data stack' concept in today's tech landscape. They delve into the shift from traditional structures like the LAMP stack to more dynamic, complex systems, underscoring the need for new frameworks and terminologies. ► LAMP stack: https://en.wikipedia.org/wiki/LAMP_(software_bundle)► JAM stack: https://jamstack.org/► OSI reference model: https://en.wikipedia.org/wiki/OSI_model► Recent Trino episode of podcast: https://youtu.be/_eFdbfn1gO0► The StarTrek Federation: https://memory-alpha.fandom.com/wiki/United_Federation_of_Planets► Look for Peter Corless in StarTree Community Slack (stree.ai/slack)
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the Real-time Analytics Podcast, host Tim Berglund dives deep into the nuanced world of transactional versus analytical stream processing. Reflecting on his experiences and expert interviews, Tim brings fresh insights into querying streams and the technology behind it. He revisits his early work with Apache Pinot and Kafka, offering a unique perspective on the evolving field of streaming SQL technologies. ► Episode #24 with Hojjat Jafarpour: https://youtu.be/CFvaRPiNXJc► Episode #19 on Upserts & Deletes ft. Navina Ramesh: https://youtu.be/qa9ZCMYVpa8
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Dive into the world of advanced SQL querying with Elon Azoulay, a software engineer at Starburst. In this conversation, we explore Trino, a massively parallel distributed SQL engine, and its groundbreaking capabilities in federating queries across diverse data sources—from data lakes to APIs like Pinot.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join host Tim as he talks with Sandeep Dabade through demystifying the impressive star-tree index of Apache Pinot. Discover how this advanced feature optimizes OLAP databases, striking a balance between storage and high-speed query performance, and listen to real-world test cases showcasing its lightning-fast capabilities. Sandeep's blogs:► https://startree.ai/blog/best-practices-for-designing-tables-in-apache-pinot► https://startree.ai/blog/star-tree-indexes-in-apache-pinot-part-1-understanding-the-impact-on-query-performance► https://startree.ai/blog/star-tree-indexes-in-apache-pinot-part-2-understanding-the-impact-during-high-concurrency► https://startree.ai/blog/star-tree-index-in-apache-pinot-part-3-understanding-the-impact-in-real-customer
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the 'Real-Time Analytics' podcast, dive into the world of Pinot capacity planning with Sandeep Dabade, a solutions engineer at StarTree. Discover how to calculate the perfect cluster size for your real-time analytics requirements and explore essential technical KPIs like read throughput, write throughput, and data size. Sandeep shares invaluable insights into optimizing Pinot for seamless data processing and analytics, making this episode essential for anyone tackling real-time data challenges.Sandeep's blogs:► https://startree.ai/blog/best-practices-for-designing-tables-in-apache-pinot► https://startree.ai/blog/star-tree-indexes-in-apache-pinot-part-1-understanding-the-impact-on-query-performance► https://startree.ai/blog/star-tree-indexes-in-apache-pinot-part-2-understanding-the-impact-during-high-concurrency► https://startree.ai/blog/star-tree-index-in-apache-pinot-part-3-understanding-the-impact-in-real-customer
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join us for Part 2 of the "Real-Time Analytics" podcast featuring Neha Pawar of StarTree, where we delve into Apache Pinot's advanced features including its pluggable architecture, upserts, and Kafka integration. Uncover how Pinot maintains data integrity in real-time analytics and get an insider's look at StarTree Cloud's exclusive tiered storage system.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the "Real-Time Analytics" podcast, host Tim and guest Neha Pawar, a founding engineer of StarTree, explore Apache Pinot's unique capabilities in real-time analytics. Neha unpacks Pinot's efficiency, low latency, and high throughput, revealing its prowess in offering real-time insights to end users. Tune in to this first installment of a two-part series for an insightful discussion on the intricacies and innovations that make Pinot a standout in the analytics landscape.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the Real-Time Analytics podcast, we welcome Johan Adami, a seasoned software engineer from Stripe, who shares his experience building out Pinot as an internal service for enhanced real-time analytics. Listen in as Johan unveils the journey from the integration of Apache Pinot to tackling the complexities of real-time data processing, offering a first-hand account of the challenges and achievements encountered along the way.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Ever wondered how enterprise IT data has evolved over the years? Join Tim Berglund and Guru Sattanathan as they navigate through the intriguing phases of data silos, application integrations, and the inevitable rise of real-time analytics. Guru's perspective, deeply rooted in his personal career journey and rich experience, brings out the essence of how software silos have transformed over the years.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | StarTree's Tim Berglund unpacks Apache Pinot 1.0! This major milestone release is functionally complete and widely used in production environments. It has introduced many new features to support query-time native JOINs by extending the multi-stage query engine, upsert capabilities (delete, metadata TTL, segment preloading and segment compaction), NULL value support in queries, support for SPI-based pluggable indexes, and improvements to the Spark 3 connector. Be sure to subscribe to catch our future videos covering each new release of Apache Pinot.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the "Real-Time Analytics" podcast, Tim sits down with Hojjat Jafarpour, a leading figure in the streaming SQL domain. From the early days of KSQL to the cutting-edge work with DeltaStreams, they dive deep into the evolution and impact of real-time analytics, streaming SQL, and cloud-native data solutions. As a former colleague and friend, Hojjat shares his journey and insights on where the streaming world is heading. Join them for an insightful conversation that bridges the past and future of stream processing.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join us on the Real-Time Analytics Podcast as we delve into the intriguing intersection of data mesh and event streaming with Hubert Dulay, a developer advocate at StarTree and the author of "Streaming Data Mesh." Our host, Tim Berglund, uncovers the journey from Zhamak Dehghani's initial concept to Hubert's vision of implementing it in a streaming context. Understand the essence of treating data as a product, the future of streaming technologies, and the transformative role of data in modern businesses. Hubert's book: https://www.oreilly.com/library/view/streaming-data-mesh/9781098130718/Zhamak Dehghani's Real-Time Analytics Summit keynote: https://youtu.be/Pz3UPpv_JIs
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Discover the secrets behind Wix's cutting-edge real-time analytics in this intriguing episode of the "Real-Time Analytics" podcast. We're joined by Josef Goldstein, Head of R&D for Data Engineering and Analytics Infrastructure at Wix. Tune in to learn how Wix not only uses analytics to make data-driven decisions but also empowers its users to do the same. Uncover the complexities of providing real-time analytics at scale and why it's a game-changer for both Wix and its users.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join host Tim Bergland as he sits down with Lakshmi Rao, a staff engineer at Stripe, who has been at the forefront of data infrastructure, especially in real-time and streaming technologies. Dive into their discussion about Lakshmi's journey from Kafka to Flink and finally to Pinot, understanding the growth and development of real-time analytics in the payment sector. Discover the significance of real-time analytics at Stripe, both for customer-facing and internal tools. If you're curious about the evolution of real-time infrastructure and how major players like Stripe navigate this space, this episode is a must-listen!Meetup: Building a Real-Time Analytics Platform (Lakshmi Rao, Stripe) | San Francisco 2023: https://youtu.be/yqdRegoCiJ0
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Today Tim is joined by Ralph Debusmann (Enterprise Kafka Engineer, Migros) and Hubert Dulay (Developer Advocate, StarTree) where they delve deep into the world of streaming databases. They explore the blend of traditional databases with streaming elements, aiming to make stream processing more user-friendly with SQL. Discussing tools like ksqlDB and Materialize, they touch upon Martin Kleppmann's theories of transforming databases and the pros and cons of current streaming platforms. Dive in to learn more about the future of data streaming!Turning the database inside-out: https://martin.kleppmann.com/2015/11/05/database-inside-out-at-oredev.html
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! StarTree's Tim Berglund and Navina Ramesh sit down to discuss the complex issue of upserts and deletes in analytical databases. They cover the challenges and necessity of these features in real-time analytical processing. Unlike traditional databases where records can be updated, analytical databases are typically immutable, making Pinot unique in its ability to support upserts. The conversation sheds light on why these functionalities are game-changers for real-time analytics.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! On this week's episode, Tim chats with Ken Krugler about the popularity of vector databases and generative AI, such as ChatGPT-4, where they then explore Ken's work with Word2vec and the challenge of fast vector searches in advertising. Ken shares some fascinating insights into semantic search and the mechanics of working with large data sets. The conversation concludes with an appreciation for the depth and creativity that AI can offer, demonstrated by an interesting experiment Ken conducts with summarizing a philosophical paper using different character voices, like a surfer dude and a Jesuit priest.Hierarchical Navigable Small World (HNSW): https://towardsdatascience.com/similarity-search-part-4-hierarchical-navigable-small-world-hnsw-2aad4fe87d37?gi=ea38f97d58f7
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the Real-Time Analytics podcast, host Tim Berglund welcomes Eric Sammer, Founder and CEO of Decodable. Eric, an industry leader in event streaming technology, discusses the company's focus on stream processing, real-time data processing, and integration with systems like Apache Pinot and StarTree. The conversation delves into the challenges and complexities of managing data, from data cleansing to structuring for different use cases. They explore the ideal balance between generalized and specialized systems, emphasizing the importance of flexibility. Ultimately, they highlight how stream processing serves as an effective solution to adjust and distribute data intelligently, providing an essential abstraction point.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! On today's episode, Tim Berglund sits down for a chat with Bill Bejeck, a prominent figure in the world of Kafka and real-time analytics. They dive into topics around Apache Kafka, Kafka Streams and interactive queries, diving deep into each one. Bill describes interactive queries as a way to scrutinize the state of a Kafka Streams application, whether that's a simple key lookup or an analysis of complex aggregations. The conversation also explores the functionality of KTables and how Kafka Streams manage state. If you've ever wondered about interactive queries or Kafka Streams at large, this is the episode for you.Anna's previous episodes: https://youtu.be/K14Kn0D-I4Yhttps://youtu.be/nCLN15W_WOcBill's book, Kafka Streams in Action: https://www.manning.com/books/kafka-streams-in-actionKafka Streams 101 course: https://developer.confluent.io/courses/kafka-streams/get-started/?utm_medium=sem&utm_source=google&utm_campaign=ch.sem_br.nonbrand_tp.prs_tgt.dsa_mt.dsa_rgn.namer_lng.eng_dv.all_con.confluent-developer&utm_term=&creative=&device=c&placement=&gad=1&gclid=CjwKCAjwx_eiBhBGEiwA15gLN00L7kvbE0vwVuL9IIGu78PBhzaTTzZU3REN-z2FTr968azH4KouiRoCV4oQAvD_BwE
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the Real-Time Analytics podcast, host Tim Berglund discusses the complexities of querying data streams. Tim examines two types of queries - transactional and analytical - and details four methods to handle such inquiries: dumping the data into a data lake, a relational database, using a stream processor, or a real-time analytics database. Each approach has its merits and drawbacks, relating to infrastructure, latency, and the type of analysis required.Robert Zych's tweet: https://twitter.com/zychr/status/1540553490648289280Gunnar Morling's episode: https://youtu.be/cyeKnfdjQlw
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In part two of the "Real-Time Analytics" podcast, Robert Metzger, the PMC chair of Apache Flink, elaborates on using Flink as a developer. Metzger discusses the spectrum of APIs in Flink, ranging from expressive APIs to easy-to-use APIs. He mentions the process function, a low-level, flexible API that exposes basic building blocks of Flink, such as real-time events, state, and event time. Metzger also speaks about the windowing API of Flink and the Async I/O operator. He further details how Flink users can work with a combination of SQL and Java code in the data stream API. You won't want to miss this episode!Flink Deployments At Decodable: https://www.decodable.co/blog/flink-deployments-at-decodable3 Reasons Why You Need Apache Flink for Stream Processing: https://thenewstack.io/3-reasons-why-you-need-apache-flink-for-stream-processing/#:~:text=For%20example%2C%20Uber%20uses%20Flink,streaming%20data%20at%20massive%20scale.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Today we have Apache Flink's PMC Chair, Robert Metzger, on the show, who provides a friendly introduction to the world of Flink. Like a tour guide, he navigates listeners through Flink's role as a handy tool for building applications that process data in real-time. Metzger illustrates Flink's unique ability to work smoothly with both batch and streaming data, making it a nifty sidekick for anyone dealing with everything from historical data to real-time processing. Make sure to tune into Part 2 next week, where Robert will dive even deeper into this technology.
In this episode of the Real-Time Analytics Podcast, host Tim Berglund continues his conversation with Anna McDonald about Kafka Streams and the complexities of stream processing related to time. They explore the different types of windows available in Kafka Streams, including hopping, tumbling, session, and sliding windows. Anna provides insightful explanations and examples of each window type, highlighting their unique features and use cases. Don't miss out on this informative and engaging conversation on real-time analytics and Kafka Streams.Part 1 of Anna's episode: https://youtu.be/K14Kn0D-I4YAnna's Real-Time Analytics Summit 2023 presentation: https://youtu.be/tratRsV1TiI
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join Tim Berglund as he chats with Anna McDonald about the fascinating world of Kafka Streams. Anna, a customer success technical architect at Confluent, shares her insights on the core concepts of Kafka Streams, including the all-important table and stream abstractions. They delve into the benefits of statefulness and durability, such as active and standby tasks, which ensure seamless failover, and how Kafka Streams stores state in RocksDB and in Kafka itself. With a teaser for the next episode, this conversation promises an exciting exploration of data ingestion and time management in Kafka Streams. Don't miss out on this insightful discussion!Starting with Apache Kafka: https://developer.confluent.io/learn-kafka/apache-kafka/events/KIP-392 information: https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! On today's episode of the Real-Time Analytics podcast, Tim Berglund interviews Oli Makhasoeva, the Director of Developer Relations at Bytewax. Oli, who joined Bytewax as a founding member just six months ago, shares her passion for making data streaming more accessible for Python users. As a data scientist herself, Oli understands the challenges faced when transitioning batch processes to real-time systems. Bytewax, as a Python-native stream processing library, aims to bridge this gap and provide a more convenient and efficient solution for data scientists seeking real-time capabilities. The conversation delves into the benefits of Bytewax's Python-native approach, making it a promising tool for stream processing needs.