Podcasts about apache pinot

  • 18PODCASTS
  • 41EPISODES
  • 39mAVG DURATION
  • ?INFREQUENT EPISODES
  • Apr 7, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about apache pinot

Latest podcast episodes about apache pinot

Great Things with Great Tech!
Real-Time Analytics... Supercharging AI and Observability with StarTree | Episode #97

Great Things with Great Tech!

Play Episode Listen Later Apr 7, 2025 40:48


Did you know every time you order food, book a ride, or even check who viewed your profile, real-time analytics is powering your experience behind the scenes?In this episode of Great Things with Great Tech, we dive deep into the power of real-time analytics with Kishore Gopalakrishna, CEO and Co-founder of StarTree. StarTree leverages Apache Pinot, a high-performance real-time analytics database, revolutionizing how leading companies like Uber, LinkedIn, Walmart, and Etsy provide instant insights and personalized experiences at massive scale.Kishore shares his journey from a gaming enthusiast fascinated by distributed systems to building mission-critical platforms at Yahoo and LinkedIn, eventually creating Apache Pinot. Discover how StarTree is powering billions of real-time queries per week, enabling businesses to enhance customer interactions, optimize operational decisions, and supercharge modern AI and observability.Key Takeaways: How real-time analytics transform industries, enabling instantaneous insights and rapid decision-making. The evolution from traditional databases to highly efficient columnar, real-time analytics systems. Real-world applications of Apache Pinot, from consumer apps to enterprise observability and operational excellence. How real-time data is accelerating innovations in AI, specifically through Real-Time Retrieval-Augmented Generation (RAG). The future of analytics: seamless data ingestion, enhanced concurrency, and the growing demand for sub-second response times.Links & Resources: Web StarTree: https://startree.ai Kishore Gopalakrishna on LinkedIn: https://www.linkedin.com/in/kgopalak/Apache Pinot: https://pinot.apache.org☑️ Support the Channel: ⁠⁠⁠https://ko-fi.com/gtwgt⁠⁠⁠☑️ Be on #GTwGT: Contact via Twitter @GTwGTPodcast or ⁠⁠visit https://www.gtwgt.com⁠⁠☑️ Subscribe to YouTube: ⁠⁠https://www.youtube.com/@GTwGTPodcast?sub_confirmation=1⁠⁠Check out the full episode on our platforms:Spotify: ⁠⁠https://open.spotify.com/episode/2l9aZpvwhWcdmL0lErpUHC?si=x3YOQw_4Sp-vtdjyroMk3Q⁠⁠Apple Podcasts: ⁠⁠https://podcasts.apple.com/us/podcast/darknet-diaries-with-jack-rhysider-episode-83/id1519439787?i=1000654665731⁠⁠Follow Us:Website: https://gtwgt.comTwitter: https://twitter.com/GTwGTPodcastInstagram: https://instagram.com/GTwGTPodcast☑️ Music: https://www.bensound.com

Open at Intel
AI, Community, and the Future of Generative Applications

Open at Intel

Play Episode Listen Later Nov 27, 2024 20:53


In this engaging conversation at the All Things Open conference, Tim Spann, Principal Developer Advocate at Zilliz, discusses the importance of community collaboration in advancing AI technologies. He emphasizes the need for diverse perspectives in solving complex problems and highlights his work with the Milvus open source vector database. Tim also explains the evolving landscape of retrieval augmented generation (RAG) and its applications and shares insights into the future of AI development. The conversation concludes on a lighter note with Tim describing his creative use of Milvus in a fun Halloween project to catalog and identify ghosts. 00:00 Introduction 00:41 Meet Tim Spann: Principal Developer Advocate 01:35 The Importance of Community in AI 02:56 Advanced RAG and Multimodal Models 06:17 The Future of Agentic RAG 09:04 Challenges and Excitement in AI Development 13:35 Building AI the Right Way 17:50 Fun with AI: Capturing Ghosts 19:24 Conclusion and Final Thoughts   Guest: Tim Spann is a Principal Developer Advocate for Zilliz and Milvus. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal Developer Advocate at Cloudera, Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.

The GeekNarrator
Learnings from building Open Source Distributed Systems with Kishore Gopalakrishna

The GeekNarrator

Play Episode Listen Later Aug 27, 2024 60:24


In this episode of The Geek Narrator podcast, hosted by Kaivalya Apte, we welcome a special guest, Kishore Gopalakrishna from StarTree, co-author of Apache Pinot and other notable projects. Kishore shares his extensive experience in building real-time analytics and streaming systems, including Apache Pino, Espresso, Apache Helix, and Third Eye. The episode delves into the motivations and challenges behind creating these systems, the innovations they brought to distributed systems, and the impact of community on open-source projects. Kishore also discusses the evolution of testing methodologies, cost optimizations in transactional and analytical systems, and key considerations for companies evaluating real-time analytics solutions. Don't miss this in-depth conversation packed with valuable insights for both seasoned developers and tech enthusiasts! Chapters: 00:00 Introduction 03:13 Building Distributed Systems at LinkedIn 08:57 Testing and Challenges in Distributed Systems 30:50 Advantages of Columnar Storage 33:04 The Importance of Upserts 34:24 Building a Strong Open Source Community 41:10 Challenges and Lessons in System Design 51:35 Real-Time Analytics: Do You Need It? StarTree: https://startree.ai/ Apache Pinot: https://pinot.apache.org/ If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #kafka #s3 #streaming #realtimeanalytics #database #pinot #startree

AWS re:Think Podcast
Episode 28: Real Time Analytics with Apache Pinot and Startree

AWS re:Think Podcast

Play Episode Listen Later Aug 20, 2024 42:50


Companies need to provide real time insights to both customers and internal users. These insights power use cases such as personalization and fraud detection. StarTree Cloud is a real-time analytics platform built on Apache Pinot for building such applications that depend on real time insights. In this episode we meet with Chinmay Soman, Head of Product at Startree.ai to discuss the different dimensions of real-time analytics and how Apache Pinot and StarTree Cloud offer a robust platform for providing such insights to applications.AWS Hosts: Nolan Chen & Malini ChatterjeeEmail Your Feedback: rethinkpodcast@amazon.comResources:StarTree:https://startree.aiStarTree community Slack:https://communityinviter.com/apps/startreedata/startree-communityApache Pinot Slack: https://communityinviter.com/apps/apache-pinot/apache-pinotServerless / Free forever workspace:https://stree.ai/free

Software at Scale
Software at Scale 60 - Data Platforms with Aravind Suresh

Software at Scale

Play Episode Listen Later Aug 5, 2024 34:51


Aravind was a Staff Software Engineer at Uber, and currently works at OpenAI.Apple Podcasts | Spotify | Google PodcastsEdited TranscriptCan you tell us about the scale of data Uber was dealing with when you joined in 2018, and how it evolved?When I joined Uber in mid-2018, we were handling a few petabytes of data. The company was going through a significant scaling journey, both in terms of launching in new cities and the corresponding increase in data volume. By the time I left, our data had grown to over an exabyte. To put it in perspective, the amount of data grew by a factor of about 20 in just a three to four-year period.Currently, Uber ingests roughly a petabyte of data daily. This includes some replication, but it's still an enormous amount. About 60-70% of this is raw data, coming directly from online systems or message buses. The rest is derived data sets and model data sets built on top of the raw data.That's an incredible amount of data. What kinds of insights and decisions does this enable for Uber?This scale of data enables a wide range of complex analytics and data-driven decisions. For instance, we can analyze how many concurrent trips we're handling throughout the year globally. This is crucial for determining how many workers and CPUs we need running at any given time to serve trips worldwide.We can also identify trends like the fastest growing cities or seasonal patterns in traffic. The vast amount of historical data allows us to make more accurate predictions and spot long-term trends that might not be visible in shorter time frames.Another key use is identifying anomalous user patterns. For example, we can detect potentially fraudulent activities like a single user account logging in from multiple locations across the globe. We can also analyze user behavior patterns, such as which cities have higher rates of trip cancellations compared to completed trips.These insights don't just inform day-to-day operations; they can lead to key product decisions. For instance, by plotting heat maps of trip coordinates over a year, we could see overlapping patterns that eventually led to the concept of Uber Pool.How does Uber manage real-time versus batch data processing, and what are the trade-offs?We use both offline (batch) and online (real-time) data processing systems, each optimized for different use cases. For real-time analytics, we use tools like Apache Pinot. These systems are optimized for low latency and quick response times, which is crucial for certain applications.For example, our restaurant manager system uses Pinot to provide near-real-time insights. Data flows from the serving stack to Kafka, then to Pinot, where it can be queried quickly. This allows for rapid decision-making based on very recent data.On the other hand, our offline flow uses the Hadoop stack for batch processing. This is where we store and process the bulk of our historical data. It's optimized for throughput – processing large amounts of data over time.The trade-off is that real-time systems are generally 10 to 100 times more expensive than batch systems. They require careful tuning of indexes and partitioning to work efficiently. However, they enable us to answer queries in milliseconds or seconds, whereas batch jobs might take minutes or hours.The choice between batch and real-time depends on the specific use case. We always ask ourselves: Does this really need to be real-time, or can it be done in batch? The answer to this question goes a long way in deciding which approach to use and in building maintainable systems.What challenges come with maintaining such large-scale data systems, especially as they mature?As data systems mature, we face a range of challenges beyond just handling the growing volume of data. One major challenge is the need for additional tools and systems to manage the complexity.For instance, we needed to build tools for data discovery. When you have thousands of tables and hundreds of users, you need a way for people to find the right data for their needs. We built a tool called Data Book at Uber to solve this problem.Governance and compliance are also huge challenges. When you're dealing with sensitive customer data, you need robust systems to enforce data retention policies and handle data deletion requests. This is particularly challenging in a distributed system where data might be replicated across multiple tables and derived data sets.We built an in-house lineage system to track which workloads derive from what data. This is crucial for tasks like deleting specific data across the entire system. It's not just about deleting from one table – you need to track down and update all derived data sets as well.Data deletion itself is a complex process. Because most files in the batch world are kept immutable for efficiency, deleting data often means rewriting entire files. We have to batch these operations and perform them carefully to maintain system performance.Cost optimization is an ongoing challenge. We're constantly looking for ways to make our systems more efficient, whether that's by optimizing our storage formats, improving our query performance, or finding better ways to manage our compute resources.How do you see the future of data infrastructure evolving, especially with recent AI advancements?The rise of AI and particularly generative AI is opening up new dimensions in data infrastructure. One area we're seeing a lot of activity in is vector databases and semantic search capabilities. Traditional keyword-based search is being supplemented or replaced by embedding-based semantic search, which requires new types of databases and indexing strategies.We're also seeing increased demand for real-time processing. As AI models become more integrated into production systems, there's a need to handle more GPUs in the serving flow, which presents its own set of challenges.Another interesting trend is the convergence of traditional data analytics with AI workloads. We're starting to see use cases where people want to perform complex queries that involve both structured data analytics and AI model inference.Overall, I think we're moving towards more integrated, real-time, and AI-aware data infrastructure. The challenge will be balancing the need for advanced capabilities with concerns around cost, efficiency, and maintainability. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.softwareatscale.dev

Real-Time Analytics with Tim Berglund
Testcontainers and Apache Pinot with Tim Veil | Ep. 51

Real-Time Analytics with Tim Berglund

Play Episode Listen Later May 6, 2024 26:28


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Join us for episode #51 of the Real-Time Analytics podcast as our host, Tim Berglund, is joined by Tim Veil, VP of Solutions Engineering and Enablement at StarTree. Dive into an discussion about Testcontainers, a powerful tool that leverages Docker for sophisticated integration testing. Learn how Testcontainers simplifies the testing process against real databases like Apache Pinot, enhancing code reliability and CI pipeline efficiency.

Real-Time Analytics with Tim Berglund

Pinot 1.1: https://docs.pinot.apache.org/basics/releases | Sub: https://stree.ai/sub | In this release video, Tim Berglund (VP of Developer Relations, StarTree) covers the updates since Pinot 1.0, including 166 new features and 152 bug fixes. Tim delves into key enhancements such as the introduction of vector index support—vital for AI and machine learning applications—and improvements in the multi-stage query engine. He also explains the significance of sticky query routing and new approximation algorithms like HyperLogLog++. Whether you're a seasoned developer or a data enthusiast keen to understand the latest trends in database technology, this video offers valuable insights into optimizing real-time data processing with Apache Pinot.

ai pinot developer relations startree apache pinot
Developer Voices
How Apache Pinot Achieves 200,000 Queries per Second (with Tim Berglund)

Developer Voices

Play Episode Listen Later Mar 20, 2024 74:28


The likes of LinkedIn and Uber use Pinot to power some astonishingly high-scale queries against realtime data. The numbers alone would make an impressive case-study. But behind the headline lies a fascinating set of architectural decisions and constraints to get there. So how does Pinot work? How does it process queries? How are the various roles split across a cluster? And equally important - what does it *not* try to achieve.Joining me to go through the nuts and bolts of how Pinot handles SQL queries is Tim Berglund, veteran technology explainer of the realtime-data world. He takes us through Pinot step-by-step, covering the roles of brokers, servers, controllers and minions as we build up the picture of a query engine that's interesting in theory and massively performant in practice.–Apache Pinot: https://pinot.apache.org/Apache Pinot Docs: https://docs.pinot.apache.org/StarTree: https://startree.ai/Event Driven Design episode with Bobby Calderwood: https://youtu.be/V7vhSHqMxusTim on Twitter: https://twitter.com/tlberglundKris on Mastodon: http://mastodon.social/@krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/Kris on Twitter: https://twitter.com/krisajenkins–#podcast #softwaredevelopment #apachepinot #database #dataengineering #sql

Real-Time Analytics with Tim Berglund
Uber & Open-Source: Ujwala Tulshigiri's Insights - Part 2 | Ep. 42

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Feb 26, 2024 28:28


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode, we continue our conversation with Ujwala Tulshigiri, Engineering Manager at Uber, focusing on the technical intricacies of migrating workloads and technology consolidation. Ujwala provides an in-depth look into Uber's strategic approach to infrastructure decisions, the challenges of technology migration, and how they contribute to and leverage the open-source community. She discusses the complexities of replacing systems like Elasticsearch with alternatives like Pinot, addressing the nuances of data management, search capabilities, and the importance of maintaining low-latency operations.

Real-Time Analytics with Tim Berglund
Uber's Scalable Tech Strategy with Ujwala Tulshigiri - Part 1 | Ep. 41

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Feb 20, 2024 23:32


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the Real-Time Analytics podcast, host Tim Berglund is joined by Ujwala Tulshigiri, Engineering Manager at Uber, to explore the journey of technology consolidation and the strategic embrace of open-source solutions in challenging economic times. Ujwala offers deep insights into navigating the complexities of technology migration, leveraging the power of the Apache Pinot community, and fostering innovation through collaboration. Tune in to part one of this engaging conversation to learn how Uber optimizes its technology stack for efficiency and scalability.

Real-Time Analytics with Tim Berglund
Best of 2023: Navigating Event Streaming with Eric Sammer, Decodable's CEO

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Dec 26, 2023 31:30


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Looking back at our favorite episodes from 2023, host Tim Berglund welcomes Eric Sammer, Founder and CEO of Decodable. Eric, an industry leader in event streaming technology, discusses the company's focus on stream processing, real-time data processing, and integration with systems like Apache Pinot and Star Tree. The conversation delves into the challenges and complexities of managing data, from data cleansing to structuring for different use cases. They explore the ideal balance between generalized and specialized systems, emphasizing the importance of flexibility. Ultimately, they highlight how stream processing serves as an effective solution to adjust and distribute data intelligently, providing an essential abstraction point. New episodes every Monday resume on January 8, 2024!

Real-Time Analytics with Tim Berglund
Unraveling the Stream: Transactional vs Analytical Processing | Ep. 32

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Nov 20, 2023 18:28


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the Real-time Analytics Podcast, host Tim Berglund dives deep into the nuanced world of transactional versus analytical stream processing. Reflecting on his experiences and expert interviews, Tim brings fresh insights into querying streams and the technology behind it. He revisits his early work with Apache Pinot and Kafka, offering a unique perspective on the evolving field of streaming SQL technologies. ► Episode #24 with Hojjat Jafarpour: https://youtu.be/CFvaRPiNXJc► Episode #19 on Upserts & Deletes ft. Navina Ramesh: https://youtu.be/qa9ZCMYVpa8

Real-Time Analytics with Tim Berglund
Unveiling the Speed of Star-Tree Index with Sandeep Dabade | Ep. 30

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Nov 6, 2023 31:01


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join host Tim as he talks with Sandeep Dabade through demystifying the impressive star-tree index of Apache Pinot. Discover how this advanced feature optimizes OLAP databases, striking a balance between storage and high-speed query performance, and listen to real-world test cases showcasing its lightning-fast capabilities. Sandeep's blogs:► https://startree.ai/blog/best-practices-for-designing-tables-in-apache-pinot► https://startree.ai/blog/star-tree-indexes-in-apache-pinot-part-1-understanding-the-impact-on-query-performance► https://startree.ai/blog/star-tree-indexes-in-apache-pinot-part-2-understanding-the-impact-during-high-concurrency► https://startree.ai/blog/star-tree-index-in-apache-pinot-part-3-understanding-the-impact-in-real-customer

Real-Time Analytics with Tim Berglund
Deep Dive: Exploring StarTree's Advanced Features with Neha Pawar - Part 2 | Ep. 28

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Oct 23, 2023 33:42


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join us for Part 2 of the "Real-Time Analytics" podcast featuring Neha Pawar of StarTree, where we delve into Apache Pinot's advanced features including its pluggable architecture, upserts, and Kafka integration. Uncover how Pinot maintains data integrity in real-time analytics and get an insider's look at StarTree Cloud's exclusive tiered storage system.

Real-Time Analytics with Tim Berglund
Neha Pawar on Apache Pinot's Edge in Real-Time Analytics | Ep. 27

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Oct 16, 2023 24:46


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the "Real-Time Analytics" podcast, host Tim and guest Neha Pawar, a founding engineer of StarTree, explore Apache Pinot's unique capabilities in real-time analytics. Neha unpacks Pinot's efficiency, low latency, and high throughput, revealing its prowess in offering real-time insights to end users. Tune in to this first installment of a two-part series for an insightful discussion on the intricacies and innovations that make Pinot a standout in the analytics landscape.

Real-Time Analytics with Tim Berglund
Inside Stripe's Data Revolution with Johan Adami | Ep. 26

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Oct 10, 2023 32:34


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the Real-Time Analytics podcast, we welcome Johan Adami, a seasoned software engineer from Stripe, who shares his experience building out Pinot as an internal service for enhanced real-time analytics. Listen in as Johan unveils the journey from the integration of Apache Pinot to tackling the complexities of real-time data processing, offering a first-hand account of the challenges and achievements encountered along the way.

Real-Time Analytics with Tim Berglund

Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | StarTree's Tim Berglund unpacks Apache Pinot 1.0! This major milestone release is functionally complete and widely used in production environments. It has introduced many new features to support query-time native JOINs by extending the multi-stage query engine, upsert capabilities (delete, metadata TTL, segment preloading and segment compaction), NULL value support in queries, support for SPI-based pluggable indexes, and improvements to the Spark 3 connector. Be sure to subscribe to catch our future videos covering each new release of Apache Pinot.

Real-Time Analytics with Tim Berglund
Upserts & Deletes in Apache Pinot: A Discussion with Navina Ramesh | Ep. 19

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Aug 14, 2023 21:22


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! StarTree's Tim Berglund and Navina Ramesh sit down to discuss the complex issue of upserts and deletes in analytical databases. They cover the challenges and necessity of these features in real-time analytical processing. Unlike traditional databases where records can be updated, analytical databases are typically immutable, making Pinot unique in its ability to support upserts. The conversation sheds light on why these functionalities are game-changers for real-time analytics.

Enterprise Java Newscast
Stackd 66: Streams, Messages, Events, and a Java User Group

Enterprise Java Newscast

Play Episode Listen Later Aug 11, 2023 121:43


Ian, Kito, and Josh are joined by Java Champion, Streaming Developer Advocate at DataStax, and President of Chicago-JUG, Mary Grygleski. They discuss news about Capacitor, Angular, PrimeNG Designer for Tailwind, JetBraiins Compose Multiplatform for iOS, JDK 21,  AI developer tools, Jakarta EE 10, and more. Kito announces the work he is doing on the Jakarta EE Tutorial, and then they delve into Mary's background and event streaming with Apache Pulsar, plus tools like Apache Pinot, Apache Flink, RisingWave, ByteWax and Apache Cassandra. We Thank DataDog for sponsoring this podcast! https://www.pubhouse.net/datadog Front End  - Announcing Capacitor 5.0 - Ionic Blog (https://ionic.io/blog/announcing-capacitor-5)  - Angular v16 is here! (https://blog.angular.io/angular-v16-is-here-4d7a28ec680d)  - Compose Multiplatform (https://blog.jetbrains.com/kotlin/2023/05/compose-multiplatform-for-ios-is-in-alpha/)  - PrimeNG Designer - Tailwind (Q3 2023) (https://www.primefaces.org/primeng-theme-designer-with-tailwind/) Server Side Java  - Kito is working with Bauke Scholtz and Arjan Tjmes to refresh the Jakarta EE Tutorial     - Eclipse Documentation for Jakarta EE (https://projects.eclipse.org/projects/ee4j.jakartaee-documentation)    - Antora (https://antora.org)    - Asciidoc (http://asciidoc.org)  - Jakarta EE 10; MicroProfile 6; Java SE 20; Open Liberty (https://openliberty.io/blog/2023/04/04/23.0.0.3.html)  - Jakarta EE Starter (https://start.jakarta.ee/) AI/ML  - Phind - AI search engine for developers (https://www.phind.com/)  - 92% of devs using AI coding assistants (https://www.zdnet.com/article/github-developer-survey-finds-92-of-programmers-using-ai-tools/) Java Platform  - JDK 21, the next LTS release, due out in September (https://www.infoworld.com/article/3689880/jdk-21-the-new-features-in-java-21.html) IDE and Tools  - Grazie Professional - IntelliJ IDEs Plugin | Marketplace (https://plugins.jetbrains.com/plugin/16136-grazie-professional) Chat w/Mary  - Twitter: @mgrygles (https://twitter.com/mgrygles)  - Discord server:  https://discord.gg/RMU4Juw  - LinkedIn:  https://www.linkedin.com/in/mary-grygleski/  - Apache Pulsar (https://pulsar.apache.org/)  - Apache Pinot (https://pinot.apache.org/)  - Apache Flink (https://flink.apache.org/)  - RisingWave (https://www.risingwave.dev/)  - ByteWax (https://bytewax.io/)  - Apache Cassandra (https://cassandra.apache.org/)  - Apache Kafka (https://kafka.apache.org/) Picks   - Quantum Energy Squares (Kito) (https://quantumsquares.com/)  - JBOSS EAP on Azure (Josh) (https://learn.microsoft.com/en-us/azure/developer/java/ee/jboss-on-azure)  - Interstellar (Mary) (https://www.imdb.com/title/tt0816692/)  - Black Mirror Season 6 Episode 1 - Joan Is Awful - Netflix (Ian) (https://www.rottentomatoes.com/tv/black_mirror/s06/e01) Other Pubhouse Network podcasts   - Breaking into Open Source (https://www.pubhouse.net/breaking-into-open-source)  - OffHeap (https://www.javaoffheap.com/)  - Java Pubhouse (https://www.javapubhouse.com/) Events  - Lone Star Software Symposium - July 14 - 15, Austin, TX, USA (https://nofluffjuststuff.com/austin)  - ÜberConf - July 18 - 21, Denver, CO, USA (https://uberconf.com/)  - Nebraska.code() - July 19-20, Lincoln, NE, USA (https://nebraskacode.amegala.com/)

Real-Time Analytics with Tim Berglund
Navigating Event Streaming with Eric Sammer, Decodable's CEO | Ep. 17

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Jul 31, 2023 30:49


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the Real-Time Analytics podcast, host Tim Berglund welcomes Eric Sammer, Founder and CEO of Decodable. Eric, an industry leader in event streaming technology, discusses the company's focus on stream processing, real-time data processing, and integration with systems like Apache Pinot and StarTree. The conversation delves into the challenges and complexities of managing data, from data cleansing to structuring for different use cases. They explore the ideal balance between generalized and specialized systems, emphasizing the importance of flexibility. Ultimately, they highlight how stream processing serves as an effective solution to adjust and distribute data intelligently, providing an essential abstraction point.

Engenharia de Dados [Cast]
A Day in a Life of a Founding Engineer at StarTree: Apache Pinot with Neha Pawar

Engenharia de Dados [Cast]

Play Episode Listen Later Jul 25, 2023 69:21


No episódio de hoje, Luan Moreno e Mateus Oliveira entrevistam Neha Pawar, atualmente Founder Engineer na StarTree.Apache Pinot é um banco de dados OLAP de baixa latência, que foi desenvolvido para queries analíticas dentro do Linkedin.O objetivo é resolver um dos problemas que tecnologias como o Apache Kafka não resolvem, consultar bilhões de eventos com performance e baixa latêcia .  Com  Apache Pinot, você tem os seguintes benefícios: Alto desempenho de consultas analíticas;  Dados que residem no Apache Pinot são comprimidos; Habilita milhares de acessos concorrentes aos dados residentes no Apache Pinot.Falamos também sobre os temas: Criação do Apache Pinot; User Facing Analytics;Tipos de Deployment no Apache Pinot;  O que vem por aí no Apache Pinot.Aprenda mais sobre Apache Pinot, uma tecnologia capaz de armazenar dados em tempo real, e executar queries com baixa latência, chegando até milissegundos.Neha Pawar = Linkedinhttps://pinot.apache.org/ Luan Moreno = https://www.linkedin.com/in/luanmoreno/

The GeekNarrator
Tim Berglund on Realtime Analytics with Apache Pinot

The GeekNarrator

Play Episode Listen Later Jul 3, 2023 51:04


Hey Everyone, In the 43rd episode I speak with Tim Berglund on Realtime Analytics with Apache Pinot. Chapters: 00:00 Introduction 01:22 What do we mean by analytics and realtime analytics? 05:35 Can we define realtime in millis, seconds or minutes? 08:54 What is the fundamental difference between traditional analytics systems and Apache Pinot? 12:19 Was Kafka one of the reasons Apache Pinot could reach its full potential? 16:50 E-commerce Application example - How do I get my data in? 20:07 How is data stored (structured) on the disk? 23:31 Are joins available in Apache Pinot? 26:07 Joins vs pre-computing at ingestion 27:15 How is historical data ingested into Apache Pinot? 28:14 Types of indexes available in Apache Pinot 35:42 Do indexes cause write amplification? Is that a problem in Apache Pinot? 40:02 Point lookups in Apache Pinot 42:54 Anamoly Detection 45:51 Coming up in Apache Pinot Links: StarTree https://startree.ai/ Apache Pinot: https://pinot.apache.org/ Joins in Pinot: https://startree.ai/blog/apache-pinot... Apache Pinot Indexes: https://docs.pinot.apache.org/basics/... Other playlists: Distributed systems:    • Distributed Syste...   Modern Databases:    • Modern Databases   Serverless Architecture:    • Serverless Archit...   Software Engineering:    • Software Engineering   I hope you like the episode. Like, share and subscribe to the channel. Cheers, The GeekNarrator

GOTO - Today, Tomorrow and the Future
Unlocking the Power of Real-Time Analytics • Tim Berglund & Adi Polak

GOTO - Today, Tomorrow and the Future

Play Episode Listen Later Jun 2, 2023 44:07 Transcription Available


This interview was recorded for GOTO Unscripted.gotopia.techRead the full transcription of this interview hereTim Berglund - VP DevRel at StarTree & Author of "Gradle Beyond the Basics"Adi Polak - VP of Developer Experience at Treeverse & Contributing to lakeFS OSSRESOURCESTimtimberglund.comtwitter.com/tlberglundlinkedin.com/in/tlberglundAditwitter.com/AdiPolakinstagram.com/polak.codelinkedin.com/in/polak-adiTools & companiespinot.apache.orgtwitter.com/startreedatalinkedin.com/company/startreedatadev.startree.aistree.ai/slackYT videosData Mesh • Zhamak DehghaniBeyond Microservices • Gwen ShapiraDESCRIPTIONAdi Polak and Tim Berglund explore the concept of analytics and what it truly means in the software development world. They delve into the benefits of real-time analytics for product development, highlighting the fine line between compute and storage and the technical requirements for achieving effective real-time analytics. They also discuss the applications of real-time analytics through the lens of Apache Pinot and StarTree Cloud, exploring use cases such as the popular "Who's Watched My Profile on LinkedIn" feature powered by Apache Pinot.RECOMMENDED BOOKSAdi Polak • Scaling Machine Learning with SparkTim Berglund • Gradle Beyond the BasicsTim Berglund & Matthew McCullough • Building and Testing with GradleMark Needham • Building Real-Time Analytics SystemsGwen Shapira, Todd Palino, Rajini Sivaram & Krit Petty • Kafka: The Definitive GuideTwitterLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily

Real-Time Analytics with Tim Berglund
Digging Deep Into Apache Pinot Internals | Ep. 6: ft Rong Rong

Real-Time Analytics with Tim Berglund

Play Episode Listen Later May 8, 2023 31:57


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Who remembers taxis? They were these yellow cars that appeared at random with seemingly inexplicable service charges. You may be more familiar with taxi 2.0: Uber - a platform fully powered by real-time analytics. In this episode, Tim sits down with Rong Rong (Software Engineer, StarTree) to talk about how Uber made use of Apache Pinot, when he came over to StarTree, as well as a deep dive into some Apache Pinot internals.

Real-Time Analytics with Tim Berglund
Uber, LinkedIn, Pinot and Open Source | Ep. 5: ft Mayank Shrivastava

Real-Time Analytics with Tim Berglund

Play Episode Listen Later May 1, 2023 30:26


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! When talking about developing Apache Pinot, Mayank Shrivastava (founding engineer, StarTree) says, “We have to keep venturing into areas where we don't belong.” In this episode, Tim asks Mayank all about his time at LinkedIn, how companies like Uber use this open source technology and how much of real-time analytics was driven by the ever illusive algorithm. From use cases to Apache Pinot internals, this week is all about digging into the details.

uber open source pinot mayank startree apache pinot
Real-Time Analytics with Tim Berglund
Mr. Debezium on Pinot, Flink, CDC & Decodable | Ep. 4: Gunnar Morling

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Apr 24, 2023 28:13 Transcription Available


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Mr. Debezium (also known as Gunnar Morling, Software Engineer at Decodable) has been in the business of open source for a long time. While working with Debezium at Red Hat for the better part of a decade, he's seen how all sorts of technologies integrate together and which to use in a variety of use cases. In this episode, Tim talks to Gunnar about his new role at Decodable, if it's really a “Pinot vs Flink” world we're living in, and when to utilize a variety of streaming technologies.ABOUT THE PODCASTFrom StarTree, the original creators of Apache Pinot, "Real-Time Analytics with Tim Berglund" is a podcast dedicated to bringing analytics from the dashboard to the user interface. Accessible but technically rich, the show focuses on the infrastructure, tools, and techniques being developed by the people building the systems that are serving analytics to our users in real-time. New episodes every Monday. Follow on Spotify, Apple, Google, etc at https://stree.ai/podcast

Real-Time Analytics with Tim Berglund
How Apache Pinot Began ft. Kishore Gopalakrishna of StarTree | Ep. 2

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Apr 10, 2023 29:02 Transcription Available


Do you remember the day you could magically see who viewed your profile on LinkedIn? What was once a website for resumes suddenly gave rise to a new category of database: real-time analytics. From building that new database to what makes the open source community so special, Kishore Gopalakrishna (Co-Creator of Apache Pinot™ and Co-Founder & CEO of StarTree) talks about it all in this week's episode.

The GeekNarrator
Kafka, Realtime analytics and Apache Pinot with Tim Berglund Part-1

The GeekNarrator

Play Episode Listen Later Jan 3, 2023 36:33


Hey Everyone, In this episode I talked to Tim Berglund about his vast experience with Kafka, realtime analytics and Apache Pinot. I hope you like the episode. Do watch the part-2. Cheers, The GeekNarrator

The GeekNarrator
Kafka, Realtime analytics and Apache Pinot with Tim Berglund Part-2

The GeekNarrator

Play Episode Listen Later Jan 3, 2023 39:38


Hey everyone, This is the part-2 of our episode with Tim Berglund. We have covered some advanced topics on Kafka, Kafka Streams and Apache Pinot. I hope you like the discussion. Cheers, The GeekNarrator

Engenharia de Dados [Cast]
Enabling User-Facing Analytics using Apache Pinot with Kishore Gopalakrishna

Engenharia de Dados [Cast]

Play Episode Listen Later Dec 29, 2022 52:11


Neste episódio entrevistamos o Kishore Gopalakrishna, Co-Fundador e CEO da empresa StarTree, Luan Moreno e Mateus Oliveira batem um papo com o co-criador dessa poderosa ferramenta chamada Apache Pinot.O Pinot é um OLAP DataStore desenvolvido para responder consultas analíticas com tempo de resposta na casa dos milissegundos, podendo ser considerado um banco de dados para consultas em tempo-real. Capaz de ingerir de fontes de dados em Batch (Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage), bem como fontes de dados em Stream (Apache Kafka, Apache Pulsar, Amazon Kinesis).O Pinot foi projetado para executar consultas OLAP em tempo real, com baixa latência em grandes quantidades de eventos para entregar o conceito de User-Facing Analytics.Foi criado e desenvolvido por engenheiros do LinkedIn e do Uber e projetado para escalar e expandir sem limites.Apache PinotKishore GopalakrishnaStarTree Luan Moreno = https://www.linkedin.com/in/luanmoreno/

The GeekNarrator
Tiered Storage implementation by StarTree (Apache Pinot) with Neha Pawar

The GeekNarrator

Play Episode Listen Later Oct 29, 2022 63:11


In this podcast I have invited Neha Pawar, who is one of the Founding Engineers are StarTree (the company powering Apache Pinot). We talked about how StarTree has implemented Tiered storage and how it differs from other available implementations. Note: Currently tiered storage is available only in StarTree's Pinot and not available in the open source version. But its only about time. Chapters: 00:00 Introduction 03:28 What does Tiered Storage mean? 05:51 How many tiers are typically supported? 07:30 Is it mainly about Cost Optimisation? How do I compare the cost savings vs performance hit? 15:41 What is mmap and how does it help? 16:45 How do I implement/approach Tiered Storage? What are the challenges? 23:00 What is Apache Pinot? When we say low latency, how low it is? 25:00 How is it implemented in StarTree (Apache Pinot)? 36:45 What happens when I query for more number of (or all) columns? How is that optimised? 47:10 What are the failure modes? 50:15 How can we test and validate Tiered Storage as a feature? 54:30 How would bloom filter false positives affect performance and correctness? 56:15 Can I move back my data from Cold storage to Hot Storage? 57:45 What other cloud storage services are supported other than S3? 58:35 What is the future of Tiered Storage?

Founder Real Talk
Kishore Gopalakrishna, Co-founder and CEO of StarTree, on Building Real-Time Analytics and Leveraging Community Support

Founder Real Talk

Play Episode Listen Later Aug 29, 2022 26:31


Kishore Gopalakrishna, co-founder and CEO of StarTree, created a solution to a database problem with his co-worker and eventual co-founder Xiang Fu while working at LinkedIn. At the time, LinkedIn was debuting its now-popular feature called Who's Viewed Your Profile, which required the ability to slice and dice massive amounts of data in real time. Kishore and Xiang developed what they called Apache Pinot, a real-time distributed analytical processing data store used to deliver scalable real-time analytics with very low latency. The pair went on to found their open source company, StarTree, in 2019 to build a commercial version of Apache Pinot. The analytics provided by its technology are increasingly essential for all kinds of business decision makers, and the company's quickly emerged as a leader in serving up real-time user-facing analytics at very low latency—for millions. In this episode, Kishore talks about the solutions StarTree provides, its key relationship with the developer community and the roadmap for the company, which just announced a $47 million series B led by GGV, with participation from investor existing investors, Bain and CRV as well as new investor Sapphire Ventures.

TechCrunch Startups – Spoken Edition
Data analytics startup StarTree secures cash to expand its Apache Pinot–powered platform

TechCrunch Startups – Spoken Edition

Play Episode Listen Later Aug 29, 2022 5:09


StarTree, a company building what it describes as an “analytics-as-a-service” platform, today announced that it raised $47 million in a Series B round led by GGV Capital with participation from Sapphire Ventures, Bain Capital Ventures, and CRV.

TechCrunch Startups – Spoken Edition
Data analytics startup StarTree secures cash to expand its Apache Pinot–powered platform

TechCrunch Startups – Spoken Edition

Play Episode Listen Later Aug 29, 2022 5:08


StarTree, a company building what it describes as an “analytics-as-a-service” platform, today announced that it raised $47 million in a Series B round led by GGV Capital with participation from Sapphire Ventures, Bain Capital Ventures, and CRV.

The GeekNarrator
Running Distributed Systems like a Pro with Mayank Shrivastava

The GeekNarrator

Play Episode Listen Later Aug 6, 2022 63:24


Hey Everyone, In this episode I am talking to Mayank Shrivastava who has vast experience into building and maintaining high scale distributed systems. He was in the team that originally built Apache Pinot at Linkedin and is now working at StarTree as the Head of Core Data Engineering. He has shared some amazing insights from his experience and there is a lot to learn from our discussion. We discuss about the following: 00:00 Introduction 04:20 Practices to follow while designing and developing Distributed Systems 05:47 What do we mean by Solid Scalable Design? How do we approach that? 09:00 Safety Nets for developing Distributed systems 10:21 When is the right time to do performance benchmarking? 17:00 What is release certification? 21:00 Deploying to Production 24:45 Example when Canary Deployment might not be a good strategy? 26:00 Example when Canary Deployment a good strategy? 27:30 Post Deployment - how do we observe our system? 33:30 How do we avoid on-call(alerting) noise? 42:00 Maintaining a Large scale Distributed system 47:15 Scaling up/down for stateful systems 51:30 Handling Failures in Production (Disaster Recovery) 01:00:30 Runbooks - How do we keep them updated? References: The GeekNarrator Linkedin page: https://www.linkedin.com/company/86276626 Kaivalya Apte: https://www.linkedin.com/in/kaivalya-apte-2217221a/ Geeknarrator website: www.geeknarrator.com Mayank Shrivastava: https://www.linkedin.com/in/mayankshriv/ StarTree: https://www.startree.ai/ Apache Pinot: https://pinot.apache.org/ Hope you enjoy the discussion and learn from it. Please hit the like button if you liked my discussion with Mayank and please subscribe to the channel for more content like this. Cheers, The GeekNarrator

Open Source Startup Podcast
E41: Real-time Analytics Powered by Startree & Apache Pinot

Open Source Startup Podcast

Play Episode Listen Later Jul 6, 2022 44:47


Kishore Gopalakrishna is Co-Founder & CEO of Startree, the real-time analytics platform that provides a managed service on top of the open-source distributed data store Apache Pinot. Kishore is also the co-creator of Apache Pinot, which was started while he was at LinkedIn. Since leaving to build Startree, Kishore and his team have raised $28M from investors including GGV, Bain Capital Ventures, and CRV. In this episode, we discuss the right time to launch a managed service on top of an open source project, the importance of relentless focus on customer needs and use cases early-on, community building, and much more.

Open||Source||Data
Apache Pinot and Real-Time Analytics with Neha Pawar

Open||Source||Data

Play Episode Listen Later May 25, 2022 40:57


This episode features an interview with Neha Pawar, a Founding Engineer at StarTree. StarTree is a software development company that focuses on democratizing data for all users by providing real-time, user-facing analytics.Prior to her time at StarTree, Neha was a Senior Software Engineer on LinkedIn's Data Analytics team where she spent five years working on Apache Pinot. Neha has provided countless contributions to Pinot over the years, focusing on real-time streaming integrations, ingestion, and storage. In this episode, Sam sits down with Neha to discuss Apache Pinot's impact on the data community and how LinkedIn popularized real-time analytics.-------------------"Many people do think that a batch is good enough, real-time infra is expensive anyway. And what difference is it going to make if the data shown in this application is a day ago or an hour ago, and it's not real-time to the nearest second? And while that is true, in some cases, but in many other cases, not having real-time data can be super expensive and can affect the business badly and also make them irrelevant. You need the real-time data and then you also need to be able to analyze that data at the speed of your thought. For example, if you are having fraudulent activity somewhere, you can't wait for, ‘Hey, my model is going to learn about this.' And then the next time, be able to tell me that that was a fraudulent activity. You need to be able to analyze all that data right now. So, it's not just a nice-to-have, it's a must-have.” – Neha Pawar-------------------Episode Timestamps:(01:58): What open source data means to Neha(06:04): Neha's learnings from the LinkedIn Data Analytics Team(07:07): What peaked Neha's interest in real-time data analytics(08:30): Neha's first experiences working on Apache Pinot(11:40): How the work of real-time data spread from LinkedIn to other companies(17:30): How the Apache community has grown(24:04): Neha's focus at StarTree(30:41): Neha's motivation for tiered storage at StarTree (37:07): Neha's advice for open source data folks-------------------Links:LinkedIn - Connect with NehaLinkedIn - Connect with StarTreeTwitter - Follow NehaTwitter - Follow StarTreeVisit StarTree

Software Engineering Daily
Pinot and StarTree with Chinmay Soman

Software Engineering Daily

Play Episode Listen Later May 9, 2022 44:17


Real-time analytics are difficult to achieve because large amounts of data must be integrated into a data set as that data streams in. As the world moved from batch analytics powered by Hadoop into a norm of “real-time” analytics, a variety of open source systems emerged. One of these was Apache Pinot. StarTree is a The post Pinot and StarTree with Chinmay Soman appeared first on Software Engineering Daily.

Podcast – Software Engineering Daily
Pinot and StarTree with Chinmay Soman

Podcast – Software Engineering Daily

Play Episode Listen Later May 9, 2022 51:58


Real-time analytics are difficult to achieve because large amounts of data must be integrated into a data set as that data streams in. As the world moved from batch analytics powered by Hadoop into a norm of “real-time” analytics, a variety of open source systems emerged. One of these was Apache Pinot. StarTree is a The post Pinot and StarTree with Chinmay Soman appeared first on Software Engineering Daily.

SaaS for Developers
Speed of Apache Pinot - Cost of Cloud Storage

SaaS for Developers

Play Episode Listen Later Mar 25, 2022 41:29


Companies that take existing software and manage it in the cloud have the opportunity to rethink their architecture and use cloud services appropriately. The resulting system is often cheaper, simpler, more elastic, and more manageable. Neha Pawar, a founding engineer at StarTree Data and a PMC member of Apache Pinot, recently published a blog post about her experiments with tiered storage for Pinot. We had an in-depth conversation about how they approached the architecture change and the challenges of achieving reasonable latencies with cloud object storage like S3. And the big question: Why did otherwise reasonable engineers decide to tackle this project's prototype over a weekend? Interested in more SaaS? Join our Slack: https://saas-community.github.io/ The blog post: https://www.startree.ai/blogs/introducing-tiered-storage Twitter thread: https://twitter.com/KishoreBytes/status/1503075370240659462

Data Engineering Podcast
Accelerate Your Embedded Analytics With Apache Pinot

Data Engineering Podcast

Play Episode Listen Later Mar 20, 2022 72:56


Data and analytics are permeating every system, including customer-facing applications. The introduction of embedded analytics to an end-user product creates a significant shift in requirements for your data layer. The Pinot OLAP datastore was created for this purpose, optimizing for low latency queries on rapidly updating datasets with highly concurrent queries. In this episode Kishore Gopalakrishna and Xiang Fu explain how it is able to achieve those characteristics, their work at StarTree to make it more easily available, and how you can start using it for your own high throughput data workloads today.