POPULARITY
Did you know every time you order food, book a ride, or even check who viewed your profile, real-time analytics is powering your experience behind the scenes?In this episode of Great Things with Great Tech, we dive deep into the power of real-time analytics with Kishore Gopalakrishna, CEO and Co-founder of StarTree. StarTree leverages Apache Pinot, a high-performance real-time analytics database, revolutionizing how leading companies like Uber, LinkedIn, Walmart, and Etsy provide instant insights and personalized experiences at massive scale.Kishore shares his journey from a gaming enthusiast fascinated by distributed systems to building mission-critical platforms at Yahoo and LinkedIn, eventually creating Apache Pinot. Discover how StarTree is powering billions of real-time queries per week, enabling businesses to enhance customer interactions, optimize operational decisions, and supercharge modern AI and observability.Key Takeaways: How real-time analytics transform industries, enabling instantaneous insights and rapid decision-making. The evolution from traditional databases to highly efficient columnar, real-time analytics systems. Real-world applications of Apache Pinot, from consumer apps to enterprise observability and operational excellence. How real-time data is accelerating innovations in AI, specifically through Real-Time Retrieval-Augmented Generation (RAG). The future of analytics: seamless data ingestion, enhanced concurrency, and the growing demand for sub-second response times.Links & Resources: Web StarTree: https://startree.ai Kishore Gopalakrishna on LinkedIn: https://www.linkedin.com/in/kgopalak/Apache Pinot: https://pinot.apache.org☑️ Support the Channel: https://ko-fi.com/gtwgt☑️ Be on #GTwGT: Contact via Twitter @GTwGTPodcast or visit https://www.gtwgt.com☑️ Subscribe to YouTube: https://www.youtube.com/@GTwGTPodcast?sub_confirmation=1Check out the full episode on our platforms:Spotify: https://open.spotify.com/episode/2l9aZpvwhWcdmL0lErpUHC?si=x3YOQw_4Sp-vtdjyroMk3QApple Podcasts: https://podcasts.apple.com/us/podcast/darknet-diaries-with-jack-rhysider-episode-83/id1519439787?i=1000654665731Follow Us:Website: https://gtwgt.comTwitter: https://twitter.com/GTwGTPodcastInstagram: https://instagram.com/GTwGTPodcast☑️ Music: https://www.bensound.com
In this episode of The Geek Narrator podcast, hosted by Kaivalya Apte, we welcome a special guest, Kishore Gopalakrishna from StarTree, co-author of Apache Pinot and other notable projects. Kishore shares his extensive experience in building real-time analytics and streaming systems, including Apache Pino, Espresso, Apache Helix, and Third Eye. The episode delves into the motivations and challenges behind creating these systems, the innovations they brought to distributed systems, and the impact of community on open-source projects. Kishore also discusses the evolution of testing methodologies, cost optimizations in transactional and analytical systems, and key considerations for companies evaluating real-time analytics solutions. Don't miss this in-depth conversation packed with valuable insights for both seasoned developers and tech enthusiasts! Chapters: 00:00 Introduction 03:13 Building Distributed Systems at LinkedIn 08:57 Testing and Challenges in Distributed Systems 30:50 Advantages of Columnar Storage 33:04 The Importance of Upserts 34:24 Building a Strong Open Source Community 41:10 Challenges and Lessons in System Design 51:35 Real-Time Analytics: Do You Need It? StarTree: https://startree.ai/ Apache Pinot: https://pinot.apache.org/ If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #kafka #s3 #streaming #realtimeanalytics #database #pinot #startree
Companies need to provide real time insights to both customers and internal users. These insights power use cases such as personalization and fraud detection. StarTree Cloud is a real-time analytics platform built on Apache Pinot for building such applications that depend on real time insights. In this episode we meet with Chinmay Soman, Head of Product at Startree.ai to discuss the different dimensions of real-time analytics and how Apache Pinot and StarTree Cloud offer a robust platform for providing such insights to applications.AWS Hosts: Nolan Chen & Malini ChatterjeeEmail Your Feedback: rethinkpodcast@amazon.comResources:StarTree:https://startree.aiStarTree community Slack:https://communityinviter.com/apps/startreedata/startree-communityApache Pinot Slack: https://communityinviter.com/apps/apache-pinot/apache-pinotServerless / Free forever workspace:https://stree.ai/free
The pace and scale of modern business puts traditional data architectures on their heels. To keep user-facing apps running smoothly, companies need a new way to observe and address critical issues. That's the focus of Apach Pinot, an open-source distributed database specifically designed for real-time analytics. It excels in providing low-latency query responses even at high throughput, making it ideal for scenarios where immediate insights from massive amounts of data are essential. How does it work, and how can it help you? Check out this episode of DM Radio to find out! Host @eric_kavanagh will interview Kishore Gopalakrishna of StarTree, and Hyoun Park of Amalgam Insights.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Join us for episode #51 of the Real-Time Analytics podcast as our host, Tim Berglund, is joined by Tim Veil, VP of Solutions Engineering and Enablement at StarTree. Dive into an discussion about Testcontainers, a powerful tool that leverages Docker for sophisticated integration testing. Learn how Testcontainers simplifies the testing process against real databases like Apache Pinot, enhancing code reliability and CI pipeline efficiency.
Pinot 1.1: https://docs.pinot.apache.org/basics/releases | Sub: https://stree.ai/sub | In this release video, Tim Berglund (VP of Developer Relations, StarTree) covers the updates since Pinot 1.0, including 166 new features and 152 bug fixes. Tim delves into key enhancements such as the introduction of vector index support—vital for AI and machine learning applications—and improvements in the multi-stage query engine. He also explains the significance of sticky query routing and new approximation algorithms like HyperLogLog++. Whether you're a seasoned developer or a data enthusiast keen to understand the latest trends in database technology, this video offers valuable insights into optimizing real-time data processing with Apache Pinot.
On today's episode we are interviewing Jess Iandiorio, a repeat startup CMO currently at StarTree, and previously at unicorns including Starburst, Mirakl and Acquia. She covers how to budget for brand, why brand design matters, and how brand fuels demand.You can learn more by following us on LinkedIn using the links below:Jess IandiorioStarTree.aiAmrita GurneyJanessa LantzThanks for listening!
“So you're telling me that a DevRel's job is more than just posting funny tweets and partying at conferences?” Yes. Yes we are. And so is our guest on today's episode - Tim Berglund. Tim is the Vice President of Developer Relations at Star Tree, and he breaks down for us what DevRel means to him, and how he counteracts people's misconceptions about the role. In today's world, it's as difficult as ever for a DevRel to describe their job, but Tim gives it a go on this episode of the podcast. What's important to know is that a DevRel's job should leave room for “fun” stuff. The whole reason the role benefits a company is that it gives a member of the team the time and space to explore new, creative ways of doing things and achieving goals. A DevRel that wasn't having at least a little bit of fun wouldn't be doing a very good job! Though after the play hard, comes the work hard. And for DevRels that often comes in the form of justifying your own existence. Tim explains that DevRel will never be like marketing, where webinars and dinners can lead to a phone call the next day - easy to track, right? DevRel is a little more complicated than that. You might not know who's on the other side of your video, or blog post, and might never be able to track their journey from interacting with your content, to becoming a customer years down the line. So in the meantime, you're left with making general measurements and collecting testimonies, that while helpful, could be much improved upon as a system. Reach out to Tim here: https://www.linkedin.com/in/tlberglund/ Listen to Tim's Podcast - ‘Real Time Analytics' here: https://podcasts.apple.com/us/podcast/real-time-analytics-with-tim-berglund/id1680445905 Find out more and listen to previous podcasts here: https://www.voxgig.com/podcast Subscribe to our newsletter for weekly updates and information about upcoming meetups: https://voxgig.substack.com/ Join the Dublin DevRel Meetup group here: www.devrelmeetup.com
The likes of LinkedIn and Uber use Pinot to power some astonishingly high-scale queries against realtime data. The numbers alone would make an impressive case-study. But behind the headline lies a fascinating set of architectural decisions and constraints to get there. So how does Pinot work? How does it process queries? How are the various roles split across a cluster? And equally important - what does it *not* try to achieve.Joining me to go through the nuts and bolts of how Pinot handles SQL queries is Tim Berglund, veteran technology explainer of the realtime-data world. He takes us through Pinot step-by-step, covering the roles of brokers, servers, controllers and minions as we build up the picture of a query engine that's interesting in theory and massively performant in practice.–Apache Pinot: https://pinot.apache.org/Apache Pinot Docs: https://docs.pinot.apache.org/StarTree: https://startree.ai/Event Driven Design episode with Bobby Calderwood: https://youtu.be/V7vhSHqMxusTim on Twitter: https://twitter.com/tlberglundKris on Mastodon: http://mastodon.social/@krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/Kris on Twitter: https://twitter.com/krisajenkins–#podcast #softwaredevelopment #apachepinot #database #dataengineering #sql
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join us as we dive into the fascinating intersection of API gateways and real-time analytics with Viktor Gamov, the new head of Developer Advocacy at StarTree. Viktor shares his insights from his recent experiences and explores how these technologies are transforming user-facing analytics. We also discuss Viktor's upcoming all-day Pinot training at the RTA Summit and delve into some intriguing topics like the "Hyperion Cantos" and the concept of a StarTree. Join us at the summit, visit rtasummit.com► Hyperion Cantos by Dan Simmons: https://www.amazon.com/Hyperion-Cantos-Book-Complete-Set/dp/B084ZB7SMP► The Four: The Hidden DNA of Amazon, Apple, Facebook, and Google by Scott Galloway: https://www.amazon.com/Four-Hidden-Amazon-Facebook-Google/dp/0525501223► The Movie Database: https://developer.themoviedb.org/docs/getting-started► Developer Voices episode ft. Bobby Calderwood: https://www.youtube.com/watch?v=V7vhSHqMxus
Unlock the secrets of engaging developer communities and the transformative world of real-time data analytics with our guest, Viktor Gamov of StarTree. From crafting code to leading developer relations, Viktor unravels his career evolution, highlighting how fostering connections and sharing knowledge with developers has reshaped the landscape of tech communication. His take on the democratization of technical know-how reveals the profound impact of making what was once consultancy-exclusive, accessible to all. Tune in for a masterclass on the importance of community in the tech industry and how it can break barriers for innovation.Are you ready to see data come to life? Viktor's thrilling exposition on Kafka, KSQL, and Apache Pino turns the arcane into the amazing, using a real-time Pac-Man game dashboard to illustrate the revolutionary shift from batch to stream processing. Witness the rebirth of open-source technologies and grasp the concept of 'data in motion' as we discuss the critical importance of streaming platforms in modern data architecture. Viktor's expertise in developer relations shines as he demonstrates the value of making complex tech relatable and relevant to business needs.The data landscape is ever-evolving, and with the rise of AI, the stakes have never been higher. In an era where milliseconds matter, Viktor peels back the layers of how Apache Pino is driving real-world solutions for industries galore. From restaurant load management to transaction tracking, discover how real-time analytics are informing strategic business decisions. As we journey with our guest back to his roots in data and game development, we're reminded of the cyclical nature of passion and profession—where one's beginnings often foretell the trajectory of their career. Don't miss out on this episode, where we connect the dots between nostalgia and the next wave of data innovation.What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.
SHOWNOTESGet ready for the latest episode of Masters of MEDDICC! Host Andy Whyte chats with experienced revenue leader, Jeff Miller, current CRO of StarTree. Listen and have a chuckle as the two talk about their shared history of selling door-to-door - and what they learned from it.The way you communicate with your customer can make or break a deal or engagement. There are a lot of opinions out there about the best way to do it - product-led or open-source? Either way, Andy and Jeff look at how MEDDIC can underpin it all, and help you explain the value of your solution to your customers.Andy and Jeff discuss how their ADHD acts as a superpower in sales. Jeff, as always, ties it back to the customer, comparing how he explained how his ADHD makes his brain work differently to understanding that every customer will think differently to you. ABOUT JEFF:Jeff Miller is the Chief Revenue Officer at StarTree, leading the company's go-to-market strategy across its Sales, Revenue Operations, Solutions Engineering, Customer Success and Product-Led Growth teams. Jeff brings more than two decades of experience leading, managing and growing successful sales teams and exponentially increasing revenue for high-growth tech companies. Prior to joining StarTree, Jeff served as the CRO of Cockroach Labs, where he helped lead the company to its Series F stage and a $5billion valuation with over 250 customers. He previously served as Senior Vice President of Sales at Hortonworks, leading the organization to become the fastest-growing software company ever to get $100million in ARR and a successful IPO.
Orchestrate all the Things podcast: Connecting the Dots with George Anadiotis
For many organizations today, data management comes down to handing over their data to one of the "Big 5" data vendors: Amazon, Microsoft Azure and Google, plus Snowflake and Databricks. But analysts David Vellante and George Gilbert believe that the needs of modern data applications coupled with the evolution of open storage management may lead to the emergence of a "sixth data platform". The sixth data platform hypothesis is that open data formats may enable interoperability, leading the transition away from vertically integrated vendor-controlled platforms towards independent management of data storage and permissions. It's an interesting scenario, and one that would benefit users by forcing vendors to compete for every workload based on the business value delivered, irrespective of lock-in. But how close are we to realizing this? To answer this question, we have to examine open data formats and their interoperability potential across clouds and formats, as well as on the semantics and governance layer. We caught up with Peter Corless and Alex Merced to talk about all of that. Article published on Orchestrate all the Things: https://linkeddataorchestration.com/2024/01/11/data-management-in-2024-open-data-formats-and-a-common-language-for-a-sixth-data-platform/
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of "The Real Time Analytics Podcast," Tim Berglund is joined by returning guest Peter Corless (Director of Product Marketing, StarTree) to delve into the complex world of federated data systems. They discuss the evolution of data architectures, the challenges of federated identity and data governance, and the implications for modern businesses. Tune in for an insightful conversation on the intricacies and future directions of federated data in an era of diverse and interconnected systems.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join host Tim Berglund and StarTree's Peter Corless on the "Real-Time Analytics" podcast as they explore the evolution of data architecture and the relevance of the 'data stack' concept in today's tech landscape. They delve into the shift from traditional structures like the LAMP stack to more dynamic, complex systems, underscoring the need for new frameworks and terminologies. ► LAMP stack: https://en.wikipedia.org/wiki/LAMP_(software_bundle)► JAM stack: https://jamstack.org/► OSI reference model: https://en.wikipedia.org/wiki/OSI_model► Recent Trino episode of podcast: https://youtu.be/_eFdbfn1gO0► The StarTrek Federation: https://memory-alpha.fandom.com/wiki/United_Federation_of_Planets► Look for Peter Corless in StarTree Community Slack (stree.ai/slack)
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the 'Real-Time Analytics' podcast, dive into the world of Pinot capacity planning with Sandeep Dabade, a solutions engineer at StarTree. Discover how to calculate the perfect cluster size for your real-time analytics requirements and explore essential technical KPIs like read throughput, write throughput, and data size. Sandeep shares invaluable insights into optimizing Pinot for seamless data processing and analytics, making this episode essential for anyone tackling real-time data challenges.Sandeep's blogs:► https://startree.ai/blog/best-practices-for-designing-tables-in-apache-pinot► https://startree.ai/blog/star-tree-indexes-in-apache-pinot-part-1-understanding-the-impact-on-query-performance► https://startree.ai/blog/star-tree-indexes-in-apache-pinot-part-2-understanding-the-impact-during-high-concurrency► https://startree.ai/blog/star-tree-index-in-apache-pinot-part-3-understanding-the-impact-in-real-customer
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join us for Part 2 of the "Real-Time Analytics" podcast featuring Neha Pawar of StarTree, where we delve into Apache Pinot's advanced features including its pluggable architecture, upserts, and Kafka integration. Uncover how Pinot maintains data integrity in real-time analytics and get an insider's look at StarTree Cloud's exclusive tiered storage system.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the "Real-Time Analytics" podcast, host Tim and guest Neha Pawar, a founding engineer of StarTree, explore Apache Pinot's unique capabilities in real-time analytics. Neha unpacks Pinot's efficiency, low latency, and high throughput, revealing its prowess in offering real-time insights to end users. Tune in to this first installment of a two-part series for an insightful discussion on the intricacies and innovations that make Pinot a standout in the analytics landscape.
Summary Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold (https://www.dataengineeringpodcast.com/datafold) You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today to get 2 weeks free! As more people start using AI for projects, two things are clear: It's a rapidly advancing field, but it's tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES (https://Neo4j.com/NODES). Your host is Tobias Macey and today I'm interviewing Eric Sammer about starting your stream processing journey with Decodable Interview Introduction How did you get involved in the area of data management? Can you describe what Decodable is and the story behind it? What are the notable changes to the Decodable platform since we last spoke? (October 2021) What are the industry shifts that have influenced the product direction? What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data. What was the process for adding full Java support in addition to SQL? What are the developer experience challenges that are particular to working with streaming data? How have you worked to address that in the Decodable platform and interfaces? As you evolve the technical and product direction, what is your heuristic for balancing the unification of interfaces and system integration against the ability to swap different components or interfaces as new technologies are introduced? What are the most interesting, innovative, or unexpected ways that you have seen Decodable used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Decodable? When is Decodable the wrong choice? What do you have planned for the future of Decodable? Contact Info esammer (https://github.com/esammer) on GitHub LinkedIn (https://www.linkedin.com/in/esammer/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Decodable (https://www.decodable.co/) Podcast Episode (https://www.dataengineeringpodcast.com/decodable-streaming-data-pipelines-sql-episode-233/) Flink (https://flink.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/apache-flink-with-fabian-hueske-episode-57/) Debezium (https://debezium.io/) Podcast Episode (https://www.dataengineeringpodcast.com/debezium-change-data-capture-episode-114/) Kafka (https://kafka.apache.org/) Redpanda (https://redpanda.com/) Podcast Episode (https://www.dataengineeringpodcast.com/vectorized-red-panda-streaming-data-episode-152/) Kinesis (https://aws.amazon.com/kinesis/) PostgreSQL (https://www.postgresql.org/) Podcast Episode (https://www.dataengineeringpodcast.com/postgresql-with-jonathan-katz-episode-42/) Snowflake (https://www.snowflake.com/en/) Podcast Episode (https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/) Databricks (https://www.databricks.com/) Startree (https://startree.ai/) Pinot (https://pinot.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/pinot-embedded-analytics-episode-273/) Rockset (https://rockset.com/) Podcast Episode (https://www.dataengineeringpodcast.com/rockset-serverless-analytics-episode-101/) Druid (https://druid.apache.org/) InfluxDB (https://www.influxdata.com/) Samza (https://samza.apache.org/) Storm (https://storm.apache.org/) Pulsar (https://pulsar.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/pulsar-fast-and-scalable-messaging-with-rajan-dhabalia-and-matteo-merli-episode-17) ksqlDB (https://ksqldb.io/) Podcast Episode (https://www.dataengineeringpodcast.com/ksqldb-kafka-stream-processing-episode-122/) dbt (https://www.getdbt.com/) GitHub Actions (https://github.com/features/actions) Airbyte (https://airbyte.com/) Singer (https://www.singer.io/) Splunk (https://www.splunk.com/) Outbox Pattern (https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join us on the Real-Time Analytics Podcast as we delve into the intriguing intersection of data mesh and event streaming with Hubert Dulay, a developer advocate at StarTree and the author of "Streaming Data Mesh." Our host, Tim Berglund, uncovers the journey from Zhamak Dehghani's initial concept to Hubert's vision of implementing it in a streaming context. Understand the essence of treating data as a product, the future of streaming technologies, and the transformative role of data in modern businesses. Hubert's book: https://www.oreilly.com/library/view/streaming-data-mesh/9781098130718/Zhamak Dehghani's Real-Time Analytics Summit keynote: https://youtu.be/Pz3UPpv_JIs
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Today Tim is joined by Ralph Debusmann (Enterprise Kafka Engineer, Migros) and Hubert Dulay (Developer Advocate, StarTree) where they delve deep into the world of streaming databases. They explore the blend of traditional databases with streaming elements, aiming to make stream processing more user-friendly with SQL. Discussing tools like ksqlDB and Materialize, they touch upon Martin Kleppmann's theories of transforming databases and the pros and cons of current streaming platforms. Dive in to learn more about the future of data streaming!Turning the database inside-out: https://martin.kleppmann.com/2015/11/05/database-inside-out-at-oredev.html
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! StarTree's Tim Berglund and Navina Ramesh sit down to discuss the complex issue of upserts and deletes in analytical databases. They cover the challenges and necessity of these features in real-time analytical processing. Unlike traditional databases where records can be updated, analytical databases are typically immutable, making Pinot unique in its ability to support upserts. The conversation sheds light on why these functionalities are game-changers for real-time analytics.
Pinot team at Uber wrote an excellent paper about the real-time analytics platform they built. Chinmay, formerly a principal engineer at Uber and now head of product at StarTree, joined me for a conversation. We discussed the challenges they encountered at Uber, the solutions they came up with, the platform they built, and how to best apply their experience to companies much smaller than Uber. The paper: https://arxiv.org/pdf/2104.00087.pdf
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In this episode of the Real-Time Analytics podcast, host Tim Berglund welcomes Eric Sammer, Founder and CEO of Decodable. Eric, an industry leader in event streaming technology, discusses the company's focus on stream processing, real-time data processing, and integration with systems like Apache Pinot and StarTree. The conversation delves into the challenges and complexities of managing data, from data cleansing to structuring for different use cases. They explore the ideal balance between generalized and specialized systems, emphasizing the importance of flexibility. Ultimately, they highlight how stream processing serves as an effective solution to adjust and distribute data intelligently, providing an essential abstraction point.
No episódio de hoje, Luan Moreno e Mateus Oliveira entrevistam Neha Pawar, atualmente Founder Engineer na StarTree.Apache Pinot é um banco de dados OLAP de baixa latência, que foi desenvolvido para queries analíticas dentro do Linkedin.O objetivo é resolver um dos problemas que tecnologias como o Apache Kafka não resolvem, consultar bilhões de eventos com performance e baixa latêcia . Com Apache Pinot, você tem os seguintes benefícios: Alto desempenho de consultas analíticas; Dados que residem no Apache Pinot são comprimidos; Habilita milhares de acessos concorrentes aos dados residentes no Apache Pinot.Falamos também sobre os temas: Criação do Apache Pinot; User Facing Analytics;Tipos de Deployment no Apache Pinot; O que vem por aí no Apache Pinot.Aprenda mais sobre Apache Pinot, uma tecnologia capaz de armazenar dados em tempo real, e executar queries com baixa latência, chegando até milissegundos.Neha Pawar = Linkedinhttps://pinot.apache.org/ Luan Moreno = https://www.linkedin.com/in/luanmoreno/
In this episode, Tim Berglund, Vice President of Developer Relations at StarTree, joins us to discuss what has and hasn't changed in the world of developer education, why hardware feels so magical when compared to software, and why being a teacher and an outstanding developer are two completely different skills.
This interview was recorded for GOTO Unscripted.gotopia.techRead the full transcription of this interview hereTim Berglund - VP DevRel at StarTree & Author of "Gradle Beyond the Basics"Adi Polak - VP of Developer Experience at Treeverse & Contributing to lakeFS OSSRESOURCESTimtimberglund.comtwitter.com/tlberglundlinkedin.com/in/tlberglundAditwitter.com/AdiPolakinstagram.com/polak.codelinkedin.com/in/polak-adiTools & companiespinot.apache.orgtwitter.com/startreedatalinkedin.com/company/startreedatadev.startree.aistree.ai/slackYT videosData Mesh • Zhamak DehghaniBeyond Microservices • Gwen ShapiraDESCRIPTIONAdi Polak and Tim Berglund explore the concept of analytics and what it truly means in the software development world. They delve into the benefits of real-time analytics for product development, highlighting the fine line between compute and storage and the technical requirements for achieving effective real-time analytics. They also discuss the applications of real-time analytics through the lens of Apache Pinot and StarTree Cloud, exploring use cases such as the popular "Who's Watched My Profile on LinkedIn" feature powered by Apache Pinot.RECOMMENDED BOOKSAdi Polak • Scaling Machine Learning with SparkTim Berglund • Gradle Beyond the BasicsTim Berglund & Matthew McCullough • Building and Testing with GradleMark Needham • Building Real-Time Analytics SystemsGwen Shapira, Todd Palino, Rajini Sivaram & Krit Petty • Kafka: The Definitive GuideTwitterLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Who remembers taxis? They were these yellow cars that appeared at random with seemingly inexplicable service charges. You may be more familiar with taxi 2.0: Uber - a platform fully powered by real-time analytics. In this episode, Tim sits down with Rong Rong (Software Engineer, StarTree) to talk about how Uber made use of Apache Pinot, when he came over to StarTree, as well as a deep dive into some Apache Pinot internals.
Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! When talking about developing Apache Pinot, Mayank Shrivastava (founding engineer, StarTree) says, “We have to keep venturing into areas where we don't belong.” In this episode, Tim asks Mayank all about his time at LinkedIn, how companies like Uber use this open source technology and how much of real-time analytics was driven by the ever illusive algorithm. From use cases to Apache Pinot internals, this week is all about digging into the details.
Do you remember the day you could magically see who viewed your profile on LinkedIn? What was once a website for resumes suddenly gave rise to a new category of database: real-time analytics. From building that new database to what makes the open source community so special, Kishore Gopalakrishna (Co-Creator of Apache Pinot™ and Co-Founder & CEO of StarTree) talks about it all in this week's episode.
Real-time analytics. We've all heard the term, but what does it really mean? It's only fitting that we kick off this podcast with some history and exploration of the term by our host, Tim Berglund (Developer Relations, StarTree).
Neste episódio entrevistamos o Kishore Gopalakrishna, Co-Fundador e CEO da empresa StarTree, Luan Moreno e Mateus Oliveira batem um papo com o co-criador dessa poderosa ferramenta chamada Apache Pinot.O Pinot é um OLAP DataStore desenvolvido para responder consultas analíticas com tempo de resposta na casa dos milissegundos, podendo ser considerado um banco de dados para consultas em tempo-real. Capaz de ingerir de fontes de dados em Batch (Hadoop HDFS, Amazon S3, Azure ADLS, Google Cloud Storage), bem como fontes de dados em Stream (Apache Kafka, Apache Pulsar, Amazon Kinesis).O Pinot foi projetado para executar consultas OLAP em tempo real, com baixa latência em grandes quantidades de eventos para entregar o conceito de User-Facing Analytics.Foi criado e desenvolvido por engenheiros do LinkedIn e do Uber e projetado para escalar e expandir sem limites.Apache PinotKishore GopalakrishnaStarTree Luan Moreno = https://www.linkedin.com/in/luanmoreno/
This interview was recorded for the GOTO Book Club.gotopia.tech/bookclubRead the full transcription of the interview hereViktor Gamov - Principal Developer Advocate at Kong & Co-Autor of "Kafka in Action"Tim Berglund - VP DevRel at StarTree & Author of "Gradle Beyond the Basics"DESCRIPTIONKafka has been on developers' radars for quite a while now. Viktor Gamov's co-authored book “Kafka in Action” ensures that you have a list of recipes to dive into. Joined by Tim Berglund, VP DevRel at StarTree, they explore the fundamentals of Apache Kafka.Learn what Kafka can help you achieve, what Viktor's favorite MCU film is and what “Highway to Mars” by Beast In Black has to do with all of this.The interview is based on Viktor's co-authored book "Kafka in Action"LINKSStreaming Audio: Apache Kafka® & Real-Time DataRECOMMENDED BOOKSViktor Gamov, Dylan Scott & Dave Klein • Kafka in ActionViktor Gamov, Tartakovsky, Rasputnis & Fain • Enterprise Web DevelopmentTim Berglund • Gradle Beyond the BasicsTim Berglund & Matthew McCullough • Building and Testing with GradleShapira, Palino, Sivaram & Petty • Kafka: The Definitive GuideMartin Kleppmann • Designing Data-Intensive ApplicationsJono Bacon • People PoweredMary Thengvall • The Business Value of Developer RelationsJay Kreps • I ❤️ LogsZhamak Dehghani • Data MeshBill Bejeck • Kafka Streams in ActionTwitterLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily
In this podcast I have invited Neha Pawar, who is one of the Founding Engineers are StarTree (the company powering Apache Pinot). We talked about how StarTree has implemented Tiered storage and how it differs from other available implementations. Note: Currently tiered storage is available only in StarTree's Pinot and not available in the open source version. But its only about time. Chapters: 00:00 Introduction 03:28 What does Tiered Storage mean? 05:51 How many tiers are typically supported? 07:30 Is it mainly about Cost Optimisation? How do I compare the cost savings vs performance hit? 15:41 What is mmap and how does it help? 16:45 How do I implement/approach Tiered Storage? What are the challenges? 23:00 What is Apache Pinot? When we say low latency, how low it is? 25:00 How is it implemented in StarTree (Apache Pinot)? 36:45 What happens when I query for more number of (or all) columns? How is that optimised? 47:10 What are the failure modes? 50:15 How can we test and validate Tiered Storage as a feature? 54:30 How would bloom filter false positives affect performance and correctness? 56:15 Can I move back my data from Cold storage to Hot Storage? 57:45 What other cloud storage services are supported other than S3? 58:35 What is the future of Tiered Storage?
What's your favorite podcast? Would you like to find some new ones? In celebration of International Podcast Day, Kris Jenkins invites 12 experts from the Apache Kafka® community to talk about their favorite podcasts. Unlike other episodes where guests educate developers and tell stories about Kafka, its surrounding technological ecosystem, or the Cloud, this special episode provides a glimpse into what these guests have learned through listening to podcasts that you might also find interesting. Through a virtual international tour, Kris chatted with Bill Bejeck (Integration Architect, Confluent), Nikoleta Verbeck (Senior Solutions Engineer, CSID, Confluent), Ben Stopford (Lead Technologist, OCTO, Confluent), Noelle Gallagher (Video Producer, Editor), Danica Fine (Senior Developer Advocate, Confluent), Tim Berglund (VP, Developer Relations, StarTree), Ben Ford (Founder and CEO, Commando Development), Jeff Bean (Group Manager, Technical Marketing, Confluent), Domenico Fioravanti (Director of Engineering, Therapie Clinic), Francesco Tisiot (Senior Developer Advocate, Aiven), Robin Moffatt (Principal, Developer Advocate, Confluent), and Simon Aubury (Principal Data Engineer, ThoughtWorks). They share recommendations covering a wide range of topics such as building distributed systems, travel, data engineering, greek mythology, data mesh, economics, and music and the arts. EPISODE LINKSCommon Apache Kafka Mistakes to AvoidFlink vs Kafka Streams/ksqlDBWhy Data Mesh ft. Ben StopfordPractical Data Pipeline ft. Danica FineWhat Could Go Wrong with a Kafka JDBC Connector?Intro to Kafka Connect: Core Components and Architecture ft. Robin MoffattServerless Stream Processing with Apache Kafka ft. Bill BejeckScaling an Apache Kafka-Based Architecture at Therapie ClinicEvent-Driven Systems and Agile OperationsReal-Time Stream Processing, Monitoring, and Analytics with Apache KafkaWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)
Highlights from this week's conversation include:Kishore's background and career journey (2:30)Internal analytics versus user-facing analytics (3:49)New ways of thinking about analytics (8:06)What makes Pinot different (13:45)How Pinot transforms systems (21:53)Understanding the data landscape (32:40)The Pinot user experience (36:27)Something exciting about StarTree (40:05)When you should adopt this technology (43:15)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Kishore Gopalakrishna, co-founder and CEO of StarTree, created a solution to a database problem with his co-worker and eventual co-founder Xiang Fu while working at LinkedIn. At the time, LinkedIn was debuting its now-popular feature called Who's Viewed Your Profile, which required the ability to slice and dice massive amounts of data in real time. Kishore and Xiang developed what they called Apache Pinot, a real-time distributed analytical processing data store used to deliver scalable real-time analytics with very low latency. The pair went on to found their open source company, StarTree, in 2019 to build a commercial version of Apache Pinot. The analytics provided by its technology are increasingly essential for all kinds of business decision makers, and the company's quickly emerged as a leader in serving up real-time user-facing analytics at very low latency—for millions. In this episode, Kishore talks about the solutions StarTree provides, its key relationship with the developer community and the roadmap for the company, which just announced a $47 million series B led by GGV, with participation from investor existing investors, Bain and CRV as well as new investor Sapphire Ventures.
StarTree, a company building what it describes as an “analytics-as-a-service” platform, today announced that it raised $47 million in a Series B round led by GGV Capital with participation from Sapphire Ventures, Bain Capital Ventures, and CRV.
StarTree, a company building what it describes as an “analytics-as-a-service” platform, today announced that it raised $47 million in a Series B round led by GGV Capital with participation from Sapphire Ventures, Bain Capital Ventures, and CRV.
In this bonus episode, Eric and Kostas preview their upcoming conversation with Kishore Gopalakrishna of StarTree.
Hey Everyone, In this episode I am talking to Mayank Shrivastava who has vast experience into building and maintaining high scale distributed systems. He was in the team that originally built Apache Pinot at Linkedin and is now working at StarTree as the Head of Core Data Engineering. He has shared some amazing insights from his experience and there is a lot to learn from our discussion. We discuss about the following: 00:00 Introduction 04:20 Practices to follow while designing and developing Distributed Systems 05:47 What do we mean by Solid Scalable Design? How do we approach that? 09:00 Safety Nets for developing Distributed systems 10:21 When is the right time to do performance benchmarking? 17:00 What is release certification? 21:00 Deploying to Production 24:45 Example when Canary Deployment might not be a good strategy? 26:00 Example when Canary Deployment a good strategy? 27:30 Post Deployment - how do we observe our system? 33:30 How do we avoid on-call(alerting) noise? 42:00 Maintaining a Large scale Distributed system 47:15 Scaling up/down for stateful systems 51:30 Handling Failures in Production (Disaster Recovery) 01:00:30 Runbooks - How do we keep them updated? References: The GeekNarrator Linkedin page: https://www.linkedin.com/company/86276626 Kaivalya Apte: https://www.linkedin.com/in/kaivalya-apte-2217221a/ Geeknarrator website: www.geeknarrator.com Mayank Shrivastava: https://www.linkedin.com/in/mayankshriv/ StarTree: https://www.startree.ai/ Apache Pinot: https://pinot.apache.org/ Hope you enjoy the discussion and learn from it. Please hit the like button if you liked my discussion with Mayank and please subscribe to the channel for more content like this. Cheers, The GeekNarrator
Kishore Gopalakrishna is Co-Founder & CEO of Startree, the real-time analytics platform that provides a managed service on top of the open-source distributed data store Apache Pinot. Kishore is also the co-creator of Apache Pinot, which was started while he was at LinkedIn. Since leaving to build Startree, Kishore and his team have raised $28M from investors including GGV, Bain Capital Ventures, and CRV. In this episode, we discuss the right time to launch a managed service on top of an open source project, the importance of relentless focus on customer needs and use cases early-on, community building, and much more.
This episode features an interview with Neha Pawar, a Founding Engineer at StarTree. StarTree is a software development company that focuses on democratizing data for all users by providing real-time, user-facing analytics.Prior to her time at StarTree, Neha was a Senior Software Engineer on LinkedIn's Data Analytics team where she spent five years working on Apache Pinot. Neha has provided countless contributions to Pinot over the years, focusing on real-time streaming integrations, ingestion, and storage. In this episode, Sam sits down with Neha to discuss Apache Pinot's impact on the data community and how LinkedIn popularized real-time analytics.-------------------"Many people do think that a batch is good enough, real-time infra is expensive anyway. And what difference is it going to make if the data shown in this application is a day ago or an hour ago, and it's not real-time to the nearest second? And while that is true, in some cases, but in many other cases, not having real-time data can be super expensive and can affect the business badly and also make them irrelevant. You need the real-time data and then you also need to be able to analyze that data at the speed of your thought. For example, if you are having fraudulent activity somewhere, you can't wait for, ‘Hey, my model is going to learn about this.' And then the next time, be able to tell me that that was a fraudulent activity. You need to be able to analyze all that data right now. So, it's not just a nice-to-have, it's a must-have.” – Neha Pawar-------------------Episode Timestamps:(01:58): What open source data means to Neha(06:04): Neha's learnings from the LinkedIn Data Analytics Team(07:07): What peaked Neha's interest in real-time data analytics(08:30): Neha's first experiences working on Apache Pinot(11:40): How the work of real-time data spread from LinkedIn to other companies(17:30): How the Apache community has grown(24:04): Neha's focus at StarTree(30:41): Neha's motivation for tiered storage at StarTree (37:07): Neha's advice for open source data folks-------------------Links:LinkedIn - Connect with NehaLinkedIn - Connect with StarTreeTwitter - Follow NehaTwitter - Follow StarTreeVisit StarTree
Real-time analytics are difficult to achieve because large amounts of data must be integrated into a data set as that data streams in. As the world moved from batch analytics powered by Hadoop into a norm of “real-time” analytics, a variety of open source systems emerged. One of these was Apache Pinot. StarTree is a The post Pinot and StarTree with Chinmay Soman appeared first on Software Engineering Daily.
Real-time analytics are difficult to achieve because large amounts of data must be integrated into a data set as that data streams in. As the world moved from batch analytics powered by Hadoop into a norm of “real-time” analytics, a variety of open source systems emerged. One of these was Apache Pinot. StarTree is a The post Pinot and StarTree with Chinmay Soman appeared first on Software Engineering Daily.
Data and analytics are permeating every system, including customer-facing applications. The introduction of embedded analytics to an end-user product creates a significant shift in requirements for your data layer. The Pinot OLAP datastore was created for this purpose, optimizing for low latency queries on rapidly updating datasets with highly concurrent queries. In this episode Kishore Gopalakrishna and Xiang Fu explain how it is able to achieve those characteristics, their work at StarTree to make it more easily available, and how you can start using it for your own high throughput data workloads today.