Podcasts about ksqldb

  • 11PODCASTS
  • 41EPISODES
  • 37mAVG DURATION
  • ?INFREQUENT EPISODES
  • Jan 11, 2024LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about ksqldb

Latest podcast episodes about ksqldb

Web and Mobile App Development (Language Agnostic, and Based on Real-life experience!)
(Part 1/N): Confluent Cloud (Managed Kafka as a Service) - Create a cluster, generate API keys, create topics, publish messages

Web and Mobile App Development (Language Agnostic, and Based on Real-life experience!)

Play Episode Listen Later Jan 11, 2024 45:53


In this podcast, the host explores Confluent Cloud, a fully managed Kafka service. The host shares their experience with RabbitMQ and Kafka and explains the value of using a managed service like Confluent Cloud. They walk through the process of signing up for an account, creating a cluster, generating API keys, and creating topics. The host also discusses the use of connectors and introduces ksqlDB and Apache Flink. They explore cluster settings, message consumption, and additional features of Confluent Cloud. The podcast concludes with a summary of the topics covered. Takeaways Confluent Cloud is a fully managed Kafka service that provides added value through pre-built connectors and ease of use. Creating a cluster, generating API keys, and creating topics are essential steps in getting started with Confluent Cloud. ksqlDB and Apache Flink offer stream processing capabilities and can be integrated with Confluent Cloud. Cluster settings, message consumption, and additional features like stream lineage and stream designer enhance the functionality of Confluent Cloud. Using a managed service like Confluent Cloud allows developers to focus on solving customer problems rather than managing infrastructure. Chapters 00:00 Introduction 02:25 Exploring Confluent Cloud 09:14 Creating a Cluster and API Keys 11:00 Creating Topics 13:20 Sending Messages to Topics 15:12 Introduction to ksqlDB and Apache Flink 17:03 Exploring Connectors 25:44 Cluster Settings and Configuration 28:05 Consuming Messages 35:20 Stream Lineage and Stream Designer 38:44 Exploring Additional Features 44:21 Summary and Conclusion Snowpal Products: Backends as Services on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠AWS Marketplace⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Mobile Apps on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠App Store⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Play Store⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Web App⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Education Platform⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ for Learners and Course Creators

Data Engineering Podcast
Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable

Data Engineering Podcast

Play Episode Listen Later Oct 15, 2023 68:28


Summary Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold (https://www.dataengineeringpodcast.com/datafold) You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today to get 2 weeks free! As more people start using AI for projects, two things are clear: It's a rapidly advancing field, but it's tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES (https://Neo4j.com/NODES). Your host is Tobias Macey and today I'm interviewing Eric Sammer about starting your stream processing journey with Decodable Interview Introduction How did you get involved in the area of data management? Can you describe what Decodable is and the story behind it? What are the notable changes to the Decodable platform since we last spoke? (October 2021) What are the industry shifts that have influenced the product direction? What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data. What was the process for adding full Java support in addition to SQL? What are the developer experience challenges that are particular to working with streaming data? How have you worked to address that in the Decodable platform and interfaces? As you evolve the technical and product direction, what is your heuristic for balancing the unification of interfaces and system integration against the ability to swap different components or interfaces as new technologies are introduced? What are the most interesting, innovative, or unexpected ways that you have seen Decodable used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Decodable? When is Decodable the wrong choice? What do you have planned for the future of Decodable? Contact Info esammer (https://github.com/esammer) on GitHub LinkedIn (https://www.linkedin.com/in/esammer/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Decodable (https://www.decodable.co/) Podcast Episode (https://www.dataengineeringpodcast.com/decodable-streaming-data-pipelines-sql-episode-233/) Flink (https://flink.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/apache-flink-with-fabian-hueske-episode-57/) Debezium (https://debezium.io/) Podcast Episode (https://www.dataengineeringpodcast.com/debezium-change-data-capture-episode-114/) Kafka (https://kafka.apache.org/) Redpanda (https://redpanda.com/) Podcast Episode (https://www.dataengineeringpodcast.com/vectorized-red-panda-streaming-data-episode-152/) Kinesis (https://aws.amazon.com/kinesis/) PostgreSQL (https://www.postgresql.org/) Podcast Episode (https://www.dataengineeringpodcast.com/postgresql-with-jonathan-katz-episode-42/) Snowflake (https://www.snowflake.com/en/) Podcast Episode (https://www.dataengineeringpodcast.com/snowflakedb-cloud-data-warehouse-episode-110/) Databricks (https://www.databricks.com/) Startree (https://startree.ai/) Pinot (https://pinot.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/pinot-embedded-analytics-episode-273/) Rockset (https://rockset.com/) Podcast Episode (https://www.dataengineeringpodcast.com/rockset-serverless-analytics-episode-101/) Druid (https://druid.apache.org/) InfluxDB (https://www.influxdata.com/) Samza (https://samza.apache.org/) Storm (https://storm.apache.org/) Pulsar (https://pulsar.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/pulsar-fast-and-scalable-messaging-with-rajan-dhabalia-and-matteo-merli-episode-17) ksqlDB (https://ksqldb.io/) Podcast Episode (https://www.dataengineeringpodcast.com/ksqldb-kafka-stream-processing-episode-122/) dbt (https://www.getdbt.com/) GitHub Actions (https://github.com/features/actions) Airbyte (https://airbyte.com/) Singer (https://www.singer.io/) Splunk (https://www.splunk.com/) Outbox Pattern (https://debezium.io/blog/2019/02/19/reliable-microservices-data-exchange-with-the-outbox-pattern/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Real-Time Analytics with Tim Berglund
What is a Streaming Database? ft Hubert Dulay and Ralph Debusmann | Ep. 20

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Aug 21, 2023 32:29


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Today Tim is joined by Ralph Debusmann (Enterprise Kafka Engineer, Migros) and Hubert Dulay (Developer Advocate, StarTree) where they delve deep into the world of streaming databases. They explore the blend of traditional databases with streaming elements, aiming to make stream processing more user-friendly with SQL. Discussing tools like ksqlDB and Materialize, they touch upon Martin Kleppmann's theories of transforming databases and the pros and cons of current streaming platforms. Dive in to learn more about the future of data streaming!Turning the database inside-out: https://martin.kleppmann.com/2015/11/05/database-inside-out-at-oredev.html

The GeekNarrator
Diving into Kafka Internals with David Jacot

The GeekNarrator

Play Episode Listen Later Aug 19, 2023 70:41


In this video I talk to David Jacot who works as a Staff Software Engineer at  @Confluent  and has been a long time Kafka user, committer and PMC member. We covered how Kafka works internally in great depth. We use Kafka for various use cases and it works great, but going one level below the abstraction and truly understanding the protocols, techniques and algorithms used is a fun ride. Chapters: 00:00 Kafka Internals with David Jacot 03:33 Defining Kafka 05:16 Kafka Architecture(s) 11:39 Write Path - Producer sending data 18:35 How does replication work? 25:47 How do we track replication progress? 30:42 Failure Modes: Leader fails 38:18 Consumers: Push vs Pull 40:54 Consumers: How does fetch works? 49:03 Consuming number of bytes vs records 50:50 Optimising consumption 01:00:21 Offset management and choosing partitions 01:09:10 Ending notes I hope you like this episode and more importantly you learnt some amazing techniques Kafka uses to ensure durability, low latency, simplicity and scalability in its architecture. Do give this episode a like and share it with your network. Also please subscribe to the channel for content like this. Other playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Other episodes: KsqlDB: https://youtu.be/2yE86P6uD_0 Exactly once semantics: https://youtu.be/twgbAL_EaQw David's Linkedin: https://www.linkedin.com/in/davidjacot/ our website: www.geeknarrator.com Cheers, The GeekNarrator

The GeekNarrator
Understanding ksqlDB with Matthias J. Sax

The GeekNarrator

Play Episode Listen Later Jan 13, 2023 61:55


Hey Everyone, In this episode I and Matthias talk about KsqlDb. We have covered the topic in great depth talking about its history, architecture, different concepts, use cases, limitations, comparison to Kafka Streams and so on. References: ksqlDB - https://ksqldb.io/ exactly once semantics podcast: https://youtu.be/twgbAL_EaQw Matthias Sax: https://twitter.com/MatthiasJSax and https://www.linkedin.com/in/mjsax/ Cheers, The GeekNarrator

Streaming Audio: a Confluent podcast about Apache Kafka
Learn How Stream-Processing Works The Simplest Way Possible

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Dec 20, 2022 31:29 Transcription Available


Could you explain Apache Kafka® in ways that a small child could understand? When Mitch Seymour, author of Mastering Kafka Streams and ksqlDB, wanted a way to communicate the basics of Kafka and event-based stream processing, he decided to author a children's book on the subject, but it turned into something with a far broader appeal.Mitch conceived the idea while writing a traditional manuscript for engineers and technicians interested in building stream processing applications. He wished he could explain what he was writing about to his 2-year-old daughter, and contemplated the best way to introduce the concepts in a way anyone could grasp.Four months later, he had completed the illustration book: Gently Down the Stream: A Gentle Introduction to Apache Kafka. It tells the story of a family of forest-dwelling Otters, who discover that they can use a giant river to communicate with each other. When more Otter families move into the forest, they must learn to adapt their system to handle the increase in activity.This accessible metaphor for how streaming applications work is accompanied by Mitch's warm, painterly illustrations.For his second book, Seymour collaborated with the researcher and software developer Martin Kleppmann, author of Designing Data-Intensive Applications. Kleppmann admired the illustration book and proposed that the next book tackle a gentle introduction to cryptography. Specifically, it would introduce the concepts behind symmetric-key encryption, key exchange protocols, and the Diffie-Hellman algorithm, a method for exchanging secret information over a public channel.Secret Colors tells the story of a pair of Bunnies preparing to attend a school dance, who eagerly exchange notes on potential dates. They realize they need a way of keeping their messages secret, so they develop a technique that allows them to communicate without any chance of other Bunnies intercepting their messages.Mitch's latest illustration book is—A Walk to the Cloud: A Gentle Introduction to Fully Managed Environments.  In the episode, Seymour discusses his process of creating the books from concept to completion, the decision to create his own publishing company to distribute these books, and whether a fourth book is on the way. He also discusses the experience of illustrating the books side by side with his wife, shares his insights on how editing is similar to coding, and explains why a concise set of commands is equally desirable in SQL queries and children's literature.EPISODE LINKSMinimizing Software Speciation with ksqlDB and Kafka StreamsGently Down the Stream: A Gentle Introduction to Apache KafkaSecret ColorsA Walk to the Cloud: A Gentle Introduction to Fully Managed EnvironmentsApache Kafka On the Go: Kafka Concepts for BeginnersApache Kafka 101 courseWatch the videoJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Real-time Threat Detection Using Machine Learning and Apache Kafka

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Nov 29, 2022 29:18 Transcription Available


Can we use machine learning to detect security threats in real-time? As organizations increasingly rely on distributed systems, it is becoming more important to analyze the traffic that passes through those systems quickly. Confluent Hackathon '22 finalist, Géraud Dugé de Bernonville (Data Consultant, Zenika Bordeaux), shares how his team used TensorFlow (machine learning) and Neo4j (graph database) to analyze and detect network traffic data in real-time. What started as a research and development exercise turned into ZIEM, a full-blown internal project using ksqlDB to manipulate, export, and visualize data from Apache Kafka®.Géraud and his team noticed that large amounts of data passed through their network, and they were curious to see if they could detect threats as they happened. As a hackathon project, they built ZIEM, a network mapping and intrusion detection platform that quickly generates network diagrams. Using Kafka, the system captures network packets, processes the data in ksqlDB, and uses a Neo4j Sink Connector to send it to a Neo4j instance. Using the Neo4j browser, users can see instant network diagrams showing who's on the network, allowing them to detect anomalies quickly in real time.The Ziem project was initially conceived as an experiment to explore the potential of using Kafka for data processing and manipulation. However, it soon became apparent that there was great potential for broader applications (banking, security, etc.). As a result, the focus shifted to developing a tool for exporting data from Kafka, which is helpful in transforming data for deeper analysis, moving it from one database to another, or creating powerful visualizations.Géraud goes on to talk about how the success of this project has helped them better understand the potential of using Kafka for data processing. Zenika plans to continue working to build a pipeline that can handle more robust visualizations, expose more learning opportunities, and detect patterns.EPISODE LINKSZiem Project on GitHub ksqlDB 101 courseksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work together ft. Simon AuburyReal-Time Stream Processing, Monitoring, and Analytics with Apache KafkaApplication Data Streaming with Apache Kafka and SwimWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)  

Streaming Audio: a Confluent podcast about Apache Kafka
How to Build a Reactive Event Streaming App - Coding in Motion

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Sep 20, 2022 1:26 Transcription Available


How do you build an event-driven application that can react to real-time data streams as they happen? Kris Jenkins (Senior Developer Advocate, Confluent) will be hosting another fun, hands-on programming workshop—Coding in Motion: Watching the River Flow, to demonstrate how you can build a reactive event streaming application with Apache Kafka®, ksqlDB using Python.As a developer advocate, Kris often speaks at conferences, and the presentation will be available on-demand through the organizer's YouTube channel. The desire to read comments and be able to interact with the community motivated Kris to set up a real-time event streaming application that would notify him on his mobile phone. During the workshop, Kris will demonstrate the end-to-end process of using Python to process and stream data from YouTube's REST API into a Kafka topic, analyze the data with ksqlDB, and then stream data out via Telegram. After the workshop, you'll be able to use the recipe to build your own event-driven data application.  EPISODE LINKSCoding in Motion: Building a Reactive Data Streaming AppWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)   

Streaming Audio: a Confluent podcast about Apache Kafka
Real-Time Stream Processing, Monitoring, and Analytics With Apache Kafka

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Sep 15, 2022 34:07 Transcription Available


Processing real-time event streams enables countless use cases big and small. With a day job designing and building highly available distributed data systems, Simon Aubury (Principal Data Engineer, Thoughtworks) believes stream-processing thinking can be applied to any stream of events. In this episode, Simon shares his Confluent Hackathon '22 winning project—a wildlife monitoring system to observe population trends over time using a Raspberry Pi, along with Apache Kafka®, Kafka Connect, ksqlDB, TensorFlow Lite, and Kibana. He used the system to count animals in his Australian backyard and perform trend analysis on the results. Simon also shares ideas on how you can use these same technologies to help with other real-world challenges.Open-source, object detection models for TensorFlow, which appropriately are collected into "model zoos," meant that Simon didn't have to provide his own object identification as part of the project, which would have made it untenable. Instead, he was able to utilize the open-source models, which are essentially neural nets pretrained on relevant data sets—in his case, backyard animals.Simon's system, which consists of around 200 lines of code, employs a Kafka producer running a while loop, which connects to a camera feed using a Python library. For each frame brought down, object masking is applied in order to crop and reduce pixel density, and then the frame is compared to the models mentioned above. A Python dictionary containing probable found objects is sent to a Kafka broker for processing; the images themselves aren't sent. (Note that Simon's system is also capable of alerting if a specific, rare animal is detected.) On the broker, Simon uses ksqlDB and windowing to smooth the data in case the frames were inconsistent for some reason (it may look back over thirty seconds, for example, and find the highest number of animals per type). Finally, the data is sent to a Kibana dashboard for analysis, through a Kafka Connect sink connector. Simon's system is an extremely low-cost system that can simulate the behaviors of more expensive, proprietary systems. And the concepts can easily be applied to many other use cases. For example, you could use it to estimate traffic at a shopping mall to gauge optimal opening hours, or you could use it to monitor the queue at a coffee shop, counting both queued patrons as well as impatient patrons who decide to leave because the queue is too long.EPISODE LINKSReal-Time Wildlife Monitoring with Apache KafkaWildlife Monitoring GithubksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work TogetherEvent-Driven Architecture - Common Mistakes and Valuable LessonsWatch the video version of this podcastKris Jenkins' TwitterJoin the Confluent CommunityLearn more on Confluent DeveloperUse PODCAST100 to get $100 of free Confluent Cloud usage (details)   

Streaming Audio: a Confluent podcast about Apache Kafka
Capacity Planning Your Apache Kafka Cluster

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 30, 2022 61:54 Transcription Available


How do you plan Apache Kafka® capacity and Kafka Streams sizing for optimal performance? When Jason Bell (Principal Engineer, Dataworks and founder of Synthetica Data), begins to plan a Kafka cluster, he starts with a deep inspection of the customer's data itself—determining its volume as well as its contents: Is it JSON, straight pieces of text, or images? He then determines if Kafka is a good fit for the project overall, a decision he bases on volume, the desired architecture, as well as potential cost.Next, the cluster is conceived in terms of some rule-of-thumb numbers. For example, Jason's minimum number of brokers for a cluster is three or four. This means he has a leader, a follower and at least one backup.  A ZooKeeper quorum is also a set of three. For other elements, he works with pairs, an active and a standby—this applies to Kafka Connect and Schema Registry. Finally, there's Prometheus monitoring and Grafana alerting to add. Jason points out that these numbers are different for multi-data-center architectures.Jason never assumes that everyone knows how Kafka works, because some software teams include specialists working on a producer or a consumer, who don't work directly with Kafka itself. They may not know how to adequately measure their Kafka volume themselves, so he often begins the collaborative process of graphing message volumes. He considers, for example, how many messages there are daily, and whether there is a peak time. Each industry is different, with some focusing on daily batch data (banking), and others fielding incredible amounts of continuous data (IoT data streaming from cars).  Extensive testing is necessary to ensure that the data patterns are adequately accommodated. Jason sets up a short-lived system that is identical to the main system. He finds that teams usually have not adequately tested across domain boundaries or the network. Developers tend to think in terms of numbers of messages, but not in terms of overall network traffic, or in how many consumers they'll actually need, for example. Latency must also be considered, for example if the compression on the producer's side doesn't match compression on the consumer's side, it will increase.Kafka Connect sink connectors require special consideration when Jason is establishing a cluster. Failure strategies need to well thought out, including retries and how to deal with the potentially large number of messages that can accumulate in a dead letter queue. He suggests that more attention should generally be paid to the Kafka Connect elements of a cluster, something that can actually be addressed with bash scripts.Finally, Kris and Jason cover his preference for Kafka Streams over ksqlDB from a network perspective. EPISODE LINKSCapacity Planning and Sizing for Kafka StreamsTales from the Frontline of Apache Kafka DevOpsWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more on Confluent DeveloperUse PODCAST100 to get $100 of free Cloud usage (details)  

Streaming Audio: a Confluent podcast about Apache Kafka
Automating Multi-Cloud Apache Kafka Cluster Rollouts

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jun 30, 2022 48:29 Transcription Available


To ensure safe and efficient deployment of Apache Kafka® clusters across multiple cloud providers, Confluent rolled out a large scale cluster management solution.Rashmi Prabhu (Staff Software Engineer & Eng Manager, Fleet Management Platform, Confluent) and her team have been building the Fleet Management Platform for Confluent Cloud. In this episode, she delves into what Fleet Management is, and how the cluster management service streamlines Kafka operations in the cloud while providing a seamless developer experience. When it comes to performing operations at large scale on the cloud, manual processes work well if the scenario involves only a handful of clusters. However, as a business grows, a cloud footprint may potentially scale 10x, and will require upgrades to a significantly larger cluster fleet.d. Additionally, the process should be automated, in order to accelerate feature releases while ensuring safe and mature operations. Fleet Management lets you manage and automate software rollouts and relevant cloud operations within the Kafka ecosystem at scale—including cloud-native Kafka, ksqlDB, Kafka Connect, Schema Registry, and other cloud-native microservices. The automation service can consistently operate applications across multiple teams, and can also manage Kubernetes infrastructure at scale. The existing Fleet Management stack can successfully handle thousands of concurrent upgrades in the Confluent ecosystem.When building out the Fleet Management Platform, Rashmi and the team kept these key considerations in mind: Rollout Controls and DevX: Wide deployment and distribution of changes across the fleet of target assets; improved developer experience for ease of use, with rollout strategy support, deployment policies, a dynamic control workflow, and manual approval support on an as-needed basis. Safety: Built-in features where security and safety of the fleet are the priority with access control, and audits on operations: There is active monitoring and paced rollouts, as well as automated pauses and resumes to reduce the time to react upon failure. There's also an error threshold, and controls to allow a healthy balance of risk vs. pace. Visibility: A close to real time, wide-angle view of the fleet state, along with insights into workflow progress, historical operations on the clusters, live notification on workflows, drift detection across assets, and so much more.EPISODE LINKSOptimize Fleet ManagementSoftware Engineer - Fleet Management Watch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later May 26, 2022 55:55 Transcription Available


Stream processing can be hard or easy depending on the approach you take, and the tools you choose. This sentiment is at the heart of the discussion with Matthias J. Sax (Apache Kafka® PMC member; Software Engineer, ksqlDB and Kafka Streams, Confluent) and Jeff Bean (Sr. Technical Marketing Manager, Confluent). With immense collective experience in Kafka, ksqlDB, Kafka Streams, and Apache Flink®, they delve into the types of stream processing operations and explain the different ways of solving for their respective issues.The best stream processing tools they consider are Flink along with the options from the Kafka ecosystem: Java-based Kafka Streams and its SQL-wrapped variant—ksqlDB. Flink and ksqlDB tend to be used by divergent types of teams, since they differ in terms of both design and philosophy.Why Use Apache Flink?The teams using Flink are often highly specialized, with deep expertise, and with an absolute focus on stream processing. They tend to be responsible for unusually large, industry-outlying amounts of both state and scale, and they usually require complex aggregations. Flink can excel in these use cases, which potentially makes the difficulty of its learning curve and implementation worthwhile.Why use ksqlDB/Kafka Streams?Conversely, teams employing ksqlDB/Kafka Streams require less expertise to get started and also less expertise and time to manage their solutions. Jeff notes that the skills of a developer may not even be needed in some cases—those of a data analyst may suffice. ksqlDB and Kafka Streams seamlessly integrate with Kafka itself, as well as with external systems through the use of Kafka Connect. In addition to being easy to adopt, ksqlDB is also deployed on production stream processing applications requiring large scale and state.There are also other considerations beyond the strictly architectural. Local support availability, the administrative overhead of using a library versus a separate framework, and the availability of stream processing as a fully managed service all matter. Choosing a stream processing tool is a fraught decision partially because switching between them isn't trivial: the frameworks are different, the APIs are different, and the interfaces are different. In addition to the high-level discussion, Jeff and Matthias also share lots of details you can use to understand the options, covering employment models, transactions, batching, and parallelism, as well as a few interesting tangential topics along the way such as the tyranny of state and the Turing completeness of SQL.EPISODE LINKSThe Future of SQL: Databases Meet Stream ProcessingBuilding Real-Time Event Streams in the Cloud, On PremisesKafka Streams 101 courseksqlDB 101 courseWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more on Confluent DeveloperUse PODCAST100 for additional $100 of  Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Practical Data Pipeline: Build a Plant Monitoring System with ksqlDB

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later May 19, 2022 33:56 Transcription Available


Apache Kafka® isn't just for day jobs according to Danica Fine (Senior Developer Advocate, Confluent). It can be used to make life easier at home, too!Building out a practical Apache Kafka® data pipeline is not always complicated—it can be simple and fun. For Danica, the idea of building a Kafka-based data pipeline sprouted with the need to monitor the water level of her plants at home. In this episode, she explains the architecture of her hardware-oriented project and discusses how she integrates, processes, and enriches data using ksqlDB and Kafka Connect, a Raspberry Pi running Confluent's Python client, and a Telegram bot. Apart from the script on the Raspberry Pi, the entire project was coded within Confluent Cloud.Danica's model Kafka pipeline begins with moisture sensors in her plants streaming data that is requested by an endless for-loop in a Python script on her Raspberry Pi. The Pi in turn connects to Kafka on Confluent Cloud, where the plant data is sent serialized as Avro. She carefully modeled her data, sending an ID along with a timestamp, a temperature reading, and a moisture reading. On Confluent Cloud, Danica enriches the streaming plant data, which enters as a ksqlDB stream, with metadata such as moisture threshold levels, which is stored in a ksqlDB table.She windows the streaming data into 12-hour segments in order to avoid constant alerts when a threshold has been crossed. Alerts are sent at the end of the 12-hour period if a threshold has been traversed for a consistent time period within it (one hour, for example). These are sent to the Telegram API using Confluent Cloud's HTTP Sink Connector, which pings her phone when a plant's moisture level is too low.Potential future project improvement plans include visualizations, adding another Telegram bot to register metadata for new plants, adding machine learning to anticipate watering needs, and potentially closing the loop by pushing data backto the Raspberry Pi, which could power a visual indicator on the plants themselves. EPISODE LINKSGitHub: raspberrypi-houseplantsData Pipelines 101 courseTips for Streaming Data Pipelines ft. Danica FineWatch the video version of this podcastDanica Fine's TwitterKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)   

Streaming Audio: a Confluent podcast about Apache Kafka
Streaming Analytics on 50M Events Per Day with Confluent Cloud at Picnic

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later May 5, 2022 34:41 Transcription Available


What are useful practices for migrating a system to Apache Kafka® and Confluent Cloud, and why use Confluent to modernize your architecture?Dima Kalashnikov (Technical Lead, Picnic Technologies) is part of a small analytics platform team at Picnic, an online-only, European grocery store that processes around 45 million customer events and five million internal events daily. An underlying goal at Picnic is to try and make decisions as data-driven as possible, so Dima's team collects events on all aspects of the company—from new stock arriving at the warehouse, to customer behavior on their websites, to statistics related to delivery trucks. Data is sent to internal systems and to a data warehouse.Picnic recently migrated from their existing solution to Confluent Cloud for several reasons:Ecosystem and community: Picnic liked the tooling present in the Kafka ecosystem. Since being a small team means they aren't able to devote extra time to building boilerplate-type code such as connectors for their data sources or functionality for extensive monitoring capabilities. Picnic also has analysts that use SQL so appreciated the processing capabilities of ksqlDB. Finally, they found that help isn't hard to locate if one gets stuck.Monitoring: They wanted better monitoring; specifically they found it challenging to measure for SLAs with their former system as they couldn't easily detect the positions of consumers in their streams.Scaling and data retention times: Picnic is growing so they needed to scale horizontally without having to worry about manual reassignment. They also hit a wall with their previous streaming solution with respect to the length of time they could save data, which is a serious issue for a company that makes data-first decisions. Cloud: Another factor of being a small team is that they don't have resources for extensive maintenance of their tooling.Dima's team was extremely careful and took their time with the migration. They ran a pilot system simultaneously with the old system, in order to make sure it could achieve their fundamental performance goals: complete stability, zero data loss, and no performance degradation. They also wanted to check it for costs.The pilot was successful and they actually have a second, IoT pilot in the works that uses Confluent Cloud and Debezium to track the robotics data emanating from their automatic fulfillment center. And it's a lot of data, Dima mentions that the robots in the center generate data sets as large as their customer events streams. EPISODE LINKSPicnic Analytics Platform: Migration from AWS Kinesis to Confluent CloudPicnic Modernizes Data Architecture with ConfluentData Engineer: Event Streaming PlatformWatch this podcast in videoKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka  resources on Confluent DeveloperLive demo: Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usageBuilding Data Streaming App | Coding In Motion

Streaming Audio: a Confluent podcast about Apache Kafka
Build a Data Streaming App with Apache Kafka and JS - Coding in Motion

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later May 3, 2022 2:03 Transcription Available


Coding is inherently enjoyable and experimental. With the goal of bringing fun into programming, Kris Jenkins (Senior Developer Advocate, Confluent) hosts a new series of hands-on workshops—Coding in Motion, to teach you how to use Apache Kafka® and data streaming technologies for real-life use cases. In the first episode, Sound & Vision, Kris walks you through the end-to-end process of building a real-time, full-stack data streaming application from scratch using Kafka and JavaScript/TypeScript. During the workshop, you'll learn to stream musical MIDI data into fully-managed Kafka using Confluent Cloud, then process and transform the raw data stream using ksqlDB. Finally, the enriched data streams will be pushed to a web server to display data in a 3D graphical visualization. Listen to Kris previews the first episode of Coding in Motion: Sound & Vision and join him in the workshop premiere to learn more. EPISODE LINKSCoding in Motion Workshop: Build a Streaming App for Sound & VisionWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)   

Streaming Audio: a Confluent podcast about Apache Kafka
Confluent Platform 7.1: New Features + Updates

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Apr 12, 2022 10:01 Transcription Available


Confluent Platform 7.1 expands upon its already innovative features, adding improvements in key areas that benefit data consistency, allow for increased speed and scale, and enhance resilience and reliability.Previously, the Confluent Platform 7.0 release introduced Cluster Linking, which enables you to bridge on-premises and cloud clusters, among other configurations. Maintaining data quality standards across multiple environments can be challenging though. To assist with this problem, CP 7.1 adds Schema Linking, which lets you share consistent schemas across your clusters—synced in real time.Confluent for Kubernetes lets you build your own private-cloud Apache Kafka® service. Now you can enhance the global resilience of your architecture by employing to multiple regions. With the new release you can also configure custom volumes attached to Confluent deployments and you can declaratively define and manage the new Schema Links. As of this release, Confluent for Kubernetes now supports the full feature set of the Confluent Platform. Tiered Storage was released in Confluent Platform 6.0, and it offers immense benefits for a cluster by allowing the offloading of older topic data out of the broker and into slower, long-term object storage. The reduced amount of local data makes maintenance, scaling out, recovery from failure, and adding brokers all much quicker. CP 7.1 adds compatibility for object storage using Nutanix, NetApp, MinIO, and Dell, integrations that have been put through rigorous performance and quality testing.Health+ was introduced in CP 6.2—offers intelligent cloud-based alerting and monitoring tools in a dashboard. New as of CP 7.1, you can choose to be alerted when anomalies in broker latency are detected, when there is an issue with your connectors linking Kafka and external systems, as well as when a ksqlDB query will interfere with a continuous, real-time processing stream. Shipping with CP 7.1 is ksqlDB 0.23, which adds support for pull queries against streams as opposed to only against tables—a milestone development that greatly helps when debugging since a subset of messages within a topic can now be inspected. ksqlDB 0.23 also supports custom schema selection, which lets you choose a specific schema ID when you create a new stream or table, rather than use the latest registered schema. A number of additional smaller enhancements are also included in the release.EPISODE LINKSDownload Confluent Platform 7.1Check out the release notesRead the Confluent Platform 7.1 blog postWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Serverless Stream Processing with Apache Kafka ft. Bill Bejeck

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Mar 3, 2022 42:23 Transcription Available


What is serverless?Having worked as a software engineer for over 15 years and as a regular contributor to Kafka Streams, Bill Bejeck (Integration Architect, Confluent) is an Apache Kafka® committer and author of “Kafka Streams in Action.” In today's episode, he explains what serverless and the architectural concepts behind it are. To clarify, serverless doesn't mean you can run an application without a server—there are still servers in the architecture, but they are abstracted away from your application development. In other words, you can focus on building and running applications and services without any concerns over infrastructure management. Using a cloud provider such as Amazon Web Services (AWS) enables you to allocate machine resources on demand while handling provisioning, maintenance, and scaling of the server infrastructure. There are a few important terms to know when implementing serverless functions with event stream processors: Functions as a service (FaaS)Stateless stream processingStateful stream processingServerless commonly falls into the FaaS cloud computing service category—for example, AWS Lambda is the classic definition of a FaaS offering. You have a greater degree of control to run a discrete chunk of code in response to certain events, and it lets you write code to solve a specific issue or use case. Stateless processing is simpler in comparison to stateful processing, which is more complex as it involves keeping the state of an event stream and needs a key-value store. ksqlDB allows you to perform both stateless and stateful processing, but its strength lies in stateful processing to answer complex questions while AWS Lambda is better suited for stateless processing tasks. By integrating ksqlDB with AWS Lambda together, they deliver serverless event streaming and analytics at scale.EPISODE LINKSServerless Stream Processing with Apache Kafka, AWS Lambda, and ksqlDBStateful Serverless Architectures with ksqlDB and AWS Lambda Serverless GitHub repositoryKafka Streams in ActionWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Engenharia de Dados [Cast]
Casos de Uso e Experiência de Campo com Apache Kafka

Engenharia de Dados [Cast]

Play Episode Listen Later Feb 24, 2022 63:34


Trazemos nesse episódio o especialista João Bosco Seixas, Community Catalyst para falar sobre Apache Kafka, nesse bate-papo falamos das vertentes de Desenvolvimento e Engenharia de Dados e como cada área pode utilizar o Apache Kafka de forma mais efetiva.* Apache Kafka para Desenvolvimento de Software* Engenharia de Dados com Apache Kafka e Analytics em Tempo-Real* Curva de Aprendizagem da Tecnologia* Casos de Uso* Experiências de Campo* Dicas para IniciantesA intenção principal é mostrar para um Engenheiro de Dados como o Apache Kafka é usado não somente para Analytics mais sim por toda a empresa principalmente na construção de microsserviços. Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Engenharia de Dados [Cast]
Apache Kafka é um Banco de Dados Relacional?

Engenharia de Dados [Cast]

Play Episode Listen Later Feb 14, 2022 53:57


O Apache Kafka é uma plataforma de streaming de dados, capaz de ingerir e processar milhões de eventos por segundo entretanto, alguns pontos são importantes e normalmente não temos muitas explicações sobre os mesmos, como:O Apache Kafka é um Banco de DadosTransações no Apache KafkaArmazenamento e Processamento DesacopladoComparação de Banco de Dados vs. Apache KafkaEsse episódio irá de uma vez por todas desmistificar o Apache Kafka e tirar todas as suas dúvidas referentes a seus pontos fortes e fracos e como você pode extrair o melhor dessa tecnologia open-source da Apache Software Foundation. Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Streaming Audio: a Confluent podcast about Apache Kafka
Intro to Event Sourcing with Apache Kafka ft. Anna McDonald

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Feb 1, 2022 30:14 Transcription Available


What is event sourcing and how does it work?Event sourcing is often used interchangeably with event-driven architecture and event stream processing. However, Anna McDonald (Principal Customer Success Technical Architect, Confluent) explains it's a specific category of its own—an event streaming pattern. Anna is passionate about event-driven architectures and event patterns. She's a tour de force in the Apache Kafka® community and is the presenter of the Event Sourcing and Event Storage with Apache Kafka course on Confluent Developer. In this episode, she previews the course by providing an overview of what event sourcing is and what you need to know in order to build event-driven systems. Event sourcing is an architectural design pattern, which defines the approach to handling data operations that are driven by a sequence of events. The pattern ensures that all changes to an application state are captured and stored as an immutable sequence of events, known as a log of events. The events are persisted in an event store, which acts as the system of record. Unlike traditional databases where only the latest status is saved, an event-based system saves all events into a database in sequential order. If you find a past event is incorrect, you can replay each event from a certain timestamp up to the present to recreate the latest status of data. Event sourcing is commonly implemented with a command query responsibility segregation (CQRS) system to perform data computation tasks in response to events. To implement CQRS with Kafka, you can use Kafka Connect, along with a database, or alternatively use Kafka with the streaming database ksqlDB.In addition, Anna also shares about:Data at rest and data in motion techniques for event modelingThe differences between event streaming and event sourcingHow CQRS, change data capture (CDC), and event streaming help you leverage event-driven systemsThe primary qualities and advantages of an event-based storage systemUse cases for event sourcing and how it integrates with your systemsEPISODE LINKSEvent Sourcing courseEvent Streaming in 3 MinutesIntroducing Derivative Event SourcingMeetup: Event Sourcing and Apache KafkaWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jan 13, 2022 29:50 Transcription Available


Implementing an event-driven data pipeline can be challenging, but doing so within the context of a legacy architecture is even more complex. Having spent three years building a streaming data infrastructure and being on the first team at a financial organization to implement Apache Kafka® event-driven data pipelines, Danica Fine (Senior Developer Advocate, Confluent) shares about the development process and how ksqlDB and Kafka Connect became instrumental to the implementation.By moving away from batch processing to streaming data pipelines with Kafka, data can be distributed with increased data scalability and resiliency. Kafka decouples the source from the target systems, so you can react to data as it changes while ensuring accurate data in the target system. In order to transition from monolithic micro-batching applications to real-time microservices that can integrate with a legacy system that has been around for decades, Danica and her team started developing Kafka connectors to connect to various sources and target systems. Kafka connectors: Building two major connectors for the data pipeline, including a source connector to connect the legacy data source to stream data into Kafka, and another target connector to pipe data from Kafka back into the legacy architecture. Algorithm: Implementing Kafka Streams applications to migrate data from a monolithic architecture to a stream processing architecture. Data join: Leveraging Kafka Connect and the JDBC source connector to bring in all data streams to complete the pipeline.Streams join: Using ksqlDB to join streams—the legacy data system continues to produce streams while the Kafka data pipeline is another stream of data. As a final tip, Danica suggests breaking algorithms into process steps. She also describes how her experience relates to the data pipelines course on Confluent Developer and encourages anyone who is interested in learning more to check it out. EPISODE LINKSData Pipelines courseGuided Exercise on Building Streaming Data PipelinesMigrating from a Legacy System to Kafka StreamsWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Modernizing Banking Architectures with Apache Kafka ft. Fotios Filacouris

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Dec 28, 2021 34:59 Transcription Available


It's been said that financial services organizations have been early Apache Kafka® adopters due to the strong delivery guarantees and scalability that Kafka provides. With experience working and designing architectural solutions for financial services, Fotios Filacouris (Senior Solutions Engineer, Enterprise Solutions Engineering, Confluent) joins Tim to discuss how Kafka and Confluent help banks build modern architectures, highlighting key emerging use cases from the sector. Previously, Kafka was often viewed as a simple pipe that connected databases together, which allows for easy and scalable data migration. As the Kafka ecosystem evolves with added components like ksqlDB, Kafka Streams, and Kafka Connect, the implementation of Kafka goes beyond being just a pipe—it's an intelligent pipe that enables real-time, actionable data insights.Fotios shares a couple of use cases showcasing how Kafka solves the problems that many banks are facing today. One of his customers transformed retail banking by using Kafka as the architectural base for storing all data permanently and indefinitely. This approach enables data in motion and a better user experience for frontend users while scrolling through their transaction history by eliminating the need to download old statements that have been offloaded in the cloud or a data lake. Kafka also provides the best of both worlds with increased scalability and strong message delivery guarantees that are comparable to queuing middleware like IBM MQ and TIBCO. In addition to use cases, Tim and Fotios talk about deploying Kafka for banks within the cloud and drill into the profession of being a solutions engineer. EPISODE LINKSWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Lessons Learned From Designing Serverless Apache Kafka ft. Prachetaa Raghavan

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Dec 14, 2021 28:20 Transcription Available


You might call building and operating Apache Kafka® as a cloud-native data service synonymous with a serverless experience. Prachetaa Raghavan (Staff Software Developer I, Confluent) spends his days focused on this very thing. In this podcast, he shares his learnings from implementing a serverless architecture on Confluent Cloud using Kubernetes Operator. Serverless is a cloud execution model that abstracts away server management, letting you run code on a pay-per-use basis without infrastructure concerns. Confluent Cloud's major design goal was to create a serverless Kafka solution, including handling its distributed state, its performance requirements, and seamlessly operating and scaling the Kafka brokers and Zookeeper. The serverless offering is built on top of an event-driven microservices architecture that allows you to deploy services independently with your own release cadence and maintained at the team level.There are 4 subjects that help create the serverless event streaming experience with Kafka:Confluent Cloud control plane: This Kafka-based control plane provisions resources required to run the application. It automatically scales resources for services, such as managed Kafka, managed ksqlDB, and managed connectors. The control plane and data plane are decoupled—if a single data plane has issues, it doesn't affect the control plane or other data planes. Kubernetes Operator: The operator is an application-specific controller that extends the functionality of the Kubernetes API to create, configure, and manage instances of complex applications on behalf of Kubernetes users. The operator looks at Kafka metrics before upgrading a broker at a time. It also updates the status on cluster rebalancing and on shrink to rebalance data onto the remaining brokers. Self-Balancing Clusters: Cluster balance is measured on several dimensions, including replica counts, leader counts, disk usage, and network usage. In addition to storage rebalancing, Self-Balancing Clusters are essential to making sure that the amount of available disk and network capability is satisfied during any balancing decisions. Infinite Storage: Enabled by Tiered Storage, Infinite Storage rebalances data fast and efficiently—the most recently written data is stored directly on Kafka brokers, while older segments are moved off into a separate storage tier.  This has the added bonus of reducing the shuffling of data due to regular broker operations, like partition rebalancing. EPISODE LINKSMaking Apache Kafka Serverless: Lessons From Confluent CloudCloud-Native Apache KafkaJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)Watch the video version of this podcast

Streaming Audio: a Confluent podcast about Apache Kafka
ksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work Together ft. Simon Aubury

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Dec 1, 2021 30:42 Transcription Available


What is ksqlDB and how does Simon Aubury (Principal Data Engineer, Thoughtworks) use it to track down the plane that wakes his cat Snowy in the morning? Experienced in building real-time applications with ksqlDB since its genesis, Simon provides an introduction to ksqlDB by sharing some of his projects and use cases. ksqlDB is a database purpose-built for stream processing applications and lets you build real-time data streaming applications with SQL syntax. ksqlDB reduces the complexity of having to code with Java, making it easier to achieve outcomes through declarative programming, as opposed to procedural programming. Before ksqlDB, you could use the producer and consumer APIs to get data in and out of Apache Kafka®; however, when it comes to data enrichment, such as joining, filtering, mapping, and aggregating data, you would have to use the Kafka Streams API—a robust and scalable programming interface influenced by the JVM ecosystem that requires Java programming knowledge. This presented scaling challenges for Simon, who was at a multinational insurance company that needed to stream loads of data from disparate systems with a small team to scale and enrich data for meaningful insights. Simon recalls discovering ksqlDB during a practice fire drill, and he considers it as a memorable moment for turning a challenge into an opportunity.Leveraging your familiarity with relational databases, ksqlDB abstracts away complex programming that is required for real-time operations both for stream processing and data integration, making it easy to read, write, and process streaming data in real time.Simon is passionate about ksqlDB and Kafka Streams as well as getting other people inspired by the technology. He's been using ksqlDB for projects, such as taking a stream of information and enriching it with static data. One of Simon's first ksqlDB projects was using Raspberry Pi and a software-defined radio to process aircraft movements in real time to determine which plane wakes his cat Snowy up every morning. Simon highlights additional ksqlDB use cases, including e-commerce checkout interaction to identify where people are dropping out of a sales funnel. EPISODE LINKSksqlDB 101 courseA Guide to ksqlDB Fundamentals and Stream Processing ConceptsksqlDB 101 Training with Live Walkthrough ExerciseKSQL-ops! Running ksqlDB in the WildArticles from Simon AuburyWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usage (details)

airhacks.fm podcast with adam bien
Debezium, Server, Engine, UI and the Outbox

airhacks.fm podcast with adam bien

Play Episode Listen Later Nov 28, 2021 67:11


An airhacks.fm conversation with Gunnar Morling (@gunnarmorling) about: debezium as analytics enablement, enriching events with quarkus, ksqlDB and PrestoDB and trino, cloud migrations with Debezium, embedded Debezium Engine, debezium server vs. Kafka Connect, Debezium Server with sink connectors, Apache Pulsar, Redis Streams are supporting Debezium Server, Debezium Server follows the microservice architecture, pluggable offset stores, JDBC offset store is Apache Iceberg connector, DB2, MySQL, PostgreSQL, MongoDB change streams, Cassandra, Vitess, Oracle, Microsoft SQL Server scylladb is cassandra compatible and provides external debezium connector, debezium ui is written in React, incremental snapshots, netflix cdc system, DBLog: A Watermark Based Change-Data-Capture Framework, multi-threaded snapshots, internal data leakage and the Outbox pattern, debezium listens to the outbox pattern, OpenTracing integration and the outbox pattern, sending messages directly to transaction log with PostgreSQL, Quarkus outbox pattern extension, the transaction boundary topic Gunnar Morling on twitter: @gunnarmorling and debezium.io

oracle react engine server mongodb mysql postgresql db2 outbox jdbc apache pulsar opentracing apache iceberg kafka connect ksqldb
Streaming Audio: a Confluent podcast about Apache Kafka
Explaining Stream Processing and Apache Kafka ft. Eugene Meidinger

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Nov 23, 2021 29:28 Transcription Available


Many of us find ourselves in the position of equipping others to use Apache Kafka® after we've gained an understanding of what Kafka is used for. But how do you communicate and teach others event streaming concepts effectively? As a Pluralsight instructor and business intelligence consultant, Eugene Meidinger shares tips for creating consumable training materials for conveying event streaming concepts to developers and IT administrators, who are trying to get on board with Kafka and stream processing. Eugene's background as a database administrator (DBA) and immense knowledge of event streaming architecture and data processing shows as he reveals his learnings from years of working with Microsoft Power BI, Azure Event Hubs, data processing, and event streaming with ksqlDB and Kafka Streams. Eugene mentions the importance of understanding your audience, their pain points, and their questions, such as why was Kafka invented? Why does ksqlDB matter? It also helps to use metaphors where appropriate. For example, when explaining what is processing typology for Kafka Streams, Eugene uses the analogy of a highway where people are getting on a bus as the blocking operations, after the grace period, the bus will leave even without passengers, meaning after the window session, the processor will continue even without events. He also likes to inject a sense of humor in his training and keeps empathy in mind. Here is the structure that Eugene uses when building courses:The first module is usually fundamentals, which lays out the groundwork and the objectives of the courseIt's critical to repeat and summarize core concepts or major points; for example, a key capability of Kafka is the ability to decouple data in both network space and in time Provide variety and different modalities that allow people to consume content through multiple avenues, such as screencasts, slides, and demos, wherever it makes senseEPISODE LINKSBuilding ETL Pipelines from Streaming Data with Kafka and ksqlDBDon't Make Me Think | Steve KrugDesign for How People Learn | Julie Dirksen Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usage (details)

Engenharia de Dados [Cast]

Os produtos de Big Data open-source são os mais utilizados pelas médias e grandes empresas para operacionalizar processos de análise de dados em grande escala para todas as áreas e departamentos de uma empresa.Porém, depender de um provedor de nuvem para essa tarefa pode te trazer alguns problemas a longo prazo, principalmente pelo "lock-in" e custos elevados dos produtos de modalidade PaaS e SaaS de fato, a maioria das empresas compreende porque produtos open-source são os mais utilizados do mundo.A terceira geração de Big Data nasce para endereçar esses problemas, agora é possível criar uma infraestrutura "self-healing" e de baixo custo para operacionalizar os produtos mais utilizados de Big Data do mundo. As provedoras de nuvem oferecem Kubernetes gerenciados (AKS, GKE e EKS) para que você possa focar no que é mais importante para o negócio e ao mesmo tempo permanecer e colaborar no mesmo tipo de ambiente. Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Streaming Audio: a Confluent podcast about Apache Kafka
Confluent Platform 7.0: New Features + Updates

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Nov 9, 2021 12:16 Transcription Available


Confluent Platform 7.0 has launched and includes Apache Kafka® 3.0, plus new features introduced by KIP-630: Kafka Raft Snapshot, KIP-745: Connect API to restart connector and task, and KIP-695: Further improve Kafka Streams timestamp synchronization. Reporting from Dubai, Tim Berglund (Senior Director, Developer Advocacy, Confluent) provides a summary of new features, updates, and improvements to the 7.0 release, including the ability to create a real-time bridge from on-premises environments to the cloud with Cluster Linking. Cluster Linking allows you to create a single cluster link between multiple environments from Confluent Platform to Confluent Cloud, which is available on public clouds like AWS, Google Cloud, and Microsoft Azure, removing the need for numerous point-to-point connections. Consumers reading from a topic in one environment can read from the same topic in a different environment without risks of reprocessing or missing critical messages. This provides operators the flexibility to make changes to topic replication smoothly and byte for byte without data loss. Additionally, Cluster Linking eliminates any need to deploy MirrorMaker2 for replication management while ensuring offsets are preserved. Furthermore, the release of Confluent for Kubernetes 2.2 allows you to build your own private cloud in Kafka. It completes the declarative API by adding cloud-native management of connectors, schemas, and cluster links to reduce the operational burden and manual processes so that you can instead focus on high-level declarations. Confluent for Kubernetes 2.2 also enhances elastic scaling through the Shrink API.  Following ZooKeeper's removal in Apache Kafka 3.0, Confluent Platform 7.0 introduces KRaft in preview to make it easier to monitor and scale Kafka clusters to millions of partitions. There are also several ksqlDB enhancements in this release, including foreign-key table joins and the support of new data types—DATE and TIME— to account for time values that aren't TIMESTAMP. This results in consistent data ingestion from the source without having to convert data types.EPISODE LINKSDownload Confluent Platform 7.0Check out the release notesRead the Confluent Platform 7.0 blog postWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Real-Time Stream Processing with Kafka Streams ft. Bill Bejeck

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Nov 4, 2021 35:32 Transcription Available


Kafka Streams is a native streaming library for Apache Kafka® that consumes messages from Kafka to perform operations like filtering a topic's message and producing output back into Kafka. After working as a developer in stream processing, Bill Bejeck (Apache Kafka Committer and Integration Architect, Confluent) has found his calling in sharing knowledge and authoring his book, “Kafka Streams in Action.” As a Kafka Streams expert, Bill is also the author of the Kafka Streams 101 course on Confluent Developer, where he delves into what Kafka Streams is, how to use it, and how it works. Kafka Streams provides the abstraction over Kafka consumers and producers by minimizing administrative details like the need to code and manage frameworks required when using plain Kafka consumers and producers to process streams. Kafka Streams is declarative—you can state what you want to do, rather than how to do it. Kafka Streams leverages the KafkaConsumer protocol internally; it inherits its dynamic scaling properties and the consumer group protocol to dynamically redistribute the workload. When Kafka Streams applications are deployed separately but have the same application.id, they are logically still one application. Kafka Streams has two processing APIs, the declarative API or domain-specific language (DSL)  is a high-level language that enables you to build anything needed with a processor topology, whereas the Processor API lets you specify a processor typology node by node, providing the ultimate flexibility. To underline the differences between the two APIs, Bill says it's almost like using the object-relational mapping framework (ORM) versus SQL. The Kafka Streams 101 course is designed to get you started with Kafka Streams and to help you learn the fundamentals of: How streams and tables work How stateless and stateful operations work How to handle time windows and out of order dataHow to deploy Kafka StreamsEPISODE LINKSKafka Streams 101 courseA Guide to Kafka Streams and Its UsesKafka Streams 101 meetupWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse podcon19 to get 40% off "Kafka Streams in Action"Use podcon19 to get 40% off "Event Streaming with Kafka Streams and ksqlDB"Use PODCAST100 to get $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Powering Event-Driven Architectures on Microsoft Azure with Confluent

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 14, 2021 38:42 Transcription Available


When you order a pizza, what if you knew every step of the process from the moment it goes in the oven to being delivered to your doorstep? Event-Driven Architecture is a modern, data-driven approach that describes “events” (i.e., something that just happened). A real-time data infrastructure enables you to provide such event-driven data insights in real time. Israel Ekpo (Principal Cloud Solutions Architect, Microsoft Global Partner Solutions, Microsoft) and Alicia Moniz (Cloud Partner Solutions Architect, Confluent) discuss use cases on leveraging Confluent Cloud and Microsoft Azure to power real-time, event-driven architectures. As an Apache Kafka® community stalwart, Israel focuses on helping customers and independent software vendor (ISV) partners build solutions for the cloud and use open source databases and architecture solutions like Kafka, Kubernetes, Apache Flink, MySQL, and PostgreSQL on Microsoft Azure. He's worked with retailers and those in the IoT space to help them adopt processes for inventory management with Confluent. Having a cloud-native, real-time architecture that can keep an accurate record of supply and demand is important in keeping up with the inventory and customer satisfaction. Israel has also worked with customers that use Confluent to integrate with Cosmos DB, Microsoft SQL Server, Azure Cognitive Search, and other integrations within the Azure ecosystem. Another important use case is enabling real-time data accessibility in the public sector and healthcare while ensuring data security and regulatory compliance like HIPAA. Alicia has a background in AI, and she expresses the importance of moving away from the monolithic, centralized data warehouse to a more flexible and scalable architecture like Kafka. Building a data pipeline leveraging Kafka helps ensure data security and consistency with minimized risk.The Confluent and Azure integration enables quick Kafka deployment with out-of-the-box solutions within the Kafka ecosystem. Confluent Schema Registry captures event streams with a consistent data structure, ksqlDB enables the development of real-time ETL pipelines, and Kafka Connect enables the streaming of data to multiple Azure services.EPISODE LINKSMicrosoft Azure at Kafka Summit AmericasIzzyAcademy Kafka on Azure Learning Series by Alicia MonizWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Intro to Kafka Connect: Core Components and Architecture ft. Robin Moffatt

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Sep 28, 2021 31:18 Transcription Available


Kafka Connect is a streaming integration framework between Apache Kafka® and external systems, such as databases and cloud services. With expertise in ksqlDB and Kafka Connect, Robin Moffatt (Staff Developer Advocate, Confluent) helps and supports the developer community in understanding Kafka and its ecosystem. Recently, Robin authored a Kafka Connect 101 course that will help you understand the basic concepts of Kafka Connect, its key features, and how it works.What's Kafka Connect, and how does it work with Kafka and brokers? Robin explains that Kafka Connect is a Kafka API that runs separately from the Kafka brokers, running on its own Java virtual machine (JVM) process known as the Kafka Connect worker. Kafka Connect is essential for streaming data from different sources into Kafka and from Kafka to various targets. With Connect, you don't have to write programs using Java and instead specify your pipeline using configuration. Kafka Connect.As a pluggable framework, Kafka Connect has a broad set of more than 200 different connectors available on Confluent Hub, including but not limited to:NoSQL and document stores (Elasticsearch, MongoDB, and Cassandra)RDBMS (Oracle, SQL Server, DB2, PostgreSQL, and MySQL)Cloud object stores (Amazon S3, Azure Blob Storage, and Google Cloud Storage),Message queues (ActiveMQ, IBM MQ, and RabbitMQ)Robin and Tim also discuss single message transform (SMTs), as well as distributed and standalone deployment modes Kafka Connect. Tune in to learn more about Kafka Connect, and get a preview of the Kafka Connect 101 course.EPISODE LINKSKafka Connect 101 courseKafka Connect Fundamentals: What is Kafka Connect?Meetup: From Zero to Hero with Kafka ConnectConfluent Hub: Discover Kafka connectors and more12 Days of SMTsWhy Kafka Connect? ft. Robin MoffattWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Digital. Innovation. Engineers.
KsqlDB, la reinvención de SQL

Digital. Innovation. Engineers.

Play Episode Listen Later Sep 27, 2021 31:12


En esta ocasión hablamos sobre una de las opciones más sencillas que Confluent ofrece para hacer streaming de datos: KsqlDB. ¿Te animas a probarlo? Música: Aliaksei Yuknhevich - Background Ukulele Makesound - Ambient Motivational Inspirational

Streaming Audio: a Confluent podcast about Apache Kafka
Using Apache Kafka and ksqlDB for Data Replication at Bolt

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 26, 2021 29:15 Transcription Available


What does a ride-hailing app that offers micromobility and food delivery services have to do with data in motion? In this episode, Ruslan Gibaiev (Data Architect, Bolt) shares about Bolt's road to adopting Apache Kafka® and ksqlDB for stream processing to replicate data from transactional databases to analytical warehouses. Rome wasn't built overnight, nor was the adoption of Kafka and  ksqlDB at Bolt. Initially, Bolt noticed the need for system standardization and replacing the unreliable query-based change data capture (CDC) process. As an experienced Kafka developer, Ruslan believed that Kafka is the solution for adopting change data capture as a company-wide event streaming solution. Persuading the team at Bolt to adopt and buy in was hard at first, but Ruslan made it possible. Eventually, the team replaced query-based CDC with log-based CDC from Debezium, built on top of Kafka. Shortly after the implementation, developers at Bolt began to see precise, correct, and real-time data. As Bolt continues to grow, they see the need to implement a data lake or a data warehouse for OTP system data replication and stream processing. After carefully considering several different solutions and frameworks such as ksqlDB, Apache Flink®, Apache Spark™, and Kafka Streams, ksqlDB shines most for their business requirement. Bolt adopted ksqlDB because it is native to the Kafka ecosystem, and it is a perfect fit for their use case. They found ksqlDB to be a particularly good fit for replicating all their data to a data warehouse for a number of reasons, including: Easy to deploy and manageLinearly scalableNatively integrates with Confluent Schema Registry Turn in to find out more about Bolt's adoption journey with Kafka and ksqlDB. EPISODE LINKSInside ksqlDB Course ksqlDB 101 CourseHow Bolt Has Adopted Change Data Capture with Confluent PlatformAnalysing Changes with Debezium and Kafka StreamsNo More Silos: How to Integrate Your Databases with Apache Kafka and CDCChange Data Capture with Debezium ft. Gunnar MorlingAnnouncing ksqlDB 0.17.0Real-Time Data Replication with ksqlDBWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Advanced Stream Processing with ksqlDB ft. Michael Drogalis

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 11, 2021 28:26 Transcription Available


ksqlDB makes it easy to read, write, process, and transform data on Apache Kafka®, the de facto event streaming platform. With simple SQL syntax, pre-built connectors, and materialized views, ksqlDB's powerful stream processing capabilities enable you to quickly start processing real-time data at scale. But how does ksqlDB work? In this episode, Michael Drogalis (Principal Product Manager, Product Management, Confluent) previews an all-new Confluent Developer course: Inside ksqlDB, where he provides a full overview of ksqlDB's internal architecture and delves into advanced ksqlDB features. When it comes to ksqlDB or Kafka Streams, there's one principle to keep in mind: ksqlDB and Kafka Streams share a runtime. ksqlDB runs its SQL queries by dynamically writing Kafka Streams typologies. Leveraging Confluent Cloud makes it even easier to use ksqlDB.Once you are familiar with ksqlDB's basic design, you'll be able to troubleshoot problems and build real-time applications more effectively. The Inside ksqlDB course is designed to help you advance in ksqlDB and Kafka. Paired with hands-on exercises and ready-to-use codes, the course covers topics including: ksqlDB architectureHow stateless and stateful operations workStreaming joins Table-table joinsElastic scaling High availabilityMichael also sheds light on ksqlDB's roadmap: Building out the query layer so that is highly scalable, making it able to execute thousands of concurrent subscriptionsMaking Confluent Cloud the best place to run ksqlDB and process streamsTune in to this episode to find out more about the Inside ksqlDB course on Confluent Developer. The all-new website provides diverse and comprehensive resources for developers looking to learn about Kafka and Confluent. You'll find free courses, tutorials, getting started guides, quick starts for 60+ event streaming patterns, and more—all in a single destination. EPISODE LINKSInside ksqlDB Course ksqlDB 101 CourseHow Real-Time Stream Processing Safely Scales with ksqlDB, AnimatedHow Real-Time Materialized Views Work with ksqlDB, AnimatedHow Real-Time Stream Processing Works with ksqlDB, AnimatedWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Minimizing Software Speciation with ksqlDB and Kafka Streams ft. Mitch Seymour

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 5, 2021 31:32 Transcription Available


Building a large, stateful Kafka Streams application that tracks the state of each outgoing email is crucial to marketing automation tools like Mailshimp. Joining us today in this episode, Mitch Seymour, staff engineer at Mailchimp, shares how ksqlDB and Kafka Streams handle the company's largest source of streaming data.  Almost like a post office, except instead of sending physical parcels, Mailchimp sends billions of emails per day. Monitoring the state of each email can provide visibility into the core business function, and it also returns information about the health of both internal and remote message transfer agents (MTAs). Finding a way to track those MTA systems in real time is pivotal to the success of the business. Mailchimp is an early Apache Kafka® adopter that started using the technology in 2014, a time before ksqlDB, Kafka Connect, and Kafka Streams came into the picture. The stream processing applications that they were building faced many complexities and rough edges. As their use case evolved and scaled overtime at Mailchimp, a large number of applications deviated from the initial implementation and design so that different applications emerged that they had to maintain. To reduce cost, complexity, and standardize stream processing applications, adopting ksqlDB and Kafka Streams became the solution to their problems. This is what Mitch calls, “minimizing software speciation in our software” It's the idea when applications evolved into multiple systems to respond to failure-handling strategies, increased load, and the like. Using different scaling strategies and communication protocols creates system silos and can be challenging to maintain.Replacing the existing architecture that supported point-to-point communication, the new Mailchimp architecture uses Kafka as its foundation with scalable custom functions, such as a reusable and highly functional user-defined function (UDF). The reporting capabilities have also evolved from Kafka Streams' interactive queries into enhanced queries with Elasticsearch. Turning experiences into books, Mitch is also an author of O'Reilly's Mastering Kafka Streams and ksqlDB and the author and illustrator of Gently Down the Stream: A Gentle Introduction to Apache Kafka. EPISODE LINKSThe Exciting Frontier of Custom ksql Functions (Mitch Seymour, Mailchimp) Kafka Summit LondonApache Kafka 101: Kafka Streams CourseksqlDB UDFs and UDADs Made EasyUsing Apache Kafka as a Scalable, Event-Driven Backbone for Service ArchitecturesThe Haiku Approach to Writing SoftwareWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Consistent, Complete Distributed Stream Processing ft. Guozhang Wang

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jul 22, 2021 29:00 Transcription Available


Stream processing has become an important part of the big data landscape as a new programming paradigm to implement real-time data-driven applications. One of the biggest challenges for streaming systems is to provide correctness guarantees for data processing in a distributed environment. Guozhang Wang (Distributed Systems Engineer, Confluent) contributed to a leadership paper, along with other leaders in the Apache Kafka® community, on consistency and completeness in streaming processing in Apache Kafka in order to shed light on what a reimagined, modern infrastructure looks like. In his white paper titled Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka, Guozhang covers the following topics in his paper: Streaming correctness challengesStream processing with KafkaExactly-once in Kafka StreamsFor context, accurate, real-time data stream processing is more friendly to modern organizations that are composed of vertically separated engineering teams. Unlike in the past, stream processing was considered as an auxiliary system to normal batch processing oriented systems, often bearing issues around consistency and completeness. While modern streaming engines, such as ksqlDB and Kafka Streams are designed to be authoritative, as the source of truth, and are no longer treated as an approximation, by providing strong correctness guarantees. There are two major umbrellas of the correctness of guarantees: Consistency: Ensuring unique and extant recordsCompleteness: Ensuring the correct order of records, also referred to as exactly-once semantics. Guozhang also answers the question of why he wrote this academic paper, as he believes in the importance of knowledge sharing across the community and bringing industry experience back to academia (the paper is also published in SIGMOD 2021, one of the most important conference proceedings in the data management research area). This will help foster the next generation of industry innovation and push one step forward in the data streaming and data management industry. In Guozhang's own words, "Academic papers provide you this proof of concept design, which gets groomed into a big system."EPISODE LINKSWhite Paper: Rethinking Distributed Stream Processing in Apache KafkaBlog: Rethinking Distributed Stream Processing in Apache KafkaEnabling Exactly-Once in Kafka StreamsWhy Kafka Streams Does Not Use Watermarks ft. Matthias SaxStreams and Tables: Two Sides of the Same CoinWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get $60 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Data-Driven Digitalization with Apache Kafka in the Food Industry at BAADER

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jun 29, 2021 27:53 Transcription Available


Coming out of university, Patrick Neff (Data Scientist, BAADER) was used to “perfect” examples of datasets. However, he soon realized that in the real world, data is often either unavailable or unstructured. This compelled him to learn more about collecting data, analyzing it in a smart and automatic way, and exploring Apache Kafka® as a core ecosystem while at BAADER, a global provider of food processing machines. After Patrick began working with Apache Kafka in 2019, he developed several microservices with Kafka Streams and used Kafka Connect for various data analytics projects. Focused on the food value chain, Patrick's mission is to optimize processes specifically around transportation and processing. In consulting one customer, Patrick detected an area of improvement related to animal welfare, lost revenues, unnecessary costs, and carbon dioxide emissions. He also noticed that often machines are ready to send data into the cloud, but the correct presentation and/or analysis of the data is missing and thus the possibility of optimization. As a result:Data is difficult to understand because of missing unitsData has not been analyzed so farComparison of machine/process performance for the same machine but different customers is missing In response to this problem, he helped develop the Transport Manager. Based on data analytics results, the Transport Manager presents information like a truck's expected arrival time and its current poultry load. This leads to better planning, reduced transportation costs, and improved animal welfare. The Asset Manager is another solution that Patrick has been working on, and it presents IoT data in real time and in an understandable way to the customer. Both of these are data analytics projects that use machine learning.Kafka topics store data, provide insight, and detect dependencies related to why trucks are stopping along the route, for example. Kafka is also a real-time platform, meaning that alerts can be sent directly when a certain event occurs using ksqlDB or Kafka Streams.As a result of running Kafka on Confluent Cloud and creating a scalable data pipeline, the BAADER team is able to break data silos and produce live data from trucks via MQTT. They've even created an Android app for truck drivers, along with a desktop version that monitors the data inputted from a truck driver on the app in addition to other information, such as expected time of arrival and weather information—and the best part: All of it is done in real time.EPISODE LINKSLearn more about BAADER's data-in-motion use casesRead about how BAADER uses Confluent CloudWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)

Drill to Detail
Drill to Detail Ep.90 'Apache Kafka, Real-Time Stream Processing and ksqlDB' with Special Guest Michael Drogalis

Drill to Detail

Play Episode Listen Later May 20, 2021 39:02


Mark Rittman is joined in this episode by Michale Drogalis, Product Manager at Confluent to talk about the history of Apache Kafka and real-time stream processing, the origins of kSQL and kSQLdb and the rationale and use-cases addressed by these products.ksqldDB product homepage Introducing ksqlDB Blog PostKafka Streams vs. ksqlDB Blog Post

Drill to Detail
Drill to Detail Ep.90 'Apache Kafka, Real-Time Stream Processing and ksqlDB' with Special Guest Michael Drogalis

Drill to Detail

Play Episode Listen Later May 20, 2021 39:02


Mark Rittman is joined in this episode by Michale Drogalis, Product Manager at Confluent to talk about the history of Apache Kafka and real-time stream processing, the origins of kSQL and kSQLdb and the rationale and use-cases addressed by these products.ksqldDB product homepage Introducing ksqlDB Blog PostKafka Streams vs. ksqlDB Blog Post

The Cloudcast
kslDB Database for Streaming Events

The Cloudcast

Play Episode Listen Later May 12, 2021 30:32


Michael Drogalis (@MichaelDrogalis, Product Manager @Confluent) talks about the evolution of Kafka, the challenges of data management with event-driven systems, and the kslDB Database.SHOW: 514SHOW SPONSOR LINKS:Datadog Security Monitoring Homepage: Modern Monitoring and AnalyticsTry Datadog yourself by starting a free, 14-day trial today. Listeners of this podcast will also receive a free Datadog T-shirt.Okta - Safe Identity for customers and workforceTry Okta for FREE (Trial in 10 minutes)CBT Nuggets: Expert IT Training for individuals and teamsSign up for a CBT Nuggets Free Learner account CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwCHECK OUT OUR NEW PODCAST - "CLOUDCAST BASICS"SHOW NOTES:Confluent HomepageksqlDB HomepageHow Real-Time Stream Processing Works with ksqlDBHow Real-Time Materialized Views Work with ksqlDBHow Real-Time Stream Processing Safely Scales with ksqlDBTopic 1 - Welcome to the show. Before we dive into this new database, let’s talk a little bit about your background. Topic 2 - Let’s first talk about Kafka. What is it,. what does it do, and what are common use-cases that use Kafka?Topic 3 - Kafka handles all these streams of data / events, but there are a bunch of other pieces needed to connect the data to the applications. What are those pieces, and where are their often challenges?Topic 4 - Let’s talk about this new project - ksqlDB. It’s real-time, it’s Kafka and it’s SQL. Walk us through how it was created and the types of challenges it solves for. Topic 5 - How is kslDB different from using other SQL databases? Topic 6 - What are some of the initial ways that you’re seeing ksqlDB being used? Is it beginning to allow new use-cases to emerge?FEEDBACK?Email: show at thecloudcast dot netTwitter: @thecloudcastnet

Software Daily
kSQLDB: Kafka Streaming Interface with Michael Drogalis

Software Daily

Play Episode Listen Later Apr 7, 2020


Kafka is a distributed stream processing system that is commonly used for storing large volumes of append-only event data. Kafka has been open source for almost a decade, and as the project has matured, it has been used for new kinds of applications. Kafka's pubsub interface for writing and reading topics is not ideal for all of these applications, which has led to the creation of ksqlDB, a database system built for streaming applications that uses Kafka as the underlying infrastructure for storing data.Michael Drogalis is a principal product manager at Confluent, where he helped develop ksqlDB. Michael joins the show to discuss ksqlDB, including the architecture, the query semantics, and the applications which might want a database that focuses on streams. We have done many great shows on Kafka in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about Kafka, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.