Podcasts about kafka connect

  • 17PODCASTS
  • 63EPISODES
  • 39mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • Apr 29, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about kafka connect

Latest podcast episodes about kafka connect

Oracle University Podcast
What is Oracle GoldenGate 23ai?

Oracle University Podcast

Play Episode Listen Later Apr 29, 2025 18:03


In a new season of the Oracle University Podcast, Lois Houston and Nikita Abraham dive into the world of Oracle GoldenGate 23ai, a cutting-edge software solution for data management. They are joined by Nick Wagner, a seasoned expert in database replication, who provides a comprehensive overview of this powerful tool.   Nick highlights GoldenGate's ability to ensure continuous operations by efficiently moving data between databases and platforms with minimal overhead. He emphasizes its role in enabling real-time analytics, enhancing data security, and reducing costs by offloading data to low-cost hardware. The discussion also covers GoldenGate's role in facilitating data sharing, improving operational efficiency, and reducing downtime during outages.   Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. ---------------------------------------------------------------   Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Nikita: Welcome to the Oracle University Podcast! I'm Nikita Abraham, Team Lead: Editorial Services with Oracle University, and with me is Lois Houston: Director of Innovation Programs. Lois: Hi everyone! Welcome to a new season of the podcast. This time, we're focusing on the fundamentals of Oracle GoldenGate. Oracle GoldenGate helps organizations manage and synchronize their data across diverse systems and databases in real time.  And with the new Oracle GoldenGate 23ai release, we'll uncover the latest innovations and features that empower businesses to make the most of their data. Nikita: Taking us through this is Nick Wagner, Senior Director of Product Management for Oracle GoldenGate. He's been doing database replication for about 25 years and has been focused on GoldenGate on and off for about 20 of those years.  01:18 Lois: In today's episode, we'll ask Nick to give us a general overview of the product, along with some use cases and benefits. Hi Nick! To start with, why do customers need GoldenGate? Nick: Well, it delivers continuous operations, being able to continuously move data from one database to another database or data platform in efficiently and a high-speed manner, and it does this with very low overhead. Almost all the GoldenGate environments use transaction logs to pull the data out of the system, so we're not creating any additional triggers or very little overhead on that source system. GoldenGate can also enable real-time analytics, being able to pull data from all these different databases and move them into your analytics system in real time can improve the value that those analytics systems provide. Being able to do real-time statistics and analysis of that data within those high-performance custom environments is really important. 02:13 Nikita: Does it offer any benefits in terms of cost?  Nick: GoldenGate can also lower IT costs. A lot of times people run these massive OLTP databases, and they are running reporting in those same systems. With GoldenGate, you can offload some of the data or all the data to a low-cost commodity hardware where you can then run the reports on that other system. So, this way, you can get back that performance on the OLTP system, while at the same time optimizing your reporting environment for those long running reports. You can improve efficiencies and reduce risks. Being able to reduce the amount of downtime during planned and unplanned outages can really make a big benefit to the overall operational efficiencies of your company.  02:54 Nikita: What about when it comes to data sharing and data security? Nick: You can also reduce barriers to data sharing. Being able to pull subsets of data, or just specific pieces of data out of a production database and move it to the team or to the group that needs that information in real time is very important. And it also protects the security of your data by only moving in the information that they need and not the entire database. It also provides extensibility and flexibility, being able to support multiple different replication topologies and architectures. 03:24 Lois: Can you tell us about some of the use cases of GoldenGate? Where does GoldenGate truly shine?  Nick: Some of the more traditional use cases of GoldenGate include use within the multicloud fabric. Within a multicloud fabric, this essentially means that GoldenGate can replicate data between on-premise environments, within cloud environments, or hybrid, cloud to on-premise, on-premise to cloud, or even within multiple clouds. So, you can move data from AWS to Azure to OCI. You can also move between the systems themselves, so you don't have to use the same database in all the different clouds. For example, if you wanted to move data from AWS Postgres into Oracle running in OCI, you can do that using Oracle GoldenGate. We also support maximum availability architectures. And so, there's a lot of different use cases here, but primarily geared around reducing your recovery point objective and recovery time objective. 04:20 Lois: Ah, reducing RPO and RTO. That must have a significant advantage for the customer, right? Nick: So, reducing your RPO and RTO allows you to take advantage of some of the benefits of GoldenGate, being able to do active-active replication, being able to set up GoldenGate for high availability, real-time failover, and it can augment your active Data Guard and Data Guard configuration. So, a lot of times GoldenGate is used within Oracle's maximum availability architecture platinum tier level of replication, which means that at that point you've got lots of different capabilities within the Oracle Database itself. But to help eke out that last little bit of high availability, you want to set up an active-active environment with GoldenGate to really get true zero RPO and RTO. GoldenGate can also be used for data offloading and data hubs. Being able to pull data from one or more source systems and move it into a data hub, or into a data warehouse for your operational reporting. This could also be your analytics environment too. 05:22 Nikita: Does GoldenGate support online migrations? Nick: In fact, a lot of companies actually get started in GoldenGate by doing a migration from one platform to another. Now, these don't even have to be something as complex as going from one database like a DB2 on-premise into an Oracle on OCI, it could even be simple migrations. A lot of times doing something like a major application or a major database version upgrade is going to take downtime on that production system. You can use GoldenGate to eliminate that downtime. So this could be going from Oracle 19c to Oracle 23ai, or going from application version 1.0 to application version 2.0, because GoldenGate can do the transformation between the different application schemas. You can use GoldenGate to migrate your database from on premise into the cloud with no downtime as well. We also support real-time analytic feeds, being able to go from multiple databases, not only those on premise, but being able to pull information from different SaaS applications inside of OCI and move it to your different analytic systems. And then, of course, we also have the ability to stream events and analytics within GoldenGate itself.  06:34 Lois: Let's move on to the various topologies supported by GoldenGate. I know GoldenGate supports many different platforms and can be used with just about any database. Nick: This first layer of topologies is what we usually consider relational database topologies. And so this would be moving data from Oracle to Oracle, Postgres to Oracle, Sybase to SQL Server, a lot of different types of databases. So the first architecture would be unidirectional. This is replicating from one source to one target. You can do this for reporting. If I wanted to offload some reports into another server, I can go ahead and do that using GoldenGate. I can replicate the entire database or just a subset of tables. I can also set up GoldenGate for bidirectional, and this is what I want to set up GoldenGate for something like high availability. So in the event that one of the servers crashes, I can almost immediately reconnect my users to the other system. And that almost immediately depends on the amount of latency that GoldenGate has at that time. So a typical latency is anywhere from 3 to 6 seconds. So after that primary system fails, I can reconnect my users to the other system in 3 to 6 seconds. And I can do that because as GoldenGate's applying data into that target database, that target system is already open for read and write activity. GoldenGate is just another user connecting in issuing DML operations, and so it makes that failover time very low. 07:59 Nikita: Ok…If you can get it down to 3 to 6 seconds, can you bring it down to zero? Like zero failover time?   Nick: That's the next topology, which is active-active. And in this scenario, all servers are read/write all at the same time and all available for user activity. And you can do multiple topologies with this as well. You can do a mesh architecture, which is where every server talks to every other server. This works really well for 2, 3, 4, maybe even 5 environments, but when you get beyond that, having every server communicate with every other server can get a little complex. And so at that point we start looking at doing what we call a hub and spoke architecture, where we have lots of different spokes. At the end of each spoke is a read/write database, and then those communicate with a hub. So any change that happens on one spoke gets sent into the hub, and then from the hub it gets sent out to all the other spokes. And through that architecture, it allows you to really scale up your environments. We have customers that are doing up to 150 spokes within that hub architecture. Within active-active replication as well, we can do conflict detection and resolution, which means that if two users modify the same row on two different systems, GoldenGate can actually determine that there was an issue with that and determine what user wins or which row change wins, which is extremely important when doing active-active replication. And this means that if one of those systems fails, there is no downtime when you switch your users to another active system because it's already available for activity and ready to go. 09:35 Lois: Wow, that's fantastic. Ok, tell us more about the topologies. Nick: GoldenGate can do other things like broadcast, sending data from one system to multiple systems, or many to one as far as consolidation. We can also do cascading replication, so when data moves from one environment that GoldenGate is replicating into another environment that GoldenGate is replicating. By default, we ignore all of our own transactions. But there's actually a toggle switch that you can flip that says, hey, GoldenGate, even though you wrote that data into that database, still push it on to the next system. And then of course, we can also do distribution of data, and this is more like moving data from a relational database into something like a Kafka topic or a JMS queue or into some messaging service. 10:24 Raise your game with the Oracle Cloud Applications skills challenge. Get free training on Oracle Fusion Cloud Applications, Oracle Modern Best Practice, and Oracle Cloud Success Navigator. Pass the free Oracle Fusion Cloud Foundations Associate exam to earn a Foundations Associate certification. Plus, there's a chance to win awards and prizes throughout the challenge! What are you waiting for? Join the challenge today by visiting visit oracle.com/education. 10:58 Nikita: Welcome back! Nick, does GoldenGate also have nonrelational capabilities?  Nick: We have a number of nonrelational replication events in topologies as well. This includes things like data lake ingestion and streaming ingestion, being able to move data and data objects from these different relational database platforms into data lakes and into these streaming systems where you can run analytics on them and run reports. We can also do cloud ingestion, being able to move data from these databases into different cloud environments. And this is not only just moving it into relational databases with those clouds, but also their data lakes and data fabrics. 11:38 Lois: You mentioned a messaging service earlier. Can you tell us more about that? Nick: Messaging replication is also possible. So we can actually capture from things like messaging systems like Kafka Connect and JMS, replicate that into a relational data, or simply stream it into another environment. We also support NoSQL replication, being able to capture from MongoDB and replicate it onto another MongoDB for high availability or disaster recovery, or simply into any other system. 12:06 Nikita: I see. And is there any integration with a customer's SaaS applications? Nick: GoldenGate also supports a number of different OCI SaaS applications. And so a lot of these different applications like Oracle Financials Fusion, Oracle Transportation Management, they all have GoldenGate built under the covers and can be enabled with a flag that you can actually have that data sent out to your other GoldenGate environment. So you can actually subscribe to changes that are happening in these other systems with very little overhead. And then of course, we have event processing and analytics, and this is the final topology or flexibility within GoldenGate itself. And this is being able to push data through data pipelines, doing data transformations. GoldenGate is not an ETL tool, but it can do row-level transformation and row-level filtering.  12:55 Lois: Are there integrations offered by Oracle GoldenGate in automation and artificial intelligence? Nick: We can do time series analysis and geofencing using the GoldenGate Stream Analytics product. It allows you to actually do real time analysis and time series analysis on data as it flows through the GoldenGate trails. And then that same product, the GoldenGate Stream Analytics, can then take the data and move it to predictive analytics, where you can run MML on it, or ONNX or other Spark-type technologies and do real-time analysis and AI on that information as it's flowing through.  13:29 Nikita: So, GoldenGate is extremely flexible. And given Oracle's focus on integrating AI into its product portfolio, what about GoldenGate? Does it offer any AI-related features, especially since the product name has “23ai” in it? Nick: With the advent of Oracle GoldenGate 23ai, it's one of the two products at this point that has the AI moniker at Oracle. Oracle Database 23ai also has it, and that means that we actually do stuff with AI. So the Oracle GoldenGate product can actually capture vectors from databases like MySQL HeatWave, Postgres using pgvector, which includes things like AlloyDB, Amazon RDS Postgres, Aurora Postgres. We can also replicate data into Elasticsearch and OpenSearch, or if the data is using vectors within OCI or the Oracle Database itself. So GoldenGate can be used for a number of things here. The first one is being able to migrate vectors into the Oracle Database. So if you're using something like Postgres, MySQL, and you want to migrate the vector information into the Oracle Database, you can. Now one thing to keep in mind here is a vector is oftentimes like a GPS coordinate. So if I need to know the GPS coordinates of Austin, Texas, I can put in a latitude and longitude and it will give me the GPS coordinates of a building within that city. But if I also need to know the altitude of that same building, well, that's going to be a different algorithm. And GoldenGate and replicating vectors is the same way. When you create a vector, it's essentially just creating a bunch of numbers under the screen, kind of like those same GPS coordinates. The dimension and the algorithm that you use to generate that vector can be different across different databases, but the actual meaning of that data will change. And so GoldenGate can replicate the vector data as long as the algorithm and the dimensions are the same. If the algorithm and the dimensions are not the same between the source and the target, then you'll actually want GoldenGate to replicate the base data that created that vector. And then once GoldenGate replicates the base data, it'll actually call the vector embedding technology to re-embed that data and produce that numerical formatting for you.  15:42 Lois: So, there are some nuances there… Nick: GoldenGate can also replicate and consolidate vector changes or even do the embedding API calls itself. This is really nice because it means that we can take changes from multiple systems and consolidate them into a single one. We can also do the reverse of that too. A lot of customers are still trying to find out which algorithms work best for them. How many dimensions? What's the optimal use? Well, you can now run those in different servers without impacting your actual AI system. Once you've identified which algorithm and dimension is going to be best for your data, you can then have GoldenGate replicate that into your production system and we'll start using that instead. So it's a nice way to switch algorithms without taking extensive downtime. 16:29 Nikita: What about in multicloud environments?  Nick: GoldenGate can also do multicloud and N-way active-active Oracle replication between vectors. So if there's vectors in Oracle databases, in multiple clouds, or multiple on-premise databases, GoldenGate can synchronize them all up. And of course we can also stream changes from vector information, including text as well into different search engines. And that's where the integration with Elasticsearch and OpenSearch comes in. And then we can use things like NVIDIA and Cohere to actually do the AI on that data.  17:01 Lois: Using GoldenGate with AI in the database unlocks so many possibilities. Thanks for that detailed introduction to Oracle GoldenGate 23ai and its capabilities, Nick.  Nikita: We've run out of time for today, but Nick will be back next week to talk about how GoldenGate has evolved over time and its latest features. And if you liked what you heard today, head over to mylearn.oracle.com and take a look at the Oracle GoldenGate 23ai Fundamentals course to learn more. Until next time, this is Nikita Abraham… Lois: And Lois Houston, signing off! 17:33 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

GOTO - Today, Tomorrow and the Future
Kafka Connect: Build & Run Data Pipelines • Kate Stanley, Mickael Maison & Danica Fine

GOTO - Today, Tomorrow and the Future

Play Episode Listen Later Feb 21, 2025 49:57 Transcription Available


This interview was recorded for the GOTO Book Club.http://gotopia.tech/bookclubRead the full transcription of the interview hereKate Stanley - Principal Software Engineer at Red Hat & Co-Author of "Kafka Connect"Mickael Maison - Senior Principal Software Engineer at Red Hat & Co-Author of "Kafka Connect"Danica Fine - Lead Developer Advocate, Open Source at SnowflakeRESOURCESKatehttps://fosstodon.org/@katherishttps://www.linkedin.comMickaelhttps://bsky.app/profile/mickaelmaison.bsky.socialhttps://mas.to/@MickaelMaisonhttps://www.linkedin.comhttps://mickaelmaison.comDanicahttps://bsky.app/profile/thedanicafine.bsky.socialhttps://data-folks.masto.host/@thedanicafinehttps://www.linkedin.comhttps://linktr.ee/thedanicafineLinkshttps://kafka.apache.orghttps://flink.apache.orghttps://debezium.iohttps://strimzi.ioDESCRIPTIONDanica Fine together with the authors of “Kafka Connect” Kate Stanley and Mickael Maison, unpack Kafka Connect's game-changing power for building data pipelines—no tedious custom scripts needed! Kate and Mickael Maison discuss how they structured the book to help everyone, from data engineers to developers, tap into Kafka Connect's strengths, including Change Data Capture (CDC), real-time data flow, and fail-safe reliability.RECOMMENDED BOOKSKate Stanley & Mickael Maison • Kafka ConnectShapira, Palino, Sivaram & Petty • Kafka: The Definitive GuideViktor Gamov, Dylan Scott & Dave Klein • Kafka in ActionBlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!

Engineering Kiosk
#177 Stream Processing & Kafka: Die Basis moderner Datenpipelines mit Stefan Sprenger

Engineering Kiosk

Play Episode Listen Later Jan 7, 2025 67:40


Data Streaming und Stream Processing mit Apache Kafka und dem entsprechenden Ecosystem.Eine ganze Menge Prozesse in der Softwareentwicklung bzw. für die Verarbeitung von Daten müssen nicht zur Laufzeit, sondern können asynchron oder dezentral bearbeitet werden. Begriffe wie Batch-Processing oder Message Queueing / Pub-Sub sind dafür geläufig. Es gibt aber einen dritten Player in diesem Spiel: Stream Processing. Da ist Apache Kafka das Flaggschiff, bzw. die verteilte Event Streaming Platform, die oft als erstes genannt wird.Doch was ist denn eigentlich Stream Processing und wie unterscheidet es sich zu Batch Processing oder Message Queuing? Wie funktioniert Kafka und warum ist es so erfolgreich und performant? Was sind Broker, Topics, Partitions, Producer und Consumer? Was bedeutet Change Data Capture und was ist ein Sliding Window? Auf was muss man alles acht geben und was kann schief gehen, wenn man eine Nachricht schreiben und lesen möchte?Die Antworten und noch viel mehr liefert unser Gast Stefan Sprenger.Bonus: Wie man Stream Processing mit einem Frühstückstisch für 5-jährige beschreibt.Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:

The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists
55 – Discussing the Apache Iceberg Kafka Connect Connector

The Datanation Podcast - Podcast for Data Engineers, Analysts and Scientists

Play Episode Listen Later May 16, 2024


In this episode, we delve into the Apache Iceberg Kafka Connector, a critical tool for streaming data into your data lakehouse. We’ll explore how this connector facilitates seamless data ingestion from Apache Kafka into Apache Iceberg, enhancing your real-time analytics capabilities and data lakehouse efficiency. We’ll cover: Join us to understand how the Apache Iceberg […]

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jun 15, 2023 11:25 Transcription Available


Apache Kafka® 3.5 is here with the capability of previewing migrations between ZooKeeper clusters to KRaft mode. Follow along as Danica Fine highlights key release updates.Kafka Core:KIP-833 provides an updated timeline for KRaft.KIP-866 now is preview and allows migration from an existing ZooKeeper cluster to KRaft mode.KIP-900 introduces a way to bootstrap the KRaft controllers with SCRAM credentials.KIP-903 prevents a data loss scenario by preventing replicas with stale broker epochs from joining the ISR list. KIP-915 streamlines the process of downgrading Kafka's transaction and group coordinators by introducing tagged fields.Kafka Connect:KIP-710 provides the option to use a REST API for internal server communication that can be enabled by setting `dedicated.mode.enable.internal.rest` equal to true. KIP-875 offers support for native offset management in Kafka Connect. Connect cluster administrators can now read offsets for both source and sink connectors. This KIP adds a new STOPPED state for connectors, enabling users to shut down connectors and maintain connector configurations without utilizing resources.KIP-894 makes `IncrementalAlterConfigs` API available for use in MirrorMaker 2 (MM2), adding a new use.incremental.alter.config configuration which takes values “requested,” “never,” and “required.”KIP-911 adds a new source tag for metrics generated by the `MirrorSourceConnector` to help monitor mirroring deployments.Kafka Streams:KIP-339 improves Kafka Streams' error-handling capabilities by addressing serialization errors that occur before message production and extending the interface for custom error handling. KIP-889 introduces versioned state stores in Kafka Streams for temporal join semantics in stream-to-table joins. KIP-904 simplifies table aggregation in Kafka by proposing a change in serialization format to enable one-step aggregation and reduce noise from events with old and new keys/values. KIP-914 modifies how versioned state stores are used in Kafka Streams. Versioned state stores may impact different DSL processors in varying ways, see the documentation for details.Kafka Client:KIP-881 is now complete and introduces new client-side assignor logic for rack-aware consumer balancing for Kafka Consumers. KIP-887 adds the `EnvVarConfigProvider` implementation to Kafka so custom configurations stored in environment variables can be injected into the system by providing the map returned by `System.getEnv()`.KIP 641 introduces the `RecordReader` interface to Kafka's clients module, replacing the deprecated MessageReader Scala trait. EPISODE LINKSSee release notes for Apache Kafka 3.5Read the blog to learn moreDownload and get started with Apache Kafka 3.5Watch the video version of this podcast

The MongoDB Podcast
Ep. 168 Data In Motion with Kenny Gorman, Head of Streaming at MongoDB

The MongoDB Podcast

Play Episode Listen Later Jun 14, 2023 23:23


In this enlightening episode, we have a conversation with Kenny Gorman, a key figure at MongoDB who focuses on data in motion and streaming data. We delve into the essential role of streaming data in the data-centric world of today, discussing its applications in diverse fields such as fraud detection, IoT devices, power grid management, and manufacturing.Kenny enlightens us about the three primary patterns related to streaming data and MongoDB's importance as both a source and a destination for this data. He also shares the challenges developers face in terms of making sense of high velocity data, distilling information, and adjusting their mental models to work effectively with streaming data.Kenny gives us a glimpse into MongoDB's roadmap, focusing on expanding its capabilities in the streaming data space to make developers' work easier and more efficient. He emphasizes MongoDB's efforts to enhance functionality, offer new features, and make things more accessible to their customers.For those interested in further learning, Kenny recommends checking out MongoDB's documentation on Kafka Connect. He also mentions his upcoming participation at MongoDB .local New York City.This episode is a must-listen for anyone involved in data management, particularly those keen on understanding and leveraging the power of streaming data. Don't miss out on Kenny's insightful thoughts and expert advice on this rapidly evolving field.Introduction (00:00 - 01:00): Introduction of the podcast and Kenny Gorman, an expert on data in motion and streaming data at MongoDB.Importance of Streaming Data (01:01 - 05:30): Kenny discusses the growing importance of streaming data, its applications in various fields including fraud detection, IoT devices, power grid management, and manufacturing, and how it's changing the way we view and use data.Three Patterns Related to Streaming Data (05:31 - 15:00): Kenny explains three primary patterns related to streaming data and the role of MongoDB as a source and destination for this data.Challenges in Streaming Data (15:01 - 23:00): Kenny delves into the challenges developers face when dealing with streaming data, including the difficulty in making sense of high velocity data, the need to distill meaningful information, and the necessary shift in mental models.Best Practices for Developers (23:01 - 29:30): Kenny shares some advice and best practices for developers working with streaming data and MongoDB, emphasizing the need to understand Kafka and how it can connect to MongoDB.MongoDB's Roadmap for Streaming Data (29:31 - 34:00): Kenny gives a glimpse into MongoDB's roadmap for streaming data, discussing their focus on enhancing functionality, introducing new features, and making things more accessible to their customers.Resources for Further Learning (34:01 - 36:00): Kenny recommends checking out MongoDB's documentation on Kafka Connect for those interested in learning more about streaming data and its applications.Upcoming Events (36:01 - 38:00): Kenny mentions his upcoming participation at dot local New York City and encourages listeners to attend.Conclusion (38:01 - End): The podcast host thanks Kenny for his time and the valuable insights he shared during the interview.

Data Engineering Podcast
Realtime Data Applications Made Easier With Meroxa

Data Engineering Podcast

Play Episode Listen Later Apr 24, 2023 45:26


Summary Real-time capabilities have quickly become an expectation for consumers. The complexity of providing those capabilities is still high, however, making it more difficult for small teams to compete. Meroxa was created to enable teams of all sizes to deliver real-time data applications. In this episode DeVaris Brown discusses the types of applications that are possible when teams don't have to manage the complex infrastructure necessary to support continuous data flows. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) Your host is Tobias Macey and today I'm interviewing DeVaris Brown about the impact of real-time data on business opportunities and risk profiles Interview Introduction How did you get involved in the area of data management? Can you describe what Meroxa is and the story behind it? How have the focus and goals of the platform and company evolved over the past 2 years? Who are the target customers for Meroxa? What problems are they trying to solve when they come to your platform? Applications powered by real-time data were the exclusive domain of large and/or sophisticated tech companies for several years due to the inherent complexities involved. What are the shifts that have made them more accessible to a wider variety of teams? What are some of the remaining blockers for teams who want to start using real-time data? With the democratization of real-time data, what are the new categories of products and applications that are being unlocked? How are organizations thinking about the potential value that those types of apps/services can provide? With data flowing constantly, there are new challenges around oversight and accuracy. How does real-time data change the risk profile for applications that are consuming it? What are some of the technical controls that are available for organizations that are risk-averse? What skills do developers need to be able to effectively design, develop, and deploy real-time data applications? How does this differ when talking about internal vs. consumer/end-user facing applications? What are the most interesting, innovative, or unexpected ways that you have seen Meroxa used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Meroxa? When is Meroxa the wrong choice? What do you have planned for the future of Meroxa? Contact Info LinkedIn (https://www.linkedin.com/in/devarispbrown/) @devarispbrown (https://twitter.com/devarispbrown) on Twitter Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Meroxa (https://meroxa.com/) Podcast Episode (https://www.dataengineeringpodcast.com/meroxa-data-integration-episode-153/) Kafka (https://kafka.apache.org/) Kafka Connect (https://docs.confluent.io/platform/current/connect/index.html) Conduit (https://github.com/ConduitIO/conduit) - golang Kafka connect replacement Pulsar (https://pulsar.apache.org/) Redpanda (https://redpanda.com/) Flink (https://flink.apache.org/) Beam (https://beam.apache.org/) Clickhouse (https://clickhouse.tech/) Druid (https://druid.apache.org/) Pinot (https://pinot.apache.org/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 3.4 - New Features & Improvements

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Feb 7, 2023 5:13 Transcription Available


Apache Kafka® 3.4 is released! In this special episode, Danica Fine (Senior Developer Advocate, Confluent), shares highlights of the Apache Kafka 3.4 release. This release introduces new KIPs in Kafka Core, Kafka Streams, and Kafka Connect.In Kafka Core:KIP-792 expands the metadata each group member passes to the group leader in its JoinGroup subscription to include the highest stable generation that consumer was a part of. KIP-830 includes a new configuration setting that allows you to disable the JMX reporter for environments where it's not being used. KIP-854 introduces changes to clean up producer IDs more efficiently, to avoid excess memory usage. It introduces a new timeout parameter that affects the expiry of producer IDs and updates the old parameter to only affect the expiry of transaction IDs.KIP-866 (early access) provides a bridge to migrate between existing Zookeeper clusters to new KRaft mode clusters, enabling the migration of existing metadata from Zookeeper to KRaft. KIP-876 adds a new property that defines the maximum amount of time that the server will wait to generate a snapshot; the default is 1 hour.KIP-881, an extension of KIP-392, makes it so that consumers can now be rack-aware when it comes to partition assignments and consumer rebalancing. In Kafka Streams:KIP-770 updates some Kafka Streams configs and metrics related to the record cache size.KIP-837 allows users to multicast result records to every partition of downstream sink topics and adds functionality for users to choose to drop result records without sending.And finally, for Kafka Connect:KIP-787 allows users to run MirrorMaker2 with custom implementations for the Kafka resource manager and makes it easier to integrate with your ecosystem.Tune in to learn more about the Apache Kafka 3.4 release!EPISODE LINKSSee release notes for Apache Kafka 3.4Read the blog to learn moreDownload Apache Kafka 3.4 and get startedWatch the video version of this podcastJoin the Community 

Engenharia de Dados [Cast]
Confluent Community Catalysts Brazukas: Dissecando o Apache Kafka [Round 1]

Engenharia de Dados [Cast]

Play Episode Listen Later Feb 2, 2023 77:12


Nesse episódio Luan Moreno & Mateus Oliveira entrevistam João Bosco, atualmente como Software & Solution Strategist no Nubank e Marcelo Costa, atualmente como Head of IT na Cia. Hering. Ambos os convidados e apresentadores são Confluent Community Catalysts.Confluent Community Catalysts são profissionais que investem seu tempo em divulgar, contribuir seja no código, ou respondendo ativamente nos forums e perguntas do Stack Overflow sobre Apache Kafka, sendo reconhecidos pela comunidade e pela Confluent pelo trabalho exercido.Nesta mesa redonda conversamos sobre os seguintes temas:Conceitos de Apache KafkaEvolução de Tecnologias de Mensageria para Plataforma de StreamingHistórias das Trincheiras sobre Apache Kafka e CuriosidadesDesafios para Implementação Inicial com Apache Kafka e AdoçãoAprenda com a experiência de profissionais que trabalharam diariamente com Apache Kafka usando as melhores práticas de mercado para construir uma plataforma robusta de streaming em tempo-real que é líder de mercado atualmente.Marcelo CostaJoão BoscoConfluent Catalyst Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 3.3 - KRaft, Kafka Core, Streams, & Connect Updates

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 3, 2022 6:42 Transcription Available


Apache Kafka® 3.3 is released! With over two years of development, KIP-833 marks KRaft as production ready for new AK 3.3 clusters only. On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent) shares highlights of this release, with KIPs from Kafka Core, Kafka Streams, and Kafka Connect. To reduce request overhead and simplify client-side code, KIP-709 extends the OffsetFetch API requests to accept multiple consumer group IDs. This update has three changes, including extending the wire protocol, response handling changes, and enhancing the AdminClient to use the new protocol. Log recovery is an important process that is triggered whenever a broker starts up after an unclean shutdown. And since there is no way to know the log recovery progress other than checking if the broker log is busy, KIP-831 adds metrics for the log recovery progress with `RemainingLogsToRecover` and `RemainingSegmentsToRecover`for each recovery thread. These metrics allow the admin to monitor the progress of the log recovery.Additionally, updates on Kafka Core also include KIP-841: Fenced replicas should not be allowed to join the ISR in KRaft. KIP-835: Monitor KRaft Controller Quorum Health. KIP-859: Add metadata log processing error-related metrics. KIP-834 for Kafka Streams added the ability to pause and resume topologies. This feature lets you reduce rescue usage when processing is not required or modifying the logic of Kafka Streams applications, or when responding to operational issues. While KIP-820 extends the KStream process with a new processor API. Previously, KIP-98 added support for exactly-once delivery guarantees with Kafka and its Java clients. In the AK 3.3 release, KIP-618 offers the Exactly-Once Semantics support to Confluent's source connectors. To accomplish this, a number of new connectors and worker-based configurations have been introduced, including `exactly.once.source.support`, `transaction.boundary`, and more. Image attribution: Apache ZooKeeper™: https://zookeeper.apache.org/ and Raft logo:  https://raft.github.io/  EPISODE LINKSSee release notes for Apache Kafka 3.3.0 and Apache Kafka 3.3.1 for the full list of changesRead the blog to learn moreDownload Apache Kafka 3.3 and get startedWatch the video version of this podcast

Streaming Audio: a Confluent podcast about Apache Kafka
Real-Time Stream Processing, Monitoring, and Analytics With Apache Kafka

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Sep 15, 2022 34:07 Transcription Available


Processing real-time event streams enables countless use cases big and small. With a day job designing and building highly available distributed data systems, Simon Aubury (Principal Data Engineer, Thoughtworks) believes stream-processing thinking can be applied to any stream of events. In this episode, Simon shares his Confluent Hackathon '22 winning project—a wildlife monitoring system to observe population trends over time using a Raspberry Pi, along with Apache Kafka®, Kafka Connect, ksqlDB, TensorFlow Lite, and Kibana. He used the system to count animals in his Australian backyard and perform trend analysis on the results. Simon also shares ideas on how you can use these same technologies to help with other real-world challenges.Open-source, object detection models for TensorFlow, which appropriately are collected into "model zoos," meant that Simon didn't have to provide his own object identification as part of the project, which would have made it untenable. Instead, he was able to utilize the open-source models, which are essentially neural nets pretrained on relevant data sets—in his case, backyard animals.Simon's system, which consists of around 200 lines of code, employs a Kafka producer running a while loop, which connects to a camera feed using a Python library. For each frame brought down, object masking is applied in order to crop and reduce pixel density, and then the frame is compared to the models mentioned above. A Python dictionary containing probable found objects is sent to a Kafka broker for processing; the images themselves aren't sent. (Note that Simon's system is also capable of alerting if a specific, rare animal is detected.) On the broker, Simon uses ksqlDB and windowing to smooth the data in case the frames were inconsistent for some reason (it may look back over thirty seconds, for example, and find the highest number of animals per type). Finally, the data is sent to a Kibana dashboard for analysis, through a Kafka Connect sink connector. Simon's system is an extremely low-cost system that can simulate the behaviors of more expensive, proprietary systems. And the concepts can easily be applied to many other use cases. For example, you could use it to estimate traffic at a shopping mall to gauge optimal opening hours, or you could use it to monitor the queue at a coffee shop, counting both queued patrons as well as impatient patrons who decide to leave because the queue is too long.EPISODE LINKSReal-Time Wildlife Monitoring with Apache KafkaWildlife Monitoring GithubksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work TogetherEvent-Driven Architecture - Common Mistakes and Valuable LessonsWatch the video version of this podcastKris Jenkins' TwitterJoin the Confluent CommunityLearn more on Confluent DeveloperUse PODCAST100 to get $100 of free Confluent Cloud usage (details)   

Streaming Audio: a Confluent podcast about Apache Kafka
Capacity Planning Your Apache Kafka Cluster

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 30, 2022 61:54 Transcription Available


How do you plan Apache Kafka® capacity and Kafka Streams sizing for optimal performance? When Jason Bell (Principal Engineer, Dataworks and founder of Synthetica Data), begins to plan a Kafka cluster, he starts with a deep inspection of the customer's data itself—determining its volume as well as its contents: Is it JSON, straight pieces of text, or images? He then determines if Kafka is a good fit for the project overall, a decision he bases on volume, the desired architecture, as well as potential cost.Next, the cluster is conceived in terms of some rule-of-thumb numbers. For example, Jason's minimum number of brokers for a cluster is three or four. This means he has a leader, a follower and at least one backup.  A ZooKeeper quorum is also a set of three. For other elements, he works with pairs, an active and a standby—this applies to Kafka Connect and Schema Registry. Finally, there's Prometheus monitoring and Grafana alerting to add. Jason points out that these numbers are different for multi-data-center architectures.Jason never assumes that everyone knows how Kafka works, because some software teams include specialists working on a producer or a consumer, who don't work directly with Kafka itself. They may not know how to adequately measure their Kafka volume themselves, so he often begins the collaborative process of graphing message volumes. He considers, for example, how many messages there are daily, and whether there is a peak time. Each industry is different, with some focusing on daily batch data (banking), and others fielding incredible amounts of continuous data (IoT data streaming from cars).  Extensive testing is necessary to ensure that the data patterns are adequately accommodated. Jason sets up a short-lived system that is identical to the main system. He finds that teams usually have not adequately tested across domain boundaries or the network. Developers tend to think in terms of numbers of messages, but not in terms of overall network traffic, or in how many consumers they'll actually need, for example. Latency must also be considered, for example if the compression on the producer's side doesn't match compression on the consumer's side, it will increase.Kafka Connect sink connectors require special consideration when Jason is establishing a cluster. Failure strategies need to well thought out, including retries and how to deal with the potentially large number of messages that can accumulate in a dead letter queue. He suggests that more attention should generally be paid to the Kafka Connect elements of a cluster, something that can actually be addressed with bash scripts.Finally, Kris and Jason cover his preference for Kafka Streams over ksqlDB from a network perspective. EPISODE LINKSCapacity Planning and Sizing for Kafka StreamsTales from the Frontline of Apache Kafka DevOpsWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more on Confluent DeveloperUse PODCAST100 to get $100 of free Cloud usage (details)  

Streaming Audio: a Confluent podcast about Apache Kafka
What Could Go Wrong with a Kafka JDBC Connector?

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 4, 2022 41:10 Transcription Available


Java Database Connectivity (JDBC) is the Java API used to connect to a database. As one of the most popular Kafka connectors, it's important to prevent issues with your integrations. In this episode, we'll cover how a JDBC connection works, and common issues with your database connection. Why the Kafka JDBC Connector? When it comes to streaming database events into Apache Kafka®, the JDBC connector usually represents the first choice for its flexibility and the ability to support a wide variety of databases without requiring custom code. As an experienced data analyst, Francesco Tisiot (Senior Developer Advocate, Aiven) delves into his experience of streaming Kafka data pipeline with JDBC source connector and explains what could go wrong. He discusses alternative options available to avoid these problems, including the Debezium source connector for real-time change data capture. The JDBC connector is a Java API for Kafka Connect, which streams data between databases and Kafka. If you want to stream data from a rational database into Kafka, once per day or every two hours, the JDBC connector is a simple, batch processing connector to use. You can tell the JDBC connector which query you'd like to execute against the database, and then the connector will take the data into Kafka. The connector works well with out-of-the-box basic data types, however, when it comes to a database-specific data type, such as geometrical columns and array columns in PostgresSQL, these don't represent well with the JDBC connector. Perhaps, you might not have any results in Kafka because the column is not within the connector's supporting capability. Francesco shares other cases that would cause the JDBC connector to go wrong, such as: Infrequent snapshot timesOut-of-order eventsNon-incremental sequencesHard deletesTo help avoid these problems and set up a reliable source of events for your real-time streaming pipeline, Francesco suggests other approaches, such as the Debezium source connector for real-time change data capture. The Debezium connector has enhanced metadata, timestamps of the operation, access to all logs,  and provides sequence numbers for you to speak the language of a DBA. They also talk about the governance tool, which Francesco has been building, and how streaming Game of Thrones sentiment analysis with Kafka started his current role as a developer advocate. EPISODE LINKSKafka Connect Deep Dive – JDBC Source ConnectorJDBC Source Connector: What could go wrong?Metadata parser Debezium DocumentationDatabase Migration with Apache Kafka and Apache Kafka ConnectWatch the video version of this podcastFrancesco Tisiot's TwitterKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more on Confluent Developer

Streaming Audio: a Confluent podcast about Apache Kafka
Automating Multi-Cloud Apache Kafka Cluster Rollouts

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jun 30, 2022 48:29 Transcription Available


To ensure safe and efficient deployment of Apache Kafka® clusters across multiple cloud providers, Confluent rolled out a large scale cluster management solution.Rashmi Prabhu (Staff Software Engineer & Eng Manager, Fleet Management Platform, Confluent) and her team have been building the Fleet Management Platform for Confluent Cloud. In this episode, she delves into what Fleet Management is, and how the cluster management service streamlines Kafka operations in the cloud while providing a seamless developer experience. When it comes to performing operations at large scale on the cloud, manual processes work well if the scenario involves only a handful of clusters. However, as a business grows, a cloud footprint may potentially scale 10x, and will require upgrades to a significantly larger cluster fleet.d. Additionally, the process should be automated, in order to accelerate feature releases while ensuring safe and mature operations. Fleet Management lets you manage and automate software rollouts and relevant cloud operations within the Kafka ecosystem at scale—including cloud-native Kafka, ksqlDB, Kafka Connect, Schema Registry, and other cloud-native microservices. The automation service can consistently operate applications across multiple teams, and can also manage Kubernetes infrastructure at scale. The existing Fleet Management stack can successfully handle thousands of concurrent upgrades in the Confluent ecosystem.When building out the Fleet Management Platform, Rashmi and the team kept these key considerations in mind: Rollout Controls and DevX: Wide deployment and distribution of changes across the fleet of target assets; improved developer experience for ease of use, with rollout strategy support, deployment policies, a dynamic control workflow, and manual approval support on an as-needed basis. Safety: Built-in features where security and safety of the fleet are the priority with access control, and audits on operations: There is active monitoring and paced rollouts, as well as automated pauses and resumes to reduce the time to react upon failure. There's also an error threshold, and controls to allow a healthy balance of risk vs. pace. Visibility: A close to real time, wide-angle view of the fleet state, along with insights into workflow progress, historical operations on the clusters, live notification on workflows, drift detection across assets, and so much more.EPISODE LINKSOptimize Fleet ManagementSoftware Engineer - Fleet Management Watch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Data on Kubernetes Community
DoK Talks #138 - Build your own social media analytics with Apache Kafka // Jakub Scholz

Data on Kubernetes Community

Play Episode Listen Later Jun 24, 2022 56:25


https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK Apache Kafka is more than just a messaging broker. It has a rich ecosystem of different components. There are connectors for importing and exporting data, different stream processing libraries, schema registries and a lot more. The first part of this talk will explain the Apache Kafka ecosystem and how the different components can be used to load data from social networks and use stream processing and machine learning to analyze them. The second part will show a demo running on Kubernetes which will use Kafka Connect to load data from Twitter and analyze them using the Kafka Streams API. After this talk, the attendees should be able to better understand the full advantages of the Apache Kafka ecosystem especially with focus on Kafka Connect and Kafka Streams API. And they should be also able to use these components on top of Kubernetes. BIO Jakub works at Red Hat as Senior Principal Software Engineer. He has long-term experience with messaging and currently focuses mainly on Apache Kafka and its integration with Kubernetes. He is one of the maintainers of the Strimzi project which provides tooling for running Apache Kafka on Kubernetes. Before joining Red Hat he worked as messaging and solution architect in the financial industry. KEY TAKE-AWAYS FROM THE TALK The key takeaway of this talk is that Apache Kafka is more than just a messaging broker. It is a platform and ecosystem of different components which can be used to solve complex tasks when dealing with events or processing data. The talk demonstrates this on loading tweets from Twitter and processing them using the different parts of the Kafka ecosystem. The whole talk and its demos are running on Kubernetes using the Strimzi project. So it also shows how to easily run all the different components on top of Kubernetes with the help of few simple YAML files.

Streaming Audio: a Confluent podcast about Apache Kafka
Data Mesh Architecture: A Modern Distributed Data Model

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jun 2, 2022 48:42 Transcription Available


Data mesh isn't software you can download and install, so how do you build a data mesh? In this episode, Adam Bellemare (Staff Technologist, Office of the CTO, Confluent) discusses his data mesh proof of concept and how it can help you conceptualize the ways in which implementing a data mesh could benefit your organization.Adam begins by noting that while data mesh is a type of modern data architecture, it is only partially a technical issue. For instance, it encompasses the best way to enable various data sets to be stored and made accessible to other teams in a distributed organization. Equally, it's also a social issue—getting the various teams in an organization to commit to publishing high-quality versions of their data and making them widely available to everyone else. Adam explains that the four data mesh concepts themselves provide the language needed to start discussing the necessary social transitions that must take place within a company to bring about a better, more effective, and efficient data strategy.The data mesh proof of concept created by Adam's team showcases the possibilities of an event-stream based data mesh in a fully functional model. He explains that there is no widely accepted way to do data mesh, so it's necessarily opinionated. The proof of concept demonstrates what self-service data discovery looks like—you can see schemas, data owners, SLAs, and data quality for each data product. You can also model an app consuming data products, as well as publish your own data products.In addition to discussing data mesh concepts and the proof of concept, Adam also shares some experiences with organizational data he had as a staff data platform engineer at Shopify. His primary focus was getting their main ecommerce data into Apache Kafka® topics from sharded MySQL—using Kafka Connect and Debezium. He describes how he really came to appreciate the flexibility of having access to important business data within Kafka topics. This allowed people to experiment with new data combinations, letting them come up with new products, novel solutions, and different ways of looking at problems. Such data sharing and experimentation certainly lie at the heart of data mesh.Adam has been working in the data space for over a decade, with experience in big-data architecture, event-driven microservices, and streaming data platforms. He's also the author of the book “Building Event-Driven Microservices.”EPISODE LINKSThe Definitive Guide to Building a Data Mesh with Event StreamsWhat is data mesh? Saxo Bank's Best Practices for Distributed Domain-Driven Architecture Founded on the Data MeshWatch the video version of this podcastKris Jenkins' TwitterJoin the Confluent CommunityLearn more with Kafka tutorials at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later May 26, 2022 55:55 Transcription Available


Stream processing can be hard or easy depending on the approach you take, and the tools you choose. This sentiment is at the heart of the discussion with Matthias J. Sax (Apache Kafka® PMC member; Software Engineer, ksqlDB and Kafka Streams, Confluent) and Jeff Bean (Sr. Technical Marketing Manager, Confluent). With immense collective experience in Kafka, ksqlDB, Kafka Streams, and Apache Flink®, they delve into the types of stream processing operations and explain the different ways of solving for their respective issues.The best stream processing tools they consider are Flink along with the options from the Kafka ecosystem: Java-based Kafka Streams and its SQL-wrapped variant—ksqlDB. Flink and ksqlDB tend to be used by divergent types of teams, since they differ in terms of both design and philosophy.Why Use Apache Flink?The teams using Flink are often highly specialized, with deep expertise, and with an absolute focus on stream processing. They tend to be responsible for unusually large, industry-outlying amounts of both state and scale, and they usually require complex aggregations. Flink can excel in these use cases, which potentially makes the difficulty of its learning curve and implementation worthwhile.Why use ksqlDB/Kafka Streams?Conversely, teams employing ksqlDB/Kafka Streams require less expertise to get started and also less expertise and time to manage their solutions. Jeff notes that the skills of a developer may not even be needed in some cases—those of a data analyst may suffice. ksqlDB and Kafka Streams seamlessly integrate with Kafka itself, as well as with external systems through the use of Kafka Connect. In addition to being easy to adopt, ksqlDB is also deployed on production stream processing applications requiring large scale and state.There are also other considerations beyond the strictly architectural. Local support availability, the administrative overhead of using a library versus a separate framework, and the availability of stream processing as a fully managed service all matter. Choosing a stream processing tool is a fraught decision partially because switching between them isn't trivial: the frameworks are different, the APIs are different, and the interfaces are different. In addition to the high-level discussion, Jeff and Matthias also share lots of details you can use to understand the options, covering employment models, transactions, batching, and parallelism, as well as a few interesting tangential topics along the way such as the tyranny of state and the Turing completeness of SQL.EPISODE LINKSThe Future of SQL: Databases Meet Stream ProcessingBuilding Real-Time Event Streams in the Cloud, On PremisesKafka Streams 101 courseksqlDB 101 courseWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more on Confluent DeveloperUse PODCAST100 for additional $100 of  Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Practical Data Pipeline: Build a Plant Monitoring System with ksqlDB

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later May 19, 2022 33:56 Transcription Available


Apache Kafka® isn't just for day jobs according to Danica Fine (Senior Developer Advocate, Confluent). It can be used to make life easier at home, too!Building out a practical Apache Kafka® data pipeline is not always complicated—it can be simple and fun. For Danica, the idea of building a Kafka-based data pipeline sprouted with the need to monitor the water level of her plants at home. In this episode, she explains the architecture of her hardware-oriented project and discusses how she integrates, processes, and enriches data using ksqlDB and Kafka Connect, a Raspberry Pi running Confluent's Python client, and a Telegram bot. Apart from the script on the Raspberry Pi, the entire project was coded within Confluent Cloud.Danica's model Kafka pipeline begins with moisture sensors in her plants streaming data that is requested by an endless for-loop in a Python script on her Raspberry Pi. The Pi in turn connects to Kafka on Confluent Cloud, where the plant data is sent serialized as Avro. She carefully modeled her data, sending an ID along with a timestamp, a temperature reading, and a moisture reading. On Confluent Cloud, Danica enriches the streaming plant data, which enters as a ksqlDB stream, with metadata such as moisture threshold levels, which is stored in a ksqlDB table.She windows the streaming data into 12-hour segments in order to avoid constant alerts when a threshold has been crossed. Alerts are sent at the end of the 12-hour period if a threshold has been traversed for a consistent time period within it (one hour, for example). These are sent to the Telegram API using Confluent Cloud's HTTP Sink Connector, which pings her phone when a plant's moisture level is too low.Potential future project improvement plans include visualizations, adding another Telegram bot to register metadata for new plants, adding machine learning to anticipate watering needs, and potentially closing the loop by pushing data backto the Raspberry Pi, which could power a visual indicator on the plants themselves. EPISODE LINKSGitHub: raspberrypi-houseplantsData Pipelines 101 courseTips for Streaming Data Pipelines ft. Danica FineWatch the video version of this podcastDanica Fine's TwitterKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)   

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 3.2 - New Features & Improvements

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later May 17, 2022 6:54 Transcription Available


Apache Kafka® 3.2 delivers new  KIPs in three different areas of the Kafka ecosystem: Kafka Core, Kafka Streams, and Kafka Connect. On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent), shares release highlights.More than half of the KIPs in the new release concern Kafka Core. KIP-704 addresses unclean leader elections by allowing for further communication between the controller and the brokers. KIP-764 takes on the problem of a large number of client connections in a short period of time during preferred leader election by adding the configuration `socket.listen.backlog.size`. KIP-784 adds an error code field to the response of the `DescribeLogDirs` API, and KIP-788 improves network traffic by allowing you to set the pool size of network threads individually per listener on Kafka brokers. Finally, in accordance with the imminent KRaft protocol, KIP-801 introduces a built-in `StandardAuthorizer` that doesn't depend on ZooKeeper. There are five KIPs related to Kafka Streams in the AK 3.2 release. KIP-708 brings rack-aware standby assignment by tag, which improves fault tolerance. Then there are three projects related to Interactive Queries v2: KIP-796 specifies an improved interface for Interactive Queries; KIP-805 allows state to be queried over a specific range; and KIP-806 adds two implementations of the Query interface, `WindowKeyQuery` and `WindowRangeQuery`.The final Kafka Streams project, KIP-791, enhances `StateStoreContext` with `recordMetadata`,which may be accessed from state stores.Additionally, this Kafka release introduces Kafka Connect-related improvements, including KIP-769, which extends the `/connect-plugins` API, letting you list all available plugins, and not just connectors as before.  KIP-779 lets `SourceTasks` handle producer exceptions according to `error.tolerance`, rather than instantly killing the entire connector by default. Finally, KIP-808 lets you specify precisions with respect to TimestampConverter single message transforms. Tune in to learn more about the Apache Kafka 3.2 release!EPISODE LINKSApache Kafka 3.2 release notes Read the blog to learn moreDownload Apache Kafka 3.2.0Watch the video version of this podcast

Python Podcast
Microservices

Python Podcast

Play Episode Listen Later Apr 7, 2022 115:55


Janis, Dominik und Jochen unterhalten sich über Microservices. Letztes hatten wir ja schon so ein bisschen darüber gesprochen und daraufhin hat sich Janis gemeldet und gefragt, ob wir da nicht mal eine komplette Sendung mit ihm drüber machen wollen. Wollten wir natürlich :).   Und hier noch die Antwort auf alle Fragen im Bereich Softwareentwicklung Shownotes Unsere E-Mail für Fragen, Anregungen & Kommentare: hallo@python-podcast.de News aus der Szene Okta breach PYPL PopularitY of Programming Language Meta donates $300,000 to the Python Software Foundation | Łukasz Langa - #Programming GitHub Issues Migration: status update Cython is 20! Neue Programmiersprachen: vlang | zig April: PyCon DE & PyData Berlin 2022 Juli: EuroPython September: DjangoCon EU 2022 Werbung Ailio sucht Mitarbeiter | Anfragen bitte an diese Mailadresse: business@ailio.de Microservices BoundedContext / Single source of truth Buch: Building Microservices, 2nd Edition Sam Newman on Information Hiding, Ubiquitous Language, UI Decomposition and Building Microservices Sam Newman: Monolith to Microservices (InfoQ Podcast) Folge 99 - Sam Newman - Monolith to Microservices ELK-Stack Apache Kafka Buch: Software Architecture with Python MonolithFirst Benchmark Caddy / Nginx / Uvicorn Benchmarking nginx vs caddy vs uvicorn for serving static files Uvicorn / uvloop Picks bpytop / glances Kafka Connect

Streaming Audio: a Confluent podcast about Apache Kafka
Intro to Event Sourcing with Apache Kafka ft. Anna McDonald

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Feb 1, 2022 30:14 Transcription Available


What is event sourcing and how does it work?Event sourcing is often used interchangeably with event-driven architecture and event stream processing. However, Anna McDonald (Principal Customer Success Technical Architect, Confluent) explains it's a specific category of its own—an event streaming pattern. Anna is passionate about event-driven architectures and event patterns. She's a tour de force in the Apache Kafka® community and is the presenter of the Event Sourcing and Event Storage with Apache Kafka course on Confluent Developer. In this episode, she previews the course by providing an overview of what event sourcing is and what you need to know in order to build event-driven systems. Event sourcing is an architectural design pattern, which defines the approach to handling data operations that are driven by a sequence of events. The pattern ensures that all changes to an application state are captured and stored as an immutable sequence of events, known as a log of events. The events are persisted in an event store, which acts as the system of record. Unlike traditional databases where only the latest status is saved, an event-based system saves all events into a database in sequential order. If you find a past event is incorrect, you can replay each event from a certain timestamp up to the present to recreate the latest status of data. Event sourcing is commonly implemented with a command query responsibility segregation (CQRS) system to perform data computation tasks in response to events. To implement CQRS with Kafka, you can use Kafka Connect, along with a database, or alternatively use Kafka with the streaming database ksqlDB.In addition, Anna also shares about:Data at rest and data in motion techniques for event modelingThe differences between event streaming and event sourcingHow CQRS, change data capture (CDC), and event streaming help you leverage event-driven systemsThe primary qualities and advantages of an event-based storage systemUse cases for event sourcing and how it integrates with your systemsEPISODE LINKSEvent Sourcing courseEvent Streaming in 3 MinutesIntroducing Derivative Event SourcingMeetup: Event Sourcing and Apache KafkaWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

airhacks.fm podcast with adam bien
Kafka Connect CLI, JFR Unit, OSS Archetypes and JPMS

airhacks.fm podcast with adam bien

Play Episode Listen Later Jan 23, 2022 50:45


An airhacks.fm conversation with Gunnar Morling (@gunnarmorling) about: kcctl the CLI for Kafka connect, kcktl comes with auto completion, kcctl uses picocli, quarkus as CLI, the quarkus extension for picocli, great quarkus command mode with picocli extension, using JPMS for command client interfaces, plugins with JPMS, tab completion with kcctl, the great jreleaser project by Andres Almiray, displaying the connector offsets, the great Java Flight Recorder, jfrunit provides assertions for avoidance of performance regressions, event streaming API in Java, JfrUnit annotations, JFR event streaming into Kafka, Keep Your SQL in Check With Flight Recorder, JMC Agent and JfrUnit, layrry - A Launcher and API for Modularized Java, ModiTect plugin, building application images, the Maven OSS quickstart archetype, Gunnar Morling on twitter: @gunnarmorling, Gunnar's blog

Streaming Audio: a Confluent podcast about Apache Kafka
From Batch to Real-Time: Tips for Streaming Data Pipelines with Apache Kafka ft. Danica Fine

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jan 13, 2022 29:50 Transcription Available


Implementing an event-driven data pipeline can be challenging, but doing so within the context of a legacy architecture is even more complex. Having spent three years building a streaming data infrastructure and being on the first team at a financial organization to implement Apache Kafka® event-driven data pipelines, Danica Fine (Senior Developer Advocate, Confluent) shares about the development process and how ksqlDB and Kafka Connect became instrumental to the implementation.By moving away from batch processing to streaming data pipelines with Kafka, data can be distributed with increased data scalability and resiliency. Kafka decouples the source from the target systems, so you can react to data as it changes while ensuring accurate data in the target system. In order to transition from monolithic micro-batching applications to real-time microservices that can integrate with a legacy system that has been around for decades, Danica and her team started developing Kafka connectors to connect to various sources and target systems. Kafka connectors: Building two major connectors for the data pipeline, including a source connector to connect the legacy data source to stream data into Kafka, and another target connector to pipe data from Kafka back into the legacy architecture. Algorithm: Implementing Kafka Streams applications to migrate data from a monolithic architecture to a stream processing architecture. Data join: Leveraging Kafka Connect and the JDBC source connector to bring in all data streams to complete the pipeline.Streams join: Using ksqlDB to join streams—the legacy data system continues to produce streams while the Kafka data pipeline is another stream of data. As a final tip, Danica suggests breaking algorithms into process steps. She also describes how her experience relates to the data pipelines course on Confluent Developer and encourages anyone who is interested in learning more to check it out. EPISODE LINKSData Pipelines courseGuided Exercise on Building Streaming Data PipelinesMigrating from a Legacy System to Kafka StreamsWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Modernizing Banking Architectures with Apache Kafka ft. Fotios Filacouris

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Dec 28, 2021 34:59 Transcription Available


It's been said that financial services organizations have been early Apache Kafka® adopters due to the strong delivery guarantees and scalability that Kafka provides. With experience working and designing architectural solutions for financial services, Fotios Filacouris (Senior Solutions Engineer, Enterprise Solutions Engineering, Confluent) joins Tim to discuss how Kafka and Confluent help banks build modern architectures, highlighting key emerging use cases from the sector. Previously, Kafka was often viewed as a simple pipe that connected databases together, which allows for easy and scalable data migration. As the Kafka ecosystem evolves with added components like ksqlDB, Kafka Streams, and Kafka Connect, the implementation of Kafka goes beyond being just a pipe—it's an intelligent pipe that enables real-time, actionable data insights.Fotios shares a couple of use cases showcasing how Kafka solves the problems that many banks are facing today. One of his customers transformed retail banking by using Kafka as the architectural base for storing all data permanently and indefinitely. This approach enables data in motion and a better user experience for frontend users while scrolling through their transaction history by eliminating the need to download old statements that have been offloaded in the cloud or a data lake. Kafka also provides the best of both worlds with increased scalability and strong message delivery guarantees that are comparable to queuing middleware like IBM MQ and TIBCO. In addition to use cases, Tim and Fotios talk about deploying Kafka for banks within the cloud and drill into the profession of being a solutions engineer. EPISODE LINKSWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

airhacks.fm podcast with adam bien
Debezium, Server, Engine, UI and the Outbox

airhacks.fm podcast with adam bien

Play Episode Listen Later Nov 28, 2021 67:11


An airhacks.fm conversation with Gunnar Morling (@gunnarmorling) about: debezium as analytics enablement, enriching events with quarkus, ksqlDB and PrestoDB and trino, cloud migrations with Debezium, embedded Debezium Engine, debezium server vs. Kafka Connect, Debezium Server with sink connectors, Apache Pulsar, Redis Streams are supporting Debezium Server, Debezium Server follows the microservice architecture, pluggable offset stores, JDBC offset store is Apache Iceberg connector, DB2, MySQL, PostgreSQL, MongoDB change streams, Cassandra, Vitess, Oracle, Microsoft SQL Server scylladb is cassandra compatible and provides external debezium connector, debezium ui is written in React, incremental snapshots, netflix cdc system, DBLog: A Watermark Based Change-Data-Capture Framework, multi-threaded snapshots, internal data leakage and the Outbox pattern, debezium listens to the outbox pattern, OpenTracing integration and the outbox pattern, sending messages directly to transaction log with PostgreSQL, Quarkus outbox pattern extension, the transaction boundary topic Gunnar Morling on twitter: @gunnarmorling and debezium.io

Streaming Audio: a Confluent podcast about Apache Kafka
Handling Message Errors and Dead Letter Queues in Apache Kafka ft. Jason Bell

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Nov 16, 2021 37:41 Transcription Available


If you ever wondered what exactly dead letter queues (DLQs) are and how to use them, Jason Bell (Senior DataOps Engineer, Digitalis) has an answer for you. Dead letter queues are a feature of Kafka Connect that acts as the destination for failed messages due to errors like improper message deserialization and improper message formatting. Lots of Jason's work is around Kafka Connect and the Kafka Streams API, and in this episode, he explains the fundamentals of dead letter queues, how to use them, and the parameters around them. For example, when deserializing an Avro message, the deserialization could fail if the message passed through is not Avro or in a value that doesn't match the expected wire format, at which point, the message will be rerouted into the dead letter queue for reprocessing. The Apache Kafka® topic will reprocess the message with the appropriate converter and send it back onto the sink. For a JSON error message, you'll need another JSON connector to process the message out of the dead letter queue before it can be sent back to the sink. Dead letter queue is configurable for handling a deserialization exception or a producer exception. When deciding if this topic is necessary, consider if the messages are important and if there's a plan to read into and investigate why the error occurs. In some scenarios, it's important to handle the messages manually or have a manual process in place to handle error messages if reprocessing continues to fail. For example, payment messages should be dealt with in parallel for a better customer experience. Jason also shares some key takeaways on the dead letter queue: If the message is important, such as a payment, you need to deal with the message if it goes into the dead letter queue To minimize message routing into the dead letter queue, it's important to ensure successful data serialization at the sourceWhen implementing a dead letter queue, you need a process to consume the message and investigate the errors EPISODE LINKS: Kafka Connect 101: Error Handling and Dead Letter QueuesCapacity Planning your Kafka ClusterTales from the Frontline of Apache Kafka DevOps ft. Jason BellTweet: Morning morning (yes, I have tea)Tweet: Kafka dead letter queues Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Powering Event-Driven Architectures on Microsoft Azure with Confluent

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 14, 2021 38:42 Transcription Available


When you order a pizza, what if you knew every step of the process from the moment it goes in the oven to being delivered to your doorstep? Event-Driven Architecture is a modern, data-driven approach that describes “events” (i.e., something that just happened). A real-time data infrastructure enables you to provide such event-driven data insights in real time. Israel Ekpo (Principal Cloud Solutions Architect, Microsoft Global Partner Solutions, Microsoft) and Alicia Moniz (Cloud Partner Solutions Architect, Confluent) discuss use cases on leveraging Confluent Cloud and Microsoft Azure to power real-time, event-driven architectures. As an Apache Kafka® community stalwart, Israel focuses on helping customers and independent software vendor (ISV) partners build solutions for the cloud and use open source databases and architecture solutions like Kafka, Kubernetes, Apache Flink, MySQL, and PostgreSQL on Microsoft Azure. He's worked with retailers and those in the IoT space to help them adopt processes for inventory management with Confluent. Having a cloud-native, real-time architecture that can keep an accurate record of supply and demand is important in keeping up with the inventory and customer satisfaction. Israel has also worked with customers that use Confluent to integrate with Cosmos DB, Microsoft SQL Server, Azure Cognitive Search, and other integrations within the Azure ecosystem. Another important use case is enabling real-time data accessibility in the public sector and healthcare while ensuring data security and regulatory compliance like HIPAA. Alicia has a background in AI, and she expresses the importance of moving away from the monolithic, centralized data warehouse to a more flexible and scalable architecture like Kafka. Building a data pipeline leveraging Kafka helps ensure data security and consistency with minimized risk.The Confluent and Azure integration enables quick Kafka deployment with out-of-the-box solutions within the Kafka ecosystem. Confluent Schema Registry captures event streams with a consistent data structure, ksqlDB enables the development of real-time ETL pipelines, and Kafka Connect enables the streaming of data to multiple Azure services.EPISODE LINKSMicrosoft Azure at Kafka Summit AmericasIzzyAcademy Kafka on Azure Learning Series by Alicia MonizWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

The Cloud Pod
136: Take us to your Google Cloud Digital Leader

The Cloud Pod

Play Episode Listen Later Oct 4, 2021 36:58


On The Cloud Pod this week, the whole team definitely isn't completely exhausted. Meanwhile, Amazon releases MSK Connect, Google offers the Google Cloud Digital Leader certification, and DORA's 2021 State of DevOps report has arrived.  A big thanks to this week's sponsors: Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. JumpCloud, which offers a complete platform for identity, access, and device management — no matter where your users and devices are located.  This week's highlights

Streaming Audio: a Confluent podcast about Apache Kafka
Intro to Kafka Connect: Core Components and Architecture ft. Robin Moffatt

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Sep 28, 2021 31:18 Transcription Available


Kafka Connect is a streaming integration framework between Apache Kafka® and external systems, such as databases and cloud services. With expertise in ksqlDB and Kafka Connect, Robin Moffatt (Staff Developer Advocate, Confluent) helps and supports the developer community in understanding Kafka and its ecosystem. Recently, Robin authored a Kafka Connect 101 course that will help you understand the basic concepts of Kafka Connect, its key features, and how it works.What's Kafka Connect, and how does it work with Kafka and brokers? Robin explains that Kafka Connect is a Kafka API that runs separately from the Kafka brokers, running on its own Java virtual machine (JVM) process known as the Kafka Connect worker. Kafka Connect is essential for streaming data from different sources into Kafka and from Kafka to various targets. With Connect, you don't have to write programs using Java and instead specify your pipeline using configuration. Kafka Connect.As a pluggable framework, Kafka Connect has a broad set of more than 200 different connectors available on Confluent Hub, including but not limited to:NoSQL and document stores (Elasticsearch, MongoDB, and Cassandra)RDBMS (Oracle, SQL Server, DB2, PostgreSQL, and MySQL)Cloud object stores (Amazon S3, Azure Blob Storage, and Google Cloud Storage),Message queues (ActiveMQ, IBM MQ, and RabbitMQ)Robin and Tim also discuss single message transform (SMTs), as well as distributed and standalone deployment modes Kafka Connect. Tune in to learn more about Kafka Connect, and get a preview of the Kafka Connect 101 course.EPISODE LINKSKafka Connect 101 courseKafka Connect Fundamentals: What is Kafka Connect?Meetup: From Zero to Hero with Kafka ConnectConfluent Hub: Discover Kafka connectors and more12 Days of SMTsWhy Kafka Connect? ft. Robin MoffattWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 3.0 - Improving KRaft and an Overview of New Features

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Play 30 sec Highlight Listen Later Sep 21, 2021 15:17 Transcription Available


Apache Kafka® 3.0 is out! To spotlight major enhancements in this release, Tim Berglund (Apache Kafka Developer Advocate) provides a summary of what's new in the Kafka 3.0 release from Krakow, Poland, including API changes and improvements to the early-access Kafka Raft (KRaft). KRaft is a built-in Kafka consensus mechanism that's replacing Apache ZooKeeper going forward. It is recommended to try out new KRaft features in a development environment, as KRaft is not advised for production yet. One of the major features in Kafka 3.0 is the efficiency for KRaft controllers and brokers to store, load, and replicate snapshots into a Kafka cluster for metadata topic partitioning. The Kafka controller is now responsible for generating a Kafka producer ID in both ZooKeeper and KRaft, easing the transition from ZooKeeper to KRaft on the Kafka 3.X version line. This update also moves us closer to the ZooKeeper-to-KRaft bridge release. Additionally, this release includes metadata improvements, exactly-once semantics, and KRaft reassignments. To enable a stronger record delivery guarantee, Kafka producers turn on by default idempotency, together with acknowledgment delivery by all the replicas. This release also comprises enhancements to Kafka Connect task restarts, Kafka Streams timestamp based synchronization and more flexible configuration options for MirrorMaker2 (MM2). The first version of MirrorMaker has been deprecated, and MirrorMaker2 will be the focus for future developments. Besides that, this release drops support for older message formats, V0 and V1, as well as initiates the removal of Java 8 and Scala 2.12 across all components in Apache Kafka. The universal Java 8 and Scala 2.12 deprecation is anticipated to complete in the future Apache Kafka 4.0 release.Apache Kafka 3.0 is a major release and step forward for the Apache Kafka project!EPISODE LINKSApache Kafka 3.0 release notes Read the blog to learn moreDownload Apache Kafka 3.0Watch the video version of this podcastJoin the Confluent Community Slack

Streaming Audio: a Confluent podcast about Apache Kafka
Minimizing Software Speciation with ksqlDB and Kafka Streams ft. Mitch Seymour

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 5, 2021 31:32 Transcription Available


Building a large, stateful Kafka Streams application that tracks the state of each outgoing email is crucial to marketing automation tools like Mailshimp. Joining us today in this episode, Mitch Seymour, staff engineer at Mailchimp, shares how ksqlDB and Kafka Streams handle the company's largest source of streaming data.  Almost like a post office, except instead of sending physical parcels, Mailchimp sends billions of emails per day. Monitoring the state of each email can provide visibility into the core business function, and it also returns information about the health of both internal and remote message transfer agents (MTAs). Finding a way to track those MTA systems in real time is pivotal to the success of the business. Mailchimp is an early Apache Kafka® adopter that started using the technology in 2014, a time before ksqlDB, Kafka Connect, and Kafka Streams came into the picture. The stream processing applications that they were building faced many complexities and rough edges. As their use case evolved and scaled overtime at Mailchimp, a large number of applications deviated from the initial implementation and design so that different applications emerged that they had to maintain. To reduce cost, complexity, and standardize stream processing applications, adopting ksqlDB and Kafka Streams became the solution to their problems. This is what Mitch calls, “minimizing software speciation in our software” It's the idea when applications evolved into multiple systems to respond to failure-handling strategies, increased load, and the like. Using different scaling strategies and communication protocols creates system silos and can be challenging to maintain.Replacing the existing architecture that supported point-to-point communication, the new Mailchimp architecture uses Kafka as its foundation with scalable custom functions, such as a reusable and highly functional user-defined function (UDF). The reporting capabilities have also evolved from Kafka Streams' interactive queries into enhanced queries with Elasticsearch. Turning experiences into books, Mitch is also an author of O'Reilly's Mastering Kafka Streams and ksqlDB and the author and illustrator of Gently Down the Stream: A Gentle Introduction to Apache Kafka. EPISODE LINKSThe Exciting Frontier of Custom ksql Functions (Mitch Seymour, Mailchimp) Kafka Summit LondonApache Kafka 101: Kafka Streams CourseksqlDB UDFs and UDADs Made EasyUsing Apache Kafka as a Scalable, Event-Driven Backbone for Service ArchitecturesThe Haiku Approach to Writing SoftwareWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Collecting Data with a Custom SIEM System Built on Apache Kafka and Kafka Connect ft. Vitalii Rudenskyi

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Play 26 sec Highlight Listen Later Jul 27, 2021 25:14 Transcription Available


The best-informed business insights that support better decision-making begin with data collection, ahead of data processing and analytics. Enterprises nowadays are engulfed by data floods, with data sources ranging from cloud services, applications, to thousands of internal servers. The massive volume of data that organizations must process presents data ingestion challenges for many large companies. In this episode, data security engineer, Vitalli Rudenskyi, discusses the decision to replace a vendor security information and event management (SIEM) system by developing a custom solution with Apache Kafka® and Kafka Connect for a better data collection strategy.Having a data collection infrastructure layer is mission critical for Vitalii and the team in helping enterprises protect data and detect security events. Building on the base of Kafka, their custom SIEM infrastructure is configurable and designed to be able to ingest and analyze huge amounts of data, including personally identifiable information (PII) and healthcare data. When it comes to collecting data, there are two fundamental choices: push or pull. But how about both? Vitalii shares that Kafka Connect API extensions are integral to data ingestion in Kafka. The three key components to allow their SIEM system to collect and record daily by pushing and pulling: NettySource Connector: A connector developed to receive data from different network devices to Apache Kafka. It helps receive data using both the TCP and UDP transport protocols and can be adopted to receive any data from Syslog to SNMP and NetFlow.PollableAPI Connector: A connector made to receive data from remote systems, pulling data from different remote APIs and services.Transformations Library: These are useful extensions to the existing out-of-the-box transformations. Approach with “tag and apply” transformations that transform data into the right place in the right format after collecting data.Listen to learn more as Vitalii shares the importance of data collection and the building of a custom solution to address multi-source data management requirements. EPISODE LINKSFeed Your SIEM Smart with Kafka ConnectTo Pull or to Push Your Data with Kafka Connect? That Is the Question.Free Kafka Connect 101 CourseSyslog Source Connector for Confluent PlatformJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Data-Driven Digitalization with Apache Kafka in the Food Industry at BAADER

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jun 29, 2021 27:53 Transcription Available


Coming out of university, Patrick Neff (Data Scientist, BAADER) was used to “perfect” examples of datasets. However, he soon realized that in the real world, data is often either unavailable or unstructured. This compelled him to learn more about collecting data, analyzing it in a smart and automatic way, and exploring Apache Kafka® as a core ecosystem while at BAADER, a global provider of food processing machines. After Patrick began working with Apache Kafka in 2019, he developed several microservices with Kafka Streams and used Kafka Connect for various data analytics projects. Focused on the food value chain, Patrick's mission is to optimize processes specifically around transportation and processing. In consulting one customer, Patrick detected an area of improvement related to animal welfare, lost revenues, unnecessary costs, and carbon dioxide emissions. He also noticed that often machines are ready to send data into the cloud, but the correct presentation and/or analysis of the data is missing and thus the possibility of optimization. As a result:Data is difficult to understand because of missing unitsData has not been analyzed so farComparison of machine/process performance for the same machine but different customers is missing In response to this problem, he helped develop the Transport Manager. Based on data analytics results, the Transport Manager presents information like a truck's expected arrival time and its current poultry load. This leads to better planning, reduced transportation costs, and improved animal welfare. The Asset Manager is another solution that Patrick has been working on, and it presents IoT data in real time and in an understandable way to the customer. Both of these are data analytics projects that use machine learning.Kafka topics store data, provide insight, and detect dependencies related to why trucks are stopping along the route, for example. Kafka is also a real-time platform, meaning that alerts can be sent directly when a certain event occurs using ksqlDB or Kafka Streams.As a result of running Kafka on Confluent Cloud and creating a scalable data pipeline, the BAADER team is able to break data silos and produce live data from trucks via MQTT. They've even created an Android app for truck drivers, along with a desktop version that monitors the data inputted from a truck driver on the app in addition to other information, such as expected time of arrival and weather information—and the best part: All of it is done in real time.EPISODE LINKSLearn more about BAADER's data-in-motion use casesRead about how BAADER uses Confluent CloudWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Connecting Azure Cosmos DB with Apache Kafka - Better Together ft. Ryan CrawCour

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Apr 14, 2021 31:59 Transcription Available


When building solutions for customers in Microsoft Azure, it is not uncommon to come across customers who are deeply entrenched in the Apache Kafka® ecosystem and want to continue expanding within it. Thus, figuring out how to connect Azure first-party services to this ecosystem is of the utmost importance.Ryan CrawCour is a Microsoft engineer who has been working on all things data and analytics for the past 10+ years, including building out services like Azure Cosmos DB, which is used by millions of people around the globe. More recently, Ryan has taken a customer-facing role where he gets to help customers build the best solutions possible using Microsoft Azure’s cloud platform and development tools. In one case, Ryan helped a customer leverage their existing Kafka investments and persist event messages in a durable managed database system in Azure. They chose Azure Cosmos DB, a fully managed, distributed, modern NoSQL database service as their preferred database, but the question remained as to how they would feed events from their Kafka infrastructure into Azure Cosmos DB, as well as how they could get changes from their database system back into their Kafka topics. Although integration is in his blood, Ryan confesses that he is relatively new to the world of Kafka and has learned to adjust to what he finds in his customers’ environments. Oftentimes this is Kafka, and for many good reasons, customers don’t want to change this core part of their solution infrastructure. This has led him to embrace Kafka and the ecosystem around it, enabling him to better serve customers. He’s been closely tracking the development and progress of Kafka Connect. To him, it is the natural step from Kafka as a messaging infrastructure to Kafka as a key pillar in an integration scenario. Kafka Connect can be thought of as a piece of middleware that can be used to connect a variety of systems to Kafka in a bidirectional manner. This means getting data from Kafka into your downstream systems, often databases, and also taking changes that occur in these systems and publishing them back to Kafka where other systems can then react. One day, a customer asked him how to connect Azure Cosmos DB to Kafka. There wasn’t a connector at the time, so he helped build two with the Confluent team: a sink connector, where data flows from Kafka topics into Azure Cosmos DB, as well as a source connector, where Azure Cosmos DB is the source of data pushing changes that occur in the database into Kafka topics.EPISODE LINKSIntegrating Azure and Confluent: Ingesting Data to Azure Cosmos DB through Apache Kafka Download the Azure Cosmos DB Connector (Source and Sink) Join the Confluent CommunityGitHub: Kafka Connect for Azure Cosmos DBWatch the video version of this podcastLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Automated Cluster Operations in the Cloud ft. Rashmi Prabhu

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Apr 12, 2021 24:41 Transcription Available


If you’ve heard the term “clusters,” then you might know it refers to Confluent components and features that we run in all three major cloud providers today, including an event streaming platform based on Apache Kafka®, ksqlDB, Kafka Connect, the Kafka API, databalancers, and Kafka API services. Rashmi Prabhu, a software engineer on the Control Plane team at Confluent, has the opportunity to help govern the data plane that comprises all these clusters and enables API-driven operations on these clusters. But running operations on the cloud in a scaling organization can be time consuming, error prone, and tedious. This episode addresses manual upgrades and rolling restarts of Confluent Cloud clusters during releases, fixes, experiments, and the like, and more importantly, the progress that’s been made to switch from manual operations to an almost fully automated process. You’ll get a sneak peek into what upcoming plans to make cluster operations a fully automated process using the Cluster Upgrader, a new microservice in Java built with Vertx. This service runs as part of the control plane and exposes an API to the user to submit their workflows and target a set of clusters. It performs statement management on the workflow in the backend using Postgres.So what’s next? Looking forward, there will be the selection phase will be improved to support policy-based deployment strategies that enable you to plan ahead and choose how you want to phase your deployments (e.g., first Azure followed by part of Amazon Web Services and then Google Cloud, or maybe Confluent internal clusters on all cloud providers followed by customer clusters on Google Cloud, Azure, and finally AWS)—the possibilities are endless! The process will become more flexible, more configurable, and more error tolerant so that you can take measured risks and experience a standardized way of operating Cloud. In addition, expanding operation automations to internal application deployments and other kinds of fleet management operations that fit the “Select/Apply/Monitor” paradigm are in the works.EPISODE LINKSWatch Project Metamorphosis videos Learn about elastic scaling with Apache KafkaNick Carr: The Many Ways Cloud Computing Will Disrupt IT Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)

Engenharia de Dados [Cast]
Integração de Dados com Kafka Connect no Kubernetes [Strimzi Operator]

Engenharia de Dados [Cast]

Play Episode Listen Later Mar 19, 2021 52:14


Nesse episódio falamos sobre umas das maiores dificuldades de administradores de dados ou mesmo empresas de conectar e integrar fontes de dados em tempo real dentro do apache kafka.O Kafka Connect possui atualmente mais de 180 conectores sendo a maioria deles open-source para que você possa com um arquivo de configuração trazer seus dados para o Apache Kafka que hoje é conhecido como a espinha dorsal ou Data Lake em real-time da empresa.Falamos dos cenários mais comuns para utilização e quando não utilizá-lo, nessa conversa falamos de algumas experiências e recomendações de campo para quem deseja integrar diversas fontes de dados com seus microsserviços em real-time. Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Streaming Audio: a Confluent podcast about Apache Kafka
Change Data Capture and Kafka Connect on Microsoft Azure ft. Abhishek Gupta

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jan 11, 2021 43:04 Transcription Available


What’s it like being a Microsoft Azure Cloud advocate working with Apache Kafka® and change data capture (CDC) solutions? Abhishek Gupta would know! At Microsoft, Abhishek focuses his time on Kafka, Databases, Kubernetes, and open source projects. His experience in a wide variety of roles ranging from engineering, consulting, and product management for developer-focused products has positioned him well for developer advocacy, where he is now.Switching gears, Abhishek proceeds to break down the concept of CDC starting off with some of the core concepts such as "commit logs." Abhishek then explains how CDC can turn data around when you compare it to the traditional way of querying the database to access data—you don't call the database; it calls you. He then goes on to discuss Debezium, which is an open source change data capture solution for Kafka. He also covers some of the Azure connectors on Confluent, Azure Data Explorer and use cases powered by the Azure Data Explorer Sink connector for Kafka.EPISODE LINKSStreaming data from Confluent Cloud into Azure Data ExplorerIntegrate Apache Kafka with Azure Data ExplorerChange Data Capture with Debezium ft. Gunnar MorlingTales From The Frontline of Apache Kafka DevOps ft. Jason BellMySQL CDC Source (Debezium) Connector for Confluent CloudMySQL, Cassandra, BigQuery, and Streaming Analytics with Joy GaoJoin the Confluent Community SlackLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Ask Confluent #18: The Toughest Questions ft. Anna McDonald

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 21, 2020 33:46 Transcription Available


It’s the first work-from-home episode of Ask Confluent, where Gwen Shapira (Core Kafka Engineering Leader, Confluent) virtually sits down with Apache Kafka® expert Anna McDonald (Staff Technical Account Manager, Confluent) to answer questions from Twitter. Find out Anna’s favorite Kafka Improvement Proposal (KIP), which will start to use racially neutral terms in the Kafka community and in our code base, as well as answers to the following questions: If you could pick any one KIP from the backlog that hasn't yet been implemented and have it immediately available, which one would you pick?Are we able to arrive at any formula for identifying the consumer/producer throughput rate in Kafka with the given hardware specifications (CPU, RAM, network, and disk)? Does incremental cooperative rebalancing also work for general Kafka consumers in addition to Kafka Connect rebalancing?They also answer how to determine throughput and achieve your desired SLA by using partitions. EPISODE LINKSWatch Ask Confluent #18: The Toughest Questions ft. Anna McDonaldFrom Eager to Smarter in Apache Kafka Consumer RebalancesStreaming Heterogeneous Databases with Kafka Connect – The Easy WayKeynote: Tim Berglund, Confluent | Closing Keynote Presentation | Kafka Summit 2020Join the Confluent Community SlackLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Top 6 Things to Know About Apache Kafka ft. Gwen Shapira

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Play 35 sec Highlight Listen Later Sep 15, 2020 47:27 Transcription Available


This year, Confluent turns six! In honor of this milestone, we are taking a very special moment to celebrate with Gwen Shapira by highlighting the top six things everyone should know about Apache Kafka®:Clients have metricsBug fix releases/Kafka Improvement Proposals (KIPs)Idempotent producers and how they workKafka Connect is part of Kafka and Single Message Transforms (SMTs) are worth not missing out onCooperative rebalancing Generating sequence numbers and how Kafka changes the way you thinkListen as Tim and Gwen talk through the importance of Kafka Connect, cooperative rebalancing protocols, and the promise (and warning) that your data architecture will never be the same. As Gwen puts it, “Kafka gives you the options, but it's up to you how you use it.”EPISODE LINKSKIP-415: Incremental Cooperative Rebalancing in Kafka ConnectWhy Kafka Connect? ft. Robin Moffatt Confluent Hub Creativity IncFifth Discipline Join the Confluent Community SlackLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 2.6 - Overview of Latest Features, Updates, and KIPs

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 6, 2020 10:37 Transcription Available


Apache Kafka® 2.6 is out! This release includes progress toward removing ZooKeeper dependency, adding client quota APIs to the admin client, and exposing disk read and write metrics, and support for Java 14. In addition, there are improvements to Kafka Connect, such as allowing source connectors to set topic-specific settings for new topics and expanding Connect worker internal topic settings. Kafka 2.6 also augments metrics for Kafka Streams and adds emit-on-change support for Kafka Streams, as well as other updates. EPISODE LINKSWatch the video version of this podcastRead about what's new in Apache Kafka 2.6Join the Confluent Community SlackLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Modernizing Inventory Management Technology ft. Sina Sojoodi and Rohit Kelapure

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jul 20, 2020 41:32


Inventory management systems are crucial for reducing real-time inventory data drift, improving customer experience, and minimizing out-of-stock events. Apache Kafka®’s real-time data technology provides seamless inventory tracking at scale, saving billions of dollars in the supply chain, making modernized data architectures more important to retailers now more than ever. In this episode, we’ll discuss how Apache Kafka allows the implementation of stateful event streaming architectures on a cloud-native platform for application and architecture modernization. Sina Sojoodi (Global CTO, Data and Architecture, VMware) and Rohit Kelapure (Principal Advisor, VMware) will discuss data modeling, as well as the architecture design needed to achieve data consistency and correctness while handling the scale and resilience needs of a major retailer in near real time. The implemented solution utilizes Spring Boot, Kafka Streams, and Apache Cassandra, and they explain the process of using several services to write to Cassandra instead of trying to use Kafka as a distributed log for enforcing consistency. EPISODE LINKSHow to Run Kafka Streams on Kubernetes ft. Viktor GamovMachine Learning with Kafka Streams, Kafka Connect, and ksqlDB ft. Kai WaehnerUnderstand What’s Flying Above You with Kafka Streams ft. Neil BuesingJoin the Confluent Community SlackLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperUse 60PDCAST to get an additional $60 of free Confluent Cloud usage*

Streaming Audio: a Confluent podcast about Apache Kafka
Connecting Snowflake and Apache Kafka ft. Isaac Kunen

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later May 20, 2020 31:46


Isaac Kunen (Senior Product Manager, Snowflake) and Tim Berglund (Senior Director of Developer Advocacy, Confluent) practice social distancing by meeting up in the virtual studio to discuss all things Apache Kafka® and Kafka Connect at Snowflake. Isaac shares what Snowflake is, what it accomplishes, and his experience with developing connectors. The pair discuss the Snowflake Kafka Connector and some of the unique challenges and adaptations it has had to undergo, as well as the interesting history behind the connector. In addition, Isaac talks about how they’re taking on event streaming at Snowflake by implementing the Kafka connector and what he hopes to see in the future with Kafka releases. EPISODE LINKSDownload the Snowflake Kafka ConnectorPaving a Data Highway with Kafka Connect ft. Liz BennettMaking Apache Kafka Connectors for the Cloud ft. Magesh NandakumarMachine Learning with Kafka Streams, Kafka Connect, and ksqlDB ft. Kai WaehnerConnecting to Apache Kafka with Neo4jContributing to Open Source with the Kafka Connect MongoDB Sink ft. Hans-Peter GrahslConnecting Apache Cassandra to Apache Kafka with Jeff Carpenter from DataStaxWhy Kafka Connect? ft. Robin MoffattJoin the Confluent Community SlackLearn more with Kafka tutorials, resources, and guides at Confluent Developer

Streaming Audio: a Confluent podcast about Apache Kafka
IoT Integration and Real-Time Data Correlation with Kafka Connect and Kafka Streams ft. Kai Waehner

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Apr 29, 2020 40:55


There are two primary industries within the Internet of Things (IoT): industrial IoT (IIoT) and consumer IoT (CIoT), both of which can benefit from the Apache Kafka® ecosystem, including Kafka Streams and Kafka Connect. Kai Waehner, who works in the advanced tech group at Confluent with customers, defining their needs, use cases, and architecture, shares example use cases where he’s seen IoT integration in action. He specifically focuses on Walmart and its real-time customer integration using the Walmart app. Kafka Streams helps fine-tune the Walmart app, optimizing the user experience, offering a seamless omni-channel experience, and contributing to business success. Other topics discussed in today’s episode include integration from various legacy and modern IoT data sources, latency sensitivity, machine learning for quality control and predictive maintenance, and when event streaming can be more useful than traditional databases or data lakes.EPISODE LINKSApache Kafka 2.5 – Overview of Latest Features, Updates, and KIPsMachine Learning with Kafka Streams, Kafka Connect, and ksqlDB ft. Kai WaehnerBlog posts by Kai WaehnerProcessing IoT Data from End to End with MQTT and Apache Kafka®End-to-End Integration: IoT Edge to Confluent CloudApache Kafka is the New Black at the Edge in Industrial IoT, Logistics, and RetailingApache Kafka, KSQL, and Apache PLC4X for IIoT Data Integration and ProcessingStreaming Machine Learning at Scale from 100,000 IoT Devices with HiveMQ, Apache Kafka, and TensorFlowEvent-Model Serving: Stream Processing vs. RPC with Kafka and TensorFlowJoin the Confluent Community SlackLearn about Kafka at Confluent Developer

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 2.5 – Overview of Latest Features, Updates, and KIPs

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Apr 16, 2020 10:28


Apache Kafka® 2.5 is here, and we’ve got some Kafka Improvement Proposals (KIPs) to discuss! Tim Berglund (Senior Director of Developer Advocacy, Confluent) shares improvements and changes to over 10 KIPs all within the realm of Core Kafka, Kafka Connect, and Kafka Streams, including foundational improvements to exactly once semantics, the ability to track a connector’s active topics, and adding a new co-group operator to the Streams DSL.EPISODE LINKSCheck out the Apache Kafka 2.5 release notesRead about what’s new in Apache Kafka 2.5Watch the video version of this podcastJoin the Confluent Community SlackLearn about Kafka at Confluent Developer

Streaming Audio: a Confluent podcast about Apache Kafka
Paving a Data Highway with Kafka Connect ft. Liz Bennett

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Feb 12, 2020 46:01


The Stitch Fix team benefits from a centralized data integration platform at scale using Apache Kafka and Kafka Connect. Liz Bennett (Software Engineer, Confluent) got to play a key role building their real-time data streaming infrastructure. Liz explains how she implemented Apache Kafka® at Stitch Fix, her previous employer, where she successfully introduced Kafka first through a Kafka hackathon and then by pitching it to the management team. Her first piece of advice? Give it a cool name like The Data Highway. As part of the process, she prepared a detailed document proposing a Kafka roadmap, which eventually landed her in a meeting with management on how they would successfully integrate the product (spoiler: it worked!). If you’re curious about the pros and cons of Kafka Connect, the self-service aspect, how it does with scaling, metrics, helping data scientists, and more, this is your episode! You’ll also get to hear what Liz thinks her biggest win with Kafka has been.EPISODE LINKSPutting the Power of Apache Kafka into the Hands of Data ScientistsJoin the Confluent Community SlackGet 30% off Kafka Summit London registration with the code KSL20Audio

Streaming Audio: a Confluent podcast about Apache Kafka
Streaming Call of Duty at Activision with Apache Kafka ft. Yaroslav Tkachenko

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jan 27, 2020 46:43


Call of Duty: Modern Warfare is the most played Call of Duty multiplayer of this console generation with over $1 billion in sales and almost 300 million multiplayer matches. Behind the scenes, Yaroslav Tkachenko (Software Engineer and Architect, Activision) gets to be on the team behind it all, architecting, designing, and implementing their next-generation event streaming platform, including a large-scale, near-real-time streaming data pipeline using Kafka Streams and Kafka Connect.Learn about how his team ingests huge amounts of data, what the backend of their massive distributed system looks like, and the automated services involved for collecting data from each pipeline. EPISODE LINKSBuilding a Scalable and Extendable Data Pipeline for Call of Duty GamesDeploying Kafka Connect ConnectorsJoin the Confluent Community SlackGet 30% off Kafka Summit London registration with the code KSL20Audio

The InfoQ Podcast
Gunnar Morling on Change Data Capture and Debezium

The InfoQ Podcast

Play Episode Listen Later Jan 17, 2020 29:15


Today, on The InfoQ Podcast, Wes Reisz talks with Gunnar Morling. Gunnar is a software engineer at RedHat and leads the Debezium project. Debezium is an open-source distributed platform for change data capture (CDC). On the show, the two discuss the project and many of its use cases. Additionally, topics covered on the podcast include bootstrapping, configuration, challenges, debugging, and operational modes. The show wraps with long term strategic goals for the project. Why listen to this podcast: - CDC is a set of software design patterns used to react to changing data in a data store. Used for things like internal changelogs, integrations, replication, and event streaming, CDC can be implemented leveraging queries or against the DB transaction log. Debezium leverages the transaction log to implement CDC and is extremely performant. - Debezium has mature source and sink connectors for MySQL, SQL Server, and MongoDB. In addition, there are Incumbating connectors for Cassandra, Oracle, and DB2. Community sink connectors have been created for ElasticSearch. - In a standard deployment, Debezium leverages a Kafka cluster by deploying connectors into Kafka Connect. The connectors establish a connection to the source database and then write changes to a Kafka topic. - Debezium can be run in embedded mode. Embedded mode imports Java library into your own project and leverages callbacks for change events. The library approach allows Debezium implementations against other tools like AWS Kinesis or Azure's Event Hub. Going forward, there are plans to make a ready-made Debezium runtime. - Out of the box, Debezium has a one-to-one mapping between tables and Kafka topic queues. The default approach exposes the internal table structure to the outside. One approach to address exposing DB internals is to leverage the Outbox Pattern. The Outbox Pattern uses a separate outbox table as a source. Inserts into your normal business logic tables also make writes to the outbox. Change events are then published to Kafka from the outbox source table. More on this: Quick scan our curated show notes on InfoQ https://bit.ly/3737GZB You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq Subscribe: www.youtube.com/infoq Like InfoQ on Facebook: bit.ly/2jmlyG8 Follow on Twitter: twitter.com/InfoQ Follow on LinkedIn: www.linkedin.com/company/infoq Check the landing page on InfoQ: https://bit.ly/3737GZB

The Hoot from Humio
The Hoot - Episode 14 - Humio and Confluent with Viktor Gamov

The Hoot from Humio

Play Episode Listen Later Dec 26, 2019 34:37


Victor Gamov is a Developer Advocate at Confluent, the company that makes an event streaming platform based on Apache Kafka. John and Viktor talk about the life of a developer advocate, and about the history of Confluent and Kafka.   “The cool thing is that Kafka actually enables a lot of modern businesses that you didn't think that you'll need until you have it — things like Uber and Uber Eats. The technology enabled them to do the things that they do right now, and specifically stream processing.” Victor GamovConfluent Developer Advocate Listen to this week's podcast to learn more about how Humio and Confluent make managing streaming data from distributed systems easier and more efficient for ITOps, DevOps, and Security professionals. Viktor describes how Kafka Connect works with your Humio data. They wrap up their conversation discussing what organizations need to consider addressing in the coming year, and how to be better prepared. 

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 2.4 – Overview of Latest Features, Updates, and KIPs

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Dec 16, 2019 15:04


Apache Kafka 2.4 includes new Kafka Core developments and improvements to Kafka Streams and Kafka Connect, including MirrorMaker 2.0, RocksDB metrics, and more.EPISODE LINKSRead about what's new in Apache Kafka 2.4Check out the Apache Kafka 2.4 release notesWatch the video version of this podcast

Streaming Audio: a Confluent podcast about Apache Kafka
Machine Learning with Kafka Streams, Kafka Connect, and ksqlDB ft. Kai Waehner

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Dec 4, 2019 38:30


In this episode, Kai Waehner (Senior Systems Engineer, Confluent) defines machine learning in depth, describes the architecture of his dream machine learning pipeline, shares about its relevance to Apache Kafka®, Kafka Connect, ksqlDB, and the related ecosystem, and discusses the importance of security and fraud detection. He also covers Kafka use cases, including an example of how Kafka Streams and TensorFlow provide predictive analytics for connected cars.EPISODE LINKSHow to Build and Deploy Scalable Machine Learning in Production with Apache KafkaLearn about Apache KafkaLearn about Kafka ConnectLearn about ksqlDB, the successor to KSQLJoin the Confluent Community SlackFully managed Apache Kafka as a service! Try free.

Streaming Audio: a Confluent podcast about Apache Kafka
Real-Time Payments with Clojure and Apache Kafka ft. Bobby Calderwood

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Nov 27, 2019 58:00


Streamlining banking technology to help smaller banks and credit unions thrive among financial giants is top of mind for Bobby Calderwood (Founder, Evident Systems), who started out in programming, transitioned to banking, and recently launched Evident Real-Time Payments. Payments leverages Confluent Cloud to help banks of all sizes transform to real-time banking services from traditionally batch-oriented, bankers’ hours operational mode. This is achieved through Apache Kafka® and the Kafka Streams and Kafka Connect APIs with Clojure using functional programming paradigms like transducers. Bobby also shares about his efforts to help financial services companies build their next-generation platforms on top of streaming events, including interesting use cases, addressing hard problems that come up in payments, and identifying solutions that make event streaming technology easy to use within established banking structures. EPISODE LINKSToward a Functional Programming Analogy for MicroservicesEvent Modeling: Designing Modern Information SystemsFinovate Fall/Evident SystemsThe REPL Podcast: 30: Bobby Calderwood on Kafka and FintechClojure TransducersRich Hickey’s TwitterDavid Nolen's TwitterStuart Halloway’s TwitterChris Redinger’s TwitterTim Ewald’s LinkedInJoin the Confluent Community SlackFully managed Apache Kafka as a service! Try free.

Streaming Audio: a Confluent podcast about Apache Kafka
ETL and Event Streaming Explained ft. Stewart Bryson

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Nov 6, 2019 49:42


Migrating from traditional ETL tools to an event streaming platform is a process that Stewart Bryson (CEO and founder, Red Pill Analytics), is no stranger to. In this episode, he dispels misconceptions around what “streaming ETL” means, and explains why event streaming and event-driven architectures compel us to rethink old approaches:Not all data is corporate data anymoreNot all data is relational data anymoreThe cost of storing data is now negligibleSupporting modern, distributed event streaming platforms, and the shift of focus from on-premises to the cloud introduces new use cases that focus primarily on building new systems and rebuilding existing ones. From Kafka Connect and stack applications to the importance of tables, events, and logs, Stewart also discusses Gradle and how it’s being used at Red Pill Analytics. EPISODE LINKSDeploying Kafka Streams and KSQL with Gradle – Part 1: Overview and MotivationDeploying Kafka Streams and KSQL with Gradle – Part 2: Managing KSQL ImplementationsDeploying Kafka Streams and KSQL with Gradle – Part 3: KSQL User-Defined Functions and Kafka Streams Join the Confluent Community SlackFully managed Apache Kafka as a service! Try free.

Streaming Audio: a Confluent podcast about Apache Kafka
Kafka Screams: The Scariest JIRAs and How To Survive Them ft. Anna McDonald

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 30, 2019 46:32


In today's spooktacular episode of Streaming Audio, Anna McDonald (Technical Account Manager, Confluent) discusses six of the scariest Apache Kafka® JIRAs. Starting with KAFKA-6431: Lock Contention in Purgatory, Anna breaks down what purgatory is and how it’s not something to fear or avoid. Next, she dives into KAFKA-8522: Tombstones Can Survive Forever, where she explains tombstones, compacted topics, null values, and log compaction. Not to mention there’s KAFKA-6880: Zombie Replicas Must Be Fenced, which sounds like the spookiest of them all. KAFKA-8233, which focuses on the new TestTopology mummy (wrapper) class, provides one option for setting the topology through your Kafka Screams Streams application. As Anna puts it, "This opens doors for people to build better, more resilient, and more interesting topologies." To close out the episode, Anna talks about two more JIRAs: KAFKA-6738, which focuses on the Kafka Connect dead letter queue as a means of handling bad data, and the terrifying KAFKA-5925 on the addition of an executioner API. EPISODE LINKSKAFKA-6431: Lock Contention in PurgatoryKAFKA-8522: Tombstones Can Survive ForeverKAFKA-6880: Zombie Replicas Must Be FencedKAFKA-8233: Helper Classes to Make it Simpler to Write Test Logic with TopologyTestDriver KAFKA-6738: Kafka Connect Handling of Bad DataKAFKA-5925: Adding Records Deletion Operation to the New Admin Client APIStreaming Apps and Poison Pills: Handle the Unexpected with Kafka StreamsData Modeling for Apache Kafka – Streams, Topics & More with Dani TraphagenDistributed Systems Engineering with Apache Kafka ft. Jason GustafsonKafka Streams Topology VisualizerFollow Anna McDonald on TwitterFollow Mitch Henderson on TwitterJoin the Confluent Community SlackFully managed Apache Kafka as a service! Try free.

airhacks.fm podcast with adam bien
DBs-ium, CDC and Streaming

airhacks.fm podcast with adam bien

Play Episode Listen Later Oct 13, 2019 71:14


An airhacks.fm conversation with Gunnar Morling (@gunnarmorling) about: The first Debezium commit, Randal Hauch, DBs-iuim, Java Content Repository JCR / modshape, exploring the Change Data Capture (CDC), how Debezium started, the MySQL binlog, the logical decoding in Postgres, Oracle Advanced Queuing, update triggers, Java Message System (JMS), there is no read detection, switching the current user at JDBC connection for audit purposes, helping Debezium with additional metadata table, using Kafka Streams to join the metadata and the payload, installing the logical decoding plugins into PostgreSQL, logical decoding plugin exposes the data from the write ahead log, decoding into protocol buffers with decoderbufs, in cloud environments like e.g. Amazon RDS you are not allowed to install any plugins, wal2json is verbose but comes preinstalled on RDS, pgoutput is responsible for the actual decoding of the events, debezium only sees committed transactions, debezium is mainly written in Java, decoderbufs was written by community and included to debezium, Debezium communicates with Postgres via the JDBC / Postgres API, pgoutput format is converted into Kafka Connector source format, Kafka Connect is a framework for running connectors, Kafka Connect comes with sink and source connectors, Kafka Connect comes with connector specific connectors like e.g. StringConverter, Converters are not Serializers, Debezium ships as Kafka Connect plugin, Kafka Connector runs as standalone process, running Debezium in embedded mode, JPA cache invalidation with Debezium, converting Debezium events into CDI events, converting database changes to WebSockets events, database polling vs the Debezium approach, DB2 will support Debezium, Oracle support is "on the horizon", Oracle LogmMiner, Oracle XStream, Debezium supports Microsoft SQL Server (starting with Enterprise license), Apache Pulsar comes with Debezium out-of-the-box, Pulsar IO, running Debezium as standalone service with outbounds APIs, MongoDB supports the "Debezium Change Event Format", Kafka Sink connectors are easy to implement, Debezium embedded mode and offsets, embedded connector has to remember the offset, an offset API is available for embedded Debezium connectors, combining CDC with Kafka Streams, Quarkus supports Kafka Streams and Reactive Messaging, Quarkus and Kafka Streams, Quarkus supports Kafka Streams in dev mode, replacing Hibernate Envers with Debezium, Messaging vs. Streaming or JMS vs. Kafka, Kafka is a database, the possible Debezium features, Cassandra support is coming, Outbox pattern is going to be better supported, transactional event grouping, dedicated topic for transaction demarcations, commercial support for Debezium, Debezium exposes JMX metrics, Five Advantages of Log-Based Change Data Capture, Reliable Microservices Data Exchange With the Outbox Pattern, Automating Cache Invalidation With Change Data Capture Gunnar Morling on twitter: @gunnarmorling and github: https://github.com/gunnarmorling. Gunnar's blog: https://morling.dev/.

Streaming Audio: a Confluent podcast about Apache Kafka
MySQL, Cassandra, BigQuery, and Streaming Analytics with Joy Gao

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 2, 2019 43:59


Joy Gao chats with Tim Berglund about all things related to streaming ETL—how it works, its benefits, and the implementation and operational challenges involved. She describes the streaming ETL architecture at WePay from MySQL/Cassandra to BigQuery using Apache Kafka®, Kafka Connect, and Debezium. EPISODE LINKSCassandra Source Connector DocumentationStreaming Databases in Real Time with MySQL, Debezium, and KafkaStreaming Cassandra at WePayChange Data Capture with Debezium ft. Gunnar MorlingJoin the Confluent Community SlackFully managed Apache Kafka as a service! Try free.

Streaming Audio: a Confluent podcast about Apache Kafka
Contributing to Open Source with the Kafka Connect MongoDB Sink ft. Hans-Peter Grahsl

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 21, 2019 50:22


Sink and source connectors are important for getting data in and out of Apache Kafka®. Tim Berglund invites Hans-Peter Grahsl (Technical Trainer and Software Engineer, Netconomy Software & Consulting GmbH) to share about his involvement in the Apache Kafka project, spanning from several conference contributions all the way to his open source community sink connector for MongoDB, now part of the official MongoDB Kafka connector code base. Join us in this episode to learn what it’s like to be the only maintainer of a side project that’s been deployed into production by several companies!EPISODE LINKSMongoDB Connector for Apache KafkaGetting Started with the MongoDB Connector for Apache Kafka and MongoDBKafka Connect MongoDB Sink Community ConnectorKafka Connect MongoDB Sink Community Connector (GitHub)Adventures of Lucy the Havapoo Join the Confluent Community Slack

Streaming Audio: a Confluent podcast about Apache Kafka
Deploying Confluent Platform, from Zero to Hero ft. Mitch Henderson

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jun 18, 2019 32:30


Mitch Henderson (Technical Account Manager, Confluent) explains how to plan and deploy your first application running on Confluent Platform. He covers critical factors to consider, like the tools and skills you should have on hand, and how to make decisions about deployment solutions. Mitch also walks you through how to go about setting up monitoring and testing, the marks of success, and what to do after your first project launches successfully.

Drill to Detail
Drill to Detail Ep.68 ‘Confluent, Event-First Thinking and Streaming Real-Time Analytics' With Special Guests Robin Moffatt and Ricardo Ferreira and Special Host Stewart Bryson

Drill to Detail

Play Episode Listen Later Jun 17, 2019 40:46


In this special edition of the Drill to Detail Podcast hosted by Stewart Bryson, CEO and Co-Founder of Red Pill Analytics, he is joined by Robin Moffatt and Ricardo Ferreira, Developer Advocates at Confluent, to talk about Apache Kafka and Confluent, event-first thinking and streaming real-time analytics.Confluent Download: https://www.confluent.io/download/Demo: https://github.com/confluentinc/cp-demo/Slack group: http://cnfl.io/slackMailing list: https://groups.google.com/forum/#!forum/confluent-platformFrom Zero to Hero with Kafka Connect: http://rmoff.dev/ksldn19l-kafka-connect-slidesNo More Silos: Integrating Databases and Apache Kafka: http://rmoff.dev/ksny19-no-more-silosThe Changing Face of ETL: Event-Driven Architectures for Data Engineers: http://rmoff.dev/changing-face-of-etl

Drill to Detail
Drill to Detail Ep.68 ‘Confluent, Event-First Thinking and Streaming Real-Time Analytics' With Special Guests Robin Moffatt and Ricardo Ferreira and Special Host Stewart Bryson

Drill to Detail

Play Episode Listen Later Jun 17, 2019 40:46


In this special edition of the Drill to Detail Podcast hosted by Stewart Bryson, CEO and Co-Founder of Red Pill Analytics, he is joined by Robin Moffatt and Ricardo Ferreira, Developer Advocates at Confluent, to talk about Apache Kafka and Confluent, event-first thinking and streaming real-time analytics.Confluent Download: https://www.confluent.io/download/Demo: https://github.com/confluentinc/cp-demo/Slack group: http://cnfl.io/slackMailing list: https://groups.google.com/forum/#!forum/confluent-platformFrom Zero to Hero with Kafka Connect: http://rmoff.dev/ksldn19l-kafka-connect-slidesNo More Silos: Integrating Databases and Apache Kafka: http://rmoff.dev/ksny19-no-more-silosThe Changing Face of ETL: Event-Driven Architectures for Data Engineers: http://rmoff.dev/changing-face-of-etl

Streaming Audio: a Confluent podcast about Apache Kafka
Why Kafka Connect? ft. Robin Moffatt

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jun 12, 2019 46:42


In this episode, Tim talks to Robin Moffatt about what Kafka Connect is and why you should almost certainly use it if you're working with Apache Kafka®️. Whether you're building database offload pipelines to Amazon S3, ingesting events from external datastores to drive your applications or exposing messages from your microservices for audit and analysis, Kafka Connect is for you. Tim and Robin cover the motivating factors for Kafka Connect, why people end up reinventing the wheel when they're not aware of it and Kafka Connect's capabilities, including scalability and resilience. They also talk about the importance of schemas in Kafka pipelines and programs, and how the Confluent Schema Registry can help.

Streaming Audio: a Confluent podcast about Apache Kafka
Ask Confluent #10: Cooperative Rebalances for Kafka Connect ft. Konstantine Karantasis

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Feb 20, 2019 21:29


Want to know how Kafka Connect distributes tasks to workers? Always thought Connect rebalances could be improved? In this episode of Ask Confluent, Gwen Shapira speaks with Konstantine Karantasis, software engineer at Confluent, about the latest improvements to Kafka Connect and how to run the Confluent CLI on Windows.EPISODE LINKSImproved rebalancing for Kafka ConnectImproved rebalancing for Kafka StreamsThe "what would Kafka do?" scenario from Mark PapadakisThe future of retail at Nordstrom

Google Cloud Platform Podcast
Confluent and Kafka with Viktor Gamov

Google Cloud Platform Podcast

Play Episode Listen Later Nov 13, 2018 37:46


Viktor Gamov is on the podcast today to discuss Confluent and Kafka with Mark and special first-time guest host, Michelle. Viktor spends time with Mark and Melanie explaining how Kafka allows you to stream and process data in real-time, and how Kafka helps Confluent with its advanced streaming capabilities. Confluent Cloud helps connect Confluent and cloud platforms such as Google Cloud so customers don’t have to manage anything - Confluent takes care of it for you! To wrap up the show, Michelle answers our question of the week about Next 2019. Viktor Gamov Viktor Gamov is a Developer Advocate at Confluent, the company that makes a streaming platform based on Apache Kafka. Working in the field, Viktor developed comprehensive expertise in building enterprise application architectures using open source technologies. He enjoys helping different organizations design and develop low latency, scalable, and highly available distributed systems. Back in his consultancy days, he co-authored O’Reilly’s «Enterprise Web Development». He is a professional conference speaker on distributed systems, Java, and JavaScript topics, and is a regular at events, including JavaOne, Devoxx, OSCON, QCon, and others. He blogs and produces the podcasts Razbor Poletov (in Russian) and co-hosts DevRelRad.io. Follow Viktor on Twitter, where he posts about gym life, food, open source, and, of course, Kafka and Confluent! Cool things of the week Kubeflow published a leadership guide to inclusivity site Picture what the cloud can do: How the New York Times is using Google Cloud to find untold stories in millions of archived photos blog Click-to-deploy on Kubeflow site Containerd available for beta testing in Google Kubernetes Engine blog Introducing AI Hub and Kubeflow Pipelines: Making AI simpler, faster, and more useful for businesses blog Announcing Cloud Scheduler: a modern, managed cron service for automated batch jobs blog Interview Kafka site Kafka Connect site Kafka Streams site KSQL site Confluent site Confluent Hub site Confluent Schema Registry site Confluent Cloud on Google Cloud Marketplace site Confluent Enterprise site Confluent Cloud site Confluent on Github site Confluent Blog blog How to choose the number of topics/partitions in a Kafka cluster? blog Publishing with Apache Kafka at The New York Times blog Google Cloud Platform and Confluent partner to deliver a managed Apache Kafka service blog Viktor’s Presentations site Confluent Community site Question of the week If I wanted to submit a CFP for Next 2019, how would I do it? Where can you find us next? Mark and Michelle will be at KubeCon in December. Michelle will be at Scale by the Bay on Friday. She’ll also be at YOW! Sydney, Brisbane, & Melbourne in Nov & December.

Roaring Elephant
Episode 51 – Roaring News

Roaring Elephant

Play Episode Listen Later Sep 5, 2017 38:48


In this news episode (our very first one), Dave is all-out on Artificial Intelligence and its use in naming "stuff"; for some subjects it apparently works very well, for other subjects not so much...   Jhon brings a blog on deploying new Kerberos functionality and a tutorial for Kafka Connect for those that have not really looked at it. The ensuing discussion on Nifi vs kafka is purely coincidental. Dave AI naming Paint (May 2017) http://lewisandquark.tumblr.com/post/160776374467/new-paint-colors-invented-by-neural-network https://arstechnica.co.uk/information-technology/2017/05/ai-paint-colour-names/ Guinea Pigs (June 2017) http://gizmodo.com/this-is-what-happens-when-you-teach-an-ai-to-name-guine-1796172891 Improved Paint (July 2017) https://arstechnica.co.uk/information-technology/2017/07/ai-paint-colours-reprogrammed/ British sounding place names (July 2017) http://www.telegraph.co.uk/technology/2017/07/20/ai-trained-generate-incredibly-british-place-names/ Beer (August 2017) http://gizmodo.com/weve-run-out-of-beer-names-and-ai-is-here-to-help-1797480178 Jhon Accessing Secure Cluster from Web Applications http://blog.cloudera.com/blog/2017/08/accessing-secure-cluster-from-web-applications/ The Simplest Useful Kafka Connect Data Pipeline In The World https://www.confluent.io/blog/simplest-useful-kafka-connect-data-pipeline-world-thereabouts-part-1/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.