Podcasts about kafka streams

  • 29PODCASTS
  • 102EPISODES
  • 42mAVG DURATION
  • ?INFREQUENT EPISODES
  • May 4, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about kafka streams

Latest podcast episodes about kafka streams

airhacks.fm podcast with adam bien
LittleHorse Likes Sun

airhacks.fm podcast with adam bien

Play Episode Listen Later May 4, 2025 63:46


An airhacks.fm conversation with Colt McNealy (@coltmcnealy) about: first computing experience with Sun workstations and network computing, background in hockey and other sports, using system76 Linux laptops for development, starting programming in high school with Java and later learning C, fortran, assembly, C++ and python, working at a real estate company with kubernetes and Kafka, the genesis of LittleHorse from experiencing challenges with distributed microservices and workflow management, LittleHorse as an open source workflow orchestration engine using Kafka as a commit log rather than a message queue, building a custom distributed database optimized for workflow orchestration, the recent move to fully open source licensing, comparison with AWS Step Functions but with more capabilities and open source benefits, using RocksDB and Kafka Streams for the underlying implementation, performance metrics of 12-40ms latency between tasks and hundreds of tasks per second, the multi-tenant architecture allowing for serverless offerings, integration with Kafka for event-driven architectures, the distinction between orchestration and choreography in distributed systems, using Java 21 with benefits from virtual threads and generational garbage collection, plans for Java 25 adoption, the naming story behind "Little Horse" and its competition with MuleSoft, the Sun Microsystems legacy and innovation culture, recent adoption of Quarkus for some components, the "Know Your Customer" flow as the Hello World example for Little Horse, the importance of observability and durability in workflow management, plans for serverless offerings and multi-tenant architecture, the balance between open source core and commercial offerings Colt McNealy on twitter: @coltmcnealy

Engineering Kiosk
#177 Stream Processing & Kafka: Die Basis moderner Datenpipelines mit Stefan Sprenger

Engineering Kiosk

Play Episode Listen Later Jan 7, 2025 67:40


Data Streaming und Stream Processing mit Apache Kafka und dem entsprechenden Ecosystem.Eine ganze Menge Prozesse in der Softwareentwicklung bzw. für die Verarbeitung von Daten müssen nicht zur Laufzeit, sondern können asynchron oder dezentral bearbeitet werden. Begriffe wie Batch-Processing oder Message Queueing / Pub-Sub sind dafür geläufig. Es gibt aber einen dritten Player in diesem Spiel: Stream Processing. Da ist Apache Kafka das Flaggschiff, bzw. die verteilte Event Streaming Platform, die oft als erstes genannt wird.Doch was ist denn eigentlich Stream Processing und wie unterscheidet es sich zu Batch Processing oder Message Queuing? Wie funktioniert Kafka und warum ist es so erfolgreich und performant? Was sind Broker, Topics, Partitions, Producer und Consumer? Was bedeutet Change Data Capture und was ist ein Sliding Window? Auf was muss man alles acht geben und was kann schief gehen, wenn man eine Nachricht schreiben und lesen möchte?Die Antworten und noch viel mehr liefert unser Gast Stefan Sprenger.Bonus: Wie man Stream Processing mit einem Frühstückstisch für 5-jährige beschreibt.Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:

javaswag
#61 - Григорий Скобелев - Кафка, шардирование и роль техлида в стартапе

javaswag

Play Episode Listen Later May 21, 2024 91:49


В 61 выпуске подкаста Javaswag поговорили с Григорием Скобелевым о Кафке, шардировании Постгреса и роли техлида в стартапе 00:00:00 Введение и работа с шейдерами 00:03:49 Разработка в Java и работа над биллингом 00:07:54 Коробочное решение для тарификации и обработки событий 00:09:23 Требования к работе в телекоммуникационных компаниях 00:13:04 Kafka Streams и работа с потоковыми данными 00:15:13 CDC (Change Data Capture) и использование Kafka Streams 00:21:13 Публичные выступления и их роль в развитии разработчика 00:22:09 Инженерная культура в компании Яндекс.Деньги 00:25:54 Инструменты разработки: плагины и тулзы 00:28:36 Создание плагинов для Gradle и Maven 00:31:49 Полезные тулзы для ускорения работы 00:36:34 Шардирование базы данных: проблемы и применение 00:39:21 Шардирование в PostgreSQL и его преимущества 00:43:39 Использование идентификаторов пользователей для маршрутизации запросов 00:50:00 Роль техлида в компании и его ответственности 00:53:16 Трансляция бизнес-требований в технические 00:56:33 Подготовка архитектуры к росту и увеличению нагрузки 00:57:57 Нагрузочное тестирование и оптимизация ресурсов 00:59:32 Кросс-языковое взаимодействие команды и выбор языка программирования 01:06:32 Выбор технологий и инструментов для микросервисов 01:07:00 Database per service подход 01:09:43 Взаимодействие между микросервисами 01:11:09 Контрактный подход 01:14:29 Прогрев приложений 01:16:42 Обмен опытом с другими техлидами 01:19:56 Проблемы с аптаймом и возможные решения 01:20:53 Оценка работы техлида и его влияние на команду 01:22:19 Важность развития в разных технологиях 01:27:00 Ответ на предыдущее непопулярное мнение 01:29:31 Непопулярное мнение Гость - https://www.linkedin.com/in/grigoriy-skobelev-757030167/ Ссылки: Подкаст «Между скобок» – https://youtube.com/@mezhdu_skobok Гитхаб Гриши с выступлениями – https://github.com/GSkoba/talks Телеграм-группа с обсуждением книжек – https://t.me/backend_megdu_skobkah Курс по Gradle - https://www.youtube.com/watch?v=Ajs8pTbg8as&list=PLWQK2ZdV4Yl2k2OmC_gsjDpdIBTN0qqkE Кип сейф! 🖖

Developer Voices
ByteWax: Rust's Research Meets Python's Practicalities (with Dan Herrera)

Developer Voices

Play Episode Listen Later May 8, 2024 61:54


Bytewax is a curious stream processing tool that blends a Python surface with a Rust core to produce something that's in a similar vein to Kafka Streams or Apache Flink, but with a fundamentally different implementation. This week we're going to take a look at what it does, how it works in theory, and how the marriage of Python and Rust works in practice…–The original Naiad Paper: https://dl.acm.org/doi/10.1145/2517349.2522738Timely Dataflow: https://github.com/TimelyDataflow/timely-dataflowBytewax the Library: https://github.com/bytewax/bytewaxBytewax the Service: https://bytewax.io/PyO3, for calling Rust from Python: https://pyo3.rs/v0.21.2/Kris on Mastodon: http://mastodon.social/@krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/Kris on Twitter: https://twitter.com/krisajenkins--#softwaredevelopment #dataengineering #apachekafka #timelydataflow

The Data Stack Show
184: Kafka Streams and Operationalizing Event Driven Applications with Aprurva Mehta of Responsive

The Data Stack Show

Play Episode Listen Later Apr 3, 2024 58:27


Highlights from this week's conversation include:Apruva's background in streaming technology (0:48)Developer experience and Kafka streams (2:47)Motivation to bootstrap a startup (4:09)Meeting the Confluent founders and early work at Confluent (6:59)Projects at Confluent and transition to engineering management (10:34)Overview of Responsive and event-driven applications (12:55)Defining event-driven applications (15:33)Importance of latency and state in event-driven applications (18:54)Low Latency and Stateful Processing (21:52)In-Memory Storage and Evolution of Kafka (25:02)Motivation for KSQL and Kafka Streams (29:46)Category Creation and Database-like Interface (34:33)Developer Experience with Kafka and Kafka Streams (38:50)Kafka Streams Functionality and Operational Challenges (41:44)Metrics and Tuning Configurations (43:33)Architecture and Decoupling in Kafka Streams (45:39)State Storage and Transition from RocksDB (47:48)Future of Event-Driven Architectures (56:30)Final thoughts and takeaways (57:36)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we'll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Developer Voices
Bringing Pure Python to Apache Kafka (with Tomáš Neubauer)

Developer Voices

Play Episode Listen Later Apr 3, 2024 66:29


The “big data infrastructure” world is dominated by Java, but the data-analysis world is dominated by Python. So if you need to analyse and process huge amounts of data, chances are you're in for a less-than-ideal time. The impedance mismatch will probably make your life hard somehow. So there are a lot of projects and companies trying to solve that problem. To bridge those two worlds seamlessly, and many of the popular solutions see SQL as the glue. But this week we're going to look at another solution - ignore Java, treat Kafka as a protocol, and build up all the infrastructure tools you need with a pure Python library. It's a lot of work, but in theory it would make Python the one language for data storage, analysis and processing, at scale. Tempting, but is it feasible? Joining me to discuss the pros, cons, and massive scope of that approach is Tomáš Neubauer. He started off doing real time data analysis for the Maclaren's F1 team, and is now deep in the Python mines effectively rewriting Kafka Streams in Python. But how? How much work is actually involved in porting those ideas to Python-land, and how do you even get started? And perhaps most fundamental of all - even if you succeed, will that be enough to make the job easy, or will you still have to scale the mountain of teaching people how to use the new tools you've built? Let's find out.– Quix Streams on Github: https://github.com/quixio/quix-streamsQuix Streams getting started guide: https://quix.io/get-started-with-quix-streamsQuix: https://quix.io/ Tomáš on LinkedIn: https://www.linkedin.com/in/tom%C3%A1%C5%A1-neubauer-a10bb144Tomáš on Twitter: https://twitter.com/TomasNeubauer0Kris on Mastodon: http://mastodon.social/@krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/Kris on Twitter: https://twitter.com/krisajenkins --#podcast #softwaredevelopment #datascience #apachekafka #streamprocessing

Real-Time Analytics with Tim Berglund
Kafka Streams Enhancements with Confluent's Matthias Sax | Ep. 45

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Mar 18, 2024 30:16


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Today, Tim dives into the world of Kafka Streams with Matthias Sax, Software Engineer at Confluent and core contributor to Apache Kafka. Matthias updates us on the latest in Interactive Queries, their enhancements in recent releases, insights on stream processing and how Kafka Streams stands out in the real-time analytics landscape. Remember to use the 30% discount Tim mentioned for the Real-Time Analytics Summit: https://stree.ai/rtapod30 (Code: RTAPOD30)

Real-Time Analytics with Tim Berglund
Best of 2023: A Gentle Introduction to Kafka Streams with Anna McDonald

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Jan 2, 2024 26:58


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Looking back at our favorite episodes from 2023, Tim Berglund chats with Anna McDonald about the fascinating world of Kafka Streams. Anna, a customer success technical architect at Confluent, shares her insights on the core concepts of Kafka Streams, including the all-important table and stream abstractions. They delve into the benefits of statefulness and durability, such as active and standby tasks, which ensure seamless failover, and how Kafka Streams stores state in RocksDB and in Kafka itself. New episodes every Monday resume on January 8, 2024!

The New Stack Podcast
How Apache Flink Delivers for Deliveroo

The New Stack Podcast

Play Episode Listen Later Sep 20, 2023 20:38


Deliveroo, a prominent food delivery company, relies on Apache Flink, a distributed processing engine, to enhance its three-sided marketplace, connecting delivery drivers, restaurants, and customers. Seeking to improve real-time data streaming and gain insights into customer behavior, Deliveroo transitioned to Flink, comparing it to alternatives like Apache Spark and Kafka Streams. Flink, with feature parity to their previous platform, offered stability and scalability. They initially experimented with Flink on Kubernetes but turned to the Amazon Managed Service for Flink (MSF) for enhanced support and maintenance.Engineers from Deliveroo, Felix Angell and Duc Anh Khu, emphasized the need for flexibility in data modeling to accommodate their fast-paced product development. However, flexibility can be complex, often requiring data model adjustments. They expressed the desire for a self-serve configuration feature in MSF, allowing easy customization of low-level settings and auto-scaling based on application metrics. This move to Flink and MSF has empowered Deliveroo to focus on core responsibilities like continuous integration and delivery while efficiently managing their data processing needs.Learn more from The New Stack about Apache Flink and AWS:Kinesis, Kafka and Amazon Managed Service for Apache FlinkApache Flink for Real Time Data AnalysisApache Flink for Unbounded Data Streams

The Cloud Pod
227: The Cloud Pod Peeps at Azure's Explicit Proxy

The Cloud Pod

Play Episode Listen Later Sep 14, 2023 51:58


FINOS Open Source in Fintech Podcast
Enabling Real Time Regulatory Compliance with Kafka Streams and Morphir - Anna McDonald, Technical Voice of the Customer, Confluent

FINOS Open Source in Fintech Podcast

Play Episode Listen Later Aug 30, 2023 26:04


In this episode of the podcast, Grizz sits down with Anna McDonald, Technical Voice of the Customer at Confluent to talk about her OSFF talk: "Enabling Real Time Regulatory Compliance with Kafka Streams and Morphir". We talk about Kafka Streams, Morphir, Open Regulation, and what it's like to figure out your passion for coding at 5 years old. She will be speaking at the Open Source in Finance Forum on November 1st in New York: ⁠⁠https://sched.co/1PzH7 ⁠⁠⁠⁠⁠ Anna McDonald LinkedIn: https://www.linkedin.com/in/jbfletch/ NYC November 1 - Open Source in Finance Forum: ⁠⁠⁠⁠⁠https://events.linuxfoundation.org/open-source-finance-forum-new-york/⁠⁠⁠⁠⁠ 2022 State of Open Source in Financial Services Download: ⁠⁠⁠⁠⁠⁠https://www.finos.org/state-of-open-source-in-financial-services-2022⁠⁠⁠⁠⁠ All Links on Current Newsletter Here: ⁠⁠⁠⁠⁠⁠⁠https://www.finos.org/newsletter⁠⁠⁠⁠⁠⁠⁠ - more show notes to come A huge thank you to all our sponsors for Open Source in Finance Forum New York ⁠⁠⁠⁠https://events.linuxfoundation.org/open-source-finance-forum-new-york/⁠⁠⁠⁠that will take place this November 1st at the New York Marriott Marquis This event wouldn't be possible without our sponsors. A special thank you to our Leader sponsors: Databricks, where you can unify all your data, analytics, and AI on one platform. And Red Hat - Open to change—yesterday, today, and tomorrow. And our Contributor and Community sponsors: Adaptive/Aeron, Discover, FinOps Foundation, instaclustr, mend.io, Open Mainframe Project, OpenJS Foundation, OpenLogic by Perforce, Orkes, Red Hat, Sonatype, and Tidelift. If you would like to sponsor or learn more about this event, please send an email to ⁠⁠⁠⁠sponsorships@linuxfoundation.org⁠⁠⁠⁠. Grizz's Info | ⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/aarongriswold/⁠⁠⁠⁠⁠⁠ | ⁠⁠⁠⁠⁠⁠grizz@finos.org⁠⁠⁠⁠⁠⁠ ►► ⁠⁠⁠⁠⁠⁠Visit FINOS www.finos.org⁠⁠⁠⁠⁠⁠ ►► ⁠⁠⁠⁠⁠⁠Get In Touch: info@finos.org

SaaS for Developers
Building SaaS on Kafka Streams

SaaS for Developers

Play Episode Listen Later Aug 7, 2023 53:48


Colt McNealy is re-imagining the future of microservices orchestration and he decided to build it entirely on Kafka Streams. In this conversation we discuss how Kafka Streams provides the low latency, reliability, availability and elasticity that is needed for the next generation of microservices orchestration. Colt also shares the most exciting up and coming improvements in Kafka Streams community and the roadmap he'd dictate if he was the benevolent dictator of Kafka Streams.

Real-Time Analytics with Tim Berglund
Digging Into Interactive Queries in Kafka Streams with Bill Bejeck | Ep. 16

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Jul 24, 2023 30:01


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! On today's episode, Tim Berglund sits down for a chat with Bill Bejeck, a prominent figure in the world of Kafka and real-time analytics. They dive into topics around Apache Kafka, Kafka Streams and interactive queries, diving deep into each one. Bill describes interactive queries as a way to scrutinize the state of a Kafka Streams application, whether that's a simple key lookup or an analysis of complex aggregations. The conversation also explores the functionality of KTables and how Kafka Streams manage state. If you've ever wondered about interactive queries or Kafka Streams at large, this is the episode for you.Anna's previous episodes: https://youtu.be/K14Kn0D-I4Yhttps://youtu.be/nCLN15W_WOcBill's book, Kafka Streams in Action: https://www.manning.com/books/kafka-streams-in-actionKafka Streams 101 course: https://developer.confluent.io/courses/kafka-streams/get-started/?utm_medium=sem&utm_source=google&utm_campaign=ch.sem_br.nonbrand_tp.prs_tgt.dsa_mt.dsa_rgn.namer_lng.eng_dv.all_con.confluent-developer&utm_term=&creative=&device=c&placement=&gad=1&gclid=CjwKCAjwx_eiBhBGEiwA15gLN00L7kvbE0vwVuL9IIGu78PBhzaTTzZU3REN-z2FTr968azH4KouiRoCV4oQAvD_BwE

Real-Time Analytics with Tim Berglund
Kafka Streams and the Complexity of Time with Anna McDonald, Confluent | Ep. 12

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Jun 20, 2023 29:46


In this episode of the Real-Time Analytics Podcast, host Tim Berglund continues his conversation with Anna McDonald about Kafka Streams and the complexities of stream processing related to time. They explore the different types of windows available in Kafka Streams, including hopping, tumbling, session, and sliding windows. Anna provides insightful explanations and examples of each window type, highlighting their unique features and use cases. Don't miss out on this informative and engaging conversation on real-time analytics and Kafka Streams.Part 1 of Anna's episode: https://youtu.be/K14Kn0D-I4YAnna's Real-Time Analytics Summit 2023 presentation: https://youtu.be/tratRsV1TiI

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 3.5 - Kafka Core, Connect, Streams, & Client Updates

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jun 15, 2023 11:25 Transcription Available


Apache Kafka® 3.5 is here with the capability of previewing migrations between ZooKeeper clusters to KRaft mode. Follow along as Danica Fine highlights key release updates.Kafka Core:KIP-833 provides an updated timeline for KRaft.KIP-866 now is preview and allows migration from an existing ZooKeeper cluster to KRaft mode.KIP-900 introduces a way to bootstrap the KRaft controllers with SCRAM credentials.KIP-903 prevents a data loss scenario by preventing replicas with stale broker epochs from joining the ISR list. KIP-915 streamlines the process of downgrading Kafka's transaction and group coordinators by introducing tagged fields.Kafka Connect:KIP-710 provides the option to use a REST API for internal server communication that can be enabled by setting `dedicated.mode.enable.internal.rest` equal to true. KIP-875 offers support for native offset management in Kafka Connect. Connect cluster administrators can now read offsets for both source and sink connectors. This KIP adds a new STOPPED state for connectors, enabling users to shut down connectors and maintain connector configurations without utilizing resources.KIP-894 makes `IncrementalAlterConfigs` API available for use in MirrorMaker 2 (MM2), adding a new use.incremental.alter.config configuration which takes values “requested,” “never,” and “required.”KIP-911 adds a new source tag for metrics generated by the `MirrorSourceConnector` to help monitor mirroring deployments.Kafka Streams:KIP-339 improves Kafka Streams' error-handling capabilities by addressing serialization errors that occur before message production and extending the interface for custom error handling. KIP-889 introduces versioned state stores in Kafka Streams for temporal join semantics in stream-to-table joins. KIP-904 simplifies table aggregation in Kafka by proposing a change in serialization format to enable one-step aggregation and reduce noise from events with old and new keys/values. KIP-914 modifies how versioned state stores are used in Kafka Streams. Versioned state stores may impact different DSL processors in varying ways, see the documentation for details.Kafka Client:KIP-881 is now complete and introduces new client-side assignor logic for rack-aware consumer balancing for Kafka Consumers. KIP-887 adds the `EnvVarConfigProvider` implementation to Kafka so custom configurations stored in environment variables can be injected into the system by providing the map returned by `System.getEnv()`.KIP 641 introduces the `RecordReader` interface to Kafka's clients module, replacing the deprecated MessageReader Scala trait. EPISODE LINKSSee release notes for Apache Kafka 3.5Read the blog to learn moreDownload and get started with Apache Kafka 3.5Watch the video version of this podcast

Real-Time Analytics with Tim Berglund
A Gentle Introduction to Kafka Streams with Anna McDonald (Confluent) | Ep. 11

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Jun 12, 2023 25:59 Transcription Available


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Join Tim Berglund as he chats with Anna McDonald about the fascinating world of Kafka Streams. Anna, a customer success technical architect at Confluent, shares her insights on the core concepts of Kafka Streams, including the all-important table and stream abstractions. They delve into the benefits of statefulness and durability, such as active and standby tasks, which ensure seamless failover, and how Kafka Streams stores state in RocksDB and in Kafka itself. With a teaser for the next episode, this conversation promises an exciting exploration of data ingestion and time management in Kafka Streams. Don't miss out on this insightful discussion!Starting with Apache Kafka: https://developer.confluent.io/learn-kafka/apache-kafka/events/KIP-392 information: https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica 

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 3.4 - New Features & Improvements

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Feb 7, 2023 5:13 Transcription Available


Apache Kafka® 3.4 is released! In this special episode, Danica Fine (Senior Developer Advocate, Confluent), shares highlights of the Apache Kafka 3.4 release. This release introduces new KIPs in Kafka Core, Kafka Streams, and Kafka Connect.In Kafka Core:KIP-792 expands the metadata each group member passes to the group leader in its JoinGroup subscription to include the highest stable generation that consumer was a part of. KIP-830 includes a new configuration setting that allows you to disable the JMX reporter for environments where it's not being used. KIP-854 introduces changes to clean up producer IDs more efficiently, to avoid excess memory usage. It introduces a new timeout parameter that affects the expiry of producer IDs and updates the old parameter to only affect the expiry of transaction IDs.KIP-866 (early access) provides a bridge to migrate between existing Zookeeper clusters to new KRaft mode clusters, enabling the migration of existing metadata from Zookeeper to KRaft. KIP-876 adds a new property that defines the maximum amount of time that the server will wait to generate a snapshot; the default is 1 hour.KIP-881, an extension of KIP-392, makes it so that consumers can now be rack-aware when it comes to partition assignments and consumer rebalancing. In Kafka Streams:KIP-770 updates some Kafka Streams configs and metrics related to the record cache size.KIP-837 allows users to multicast result records to every partition of downstream sink topics and adds functionality for users to choose to drop result records without sending.And finally, for Kafka Connect:KIP-787 allows users to run MirrorMaker2 with custom implementations for the Kafka resource manager and makes it easier to integrate with your ecosystem.Tune in to learn more about the Apache Kafka 3.4 release!EPISODE LINKSSee release notes for Apache Kafka 3.4Read the blog to learn moreDownload Apache Kafka 3.4 and get startedWatch the video version of this podcastJoin the Community 

Engenharia de Dados [Cast]
Confluent Community Catalysts Brazukas: Dissecando o Apache Kafka [Round 1]

Engenharia de Dados [Cast]

Play Episode Listen Later Feb 2, 2023 77:12


Nesse episódio Luan Moreno & Mateus Oliveira entrevistam João Bosco, atualmente como Software & Solution Strategist no Nubank e Marcelo Costa, atualmente como Head of IT na Cia. Hering. Ambos os convidados e apresentadores são Confluent Community Catalysts.Confluent Community Catalysts são profissionais que investem seu tempo em divulgar, contribuir seja no código, ou respondendo ativamente nos forums e perguntas do Stack Overflow sobre Apache Kafka, sendo reconhecidos pela comunidade e pela Confluent pelo trabalho exercido.Nesta mesa redonda conversamos sobre os seguintes temas:Conceitos de Apache KafkaEvolução de Tecnologias de Mensageria para Plataforma de StreamingHistórias das Trincheiras sobre Apache Kafka e CuriosidadesDesafios para Implementação Inicial com Apache Kafka e AdoçãoAprenda com a experiência de profissionais que trabalharam diariamente com Apache Kafka usando as melhores práticas de mercado para construir uma plataforma robusta de streaming em tempo-real que é líder de mercado atualmente.Marcelo CostaJoão BoscoConfluent Catalyst Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Software Defined Talk
Episode 396: Aloha to your strategy

Software Defined Talk

Play Episode Listen Later Jan 13, 2023 80:21


This week we discuss digital transformation at Southwest and Delta Airlines, Shopify cancels all meetings, Salesforce's M&A strategy, and A.I. is everywhere. Plus, thoughts on bike lanes… Watch the YouTube Live Recording of Episode 396 (https://youtu.be/tmm8rH9fZEE) Runner-up Titles Work trying to get on my personal calendar Traveling with an infant =BLACKSWAN(A1:G453) Socks in a Costco Can't do the business case on savings until you loose it. Pay transparency for you, not me We don't pay for things on the Internet Semper Nimbus Privatus Rundown Dutch residents are the most physically active on earth, (https://twitter.com/BrentToderian/status/1611901297552396289) Digital Transformation Travel Edition Delta plans to offer free Wi-Fi starting Feb. 1 (https://www.cnbc.com/2023/01/05/delta-plans-to-offer-free-wi-fi-starting-feb-1.html) The Southwest Airlines Meltdown (https://www.nytimes.com/2023/01/10/podcasts/the-daily/the-southwest-airlines-meltdown.html) Southwest's Meltdown Could Cost It Up to $825 Million (https://www.nytimes.com/2023/01/06/business/southwest-airlines-meltdown-costs-reimbursement.html) Southwest pilots union writes scathing letter to airline executives after holiday travel fiasco (https://www.yahoo.com/now/southwest-pilots-union-writes-scathing-011720946.html) Southwest makes frequent flyer miles offer while lots of luggage remains in limbo (https://www.cnn.com/travel/article/southwest-airlines-frequent-flyer-miles-meltdown/index.html) Point of Sale: Scan and Pay (https://twitter.com/pitdesi/status/1602843962602975233?s=20&t=YdGNYzReSf4r1twJ1hRfbA) Work Life Shopify Tells Employees to Just Say No to Meetings (https://www.bloomberg.com/news/articles/2023-01-03/shopify-ceo-tobi-lutke-tells-employees-to-just-say-no-to-meetings) Netflix Revokes Some Staff's Access to Other People's Salary Information (https://apple.news/A--bGmZgJTQCgHQ-9QdWu4w) U.S. Moves to Bar Noncompete Agreements in Labor Contracts (https://www.nytimes.com/2023/01/05/business/economy/ftc-noncompete.html) Gartner HR expert: Quiet hiring will dominate U.S. workplaces in 2023 (https://www.cnbc.com/2023/01/04/gartner-hr-expert-quiet-hiring-will-dominate-us-workplaces-in-2023.html) Netflix revokes some staff's access to other people's salary information (https://www.marketwatch.com/story/netflix-revokes-some-staffs-access-to-other-peoples-salary-information-11673384493) SFDC Salesforce: There's no more Slack left to cut (https://www.theregister.com/2023/01/10/salesforce_comment/) Salesforce to Lay Off 10 Percent of Staff and Cut Office Space (https://www.nytimes.com/2023/01/04/business/salesforce-layoffs.html) After layoffs, Salesforce CEO still blasts worker productivi (https://www.sfgate.com/tech/article/salesforce-ceo-blasts-worker-productivity-17708474.php)ty (https://www.sfgate.com/tech/article/salesforce-ceo-blasts-worker-productivity-17708474.php) AI is everywhere Google execs warn company's reputation could suffer if it moves too fast on AI-chat technology (https://www.cnbc.com/2022/12/13/google-execs-warn-of-reputational-risk-with-chatgbt-like-tool.html) Microsoft and OpenAI Working on ChatGPT-Powered Bing in Challenge to Google (https://www.theinformation.com/articles/microsoft-and-openai-working-on-chatgpt-powered-bing-in-challenge-to-google?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axioslogin&stream=top) Microsoft eyes $10 billion bet on ChatGPT (https://www.semafor.com/article/01/09/2023/microsoft-eyes-10-billion-bet-on-chatgpt) Wolfram|Alpha as the Way to Bring Computational Knowledge Superpowers to ChatGPT (https://writings.stephenwolfram.com/2023/01/wolframalpha-as-the-way-to-bring-computational-knowledge-superpowers-to-chatgpt/) Relevant to your Interests 2023 Bum Steer of the Year: Austin (https://www.texasmonthly.com/news-politics/2023-bum-steer-of-year-austin/) Twitter's Rivals Try to Capitalize on Musk-Induced Chaos (https://www.nytimes.com/2022/12/07/technology/twitter-rivals-alternative-platforms.html) On Organizational Structures and the Developer Experience (https://redmonk.com/sogrady/2022/12/13/org-structure-devx/) KubeCon + CloudNativeCon North America 2022 Transparency Report | Cloud Native Computing Foundation (https://www.cncf.io/reports/kubecon-cloudnativecon-north-america-2022-transparency-report/) Inside the chaos at Washington's most connected military tech startup (https://www.vox.com/recode/23507236/inside-disruption-rebellion-defense-washington-connected-military-tech-startup) Elon Musk Starts Week As World's Second Richest Person (https://www.forbes.com/sites/mattdurot/2022/12/12/elon-musk-starts-week-as-worlds-second-richest-person/) 10 Tesla Investors Lose $132.5 Billion From Musk's Twitter Fiasco (https://www.investors.com/etfs-and-funds/sectors/tesla-stock-investors-lose-132-5-billion-from-musks-twitter-fiasco/) Rackspace's ransomware messaging dilemma (https://www.axios.com/newsletters/axios-login-83146574-380f-4e37-965d-7fd79bce7278.html?chunk=2&utm_term=emshare#story2) Heads-Up: Amazon S3 Security Changes Are Coming in April of 2023 (https://aws.amazon.com/blogs/aws/heads-up-amazon-s3-security-changes-are-coming-in-april-of-2023/) A MultiCloud Rant (https://www.lastweekinaws.com/blog/a_multicloud_rant/) Great visualization of the revenue breakdown of the 4 largest tech companies. (https://twitter.com/Carnage4Life/status/1603012861017862144?s=20&t=HC2UuMCHBB408xae6tZpbQ) AG Paxton's Google Suit Makes the Perfect the Enemy of the Good (https://truthonthemarket.com/2022/12/14/ag-paxtons-google-suit-makes-the-perfect-the-enemy-of-the-good/) AWS simplifies Simple Storage Service to prevent data leaks (https://www.theregister.com/2022/12/14/aws_simple_storage_service_simplified/) Creating the ultimate smart map with new map data initiative launched by Linux Foundation (https://venturebeat.com/virtual/creating-the-ultimate-smart-map-with-new-map-data-initiative-launched-by-linux-foundation/) Spotify's grand plan to monetize developers via its open source Backstage project (https://techcrunch.com/2022/12/15/spotifys-plan-to-monetize-its-open-source-backstage-developer-project/?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cubGlua2VkaW4uY29tLw&guce_referrer_sig=AQAAAAlyOmdhogtX6nuQkNHQ7mVSyci6aMv7X6QwRTvS9PHGJmjO_wjCqsJXXPKI36A9MkIclSIQoHQ_dz7wJ-WzfaYQT_clMcUijiC28ZQhEau4NOcU-70wy5m0Q9LLmtvWuQbWQQEccEbQH2Lvg4_GqfnQBYNPZWRcgpx7XMLas_2R) VMware offers subs for server consolidation vSphere cut (https://www.theregister.com/2022/12/15/vsphere_plus_standard/) Senior execs to leave VMware before acquisition by Broadcom (https://www.bizjournals.com/sanjose/news/2022/12/13/three-senior-execs-to-leave-vmware.html#:~:text=Mark%20Lohmeyer%2C%20who%20heads%20cloud,Raghuram%20announced%20in%20a%20memo) China Bans Exports of Loongson CPUs to Russia, Other Countries: Report (https://www.tomshardware.com/news/china-bans-exports-of-its-loongson-cpus-to-russia-other-countries) Dropbox buys form management platform FormSwift for $95M in cash (https://techcrunch.com/2022/12/16/dropbox-buys-form-management-platform-formswift-for-95m-in-cash/) Sweep, a no-code config tool for Salesforce software, raises $28M (https://techcrunch.com/2022/12/15/sweep-a-no-code-config-tool-for-salesforce-software-raises-28m/) Twitter Aided the Pentagon in its Covert Online Propaganda Campaign (https://theintercept.com/2022/12/20/twitter-dod-us-military-accounts/) Okta's source code stolen after GitHub repositories hacked (https://www.bleepingcomputer.com/news/security/oktas-source-code-stolen-after-github-repositories-hacked/) Workday appoints VMware veteran as co-CEO (https://www.theregister.com/2022/12/21/workday_co_ceo/) Top Paying Tools (https://softwaredefinedtalk.slack.com/archives/C04EK1VBK/p1671635825838769) Winging It: Inside Amazon's Quest to Seize the Skies (https://www.wired.com/story/amazon-air-quest-to-seize-the-skies/) CIS Benchmark Framework Scanning Tools Comparison (https://www.armosec.io/blog/cis-kubernetes-benchmark-framework-scanning-tools-comparison/) MSG defends using facial recognition to kick lawyer out of Rockettes show (https://arstechnica.com/tech-policy/2022/12/facial-recognition-flags-girl-scout-mom-as-security-risk-at-rockettes-show/) OpenAI releases Point-E, an AI that generates 3D models (https://techcrunch.com/2022/12/20/openai-releases-point-e-an-ai-that-generates-3d-models/) No, You Haven't Won a Yeti Cooler From Dick's Sporting Goods (https://www.wired.com/story/email-scam-dicks-sporting-goods-yeti-cooler/) The Lastpass hack was worse than the company first reported (https://www.engadget.com/the-lastpass-hack-was-worse-than-the-company-first-reported-000501559.html?utm_source=facebook&utm_medium=news_tab) IRS delays tax reporting change for 1099-K on Venmo, Paypal business payments (https://www.cnbc.com/2022/12/23/irs-delays-tax-reporting-change-for-1099-k-on-venmo-paypal-payments.html) Cyber attacks set to become ‘uninsurable', says Zurich chief (https://www.ft.com/content/63ea94fa-c6fc-449f-b2b8-ea29cc83637d) Google Employees Brace for a Cost-Cutting Drive as Anxiety Mounts (https://www.nytimes.com/2022/12/28/technology/google-job-cuts.html) IBM beat all its large-cap tech peers in 2022 as investors shunned growth for safety (https://www.cnbc.com/2022/12/27/ibm-stock-outperformed-technology-sector-in-2022.html) Europe Taps Tech's Power-Hungry Data Centers to Heat Homes (https://www.wsj.com/articles/europe-taps-techs-power-hungry-data-centers-to-heat-homes-11672309944?mod=djemalertNEWS) List of defunct social networking services (https://en.wikipedia.org/wiki/List_of_defunct_social_networking_services) 2023 Predictions | No Mercy / No Malice (https://www.profgalloway.com/2023-predictions/) Twitter rival Mastodon rejects funding to preserve nonprofit status (https://arstechnica.com/tech-policy/2022/12/twitter-rival-mastodon-rejects-funding-to-preserve-nonprofit-status/) TSMC Starts Next-Gen Mass Production as World Fights Over Chips (https://www.bloomberg.com/news/articles/2022-12-29/tsmc-mass-produces-next-gen-chips-to-safeguard-global-lead) Microsoft and FTC pre-trial hearing set for January 3rd (https://www.engadget.com/pre-trial-hearing-between-microsoft-and-ftc-set-for-january-3rd-203320387.html) The infrastructure behind ATMs (https://www.bitsaboutmoney.com/archive/the-infrastructure-behind-atms/) Apple is increasing battery replacement service charges for out-of-warranty devices (https://techcrunch.com/2023/01/03/apple-is-increasing-battery-replacement-service-charges-for-out-of-warranty-devices/) Snowflake's business and how the weakening economy is impacting cloud vendors (https://twitter.com/LiebermanAustin/status/1607376944873754626) Shift Happens: A book about keyboards (https://shifthappens.site/) Amazon to cut 18,000 jobs (https://www.axios.com/2023/01/05/amazon-layoffs-18000-jobs) CircleCI security alert: Rotate any secrets stored in CircleCI (https://circleci.com/blog/january-4-2023-security-alert/) Video game workers form Microsoft's first U.S. labor union (https://www.nbcnews.com/tech/tech-news/video-game-workers-form-microsofts-first-us-labor-union-rcna64103) World's Premier Investors Line Up to Partner with Netskope as the SASE Security and Networking Platform of Choice (https://www.prnewswire.com/news-releases/worlds-premier-investors-line-up-to-partner-with-netskope-as-the-sase-security-and-networking-platform-of-choice-301712417.html) omg.lol - A lovable web page and email address, just for you (https://home.omg.lol/) Alphabet led a $100 million funding of Chronosphere, a startup that helps companies monitor and cut cloud bills. (https://twitter.com/theinformation/status/1611165698868367360) Confluent expands Kafka Streams capabilities, acquires Apache Flink vendor (https://venturebeat.com/enterprise-analytics/confluent-acquires-apache-flink-vendor-immerok-to-expand-data-stream-processing/) Excel & Google Sheets AI Formula Generator - Excelformulabot.com (https://excelformulabot.com/) Has the Internet Reached Peak Clickability? (https://tedgioia.substack.com/p/has-the-internet-reached-peak-clickability) Adobe's CEO Sizes Up the State of Tech Now (https://www.wsj.com/articles/adobes-ceo-sizes-up-the-state-of-tech-now-11673151167?mod=djemalertNEWS) Researchers Hacked California's Digital License Plates, Gaining Access to GPS Location and User Info (https://jalopnik.com/researchers-hacked-californias-digital-license-plates-1849966295) Microsoft's New AI Can Simulate Anyone's Voice With 3 Seconds of Audio (https://slashdot.org/story/23/01/10/0749241/microsofts-new-ai-can-simulate-anyones-voice-with-3-seconds-of-audio?utm_source=slashdot&utm_medium=twitter) Observability platform Chronosphere raises another $115M at a $1.6B valuation (https://techcrunch.com/2023/01/10/observability-platform-chronosphere-raises-another-115m-at-a-1-6b-valuation/) Why IBM is no longer interested in breaking patent records–and how it plans to measure innovation in the age of open source and quantum computing (https://fortune.com/2023/01/06/ibm-patent-record-how-to-measure-innovation-open-source-quantum-computing-tech/) New research aims to analyze how widespread COBOL is (https://www.theregister.com/2022/12/14/cobol_research/) Companies are still waiting for their cloud ROI (https://www.infoworld.com/article/3675374/companies-are-still-waiting-for-their-cloud-roi.html) What TNS Readers Want in 2023: More DevOps, API Coverage (https://thenewstack.io/what-tns-readers-want-in-2023-more-devops-api-coverage/) Tech Debt Yo-Yo Cycle. (https://twitter.com/wardleymaps/status/1605860426671177728) How a single developer dropped AWS costs by 90%, then disappeared (https://scribe.rip/@maximetopolov/how-a-single-developer-dropped-aws-costs-by-90-then-disappeared-2b46a115103a) A look at the 2022 velocity of CNCF, Linux Foundation, and top 30 open source projects (https://www.cncf.io/blog/2023/01/11/a-look-at-the-2022-velocity-of-cncf-linux-foundation-and-top-30-open-source-projects/) The golden age of the streaming wars has ended (https://www.theverge.com/2022/12/14/23507793/streaming-wars-hbo-max-netflix-ads-residuals-warrior-nun) YouTube exec says NFL Sunday Ticket will have multiscreen functionality (https://awfulannouncing.com/youtube/nfl-sunday-ticket-multiscreen-mosaic-mode.html) (https://twitter.com/theinformation/status/1611165698868367360)## Nonsense The $11,500 toilet with Alexa inside can now be put inside your home (https://www.theverge.com/2022/12/19/23510864/kohler-numi-smart-toilet-alexa-ces-2022) Starbucks updating its loyalty program starting in February (https://www.axios.com/2022/12/28/starbucks-rewards-program-changes-coming) The revenue model of a popular YouTube channel about Lego. (https://paper.dropbox.com/doc/SDT-396--BwhY9F5kpz_BI2kkdw63ZpJ~Ag-MVMKwqqBEH5SzYKqYO2Jc) Conferences THAT Conference Texas Speakers and Schedule (https://that.us/events/tx/2023/schedule/), Round Rock, TX Jan 15th-18th Use code SDT for 5% off SpringOne (https://springone.io/), Jan 24–26. Coté speaking at cfgmgmtcamp (https://cfgmgmtcamp.eu/ghent2023/), Feb 6th to 8th, Ghent. State of Open Con 2023, (https://stateofopencon.com/sponsors/) London, UK, February 7th-8th 2023 CloudNativeSecurityCon North America (https://events.linuxfoundation.org/cloudnativesecuritycon-north-america/), Seattle, Feb 1 – 2, 2023 Southern California Linux Expo, (https://www.socallinuxexpo.org/scale/20x) Los Angeles, March 9-12, 2023 DevOpsDays Birmingham, AL 2023 (https://devopsdays.org/events/2023-birmingham-al/welcome/), April 20 - 21, 2023 SDT news & hype Join us in Slack (http://www.softwaredefinedtalk.com/slack). Get a SDT Sticker! Send your postal address to stickers@softwaredefinedtalk.com (mailto:stickers@softwaredefinedtalk.com) and we will send you free laptop stickers! Follow us on Twitch (https://www.twitch.tv/sdtpodcast), Twitter (https://twitter.com/softwaredeftalk), Instagram (https://www.instagram.com/softwaredefinedtalk/), LinkedIn (https://www.linkedin.com/company/software-defined-talk/) and YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured). Use the code SDT to get $20 off Coté's book, Digital WTF (https://leanpub.com/digitalwtf/c/sdt), so $5 total. Become a sponsor of Software Defined Talk (https://www.softwaredefinedtalk.com/ads)! Recommendations Brandon: Industrial Garage Shelves (https://www.homedepot.com/p/Husky-5-Tier-Industrial-Duty-Steel-Freestanding-Garage-Storage-Shelving-Unit-in-Black-90-in-W-x-90-in-H-x-24-in-D-N2W902490W5B/319132842) Matt: Oxide and Friends: Breaking it down with Ian Brown (https://oxide.computer/podcasts/oxide-and-friends/1150480) Wu Tang Saga (https://www.imdb.com/title/tt9113406/) Season 3 coming next month! Coté: Mouth to Mouth (https://www.goodreads.com/en/book/show/58438631-mouth-to-mouth) by Antoine Wilson (https://www.goodreads.com/en/book/show/58438631-mouth-to-mouth). Photo Credits Header (https://unsplash.com/photos/euaDCtB_jyw) CoverArt (https://unsplash.com/photos/9xdho4stJQ8)

The GeekNarrator
Understanding ksqlDB with Matthias J. Sax

The GeekNarrator

Play Episode Listen Later Jan 13, 2023 61:55


Hey Everyone, In this episode I and Matthias talk about KsqlDb. We have covered the topic in great depth talking about its history, architecture, different concepts, use cases, limitations, comparison to Kafka Streams and so on. References: ksqlDB - https://ksqldb.io/ exactly once semantics podcast: https://youtu.be/twgbAL_EaQw Matthias Sax: https://twitter.com/MatthiasJSax and https://www.linkedin.com/in/mjsax/ Cheers, The GeekNarrator

The GeekNarrator
Kafka Streams Exactly Once Semantics With Matthias Sax

The GeekNarrator

Play Episode Listen Later Jan 3, 2023 85:57


Hey Everyone, In this episode I am joined by Matthias Sax, who works with Confluent to build the amazing world of Kafka. We have discussed in real depths of Kafka Streams and how Exactly once semantics is implemented. This episode will give you all the details you need to understand how beautifully Kafka imeplements EOS. I hope you like the episode. Cheers, The GeekNarrator

The GeekNarrator
Kafka, Realtime analytics and Apache Pinot with Tim Berglund Part-2

The GeekNarrator

Play Episode Listen Later Jan 3, 2023 39:38


Hey everyone, This is the part-2 of our episode with Tim Berglund. We have covered some advanced topics on Kafka, Kafka Streams and Apache Pinot. I hope you like the discussion. Cheers, The GeekNarrator

Streaming Audio: a Confluent podcast about Apache Kafka
Build a Real Time AI Data Platform with Apache Kafka

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 20, 2022 37:18 Transcription Available


Is it possible to build a real-time data platform without using stateful stream processing? Forecasty.ai is an artificial intelligence platform for forecasting commodity prices, imparting insights into the future valuations of raw materials for users. Nearly all AI models are batch-trained once, but precious commodities are linked to ever-fluctuating global financial markets, which require real-time insights. In this episode, Ralph Debusmann (CTO, Forecasty.ai) shares their journey of migrating from a batch machine learning platform to a real-time event streaming system with Apache Kafka® and delves into their approach to making the transition frictionless. Ralph explains that Forecasty.ai was initially built on top of batch processing, however, updating the models with batch-data syncs was costly and environmentally taxing. There was also the question of scalability—progressing from 60 commodities on offer to their eventual plan of over 200 commodities. Ralph observed that most real-time systems are non-batch, streaming-based real-time data platforms with stateful stream processing, using Kafka Streams, Apache Flink®, or even Apache Samza. However, stateful stream processing involves resources, such as teams of stream processing specialists to solve the task. With the existing team, Ralph decided to build a real-time data platform without using any sort of stateful stream processing. They strictly keep to the out-of-the-box components, such as Kafka topics, Kafka Producer API, Kafka Consumer API, and other Kafka connectors, along with a real-time database to process data streams and implement the necessary joins inside the database. Additionally, Ralph shares the tool he built to handle historical data, kash.py—a Kafka shell based on Python; discusses issues the platform needed to overcome for success, and how they can make the migration from batch processing to stream processing painless for the data science team. EPISODE LINKSKafka Streams 101 courseThe Difference Engine for Unlocking the Kafka Black BoxGitHub repo: kash.pyWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)   

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 3.3 - KRaft, Kafka Core, Streams, & Connect Updates

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 3, 2022 6:42 Transcription Available


Apache Kafka® 3.3 is released! With over two years of development, KIP-833 marks KRaft as production ready for new AK 3.3 clusters only. On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent) shares highlights of this release, with KIPs from Kafka Core, Kafka Streams, and Kafka Connect. To reduce request overhead and simplify client-side code, KIP-709 extends the OffsetFetch API requests to accept multiple consumer group IDs. This update has three changes, including extending the wire protocol, response handling changes, and enhancing the AdminClient to use the new protocol. Log recovery is an important process that is triggered whenever a broker starts up after an unclean shutdown. And since there is no way to know the log recovery progress other than checking if the broker log is busy, KIP-831 adds metrics for the log recovery progress with `RemainingLogsToRecover` and `RemainingSegmentsToRecover`for each recovery thread. These metrics allow the admin to monitor the progress of the log recovery.Additionally, updates on Kafka Core also include KIP-841: Fenced replicas should not be allowed to join the ISR in KRaft. KIP-835: Monitor KRaft Controller Quorum Health. KIP-859: Add metadata log processing error-related metrics. KIP-834 for Kafka Streams added the ability to pause and resume topologies. This feature lets you reduce rescue usage when processing is not required or modifying the logic of Kafka Streams applications, or when responding to operational issues. While KIP-820 extends the KStream process with a new processor API. Previously, KIP-98 added support for exactly-once delivery guarantees with Kafka and its Java clients. In the AK 3.3 release, KIP-618 offers the Exactly-Once Semantics support to Confluent's source connectors. To accomplish this, a number of new connectors and worker-based configurations have been introduced, including `exactly.once.source.support`, `transaction.boundary`, and more. Image attribution: Apache ZooKeeper™: https://zookeeper.apache.org/ and Raft logo:  https://raft.github.io/  EPISODE LINKSSee release notes for Apache Kafka 3.3.0 and Apache Kafka 3.3.1 for the full list of changesRead the blog to learn moreDownload Apache Kafka 3.3 and get startedWatch the video version of this podcast

Streaming Audio: a Confluent podcast about Apache Kafka
Capacity Planning Your Apache Kafka Cluster

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 30, 2022 61:54 Transcription Available


How do you plan Apache Kafka® capacity and Kafka Streams sizing for optimal performance? When Jason Bell (Principal Engineer, Dataworks and founder of Synthetica Data), begins to plan a Kafka cluster, he starts with a deep inspection of the customer's data itself—determining its volume as well as its contents: Is it JSON, straight pieces of text, or images? He then determines if Kafka is a good fit for the project overall, a decision he bases on volume, the desired architecture, as well as potential cost.Next, the cluster is conceived in terms of some rule-of-thumb numbers. For example, Jason's minimum number of brokers for a cluster is three or four. This means he has a leader, a follower and at least one backup.  A ZooKeeper quorum is also a set of three. For other elements, he works with pairs, an active and a standby—this applies to Kafka Connect and Schema Registry. Finally, there's Prometheus monitoring and Grafana alerting to add. Jason points out that these numbers are different for multi-data-center architectures.Jason never assumes that everyone knows how Kafka works, because some software teams include specialists working on a producer or a consumer, who don't work directly with Kafka itself. They may not know how to adequately measure their Kafka volume themselves, so he often begins the collaborative process of graphing message volumes. He considers, for example, how many messages there are daily, and whether there is a peak time. Each industry is different, with some focusing on daily batch data (banking), and others fielding incredible amounts of continuous data (IoT data streaming from cars).  Extensive testing is necessary to ensure that the data patterns are adequately accommodated. Jason sets up a short-lived system that is identical to the main system. He finds that teams usually have not adequately tested across domain boundaries or the network. Developers tend to think in terms of numbers of messages, but not in terms of overall network traffic, or in how many consumers they'll actually need, for example. Latency must also be considered, for example if the compression on the producer's side doesn't match compression on the consumer's side, it will increase.Kafka Connect sink connectors require special consideration when Jason is establishing a cluster. Failure strategies need to well thought out, including retries and how to deal with the potentially large number of messages that can accumulate in a dead letter queue. He suggests that more attention should generally be paid to the Kafka Connect elements of a cluster, something that can actually be addressed with bash scripts.Finally, Kris and Jason cover his preference for Kafka Streams over ksqlDB from a network perspective. EPISODE LINKSCapacity Planning and Sizing for Kafka StreamsTales from the Frontline of Apache Kafka DevOpsWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more on Confluent DeveloperUse PODCAST100 to get $100 of free Cloud usage (details)  

Streaming Audio: a Confluent podcast about Apache Kafka
Blockchain Data Integration with Apache Kafka

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jul 7, 2022 50:59 Transcription Available


How is Apache Kafka® relevant to blockchain technology and cryptocurrency? Fotios Filacouris (Staff Solutions Engineer, Confluent) has been working with Kafka for close to five years, primarily designing architectural solutions for financial services, he also has expertise in the blockchain. In this episode, he joins Kris to discuss how blockchain and Kafka are complementary, and he also highlights some of the use cases he has seen emerging that use Kafka in conjunction with traditional, distributed ledger technology (DLT) as well as blockchain technologies. According to Fotios, Kafka and the notion of blockchain share many traits, such as immutability, replication, distribution, and the decoupling of applications. This complementary relationship means that they can function well together if you are looking to extend the functionality of a given DLT through sidechain or off-chain activities, such as analytics, integrations with traditional enterprise systems, or even the integration of certain chains and ledgers. Based on Fotios' observations, Kafka has become an essential piece of the puzzle in many blockchain-related use cases, including settlement, logging, analytics and risk, and volatility calculations. For example, a bitcoin trading application may use Kafka Streams to provide analytics on top of the price action of various crypto assets. Fotios has also seen use cases where a crypto platform leverages Kafka as its infrastructure layer for real-time logging and analytics. EPISODE LINKSModernizing Banking Architectures with Apache KafkaNew Kids On the BloqWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)   

Streaming Audio: a Confluent podcast about Apache Kafka
How I Became a Developer Advocate

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jun 9, 2022 29:48 Transcription Available


What is a developer advocate and how do you become one? In this episode, we have seasoned developer advocates, Kris Jenkins (Senior Developer Advocate, Confluent) and Danica Fine (Senior Developer Advocate, Confluent) answer the question by diving into how they got into the world of developer relations, what they enjoyed the most about their roles, and how you can become one.Developer advocacy is at the heart of a developer community—helping developers and software engineers to get the most out of a given technology by providing support in form of blog posts, podcasts, conference talks, video tutorials, meetups, and other mediums.   Before stepping into the world of developer relations, both Danica and Kris were hands-on developers. While dedicating professional time, Kris also devoted personal time to supporting fellow developers, such as running local meetups, writing blogs, and organizing hackathons.While Danica found her calling after learning more about Apache Kafka® and successfully implemented a mission-critical application for a financial services company—transforming 2,000 lines of codes into Kafka Streams. She enjoys building and sharing her knowledge with the community to make technology as accessible and as fun as possible.Additionally, the duo previews their developer advocacy trip to Singapore and Australia in mid-June, where they will attend local conferences and host in-person meetups on Kafka and event streaming. EPISODE LINKSIn-person meetup: Singapore | Sydney | MelbourneCoding in Motion: Building a Data Streaming App with JavaScript Practical Data Pipeline: Build a Plant Monitoring System with ksqlDBHow to Build a Strong Developer Community ft. Robin Moffatt and Ale MurrayDesigning Event-Driven SystemsWatch the video version of this podcastDanica Fine's TwitterKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)   

Streaming Audio: a Confluent podcast about Apache Kafka
Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later May 26, 2022 55:55 Transcription Available


Stream processing can be hard or easy depending on the approach you take, and the tools you choose. This sentiment is at the heart of the discussion with Matthias J. Sax (Apache Kafka® PMC member; Software Engineer, ksqlDB and Kafka Streams, Confluent) and Jeff Bean (Sr. Technical Marketing Manager, Confluent). With immense collective experience in Kafka, ksqlDB, Kafka Streams, and Apache Flink®, they delve into the types of stream processing operations and explain the different ways of solving for their respective issues.The best stream processing tools they consider are Flink along with the options from the Kafka ecosystem: Java-based Kafka Streams and its SQL-wrapped variant—ksqlDB. Flink and ksqlDB tend to be used by divergent types of teams, since they differ in terms of both design and philosophy.Why Use Apache Flink?The teams using Flink are often highly specialized, with deep expertise, and with an absolute focus on stream processing. They tend to be responsible for unusually large, industry-outlying amounts of both state and scale, and they usually require complex aggregations. Flink can excel in these use cases, which potentially makes the difficulty of its learning curve and implementation worthwhile.Why use ksqlDB/Kafka Streams?Conversely, teams employing ksqlDB/Kafka Streams require less expertise to get started and also less expertise and time to manage their solutions. Jeff notes that the skills of a developer may not even be needed in some cases—those of a data analyst may suffice. ksqlDB and Kafka Streams seamlessly integrate with Kafka itself, as well as with external systems through the use of Kafka Connect. In addition to being easy to adopt, ksqlDB is also deployed on production stream processing applications requiring large scale and state.There are also other considerations beyond the strictly architectural. Local support availability, the administrative overhead of using a library versus a separate framework, and the availability of stream processing as a fully managed service all matter. Choosing a stream processing tool is a fraught decision partially because switching between them isn't trivial: the frameworks are different, the APIs are different, and the interfaces are different. In addition to the high-level discussion, Jeff and Matthias also share lots of details you can use to understand the options, covering employment models, transactions, batching, and parallelism, as well as a few interesting tangential topics along the way such as the tyranny of state and the Turing completeness of SQL.EPISODE LINKSThe Future of SQL: Databases Meet Stream ProcessingBuilding Real-Time Event Streams in the Cloud, On PremisesKafka Streams 101 courseksqlDB 101 courseWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more on Confluent DeveloperUse PODCAST100 for additional $100 of  Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 3.2 - New Features & Improvements

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later May 17, 2022 6:54 Transcription Available


Apache Kafka® 3.2 delivers new  KIPs in three different areas of the Kafka ecosystem: Kafka Core, Kafka Streams, and Kafka Connect. On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent), shares release highlights.More than half of the KIPs in the new release concern Kafka Core. KIP-704 addresses unclean leader elections by allowing for further communication between the controller and the brokers. KIP-764 takes on the problem of a large number of client connections in a short period of time during preferred leader election by adding the configuration `socket.listen.backlog.size`. KIP-784 adds an error code field to the response of the `DescribeLogDirs` API, and KIP-788 improves network traffic by allowing you to set the pool size of network threads individually per listener on Kafka brokers. Finally, in accordance with the imminent KRaft protocol, KIP-801 introduces a built-in `StandardAuthorizer` that doesn't depend on ZooKeeper. There are five KIPs related to Kafka Streams in the AK 3.2 release. KIP-708 brings rack-aware standby assignment by tag, which improves fault tolerance. Then there are three projects related to Interactive Queries v2: KIP-796 specifies an improved interface for Interactive Queries; KIP-805 allows state to be queried over a specific range; and KIP-806 adds two implementations of the Query interface, `WindowKeyQuery` and `WindowRangeQuery`.The final Kafka Streams project, KIP-791, enhances `StateStoreContext` with `recordMetadata`,which may be accessed from state stores.Additionally, this Kafka release introduces Kafka Connect-related improvements, including KIP-769, which extends the `/connect-plugins` API, letting you list all available plugins, and not just connectors as before.  KIP-779 lets `SourceTasks` handle producer exceptions according to `error.tolerance`, rather than instantly killing the entire connector by default. Finally, KIP-808 lets you specify precisions with respect to TimestampConverter single message transforms. Tune in to learn more about the Apache Kafka 3.2 release!EPISODE LINKSApache Kafka 3.2 release notes Read the blog to learn moreDownload Apache Kafka 3.2.0Watch the video version of this podcast

Quarkus Insights
Quarkus Insights #84: Quarkus Testing

Quarkus Insights

Play Episode Listen Later Mar 23, 2022 60:13


Quarkus has full support for Kafka Streams with abilty to run in vm mode, native mode and dev mode.

Data on Kubernetes Community
Dok Talks #119 - Cloud-Native Data Pipelines // Hakan Lofcali

Data on Kubernetes Community

Play Episode Listen Later Mar 4, 2022 53:25


https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK This talk walks you through our stack, architecture, and processes. We develop tools to deploy and run data-driven applications in a cloud-native environment. We will give a whirlwind tour on developing a Java Quarkus application, a CICD stack powered by GitHub Actions / ArgoCD, building and deploying containerized Kafka Streams applications at runtime with Jib container builder. Having introduced the above common understanding, we will give a high-level overview of how we utilize modern Kubernetes and Cloud tooling to manage multiple clusters in different organizations together with our customers. BIO DataCater commoditizes data pipeline development lifecycle by applying software engineering and cloud native practices to data work. Hakan is a Software / Data Engineer and CTO of DataCater. He worked and built his knowledge around Software, Data Engineering, and Cloud-Native Computing in severely different environments. From early start-up to hyper-scaler AWS. From sports media companies to highly regulated FSI enterprises. The experiences gained, problems encountered, and solutions found led to him co-founding DataCater to enhance tooling in the Data space.

Streaming Audio: a Confluent podcast about Apache Kafka
Serverless Stream Processing with Apache Kafka ft. Bill Bejeck

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Mar 3, 2022 42:23 Transcription Available


What is serverless?Having worked as a software engineer for over 15 years and as a regular contributor to Kafka Streams, Bill Bejeck (Integration Architect, Confluent) is an Apache Kafka® committer and author of “Kafka Streams in Action.” In today's episode, he explains what serverless and the architectural concepts behind it are. To clarify, serverless doesn't mean you can run an application without a server—there are still servers in the architecture, but they are abstracted away from your application development. In other words, you can focus on building and running applications and services without any concerns over infrastructure management. Using a cloud provider such as Amazon Web Services (AWS) enables you to allocate machine resources on demand while handling provisioning, maintenance, and scaling of the server infrastructure. There are a few important terms to know when implementing serverless functions with event stream processors: Functions as a service (FaaS)Stateless stream processingStateful stream processingServerless commonly falls into the FaaS cloud computing service category—for example, AWS Lambda is the classic definition of a FaaS offering. You have a greater degree of control to run a discrete chunk of code in response to certain events, and it lets you write code to solve a specific issue or use case. Stateless processing is simpler in comparison to stateful processing, which is more complex as it involves keeping the state of an event stream and needs a key-value store. ksqlDB allows you to perform both stateless and stateful processing, but its strength lies in stateful processing to answer complex questions while AWS Lambda is better suited for stateless processing tasks. By integrating ksqlDB with AWS Lambda together, they deliver serverless event streaming and analytics at scale.EPISODE LINKSServerless Stream Processing with Apache Kafka, AWS Lambda, and ksqlDBStateful Serverless Architectures with ksqlDB and AWS Lambda Serverless GitHub repositoryKafka Streams in ActionWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Engenharia de Dados [Cast]
Casos de Uso e Experiência de Campo com Apache Kafka

Engenharia de Dados [Cast]

Play Episode Listen Later Feb 24, 2022 63:34


Trazemos nesse episódio o especialista João Bosco Seixas, Community Catalyst para falar sobre Apache Kafka, nesse bate-papo falamos das vertentes de Desenvolvimento e Engenharia de Dados e como cada área pode utilizar o Apache Kafka de forma mais efetiva.* Apache Kafka para Desenvolvimento de Software* Engenharia de Dados com Apache Kafka e Analytics em Tempo-Real* Curva de Aprendizagem da Tecnologia* Casos de Uso* Experiências de Campo* Dicas para IniciantesA intenção principal é mostrar para um Engenheiro de Dados como o Apache Kafka é usado não somente para Analytics mais sim por toda a empresa principalmente na construção de microsserviços. Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Engenharia de Dados [Cast]
Apache Kafka é um Banco de Dados Relacional?

Engenharia de Dados [Cast]

Play Episode Listen Later Feb 14, 2022 53:57


O Apache Kafka é uma plataforma de streaming de dados, capaz de ingerir e processar milhões de eventos por segundo entretanto, alguns pontos são importantes e normalmente não temos muitas explicações sobre os mesmos, como:O Apache Kafka é um Banco de DadosTransações no Apache KafkaArmazenamento e Processamento DesacopladoComparação de Banco de Dados vs. Apache KafkaEsse episódio irá de uma vez por todas desmistificar o Apache Kafka e tirar todas as suas dúvidas referentes a seus pontos fortes e fracos e como você pode extrair o melhor dessa tecnologia open-source da Apache Software Foundation. Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 3.1 - Overview of Latest Features, Updates, and KIPs

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jan 24, 2022 4:43 Transcription Available


Apache Kafka® 3.1 is here with exciting new features and improvements! On behalf of the Kafka community, Danica Fine (Senior Developer Advocate, Confluent) shares release highlights that you won't want to miss, including foreign-key joins in Kafka Streams and improvements that will provide consistency for Kafka latency metrics. KAFKA-13439 deprecates the eager protocol, which has been the default since Kafka 2.4—it's advised to upgrade your applications to the cooperative protocol as the eager protocol will no longer be supported in future releases. Previously, foreign-key joins in Kafka Streams only worked if both primary and foreign-key tables were joined. This release adds support for foreign-key joins on tables with custom partitioners, which will be passed in as part of a new `TableJoined` object, comparable to the existing `Joined` and `StreamJoined` objects. With the goal of making Kafka more intuitive, KIP-773 enhances naming consistency for three new client metrics with millis and nanos. For example, `io-waittime-total` is reintroduced as `io-wait-time-ns-total`. The previously introduced metrics without ns will be deprecated but available for backward compatibility. KIP-768 continues the work started in KIP-255 to implement the necessary interfaces for a production-grade way to connect to an OpenID identity provider for authentication and token retrieval. This update provides an out-of-the-box implementation of an `AuthenticateCallbackHandler` that can be used to communicate with OAuth/OIDC. Additionally, this Kafka release introduces two new metrics for active brokers specifically, `ActiveBrokerCount` and `FenceBrokerCount`. These two metrics expose the number of active brokers in the cluster known by the controller and the number of fenced brokers known by the controller. Tune in to learn more about the Apache Kafka 3.1 release! EPISODE LINKSApache Kafka 3.1 release notes Read the blog to learn moreDownload Apache Kafka 3.1Watch the video version of this podcast

Streaming Audio: a Confluent podcast about Apache Kafka
Modernizing Banking Architectures with Apache Kafka ft. Fotios Filacouris

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Dec 28, 2021 34:59 Transcription Available


It's been said that financial services organizations have been early Apache Kafka® adopters due to the strong delivery guarantees and scalability that Kafka provides. With experience working and designing architectural solutions for financial services, Fotios Filacouris (Senior Solutions Engineer, Enterprise Solutions Engineering, Confluent) joins Tim to discuss how Kafka and Confluent help banks build modern architectures, highlighting key emerging use cases from the sector. Previously, Kafka was often viewed as a simple pipe that connected databases together, which allows for easy and scalable data migration. As the Kafka ecosystem evolves with added components like ksqlDB, Kafka Streams, and Kafka Connect, the implementation of Kafka goes beyond being just a pipe—it's an intelligent pipe that enables real-time, actionable data insights.Fotios shares a couple of use cases showcasing how Kafka solves the problems that many banks are facing today. One of his customers transformed retail banking by using Kafka as the architectural base for storing all data permanently and indefinitely. This approach enables data in motion and a better user experience for frontend users while scrolling through their transaction history by eliminating the need to download old statements that have been offloaded in the cloud or a data lake. Kafka also provides the best of both worlds with increased scalability and strong message delivery guarantees that are comparable to queuing middleware like IBM MQ and TIBCO. In addition to use cases, Tim and Fotios talk about deploying Kafka for banks within the cloud and drill into the profession of being a solutions engineer. EPISODE LINKSWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Running Hundreds of Stream Processing Applications with Apache Kafka at Wise

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Dec 21, 2021 31:08 Transcription Available


What's it like building a stream processing platform with around 300 stateful stream processing applications based on Kafka Streams? Levani Kokhreidze (Principal Engineer, Wise) shares his experience building such a platform that the business depends on for multi-currency movements across the globe. He explains how his team uses Kafka Streams for real-time money transfers at Wise, a fintech organization that facilitates international currency transfers for 11 million customers. Getting to this point and expanding the stream processing platform is not, however, without its challenges. One of the major challenges at Wise is to aggregate, join, and process real-time event streams to transfer currency instantly. To accomplish this, the Wise relies on Apache Kafka® as an event broker, as well as Kafka Streams, the accompanying Java stream processing library. Kafka Streams lets you build event-driven microservices for processing streams, which can then be deployed alongside the Kafka cluster of your choice. Wise also uses the Interactive Queries feature in Kafka streams, to query internal application state at runtime. The Wise stream processing platform has gradually moved them away from a monolithic architecture to an event-driven microservices model with around 400 total microservices working together. This has given Wise the ability to independently shape and scale each service to better serve evolving business needs. Their stream processing platform includes a domain-specific language (DSL) that provides libraries and tooling, such as Docker images for building your own stream processing applications with governance. With this approach, Wise is able to store 50 TB of stateful data based on Kafka Streams running in Kubernetes. Levani shares his own experiences in this journey with you and provides you with guidance that may help you follow in Wise's footsteps. He covers how to properly delegate ownership and responsibilities for sourcing events from existing data stores, and outlines some of the pitfalls they encountered along the way. To cap it all off, Levani also shares some important lessons in organization and technology, with some best practices to keep in mind. EPISODE LINKSKafka Streams 101 courseReal-Time Stream Processing with Kafka Streams ft. Bill BejeckWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
ksqlDB Fundamentals: How Apache Kafka, SQL, and ksqlDB Work Together ft. Simon Aubury

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Dec 1, 2021 30:42 Transcription Available


What is ksqlDB and how does Simon Aubury (Principal Data Engineer, Thoughtworks) use it to track down the plane that wakes his cat Snowy in the morning? Experienced in building real-time applications with ksqlDB since its genesis, Simon provides an introduction to ksqlDB by sharing some of his projects and use cases. ksqlDB is a database purpose-built for stream processing applications and lets you build real-time data streaming applications with SQL syntax. ksqlDB reduces the complexity of having to code with Java, making it easier to achieve outcomes through declarative programming, as opposed to procedural programming. Before ksqlDB, you could use the producer and consumer APIs to get data in and out of Apache Kafka®; however, when it comes to data enrichment, such as joining, filtering, mapping, and aggregating data, you would have to use the Kafka Streams API—a robust and scalable programming interface influenced by the JVM ecosystem that requires Java programming knowledge. This presented scaling challenges for Simon, who was at a multinational insurance company that needed to stream loads of data from disparate systems with a small team to scale and enrich data for meaningful insights. Simon recalls discovering ksqlDB during a practice fire drill, and he considers it as a memorable moment for turning a challenge into an opportunity.Leveraging your familiarity with relational databases, ksqlDB abstracts away complex programming that is required for real-time operations both for stream processing and data integration, making it easy to read, write, and process streaming data in real time.Simon is passionate about ksqlDB and Kafka Streams as well as getting other people inspired by the technology. He's been using ksqlDB for projects, such as taking a stream of information and enriching it with static data. One of Simon's first ksqlDB projects was using Raspberry Pi and a software-defined radio to process aircraft movements in real time to determine which plane wakes his cat Snowy up every morning. Simon highlights additional ksqlDB use cases, including e-commerce checkout interaction to identify where people are dropping out of a sales funnel. EPISODE LINKSksqlDB 101 courseA Guide to ksqlDB Fundamentals and Stream Processing ConceptsksqlDB 101 Training with Live Walkthrough ExerciseKSQL-ops! Running ksqlDB in the WildArticles from Simon AuburyWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Explaining Stream Processing and Apache Kafka ft. Eugene Meidinger

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Nov 23, 2021 29:28 Transcription Available


Many of us find ourselves in the position of equipping others to use Apache Kafka® after we've gained an understanding of what Kafka is used for. But how do you communicate and teach others event streaming concepts effectively? As a Pluralsight instructor and business intelligence consultant, Eugene Meidinger shares tips for creating consumable training materials for conveying event streaming concepts to developers and IT administrators, who are trying to get on board with Kafka and stream processing. Eugene's background as a database administrator (DBA) and immense knowledge of event streaming architecture and data processing shows as he reveals his learnings from years of working with Microsoft Power BI, Azure Event Hubs, data processing, and event streaming with ksqlDB and Kafka Streams. Eugene mentions the importance of understanding your audience, their pain points, and their questions, such as why was Kafka invented? Why does ksqlDB matter? It also helps to use metaphors where appropriate. For example, when explaining what is processing typology for Kafka Streams, Eugene uses the analogy of a highway where people are getting on a bus as the blocking operations, after the grace period, the bus will leave even without passengers, meaning after the window session, the processor will continue even without events. He also likes to inject a sense of humor in his training and keeps empathy in mind. Here is the structure that Eugene uses when building courses:The first module is usually fundamentals, which lays out the groundwork and the objectives of the courseIt's critical to repeat and summarize core concepts or major points; for example, a key capability of Kafka is the ability to decouple data in both network space and in time Provide variety and different modalities that allow people to consume content through multiple avenues, such as screencasts, slides, and demos, wherever it makes senseEPISODE LINKSBuilding ETL Pipelines from Streaming Data with Kafka and ksqlDBDon't Make Me Think | Steve KrugDesign for How People Learn | Julie Dirksen Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Confluent Platform 7.0: New Features + Updates

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Nov 9, 2021 12:16 Transcription Available


Confluent Platform 7.0 has launched and includes Apache Kafka® 3.0, plus new features introduced by KIP-630: Kafka Raft Snapshot, KIP-745: Connect API to restart connector and task, and KIP-695: Further improve Kafka Streams timestamp synchronization. Reporting from Dubai, Tim Berglund (Senior Director, Developer Advocacy, Confluent) provides a summary of new features, updates, and improvements to the 7.0 release, including the ability to create a real-time bridge from on-premises environments to the cloud with Cluster Linking. Cluster Linking allows you to create a single cluster link between multiple environments from Confluent Platform to Confluent Cloud, which is available on public clouds like AWS, Google Cloud, and Microsoft Azure, removing the need for numerous point-to-point connections. Consumers reading from a topic in one environment can read from the same topic in a different environment without risks of reprocessing or missing critical messages. This provides operators the flexibility to make changes to topic replication smoothly and byte for byte without data loss. Additionally, Cluster Linking eliminates any need to deploy MirrorMaker2 for replication management while ensuring offsets are preserved. Furthermore, the release of Confluent for Kubernetes 2.2 allows you to build your own private cloud in Kafka. It completes the declarative API by adding cloud-native management of connectors, schemas, and cluster links to reduce the operational burden and manual processes so that you can instead focus on high-level declarations. Confluent for Kubernetes 2.2 also enhances elastic scaling through the Shrink API.  Following ZooKeeper's removal in Apache Kafka 3.0, Confluent Platform 7.0 introduces KRaft in preview to make it easier to monitor and scale Kafka clusters to millions of partitions. There are also several ksqlDB enhancements in this release, including foreign-key table joins and the support of new data types—DATE and TIME— to account for time values that aren't TIMESTAMP. This results in consistent data ingestion from the source without having to convert data types.EPISODE LINKSDownload Confluent Platform 7.0Check out the release notesRead the Confluent Platform 7.0 blog postWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Real-Time Stream Processing with Kafka Streams ft. Bill Bejeck

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Nov 4, 2021 35:32 Transcription Available


Kafka Streams is a native streaming library for Apache Kafka® that consumes messages from Kafka to perform operations like filtering a topic's message and producing output back into Kafka. After working as a developer in stream processing, Bill Bejeck (Apache Kafka Committer and Integration Architect, Confluent) has found his calling in sharing knowledge and authoring his book, “Kafka Streams in Action.” As a Kafka Streams expert, Bill is also the author of the Kafka Streams 101 course on Confluent Developer, where he delves into what Kafka Streams is, how to use it, and how it works. Kafka Streams provides the abstraction over Kafka consumers and producers by minimizing administrative details like the need to code and manage frameworks required when using plain Kafka consumers and producers to process streams. Kafka Streams is declarative—you can state what you want to do, rather than how to do it. Kafka Streams leverages the KafkaConsumer protocol internally; it inherits its dynamic scaling properties and the consumer group protocol to dynamically redistribute the workload. When Kafka Streams applications are deployed separately but have the same application.id, they are logically still one application. Kafka Streams has two processing APIs, the declarative API or domain-specific language (DSL)  is a high-level language that enables you to build anything needed with a processor topology, whereas the Processor API lets you specify a processor typology node by node, providing the ultimate flexibility. To underline the differences between the two APIs, Bill says it's almost like using the object-relational mapping framework (ORM) versus SQL. The Kafka Streams 101 course is designed to get you started with Kafka Streams and to help you learn the fundamentals of: How streams and tables work How stateless and stateful operations work How to handle time windows and out of order dataHow to deploy Kafka StreamsEPISODE LINKSKafka Streams 101 courseA Guide to Kafka Streams and Its UsesKafka Streams 101 meetupWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse podcon19 to get 40% off "Kafka Streams in Action"Use podcon19 to get 40% off "Event Streaming with Kafka Streams and ksqlDB"Use PODCAST100 to get $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Getting Started with Spring for Apache Kafka ft. Viktor Gamov

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 19, 2021 32:44 Transcription Available


What's the distinction between the Spring Framework and Spring Boot? If you are building a car, the Spring Framework is the engine while Spring Boot gives you the vehicle that you ride in. With experience teaching and answering questions on how to use Spring and Apache Kafka® together, Viktor Gamov (Principal Developer Advocate, Kong) designed a free course on Confluent Developer and previews it in this episode. Not only this, but he also explains why the opinionated Spring Framework would be a good hero in Marvel. Spring is an ever-evolving framework that embraces modern, cloud-native technologies with cross-language options, such as Kotlin integration. Unlike its predecessors, the Spring Framework supports a modern version of Java and the requirements of the Twelve-Factor App manifesto for you to move an application between environments without changing the code. With that engine in place, Spring Boot introduces a microservices architecture. Spring Boot contains databases and messaging systems integrations, reducing development time and increasing overall productivity. Spring for Apache Kafka applies best practices of the Spring community to the Kafka ecosystem, including features that abstract away infrastructure code for you to focus on programming logic that is important for your application. Spring for Apache Kafka provides a wrapper around the producer and consumer to ease Kafka configuration with APIs, including KafkaTemplate, MessageListenerContainer, @KafkaListener, and TopicBuilder.The Spring Framework and Apache Kafka course will equip you with the knowledge you need in order to build event-driven microservices using Spring and Kafka on Confluent Cloud. Tim and Viktor also discuss Spring Cloud Stream as well as Spring Boot integration with Kafka Streams and more. EPISODE LINKSSpring Framework and Apache Kafka courseSpring for Apache Kafka 101Bootiful Stream Processing with Spring and KafkaLiveStreams with Viktor GamovUse kafkaa35 to get 30% off "Kafka in Action"Watch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Apache Kafka 3.0 - Improving KRaft and an Overview of New Features

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Play 30 sec Highlight Listen Later Sep 21, 2021 15:17 Transcription Available


Apache Kafka® 3.0 is out! To spotlight major enhancements in this release, Tim Berglund (Apache Kafka Developer Advocate) provides a summary of what's new in the Kafka 3.0 release from Krakow, Poland, including API changes and improvements to the early-access Kafka Raft (KRaft). KRaft is a built-in Kafka consensus mechanism that's replacing Apache ZooKeeper going forward. It is recommended to try out new KRaft features in a development environment, as KRaft is not advised for production yet. One of the major features in Kafka 3.0 is the efficiency for KRaft controllers and brokers to store, load, and replicate snapshots into a Kafka cluster for metadata topic partitioning. The Kafka controller is now responsible for generating a Kafka producer ID in both ZooKeeper and KRaft, easing the transition from ZooKeeper to KRaft on the Kafka 3.X version line. This update also moves us closer to the ZooKeeper-to-KRaft bridge release. Additionally, this release includes metadata improvements, exactly-once semantics, and KRaft reassignments. To enable a stronger record delivery guarantee, Kafka producers turn on by default idempotency, together with acknowledgment delivery by all the replicas. This release also comprises enhancements to Kafka Connect task restarts, Kafka Streams timestamp based synchronization and more flexible configuration options for MirrorMaker2 (MM2). The first version of MirrorMaker has been deprecated, and MirrorMaker2 will be the focus for future developments. Besides that, this release drops support for older message formats, V0 and V1, as well as initiates the removal of Java 8 and Scala 2.12 across all components in Apache Kafka. The universal Java 8 and Scala 2.12 deprecation is anticipated to complete in the future Apache Kafka 4.0 release.Apache Kafka 3.0 is a major release and step forward for the Apache Kafka project!EPISODE LINKSApache Kafka 3.0 release notes Read the blog to learn moreDownload Apache Kafka 3.0Watch the video version of this podcastJoin the Confluent Community Slack

Streaming Audio: a Confluent podcast about Apache Kafka
Using Apache Kafka and ksqlDB for Data Replication at Bolt

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 26, 2021 29:15 Transcription Available


What does a ride-hailing app that offers micromobility and food delivery services have to do with data in motion? In this episode, Ruslan Gibaiev (Data Architect, Bolt) shares about Bolt's road to adopting Apache Kafka® and ksqlDB for stream processing to replicate data from transactional databases to analytical warehouses. Rome wasn't built overnight, nor was the adoption of Kafka and  ksqlDB at Bolt. Initially, Bolt noticed the need for system standardization and replacing the unreliable query-based change data capture (CDC) process. As an experienced Kafka developer, Ruslan believed that Kafka is the solution for adopting change data capture as a company-wide event streaming solution. Persuading the team at Bolt to adopt and buy in was hard at first, but Ruslan made it possible. Eventually, the team replaced query-based CDC with log-based CDC from Debezium, built on top of Kafka. Shortly after the implementation, developers at Bolt began to see precise, correct, and real-time data. As Bolt continues to grow, they see the need to implement a data lake or a data warehouse for OTP system data replication and stream processing. After carefully considering several different solutions and frameworks such as ksqlDB, Apache Flink®, Apache Spark™, and Kafka Streams, ksqlDB shines most for their business requirement. Bolt adopted ksqlDB because it is native to the Kafka ecosystem, and it is a perfect fit for their use case. They found ksqlDB to be a particularly good fit for replicating all their data to a data warehouse for a number of reasons, including: Easy to deploy and manageLinearly scalableNatively integrates with Confluent Schema Registry Turn in to find out more about Bolt's adoption journey with Kafka and ksqlDB. EPISODE LINKSInside ksqlDB Course ksqlDB 101 CourseHow Bolt Has Adopted Change Data Capture with Confluent PlatformAnalysing Changes with Debezium and Kafka StreamsNo More Silos: How to Integrate Your Databases with Apache Kafka and CDCChange Data Capture with Debezium ft. Gunnar MorlingAnnouncing ksqlDB 0.17.0Real-Time Data Replication with ksqlDBWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Advanced Stream Processing with ksqlDB ft. Michael Drogalis

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 11, 2021 28:26 Transcription Available


ksqlDB makes it easy to read, write, process, and transform data on Apache Kafka®, the de facto event streaming platform. With simple SQL syntax, pre-built connectors, and materialized views, ksqlDB's powerful stream processing capabilities enable you to quickly start processing real-time data at scale. But how does ksqlDB work? In this episode, Michael Drogalis (Principal Product Manager, Product Management, Confluent) previews an all-new Confluent Developer course: Inside ksqlDB, where he provides a full overview of ksqlDB's internal architecture and delves into advanced ksqlDB features. When it comes to ksqlDB or Kafka Streams, there's one principle to keep in mind: ksqlDB and Kafka Streams share a runtime. ksqlDB runs its SQL queries by dynamically writing Kafka Streams typologies. Leveraging Confluent Cloud makes it even easier to use ksqlDB.Once you are familiar with ksqlDB's basic design, you'll be able to troubleshoot problems and build real-time applications more effectively. The Inside ksqlDB course is designed to help you advance in ksqlDB and Kafka. Paired with hands-on exercises and ready-to-use codes, the course covers topics including: ksqlDB architectureHow stateless and stateful operations workStreaming joins Table-table joinsElastic scaling High availabilityMichael also sheds light on ksqlDB's roadmap: Building out the query layer so that is highly scalable, making it able to execute thousands of concurrent subscriptionsMaking Confluent Cloud the best place to run ksqlDB and process streamsTune in to this episode to find out more about the Inside ksqlDB course on Confluent Developer. The all-new website provides diverse and comprehensive resources for developers looking to learn about Kafka and Confluent. You'll find free courses, tutorials, getting started guides, quick starts for 60+ event streaming patterns, and more—all in a single destination. EPISODE LINKSInside ksqlDB Course ksqlDB 101 CourseHow Real-Time Stream Processing Safely Scales with ksqlDB, AnimatedHow Real-Time Materialized Views Work with ksqlDB, AnimatedHow Real-Time Stream Processing Works with ksqlDB, AnimatedWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Screaming in the Cloud
A Non-Traditional Path into the SRE Folds with Serena Tiede

Screaming in the Cloud

Play Episode Listen Later Aug 10, 2021 39:15


About Serena Serena Tiede is a SRE at Optum, a healthcare technology company that manages everything from the delivery of care to the management of patient data. Prior to becoming an SRE they were a Kafka operator for real time security logging and ingestion. In their off time, they moonlight as the proud admin of an incredibly over engineered Minecraft server.  Links: Optim: https://www.optum.com/ Twitter: https://twitter.com/SerenaTiede Personal Blog: https://blog.serenacodes.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Your company might be stuck in the middle of a DevOps revolution without even realizing it. Lucky you! Does your company culture discourage risk? Are you willing to admit it? Does your team have clear responsibilities? Depends on who you ask. Are you struggling to get buy in on DevOps practices? Well, download the 2021 State of DevOps report brought to you annually by Puppet since 2011 to explore the trends and blockers keeping evolution firms stuck in the middle of their DevOps evolution. Because they fail to evolve or die like dinosaurs. The significance of organizational buy in, and oh it is significant indeed, and why team identities and interaction models matter. Not to mention weither the use of automation and the cloud translate to DevOps success. All that and more awaits you. Visit: www.puppet.com to download your copy of the report now!Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It's an awesome approach. I've used something similar for years. Check them out. But wait, there's more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It's awesome. If you don't do something like this, you're likely to find out that you've gotten breached, the hard way. Take a look at this. It's one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That's canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I'm a big fan of this. More from them in the coming weeks.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. A recurring theme of this show has been for a while, where does the next generation of cloud engineer come from because the path I walked of being a grumpy Unix admin isn't really as commonly available as it once was, and honestly, I wouldn't wish my path on anyone in good conscience. My guest today is Serena Tiede, who's a site reliability engineer at Optim and didn't start their career as a grumpy systems administrator. Serena, welcome to the show.Serena: Hey, thanks for having me. I'm so pumped to be here.Corey: Don't worry, that will soon pass. What I'm wondering is, you didn't come to be an SRE through a giant ops background of clawing your way up by dealing with hardware and data centers and driving at unsafe speeds in the middle of the night because someone tripped over a patch cable in the data center. You have a combination of traditional/non-traditional background. Tell me about that.Serena: Yeah. So, it's funny you mentioned hardware. So, I went to school for electrical engineering, went to University of Minnesota because you want to do engineering, you pretty much going to one of the big state schools in the Midwest. So, I grew up and was like, “I want to be a hardware designer.” I'm terrible at it. So terrible. [laugh].Corey: Wait, I didn't realize that you could want to be things you were bad at. If somebody told me that early on my career, it's, “Huh. This might have taken a very different turn, and far more productive one.” I just assumed if I wasn't good at something I should give up and never try it again.Serena: Oh, I took the courses and was like, “Whoa, this is circuit design? Not for me.” Then I ended up just taking a bunch of engineering math courses. So, I took communications, the digital signal processing, controls, and started programming. I was like, all right, let's do embedded systems. No one was hiring and then come internship time, there's this little company that I've never heard of called Optim. And they're like, “We want software engineers.” Well, I can write C. Does that count?Corey: Oh, question, of course, to really ask is, “Oh, can you really write C having gone through it?” The more I talk to people who've been writing C for their entire career, and you ask them, “Can you write C?” The answer is, “Not really slash reliably. I can basically type and sometimes it works.” And, “Oh, thank God they're mortal, too.” Was my response.Serena: Oh, my opinion: no one should learn C unless there are specific reasons why. And those reasons are: you're doing embedded systems where I had to learn how to write in assembly, for three weeks, and then my professor at the end said, “Hey, we're writing C. Be thankful; it's a high-level language.”Corey: That is terrifying. But let's get back to this idea of you going to school for electrical engineering, and you didn't just dabble in it; you graduated with a degree in electrical engineering, didn't you?Serena: Oh, yes, I did. I graduated. It was fun even though, unfortunately, it still had my dead name on the diploma. So, I refer to that as my… matrix, Mr. Smith moment. [laugh].Corey: They won't go back and edit and reissue it under your actual name?Serena: I haven't bothered to look, but I almost consider it just kind of hilarious and just keeping it that way.Corey: No. Again, I am not one to ever advise people how to deal with names. When I changed my name back in 2010 or so I wound up getting a whole lot of strange looks over it. And honestly, it is no one's business, except how you interact with a name. Not the direction that we need to go in on this. I'm more interested in understanding, on some level, how you got a degree as an electrical engineer and then immediately landed a job writing software. That one feels a little strange. Can you talk me through it?Serena: Oh, yeah. So, pretty much I took a bunch of operating systems classes and was like, “Wow, this computer science thing is cool.” But I was too far in the electrical engineering track to change degrees. So, I got the degree, ended up working at Optim. I originally started off in security, oddly enough, for my internship, then came back, did a—you know, we have a rotational program so I did security for six months and then… I wound up on this team for my second rotation where their literal job description, “Write RESTful APIs in streaming applications.”Corey: So, it wasn't even a software job that focused on the close-to-the-hardware stuff where you're doing embedded systems. Like, that would at least make a bit more intuitive sense to the way I see the world. No, this was full-on up-the-stack REST API stuff.Serena: Oh, yeah. I tried embedded, but in my market, it was all medical devices, and between all of us listening here, I don't do well with medicine. Get very squeaked out, very faint. So decided, all right, let's go up the stack, and turns out, it's, like, okay, Kafka Streams. And then we were trying to figure out, “Okay, why our services—like, how do we know if it's saturated?”I'm like, “Oh, well, we have this Prometheus thing. This sounds cool.” And it was deployed on, like, you know, a rudimentary Kubernetes cluster. “Oh, hey, there's this cool service discovery thing. Let's do that.” And then one thing led to another. Thanos was coming out, and before it had a release candidate, I decided my claim to fame at the company was like, “All right, let's do this Thanos thing because it seems really cool. I read about it on Reddit.” And the distinguished engineer in the room was like, “Oh, yeah, I heard about it on Hacker News. Do it.” I did; it was rough, but it was so cool. And then I come back, like, a year later because I went back to security for a wee bit, and the same monitoring stack is still there. And they were like, “Hey, can you do more monitoring things and pivot to observability?”Corey: Yeah, let's skip past the obvious joke that I could make about someone at a healthcare company saying, “Let's do it because I read about it last night on Hacker News,” because it's just too easy at some point. It's odd, though, because I always held the belief, somewhat publicly, that an SRE role was not going to be a junior role. It was something that required quasi-significant experience to wind up moving into it, it's always felt like a transition from traditional ops roles or folks who are deep in the weeds that have been doing software engineering at scale to a point where they see how these systems fail over time in production scenarios. It doesn't sound like that was your path at all. Not to delegitimize your path by any stretch of the imagination. This is more to do with me reevaluating how I view SRE, as a field that people get into and how they approach it.Serena: I just fell into it. And the reason why I bring up my digital signal processing background is a lot of the SRE stuff I look at all of our time-series metrics, and it's like, “Oh. Well, this is just a real-time stream of data that we scrape periodically.” And it's like, “Oh, cool. So, we can look at our averages, percentiles, I can eventually do some really cool fancy digital filtering.” And kind of was like, “Oh, wow. I, kind of, know the math behind a lot of this stuff and just have to just brute force apply it in places.”Corey: Tell me a little bit more about that because with my approach to SRE—which let's be clear, was fairly crap—the math that I tended to do was mostly addition and subtraction, and for the really deep stuff, I used the most common tool to manage anything at scale, Microsoft Excel, and that mostly handled even the addition and subtraction for me what math?Serena: So, for me, a lot of it comes down to—I actually have my signals book in the other room—the big concept behind all these systems is the concept of sampling. You're not going to, real-time, get memory and CPU data every second. Processors are running at gigahertz of speed, you would need double that to recreate your signal with full fidelity. That's the Nyquist sampling theorem. But you kind of can fudge the numbers a little bit and just say, “Ehh, do we need that granular detail?”We're not trying to reproduce what happens in the past, we're just trying to see what's going on now. So, I say okay, 15-second scrape interval, things are looking good and then rolling into what I'm doing later of applying, like, “All right, let's do some fun control loops,” because people wanted service-level objectives. People want service-level objectives; everyone loves them some SLOs and SLAs. No one wants to figure out, by hand, what their baseline is. But again, some fancy—this is more controls math—figure out what your baseline is just automatically and do some little magic in the frequency domain, courtesy of Laplace transforms, and that's it. I can just automate that for you and remove the human from the equation.Corey: I'm still somewhat astounded by the fact that people calculate these things out mathematically instead of, you know, dead reckoning and confident-sounding estimation.Serena: It's really just bringing that electr—like, controls background to software. Honestly, I'm kind of baffled that no one else is found this hack because I'm just thinking, “Oh, well, I can't be that unique. Someone else has to have done that.” And then I talk to the people in the room and it's like, “Oh, wait, no, I am the only person here.” [laugh].So, that's my whole thing. Everything is just applied math. And all of our human dead reckoning, it's great, but it doesn't scale well. You know, my boss wanted me to figure out how to do our SLOs for the entire team, and turns out realist—and when it came time to hire, realistically, cloning myself was not an option. [laugh]. So—Corey: For better or worse, it seems like it isn't. So, what was your first exposure to the SRE-style space? You started off in security, but looking at the timelines on this, it wasn't that long ago. It feels like you were probably not exposed, in many cases, to physical data centers as much as you would be cloud, or at least not having to image bare-metal systems. Were you up at the AMI level, or was it beyond that in having virtual machines that moved around into full-on containers, or serverless?Serena: So, I started my internship in 2016, and got my full-time offer in 2017. And we started having our—container platforms started becoming this up-and-coming thing. You know, my lead engineers were like, “All right, you've got to learn this thing called ‘Docker.'” And I have never heard of it, but I was just amazed that, “Wow, I can just run these little, little itty bitty pods anywhere on this hardware.” And later on, I did do some, like, virtual machine stuff, but I've had the luxury of all of these years of pain and toil, to be able to say, “Oh, yeah. I can just manage things with Ansible, create my Docker files, and do everything from a code deploy pipeline style. And it was awesome.” And I just can't fathom what it's like to work without those tools, but knowing… the past, it's kind of like, “Wow, we have gotten a lot farther. Things are abstracted. This is actually kind of nice.”Corey: It kind of is, on some level. I feel like my initial reticence towards containers—I gave a talk: “Heresy in the Church of Docker,” which sort of put me on the speaker map once upon a time—and it was about all the things that Docker as a technology didn't really have good answers for. Honestly, the reason that I gave the talk was I assumed that it did have answers and I was just unaware of them, and I just gave the talk so I could publicly become the idiot who didn't know what they were talking about and then get “well actually'd” to death by [ducks 00:12:40] slash Googlers. And it turns out that no, no, at that point in time, these things were not well understood or solved for. The observability stories, the metrics, the logging, the orchestration, the security story, the how you handle things like state, et cetera, et cetera, et cetera.And Kubernetes these days has largely solved a lot of those problems, but I don't dabble in those spaces just because of outright ornery. Back then it was a weird problem, but these problems have largely gotten solved in some ways. But I sort of just skipped over the whole Kubernetes slash container renaissance, and personally, I went directly into the serverless world. What's your take on that?Serena: Oh, so as someone who loved Kubernetes, I was a serverless skeptic, initially. I was like, “Well, I can just build my Docker file and write the deployment manifest. No big deal.” And then I started working on my side project. For, I think, better purposes, my iCloud account is tied to my credit card and I have to actually be on the hook for cloud bills. And I use GCP for my home lab and lo and behold, 1 million requests a month for free. And I love the sound of free when it's my money on the line.Corey: Oh, yeah, company money versus enterprise money, radically different scales. I mean, if you try and sell me personally a $50 hamburger, I'm going to tell you to go to hell. If you try to sell me, as representative of my company, a $50 hamburger, I'm going to need a receipt.Serena: Exactly. And then also, like I'm just running through, I was redoing one of my serverless functions and watching the deploy steps. And then one of my coworkers introduced me, he's like, “Hey, Serena, you hear this thing called ‘Buildpack?'” and I'm like, “No. What on earth is that thing?” And he's like, “Oh, well, you take your code, and then it just magically turns into a container.” I'm like, “Well, crap. Show me.” And lo and behold, code goes in one end, nice little container comes out the other. And that crap was magic.Corey: It really does change the world if you let it. I think. I know it sounds like a ridiculous, I guess, hype-driven thing to say, but for the right use case, it's great because it removes the entire administrative burden from running services. Now, critics are going to say that well, that means you're just putting all of your reliability in the hands of your cloud provider. Yeah, we're kind of all doing that already; serverless just, sort of, forces us to be a little bit more honest with ourselves about that.Serena: Oh, yeah. I mean, even if you self-host things, you're relying on your data center ops people to, like, make sure, oh, I don't know, your machines don't literally catch fire. We literally had a bug one time where it's like, “Why is this one node bad?” “Oh, actually—hey, did you increase the fan speed?” Someone had to literally go increase the fan speed for whatever servers, which, again, in the serverless and cloud provider world, I don't think about that. The cloud is just infinite to me. It's just computers and APIs as far as the eye can see. It's wonderful.Corey: It really is. It's amazing, and it's high level, and on some level, you went from getting a degree that required you to write assembly and super low-level stuff and figure it out hardware works into, let's be honest, writing in your primary language, which for all of us in SRE-land is, of course, YAML.Serena: Oh, I am a very spicy YAML engineer. YAML and a little bit of Go for what I need to make things go.Corey: You ever notice there's never a language called ‘Stay,' or ‘Stop,' or anything like that? It's always about moving to the next thing. And we in engineering always have sprint after sprint after sprint. Never a, “It's a marathon, not a sprint. Relax. Walk. Enjoy the journey.” Nope, nope, nope. Faster, further, sooner.Serena: Yeah, it is honestly weird because my relatively short career span, you know, it's 2021 and I graduated in 2017. The company is like, “Hey, you're a senior software engineer now.” Here's a program, here's a budget. Go forth.Corey: Oh, that's lucky. It must have been amazing to have an actual budget. When I started out, I was in one of those shops where it's, “Yeah, Palo Alto wants $4,000 for that appliance. That's okay. We have some crappy instances and pfSense, and you know, we could wind up spending eight weeks of your time to build something not as good. Get on it.”Serena: While the hilarious part is I'm stressing out about every single dollar I'm spending and then my boss is like, “Oh, you know, your budget is super small potatoes, right, compared to like our other stuff? Don't sweat it. It's fine.”Corey: I keep making this point to the cloud providers where they're somewhat parsimonious free tiers are damaging longer-term adoption because I look at building something myself, in my spare time in my dorm room or whatnot, and I'm spinning up some instances that talk to each other and I want to use a load balancer and I want to use a managed NAT gateway—God forbid—and at the end of the month, I get a bill for $300. And it's, what the hell is this? I thought I was on the free tier and it scares the living hell out of us. So, we learn not to use those services that are higher level and differentiated. And then when we start working in environments that have budgeting and are corporate, we still remember that, and, “Oh, don't use that thing. It's expensive.” And you'll inadvertently spend 80 times as much in what your employer is paying for your time, rather than using the high-level thing because they could not care less about a $500 a month charge. And it's this weird thing that really serves as a drag on adoption.Serena: It's super wei—I actually literally had this conversation with one of my engineers who wanted to, “Hey, we're trying to expose a GRPC thing.” And I had issues getting it to work with an ingress. And he's like, “Do you want me to take a crack at that?” And I'm like, “Look at the price of the load balancer.” And I'm like, “Unless you can figure it out in half an hour… it is literally more expensive for you to continue tilting at that windmill than for us to just leave it be.” [laugh]. And it's also weird. I have my personal stuff where I'm trying to keep my cloud bill to, you know, maybe a humble $100 a month max, versus, “Oh, the enterprise? Oh, yeah. That's just logging that you're paying for.” Which is baffling to me.Corey: I feel like as engineers, we always, always, always fall into this trap. And maybe I fall into it worse than others because my entire business is actually lowering the bill. But when I started as an independent consultant, my bill was something like seven bucks a month, which yeah, I'm pretty content with that. And I started looking at ways to golf it lower, which in most cases is never worth the time, but in my case, I should really understand every penny of the AWS bill or I'm going to have a problem someday. And now I look at it recently because we have a number of engineers building things here, and our bill was over $2,000 a month.And true story, by the way, it turns out that your AWS bill is not so much a function of how many customers you have; it's how many engineers you have. And I look at this and, “Oh, my God, we need to fix that immediately.” And I spent a little bit of time on it and knocked 500 bucks off, and, “Whew, that's better.” And it still bugs me to see a $1500 bill; it feels like it's an awful lot of money. I mean, think of what you can buy for 1500 bucks a month.And then in the context of the larger business picture, compared to payroll, compared to all the other nonsense we use, like Tableau, for example, it's nothing. It is a rounding error that gets lost in the weeds. I never understood that before having access to company budgets. When I was an employee, this was never explained to me, so I was always optimizing for absolutely the wrong thing in hindsight. It feels like this is part of the problem that we run into as a culture when we don't give our staff context to make the right decisions.Serena: Yeah, I actually do appreciate the way my company does things because I am, like—not personally, my bank account, but I am, like, responsible if someone should ask, “Hey, what's this charge for?” I have to say, “Oh, well, it's for all of these things, and we need that.” But for the most part, it's been really weird to, kind of, learn, like, one of the ways I, kind of, sped up my, like, “Okay, I need to learn how business works. What do I do?” Well, quite honestly, a lot of my cloud cost tips I have learned from your various podcasts. [laugh].Corey: Uh-oh, that's a problem.Serena: No, but like, all of a sudden, all this stuff and just hanging out on tech Twitter and hearing all the advice of people and then… it was, kind of a weird way of, like, yeah, years-wise, yeah, some people might look me askance and be, like, “You're really a senior engineer?” But then they hear me speak and it's all about like, “Oh, well, I”—again—“I stand on the shoulders of giants,” which is awesome, and I'm honestly just hoping that one day I will write something that is very cool and then someone will say, “Oh, well, they were right on these things, but not right on this. Let's edit this to make it a little bit better.” And the standing on the shoulders of giants trend continues.Corey: This episode is sponsored in part my Cribl Logstream. Cirbl Logstream is an observability pipeline that lets you collect, reduce, transform, and route machine data from anywhere, to anywhere. Simple right? As a nice bonus it not only helps you improve visibility into what the hell is going on, but also helps you save money almost by accident. Kind of like not putting a whole bunch of vowels and other letters that would be easier to spell in a company name. To learn more visit: cribl.ioCorey: I'm a little taken aback by the fact that you've learned a lot of this stuff from the podcast because I tend to envision when I'm telling stories about this, companies that show ads, or my mythical Twitter for Pets startup. I have to remember that there are banks, like, is one of the examples of serious businesses that I use all the time. But you're in healthcare. I'm sorry, that's more serious than finance, just because—I hate to say this because it sounds incredibly privileged and I don't even care—it's only money. What is money compared to the worth of someone's life?I don't think that you can ever draw an equivalent and I feel dirty every time I try. When you're working with things that impact people's ability to access healthcare, that is more important than showing banner ads. And a lot of the stories I tell about, “Maybe it's okay to have downtime.” Because yeah, if AWS takes a region down issue for an afternoon and you can't show ads to people or your website isn't working, yeah, that's kind of sad and it's obviously not great for your business, but at the same time, the stories in the news are always about Amazon's issue, not about your specific issue. If you're in an environment where there's a possibility that people will die if what you have built is not available, we're having a radically different conversation.Serena: Exactly. Fortunately, for me, I personally, not working in the, like, kind of, care delivery space, but the stuff I'm working on right now is supporting, you know, that lovely end-of-the-year where it's open enrollment, all the employers are saying, “Hey, time to re-up your benefits.” Yeah, it's kind of a big deal that our site doesn't go down. Because—Corey: Yeah. And open enrollment, to my understanding, changes based upon what plan you're on. I've known companies that have open enrollment in the summertime. I believe ours winds up coinciding pretty closely with the calendar year, but I've certainly worked in environments where that wasn't true. So, being able to say, “Oh, it's fine. It's April; no one's doing open enrollment now.” Is it actually true?Serena: So, it totally depends on which part of your business. If you're going through the healthcare exchanges, that's usually more in the fall. I think the Medicare plans, those are a little bit before the individual enrollments. And there's a ton of these things that even though I just work tangentially, that I'm just not even in the know for. And then, of course, we talk about open enrollment, but the thing that a lot of people don't really talk about is, so what happens when your plan goes live on January first of the next year? Yep. Our site's still got to be up. And it's a responsibility I take really seriously because it impacts so many people.Corey: It really does. And it shouldn't, to be clear. I try to avoid getting overly political on this podcast, but the state of healthcare in the United States as of the time of this recording is barbaric. And I really, really, really hope there comes a day where someone's listening to this and laughing because it's such an antiquated and outmoded story that isn't true anymore. But I'm terrified that it won't be.And yeah, having access to a website lets you sign up for healthcare during a limited period of availability, if you miss that window, you don't have healthcare, in many cases, until the following year when open enrollment opens again, or honestly, you wind up changing jobs because that is a qualifying event to change healthcare. “Well, I missed the open enrollment window, so I have to quit and take a job somewhere else,” is a terrifying thing. It's bad for the business for a variety of reasons, but that pales in comparison to the fact that people have to make life-altering career decisions based upon a benefit that is routed through an employer when it should not be. Okay, I'll climb off my soapbox.Serena: Oh, it's bizarre to me. Honestly, for better or worse—I argue worse—but I'm honestly optimistic. One of the weirdest things I saw that stuck out from the most recent stimulus bill was, “Oh, hey. We're having a special enrollment period during a pandemic.” And I'm like, “You know, it's not a hundred percent.Maybe we should just extend it to the whole year.” But it's better than what was the previous state, where it's like I can't make—I mean, even in my work life, I can't make everything perfect. I can't make outages go away, but I can make things just a touch better. And that's all I can do.Corey: Sometimes all we can do, and I wish there were better ways to handle that. I don't know what the future is going to hold, but I also think that there are bright areas. There are aspects that are promising as far as the future being brighter than today. The overall trend—I hope—is for humanity to uplift itself.Serena: Totally.Corey: Again, I do want to highlight that you went in a very strange direction where you went from software engineering—a generally pleasant job—to SRE, which is horrible and would not be recommended to anyone. What guidance would you have for people who are, for some godforsaken reason, trying to figure out what their career trajectory is going to be like, and thinking that they might want to become an SRE—even if they're not in tech yet—because for some reason they hear the stories and think there's some nobility in suffering or whatnot?Serena: Well, for starters, for me, it kind of came down to get real good with this great math. It's boring, but that's kind of the bread and butter of the concepts I've learned. Also for junior people, if you're also just curious—say you've written an app, go over to OpenTelemetry. Go, like, instrument your stuff and see how many requests you get in a day. Start getting your hands dirty with instrumentation.Look at how cool it is, and then maybe you want to start structuring your logs; maybe you start end up doing tracing. But at the end of the day, it's all, for me, I think best learning is just experiential, and you know, one of the things where how do you learn from production outages? Go to happy hour with some of the senior people and listen to the stories that they tell. With enough time they become funny, but they're also valuable learning things.Corey: The aspect I would push back on is the hard requirement around discrete math. I don't deny that it has been helpful for what you've done and how you do it. I don't know how any of that stuff works on paper; I have an eighth-grade education. That was never my path and never my strong suit. I would agree that knowing it would have made aspects of what I do easier, but the bulk of it I don't necessarily know that I would agree. I guess, my counterpoint slash pushback would be that if you thought you'd like this, but you don't want to deal with the math, it's not a hard requirement, and I don't think that I would frame it as one.Serena: Actually, that is a very good catch. It is not a hard requirement. I am not sitting here in my notebook, scribbling away at equations. But the concepts that I've learned from a while back, it's the concepts are way more important than the actual computation itself. Because computers do that, and a computer will absolutely run circles around me.Corey: Most of us do, unless, you know, the computer is an overheating processor from Intel. But that's a little bit of a low blow. Not that it stopped me. But it was a low blow.Serena: Well, I mean, your local science supply shop might have some liquid nitrogen. Maybe.Corey: So, what's next for you? You started off in security slash software engineering, transitioned on over to SRE work. What's the next step? What's the beyond for you?Serena: Ohh, great question. So, I don't really know. I'm enjoying the SRE thing. At some point, might write a book trying to make all the concepts I have learned from my electrical engineering degree, maybe a bit more accessible, be it a series of blog posts, maybe a book. I would love to get a book published. And honestly, just writing more because knowledge should be shared, and if someone learns something from my nonsense experiments on my home lab, then cool; it's all worth it.Corey: I'd agree with that. I'm a big fan of learning in public. One of the, I guess, magical things that I do, for lack of a better term, is that I will stumble my way through learning a new concept that I have no idea what I'm doing, and when I get lost, I call it out because invariably, I'm not the only person who runs into that problem. But for folks who don't have—I don't know if it's the platform, the seniority, the perceived gravitas, the very intentional misdirection where I fooled the entire world into thinking I know what the hell I'm doing, whatever that is, most people have a problem with admitting they don't know something and learning in public, so anytime I can take up that mantle or that burden, I love doing it, just because I don't have any technical credibility to lose from my point of view. I wish that were more accepted and more common. That's why I'm so intentional about being able to talk, on some level, about the things I don't understand or the things that I don't get.Serena: I love that. I used to read a bunch of philosophy books, way back when, and my big thing, this great quote—I always get it confused, Plato or Socrates, but it's, “I know that I know nothing,” and I just run with that because I mean, even though fortunately, for me, my corner of the internet, as a non-binary person, no one's really mean to me when I say, “Okay, I broke my DNS,” because, honestly, I knew DNS conceptually when I was setting up my Minecraft server for friends, but I never really got it until I, well, kind of, broke it, [laugh] and eventually fixed it. But I hope that over time, it becomes more acceptable to say, “I don't know things.” Within my team, I tell anyone that's working with me when they're asking me a question, say, “I don't know, but I have a feeling this rabbit hole, this trail of crumbs might lead us to an answer.” And then it's a fun little adventure.Corey: I miss the days when I could describe what I do is a fun little adventure. It's now, “Oh, dear Lord, it's this bullshit again.” [sigh]. That was my sign that I was burned out, in time, find other things to do than keeping sites up.Now, I have no on-call responsibilities because there's no real site to keep up. Thank you, serverless, I get to sleep at night again. But there are times I miss aspects of working in the trenches, of being able to dive deep into a problem on a very large scale architecture. The grass is always greener, somehow.Serena: The grass is always greener. In a weird way, I actually, I complain about my on-call weeks, but I actually kind of love them. There's a weird camaraderie about all of us dealing with a shared thing. And on my team, it's really cool because we do this whole thing where, you know, I have these junior people asking, “Oh, am I going to go on call?” And we're like, “Well, unfortunately, you're not quite fully baked yet. Not quite ready. Once you're here longer with us, then yeah, we'll go walk you through a game day and make sure you can do all the things. But being on-call, it should not be a punishment for people.” Honestly, it's just the greatest feedback mechanism that guides me because I say, “Wow, this stinks. This could be better.” And then try to make it better.Corey: If people want to learn more about what you're up to, how you think about these things, or potentially even reach out for advice, where can they find you?Serena: So, I am on Twitter at @Serena—S-E-R-E-N-A—Tiede—T-I-E-D-E. DMs are open; come bug me. I got my lovely blog. It's just blog.serenacodes.com. It's pretty bare-bones, but I'll have some new content up there hopefully pretty soon, once I get around to writing it. And say hi. I like meeting new people and learning new things. Adventures await.Corey: And we will, of course, put a link to that in the [show notes 00:34:30]. Thank you so much for taking the time to speak with me. I really appreciate it, Serena.Serena: Hey, thank you. I am so happy to be here. This was one of my life goals, and now I don't know what to do now that I've gone up here.Corey: That's the problem with achieving these bucket list items. It's, “Oh, well, I wake up the following day. Now, what do I do?” And when life eventually returns to normal, on some level. [laugh]. Thanks so much for your time. I really appreciate it.Serena: Thank you. Have a great day.Corey: Serena Tiede, site reliability engineer at Optim. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment saying that if you think that C is a high-level language, oh, just wait until you explore the beauty and majesty of Rust.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Streaming Audio: a Confluent podcast about Apache Kafka
Minimizing Software Speciation with ksqlDB and Kafka Streams ft. Mitch Seymour

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 5, 2021 31:32 Transcription Available


Building a large, stateful Kafka Streams application that tracks the state of each outgoing email is crucial to marketing automation tools like Mailshimp. Joining us today in this episode, Mitch Seymour, staff engineer at Mailchimp, shares how ksqlDB and Kafka Streams handle the company's largest source of streaming data.  Almost like a post office, except instead of sending physical parcels, Mailchimp sends billions of emails per day. Monitoring the state of each email can provide visibility into the core business function, and it also returns information about the health of both internal and remote message transfer agents (MTAs). Finding a way to track those MTA systems in real time is pivotal to the success of the business. Mailchimp is an early Apache Kafka® adopter that started using the technology in 2014, a time before ksqlDB, Kafka Connect, and Kafka Streams came into the picture. The stream processing applications that they were building faced many complexities and rough edges. As their use case evolved and scaled overtime at Mailchimp, a large number of applications deviated from the initial implementation and design so that different applications emerged that they had to maintain. To reduce cost, complexity, and standardize stream processing applications, adopting ksqlDB and Kafka Streams became the solution to their problems. This is what Mitch calls, “minimizing software speciation in our software” It's the idea when applications evolved into multiple systems to respond to failure-handling strategies, increased load, and the like. Using different scaling strategies and communication protocols creates system silos and can be challenging to maintain.Replacing the existing architecture that supported point-to-point communication, the new Mailchimp architecture uses Kafka as its foundation with scalable custom functions, such as a reusable and highly functional user-defined function (UDF). The reporting capabilities have also evolved from Kafka Streams' interactive queries into enhanced queries with Elasticsearch. Turning experiences into books, Mitch is also an author of O'Reilly's Mastering Kafka Streams and ksqlDB and the author and illustrator of Gently Down the Stream: A Gentle Introduction to Apache Kafka. EPISODE LINKSThe Exciting Frontier of Custom ksql Functions (Mitch Seymour, Mailchimp) Kafka Summit LondonApache Kafka 101: Kafka Streams CourseksqlDB UDFs and UDADs Made EasyUsing Apache Kafka as a Scalable, Event-Driven Backbone for Service ArchitecturesThe Haiku Approach to Writing SoftwareWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Reversim Podcast
415 Bumpers 75

Reversim Podcast

Play Episode Listen Later Jul 28, 2021


[קישור לקובץ mp3] פרק 415 [!Unsupported Media Type] של רברס עם פלטפורמה - באמפרס מספר 75.אז זהו באמפרס: התוכנית החודשית - שלפעמים הופכת לדו-חודשית - שבה אנחנו נפגשים ומדברים על דברים מעניינים שראינו ברחבי האינטרנט, שקשורים לפיתוח תוכנה או לדברים אחרים שמעניינים אותנו: Repos מעניינים של GitHub, פרוייקטים מעניינים ב-Open Source, בלוגים, אתרים או כל דבר אחר טכנולוגי שמעניין אותנו, ונקווה שגם אתכם. אז אני אתחיל . . .רן - אספר על כלי שנתקלתי בו - וקוראים לו tuplex. קצת כמו Duplex, רק מתחיל ב-T . . .אז tuplex זה בעצם Framework חדש שלמעשה נועד לעבד מה שנקרא “Big Data”מה שמיוחד ומעניין בו זה שהוא מבחינת API דומה ל-Spark, אז כל מי שעבד עם Spark, נגיד ב-Python, מכיר את ה-API שלו - אבל המימוש, מתחת לפני השטח, למעשה הולכת ומקמפל (Compile) את הקוד ה-Python-י לקוד ב-C או ב-++C, באמצעות LLVM, וככה מריץ אותו.אז זה החלק המעניין - יש לו בעצם “Dual Mode” - הוא פעם אחת יכול להריץ את זה ב-Python, פעם אחת יכול להריץ את הקוד המקומפלוזה, לטענת המחברים, מייצר, יכולת Performance מאוד מעניינת ומרשימה.אני לא יודע מה יכולות ה-Parallelization שלו - זאת אומרת, החלק הגדול והמעניין ב-Spark זה באמת היכולת לעשות את הכל ב-Concurrency מאוד גבוהאז אני לא יודע מה היכולות של tuplex עצמואבל מה שכן - מהחינת Raw Performance, לכאורה, יש להם פה רווחים, כי הם למעשה מקמפלים את הכל ל-Native, זאת אומרת - ל-++C, באמצעות ה-LLVM.(אלון) רגע, יש לי יותר מדי שאלות לקהל . . .(רן) שאל . . .(אלון) אתה כותב ב-Python, ובסוף יוצא לך ++C?(רן) כן . . . אני חושב שגם זה, בסופו של דבר, מתקמפל מן הסתם לשפת מכונה - אבל כן.(אלון) אז השאלה, קודם כל אם . . . (דותן) לא . . . זה בנוי ב-++C - אתה כותב ב-Python, וזה פולט החוצה LLVM . . .(רן) אה, אוקיי . . . סבבה, LLVM.(אלון) אז LLVM, אוקיי - אז עכשיו, השאלה הבאה היא למה לא לכתוב - אם זה עובד, נניח - אז למה לא לכתוב Python רק בזה? בלי קשר למקביליות, בלי קשר לכלום . . . אם בסוף יצא לי LLVM? כאילו, מה עשו פה - Complier מ-Python ל-LLVM בעצם?(רן) נכון . . . אני חושב שהשאלה היא במקום, והשאלה היא . . . אני לא יודע, יכול להיות שהמרחק פה באמת מאוד קטן, אבל יכול להיות שהם התמקדו לא על כל מה ש-Python יודע לעשות, אלא על הדברים הספציפיים . . .על API שבין Python לבין Sparkאבל שאלה מעניינת . . . יש לך עוד שאלות?(אלון)כן!(דותן) זה מה שאני רואה - אני מטייל להם בקוד, ואני רואה שהם עשו בדיוק את זה . . .(אלון) כי אם זה עושה Complier, אז בעצם זה פתרון גנרי - אם זה היה עובד טוב. יש לי יותר מדי . . . אני חושש מהפרויקט הזה, כי זה מסוג הדברים שעובד לך ב-Python ואז אתה מקמפל ולא עובד כלום, אין לך מושג מה קורה, אתה מקבל LLVM, כי יש בעיה ב-Complier, אז אני חושש . . .ואם ה-Complier כן עובד - אז בעצם פתרנו את בעיות הביצועים של Python?! כאילו - משהו פה לא מסתדר . . . (דותן) יש פה . . . זה לא נראה לי יהיה ככה אי-פעם - אם היו מקמפלים את Python מחדש באופן גנרי לכל דבר שאתה יכול לכתוב ב-Python, נראה לי שזו הייתה הכרזה מסוג אחר . . . (רן) כן . . .אני חושב שזה הרבה-הרבה יותר צר - לצורך העניין, הם לא מטפלים, אני מנחש, ב-Threading או ב-Multi-Process או בהרבה דברים אחרים, פיצ'רים אחרים שיש ב-Python.הם התמקדו ספציפית בפיצ'רים של ה-API שיש ל-Python מעל Sparkאז זו איזושהי זוית מאוד צרה - אבל יכול להיות שמפה, לקחת את זה הלאה ולעשות את זה Python גנרי, יכול להיות שזה יותר קל . . .אבל הם לא עשו את זה.(דותן) אני יכול לנחש שנגיד - אם אתה כותב Java, אז ה-JIT, עם הזמן, יהפוך את הקוד שלך לקוד הכי טוב בעולם . . .פה הם אמרו: “טוב, אתה כותב ב-Python, אין JIT, זה לא ‘JI-thon' - בוא נעשה משהו דומה” . . נראה לי שזה מה שהם . . .לא קראתי לעומק, אבל נראה לי שזה מה שהם רצו לעשות.(אלון) טוב . . . מעניין. מפחיד אך מעניין . . . (רן) נסווג את זה תחת “מפחיד אך מעניין” [פיילוט לפינה חדשה?]אני, באופן אישי, לא ניסיתי, אני חייב להגיד . . . ראיתי את הפרויקט, קראתי, נראה לי מעניין - אז רציתי להביא את זה לפה, אבל לא השתמשתי בו עדיין.אני גם לא רואה את עצמי משתמש בו בקרוב, אני חייב להגיד . . .בכל אופן - חשבתי שזה יהיה מעניין.הנושא הבא - נתקלתי באתר שנקרא Gently Down The Streamאני מניח שרובינו זוכרים את השיר באנגלית מכיתה ג' - Row Row Row Your Boat, Gently Down The Stream [ברחוב סומסום היה את “בוא נשיט סירה על פני גיא והר” . . . ] - אז יש אתר שנקרא Gently Down The Stream והוא מאוד מאוד נחמד, אני ממליץ להיכנס אליו.מה שהוא בעצם עושה זה שהוא מלמד אותנו איך עובד Kafka - איך עובדת מערכת ה-Messaging שנקראית Kafkaאבל הוא עושה את זה בצורה מאוד מאוד ויזואלית ונחמדה, עם הרבה מאוד, ככה, ציורים מאוד מזמינים - רואים שם דגים וציפורים וחיות שונות שככה נמצאות ליד הנחל ומדברות ביניהן, ותוך כדי אתה גם סופג הסברים על איך פועלת Kafka - אז זה נחמד(אלון) זה סיפור ילדים שהוא קפקא . . . כאילו, זה ממש מגניב - זה בעצם ספר-לפני-השינה לילדים, שאם תקרא להם הם ידעו Kafka בסוף . . . (רן) לגמרי . . . אז אם רציתם ללמד את הילד שלכם טיפה תכנות - התחלתם ב-Logo? התחלתם ב-BASIC? טעיתם! צריך להתחיל ה-Kafka . . . משם הם יצאו מפתחים דגולים . . .אז זה ממש נחמד - באמת, מאוד ככה נחמד ויזואלית, כיף לטייל שם ולקרוא את הסיפור הזה...(דותן) מגניב . . .(רן) ואנחנו עוברים לאייטם הבא - ועל האייטם הבא אני חושב שכולם כבר שמעו: רציתי להגיד כמה מילים על Copilot של GitHub, אז . . (אלון) מה זה?!(רן) אז כן - לאלון, שעדיין לא שמע על זה - מה זה Copilot? אז Copilot זה בעצם יכולת שהוציאו GitHub - למעשה, בשיתוף עם Microsoft Research - שבעצם עוזרת לך, כמפתח, לכתוב קוד.זה בעצם “השלמה אוטומטית” - אבל לא סתם השלמה אוטומטית אלא על סטרואידים: Copilot “הולך ולומד” Code-bases, שקיימים אי-שם בעולם, בחוץ, Open-Sourceואם אתה, לצורך העניין, רוצה לכתוב . . הדוגמא הקלאסית היא שאתה רוצה לכתוב ב-Java פונקציה שפותחת וקוראת קובץ - אז נכון אף אחד לא זוכר איך עושים את זה, כי יש איזה שלושה-ארבעה Class-ים שצריך לעשות להם Instantiation ולטפל בהם כמו שצריך? אז הוא הולך “וכותב” לך את זה, לפי ה-Best Practices שהוא מצא “בעולם”.עכשיו, ראינו דוגמאות ב-Java, ב-Bash, ב-Python ובשפות אחרות, זאת אומרת - הוא יודע לעשות את זה, באופן יחסית כללי.וזה מעניין מאוד - ועולות פה הרבה הרבה שאלות . . . קודם כל - אני חושב שזה הישג מדעי מאוד יפה ומרשים. דבר נוסף מעניין . . .ואולי שווה להגיד שזה עדיין ב-Beta סגורה - אני עדיין . . . אני ביקשתי גישה ועדיין לא קיבלתי, לא יודע - אולי מישהו מכם כן קיבל ויכול לספר לנו איך זה באמת עובד, אבל אני עוד לא בפנים.בכל אופן, יש שם כמה דברים שאני חושב שהם מעניינים - באופן תמידי עולה השאלה “אוקיי, אז זהו - להתחיל לחפש מקצוע אחר? מישהו כבר יעשה את העבודה בשבילי?” אני חושב שהתשובה לזה היא די טריוויאלית כרגע - לא.זה אולי כלי-עזר - ואולי כלי-עזר טוב, יש אולי סימן שאלה לגבי עד כמה הוא טוב או לא, אבל בכל אופן זה כלי עזר - ואני חושב שהתשובה היא “לא”.שאלה נוספת היא לגבי הלגיטימיות של השימוש בכלל בכלי, או למעשה הלגיטימיות של שימוש של Microsoft או GitHub ב-Open-Source שבהם הם השתמשו - מכיון שעלתה הטענה שלמעשה הם השתמשו ב-Open Source שאסור היה להם להשתמש . . .למעשה, כאלה שיש להם מה שנקרא “רישיון וויראלי” [לרענון - 317 Zusammen with Zohar Sacks] - ואם אתם עכשיו הולכים להשתמש בהצעות שלהם, אז למעשה גם המוצר שלכם צריך להיות Open Source . . . בקיצור - יש פה כמה שאלות משפטיות לא פשוטות שעלו, ואולי בפן-הנקרא-לזה-הנדסי או מדעי, החלק המעניין ביותר, זה שלמעשה הפרויקט Copilot נכתב מעל מנוע שפה שנקרא GPT3 של OpenAI, שאני מאמין שכבר הזכרנו אותו פה בפודקאסט לפני זה [אכן, נגיד כאן - 397 Bumpers 69]אז זה למעשה מנוע שפה יחסית גנרי, שפיתחו ב-OpenAI, שדרך אגב - Microsoft היא אחד מה-Backer-ים הגדולים של OpenAI, ולכן יש להם גם גישה אליו . . . אז לא לכל אחד יש גישה ל-GPT3, אבל ל-Microsoft יש.אפשר לבקש גישה, ואני לא יודע על סמך איזה קריטריונים הם נותנים - אבל אפשר לבקש.ובכל אופן - החלק המעניין הוא ש-Microsoft פרסמו שעל הפרויקט הזה עבד בסך הכל Data Scientist אחד - אבל הרבה מהנדסים . . . מה הם רוצים להגיד? הם רוצים לבוא ולהגיד שהמנוע של GPT3 הוא כל-כך גנרי, ככה שנדרשת מעט מאוד עבודה “מדעית” כדי להתאים אותו, לצורך העניין, ל-Source code.מנוע GPT3 זה מנוע שפה יחסית גנרי, שיכול - אני מניח שראיתם בעולם דוגמאות של איך הוא כותב פרוזה, חמשירים ודברים כאלה, זה מנוע שפה מאוד-מאוד גנרי - ולבוא ולהתאים אותו ל-Source Code זה, יחסית, מעט עבודה.כן יש פה עבודה “הנדסית” של ללכת ולהנדס את ה-Pipeline של ה-Data, לאסוף את כל הדאטה וכו' - אבל מבחינה מדעית, רק Data Scientist אחד עבד על הפרויקט הזה, לפי הפרסומים שלהם.שזה בא לומר שהמנוע הוא כל כך גנרי, שמאוד קל להתאים אותו ל-Domain-ים שונים.(דותן) או . . . (רן) ולתיאוריה השנייה . . .(דותן) אני אתן את החלק השני של הסיפור הזה - וזה שהיה פיאסקו שלם, ששרף את האינטרנט: שבעצם אנשים הבינו שה-Copilot משלים להם דברים מסוכנים, גמו Secret-ים וכל מיני API keys של אנשים אחרים . . . בין היתר היה גם Copyrighted code ובעצם . . . התיאוריה האחרת היא שפשוט, כאילו, עשו את זה בצורה כזאת “עיוורת”, הפעילו את המנוע בצורה עיוורת ואז . . . (אלון) וסרקו Private Repos . . . . . . אנשים עשו Auto-completion מזעזע . . .(רן) כן - אז אני חושב שהסיפור הזה הוציא את הטוב ואת הרע באנושות, בהרבה מובנים - ולמעשה יכול להיות מאוד שהיה צריך לעשות פה עבודת הכנה וניקוי הרבה יותר משמעותית ממה שהם עשו.אני חושב שעל זה הרבה יכולים להסכיםאבל בכל אופן - אני חושב שזה, מבחינה טכנולוגית-מדעית - יש פה איזושהי קפיצת דרך [טריילר חדש!] מאוד מאוד מעניינת.דרך אגב - זה לא מפתיע שהם לא נתנו להרבה אנשים גישה . . . כנראה שלפחות מה שציינת, זה חלק מהסיבות שהם לא נתנו להרבה אנשים גישה . . . יכול להיות שהם החליטו לעצור את זה באיזשהו שלב.אבל בכל אופן - כן, זה כלי שאני חושב שעורר הרבה מאוד הדים והרבה מאוד עניין, ונתן איזושהי קריאת-כיוון לגבי היכולות של מנוע שפה באופן כלליואול גם - אני קצת סקפטי בעניין הזה - אבל אולי גם, ככה, “עתיד המקצוע שלנו”.מה שכן - צריך להזהיר שגם אם זה היה פתוח ועובד לכולם, אני חושב שזה יהיה חוסר . . . בוא נאמר בזהירות - “חוסר אחריות” מצידינו, כמפתחים, ללכת ופשוט לקבל את ההצעות שלו, As-is.אבל אם כל זה אני חייב להגיד שברגע שמקבלים הצעה כזו, וזה נראה בסדר - אז מאוד קל להשתכנע שזה כן בסדר, זאת אומרת - יש פה איזושהי סכנה, לטעמי, נקרא לזה, לאיכות הקוד שבסופו של דבר מפתחים יוציאו[ואלוהים ישמור כמה אנשי Security החסירו פעימה עכשיו לאור הביטוי “מאוד קל להשתכנע שזה כן בסדר” . . .]יש הבדל בין לבוא ולהבין את הבעיה לעומק ולפתור את זה בעצמך, לבין לקבל איזושהי הצעה ולהגיד “אוקיי, נראה לי טוב, נמשיך הלאה”.(אלון) יש פה כמה דברים . . . מעניין מתי יתחילו להיות לזה Hack-ים של Vulnerability בכוונה - לזרוק קוד ולשים אותו על Repos, עם ה-Vulnerabilities, בכוונה שישלים לאחרים וככה תיצור Vulnerabilities במקום אחר, שזה תמיד הכובע הרע . . .אני חושב שהשימוש הטוב שלו יהיה בסוף שהוא יחסוך לך זמן בלקרוא APIs וללמוד דוקומנטציות (Documentation) -כי כשאתה מסתכל על הדוגמא הזאת, אני אומר שאני רוצה לשלוף משהו מ-Database, אז אתה אומר “רגע, איך אני מתחבר ל-Database, איך אני עושה . . .” -ואז זה משלים לך, וכאילו חסך לך את כל הקטע הזה של “רגע, איך נראות הדוקומנטציות?” או לנסות להתחיל ולבדוק ב-Auto-complete מה מופיע לך על האובייקט . . .אז אני חושב שבקטע הזה זה מאוד חזק - שאתה פשוט אומר “זה ה-Database, הנה ה-Template, אני לא צריך ללכת לאיזה Stack Overflow כדי לקחת את ה-Template ואז להתחיל לסדר את זה אלי”זה חוסך, לדעתי, בקטע הזה, הרבה זמן יקר.(דותן) מצד שני - יש Database-ים שמאבדים דאטה [פעמיים ד”ש לזהר באותו אייטם?], אז מה זה משנה איך אתה שולף מהם?(אלון) כן . . . אתה יודע - עשיתי Select From Mongo, ויצא לי Return NULL . . .(רן) אני מתפלא שבאמת אין לנו אף אייטם על Mongo, אבל בוא נראה - אולי יהיה בהמשך[לכל המאוחר באפריל 2022](רן) בסדר, אז זה היה -Copilot - מיצינו . . . אני חושב שזה כנראה הולך להמשיך ולהיות איזשהו נושא שנוי במחלוקת . . . יש לי תחושה שעוד נשמע עליו בעתיד, אבל בואו נעבור הלאה לנושא הבא.הנושא הבא זה איזשהו ספר Online שנתקלתי בו, שנקרא Machine Learning Interviews Bookהספר הזה פורסם ע”י Data Scientist שנקראית Chip Huyen - השם ויאטנמי אז אני לא בטוח שאני הוגה אותו כמו שצריך [לפחות בטקסט זה נשמע בול אותו הדבר]בכל אופן - זהו ספר Online, שהתפרסם והוא פתוח לכולם ואפשר לקרוא - ולמרות שהכותרת שלו זה Machine Learning Interviews Book, הוא למעשה, אני חושב, ספר על רעיונות עבודה באופן כללי.מאוד מעניין ופותח עיניים.בגדול, הוא מחולק לשני חלקים - החלק הראשון זה איזשהו קטע גנרי על מה זה ראיונות עבודה, איך חברות מחפשות עובדים, איך נראה שוק העבודה וכו' - שאני חושב שזה קטע מאוד מעניין.החלק השני זה בעצם דוגמאות לשאלות-ראיון בתחום של Machine Learning - זה כבר באמת משהו מאוד מאוד ספציפי.אז אני ממליץ לקרוא את החלק הראשון, או לפחות את החלקים שמעניינים אתכם בחלק הראשוןקצת ללמוד על שוק העבודה - לפחות שוק העבודה האמריקאי, שוק הראיונותוכמובן שיש המון דימיון לדברים שקורים פה שישראל - הכל מאיך עושים Negotiation, איך להתכונן לראיון עבודה, על מה מראיינים בדר”כ מסתכלים, איך נראה Pipeline של ראיונות וכו'אני חושב שהספר עצמו, בסך הכל, די נגיש וכתוב בצורה יפה - אז אני ממליץ על הקריאה שלו.ולנושא הבא - אייטם קטנצ'יק: לאחרונה קניתי מוניטור חדש לבית, מסך מחשב חדש - ואני משתמש בוויש בו רמקולים, במסך הזה . . . אז למעשה, את השמע (Audio) שלי אני שומע מתוך המסך - וזה יותר נוח לי מאשר לשמוע מתוך ה-Mac, שבדר”כ המכסה שלו גם סגור, ככה שזה נשמע פחות טוב.העניין הוא שכדי לשלוט על הסאונד, לשלוט על עוצמת הסאונד - ה-Mac לא עושה את זה . . . As-is, ה-Mac לא מסוגל לשלוט על עצמת הסאונד של המוניטור החיצוני, של Dell במקרה הזה.אז אם משתמשים בכפתורים של ה-Volume, ה-Up וה-Down על המקלדת, הם לא עובדים . . .[הוסף בדיחת “מתאם Apple לבן - רק ב-$99!“ גנרית כאן . . . ]אז מצאתי איזשהו כלי שנקרא MonitorControl, שלמעשה ברגע שאתם מתקינים אותו - הוא נותן לכם את השליטה גם על ה-Control של מוניטורים חיצונייםאצלי הוא שולט למעשה על השמע ועל הבהירות של המסךאז זה נחמד . . . אפשר לעשות את זה מהמקלדת שלכם, או כמובן גם עם העכבר - ולא צריך ללכת לכפתורים של המוניטור ולעשות את זה, ששם זה קצת יותר מסורבל.אז זהו - אם יש לכם Mac ואתם רוצים לשלוט על המוניטור שלכם, והמוניטור שלכם כרגע לא מאפשר את זה - אתם מוזמנים לנסות את MonitorControl, יכול להיות שזה יעבוד לכם . . .אני כן ראיתי שזה לא תמיד תואם, זאת אומרת - זה לא תואם לכל המוניטורים, יכול להיות שזה גם לא תואם לכל הגרסאות של מערכת ההפעלה . . .אז לנסות את זה בזהירותזה כן משהו שהוא יחסית Low-Level וככה קצת שביר - מספיק שאיזשהו פרוטוקול קצת משתנה וזה יכול להישבר, אבל לפחות לי זה עובד, אז אני מרוצה ממנו (אלון) קודם כל - תתחדש על המסך . . . רוצו להתקין את זה . . .(דותן) לקנות מסכים של Dell . . .(אלון) כן . . . בחסות . . . “פרק זה בחסות Dell!”(רן) Alt + Ctrl + Dell . . . ואליך דותן . . .דותן - טוב, האמת שבוא נמשיך בתמה המסוכנת שלנו - למי ששמע השבוע על הסיפור עם NSO . . . אני קצת מפחד לדבר על זה כי אני רוצה שנמשיך לעשות פודקאסט, אז ננסה, ככה, להיות עדינים . . .(אלון) יש מאחוריך מישהו, דותן - אני רואה במצלמה . . .(דותן) אז פשוט ניקח את זה בפרספקטיבה אחרת לגמרי - לא נדבר על מה שהיה, כדי שניהיה עם הטובים . . . (אלון) תגיד מה היה, לפי פרסומים זרים, למי שלא מעודכן . . .(דותן) לא רוצה, אני מפחד . . .בקיצור - יש פה איזשהו Framework שמישהו בנה - קוראים לזה Mobile Verification Toolkit, ובעצם המטרה של הכלי הזה זה שלוקחים טלפון ובודקים אם יש בו איזושהי רוגלה של חברה שאנחנו ממש אוהבים, או לא יודע . . . לא רוצים להגיד את השם שלה [וולדמורט?]תכל'ס, מבחינה Engineer-ית, זה פשוט שם בומבסטי ל”הנה ערימה של סקריפטים שמחפשים String-ים בתוך המכשיר שלכם” . . .אותו בחור שבנה את ה-Toolkit הזה - אני חושב שהוא עובד באמנסטי, שהיא חלק מכל הפרסומים שהיו השבוע, או שהוא איזשהו פעיל זכויות אדםוהוא בכלל מתחזק איזושהי רשימה של מחקרים, או יותר נכון Investigations של אמנסטי - שזה עניין אותינכנסתי ל-Repo - וכל המידע פה הוא פומבי: נכנסים ל-Repo, ויש לפי תאריכים. . . הוא בעצם מתחזק, לרוב, רשימות של Domain-ים שקשורים לכל מיני חברות שעושות, לטענתו, ריגול.יש פה כל מיני מקרים - נגיד ב-2018, אותה חברה [Who must not be named], ואחרי זה במצריים משהו שקרה, ואחרי זה במרוקו, וכל מיני תקריות . . .בתקרית האחרונה, יש פה כל מיני Domain-ים ואימיילים שאותה תוכנה משתמשת בהם, שבעצם אם מוצאים את הדברים האלה על המכשיר, אז אפשר להגיד שמרגלים אחריכם, או שאפשר להגיד שאתם נגועים ברוגלה.ואיך שזה נראה, צריך בעצם לשלב בין שני הכלים - כאשר הוא מתחזק את שני הדברים האלה בנפרד . . .חייב לומר שחיטטתי קצת ברשימה של ה-Domain-ים והאימיילים - והרשימה היא משוגעת . . . יש פה משהו כמו 1400 Domain-ים שונים שנראים לגמרי, נקרא לזה, “תמימים”סתם לדוגמא - apigraphs.net או blogreseller.net וכל מיני דברים כאלה . . .וזהו - אז מי שמתעניין, מי שחושד, מי שהוא עיתונאי . . . יכול לקחת את הכלים האלה ולסרוק את המכשיר שלו.וזה מעניין לראות את הדבר הזה מכיוון של אותו ארגון זכויות אדם, איך הם רואים את זהויכול להיות גם שזה הכל שטויות, זה הכל לכאורה . . .(אלון) תראה, לכאורה הם כתבו פה שיש . . . שזה Developed בעקבות ה-”Pegasus Project”, לא יודע מה זה [Try to Google it . . . ], יש פה איזו חברה של שלוש אותיות, לא יודע(דותן) אני גם לא יודע מה זה . . .(אלון) מתחיל ב-”N” ונגמר ב-”O”, אבל לא יודע . . . (דותן) לא יודע, אף פעם לא שמעתי(אלון) . . . הכל לפי הפרסומים פהאז זהו . . . שמע - זה מעניין, אין ספק(דותן) כן . . . שוב - נדגיש שזה הכל “לכאורה”.ונמשיך לנושא אחר . . . האמת שנדבר קצת על Rust, ויש פה איזשהו מדען מחשב, שעשה דוקטורט - ובדוקטורט שלו הוא הוכיח את מה ש-Rust טוענת: Rust היא שפה שהיא Safeכמובן, “Safe” יכול להיות בכל מיני מובנים - במאמר הזה ספציפית הוא הוכיח שהיא Safe במובן הזה שזה בולם או מונע מחלקה שלמה של טעויות מפתחיםאני אזכיר קצת אילו טעויות מפתחים רודפות אותנו כל החיים . . .(אלון) רגע - מה החידוש בזה? כי בעצם . . .(דותן) אין חידוש - הוא פשוט הוכיח את זה “מדעית”, מתימטית.(אלון) אבל הייתי בטוח שזה כבר הוכח מדעית . . . זה לא הוכח מדעית עד עכשיו? כי הייתה את הטענה הזאת שזה הוכח, ש-Rust היא Safe Language . . .(דותן) לא חושב שמישהו עשה על זה איזשהו מחקר אקדמאי והוכיח את זה מהכיוון הזה(אלון) הייתי בטוח שכבר עשו . . .(דותן) אז זה מה שוא עשה - שזה מגניב, זה מראה ש . . . (רן) אז על איזה מחלקות - זיכרון? . . (דותן) הדברים הרגילים . . . אני חושב שאחד הדברים הכואבים ביותר, והשנויים במחלוקת ביותר, זה כל ה-NULL Safety . . .ואני אומר “השנויים במחלוקת” עד היום, כי יש שפות שאומרות “אין לנו NULL בשפה, יש משהו אחר” - וגם הן טועות . . . גם הן מובילות לאותו Class של טעויות בדיוק . . .אז ב-Rust באמת זה Safe - וזה עובד טוב.הוא קיבל גם איזשהו “צל”ש” על המאמר הזה, וכל מיני דברים טובים אחרים - ממש נחמד למי שרוצה לקבל קצת “הבטחה” עם “ה”, או לשלוח את זה למישהו שהוא רוצה לשכנע כדי שנתחיל לעבוד ב-Rust בחברה . . .אז עכשיו - אייטם משוגע: אתם מוכנים?(רן) כן! מוכנים, יושבים . . .(אלון) רגע! רק אם אני יכול, לפני האייטם הזה, כי אני מכיר אותו קצת - למי שנוהג: לעצור בצד! להקשיב לדותן, לא תצטערו . . . תמשיך.(דותן) טוב - אז האייטם הזה בא מ-Discourse, ואני חייב לומר שזו הפעם השניה שאני רואה אותם יוצאים מהמסך ונותנים לי סטירה, שזה די מדהים - ובעצם המאמר הזה הוא על איך לייצר Upload-ים יותר מהירים, Upload-ים של קבצים יותר מהירים ל-Discourse.וכשחושבים על זה, וקוראים את ה-Title, אז כתוב פה Rust ו- WebAssembly וכו' . . אז אומרים “טוב, אז הם מימשו משהו ב-Server” או, לא יודע . . . “זה Rust, אין לו VM, יש לו פיצויים מטורפים . . . בטח החליפו איזה Upload Server, כמו ש-Google עשו עם Go”.אבל מה שהם עשו זה שהם בנו WebAssembly, שיושב ב-Browser, וכשאדם פשוט כמונו רוצה להעלות קובץ תמונה - הדבר הזה, ב-Client-Side, על ה-Browser, בעצם מאפטם (Optimize) את התמונהבעצם, אם אני עכשיו, בא לי לעלות png. - למי שמכיר, אז יש כל מיני PNG Crushers ו-Utilities שמאפטמים png. - אז ה-png. שלי, מן הסתם, הוא לא Optimized - אני מעלה את זה ל-Discourse, מקבל Crunching של כל הקובץ שלי, לוקאלית - משתמש ב-CPU שלי, זה לא עולה להם כלום - ואז מאיזה 2Mb אני בעצם צריך להעלות 100Kb . . .וככה הם האיצו את כל ה-Uploads ב-Discourseחייב להגיד שזה אחד הטוויסטים בעלילה, המשוגעים, שלא חשבתי עליהם - וזה מדהים, זהו.אתם יכולים להמשיך לנהוג, למי ש . . .(רן) אפשר להתניע. . . (אלון) מי שלא התעלף לנו, ממה שהם עשו . . . (רן) אז יש לך איזו פינה חמה בלב ל-Rust, ככה אני מרגיש . . . אם אני קורא בין השורות . . .(דותן) ודאי - קודם כל: Rust ו- WebAssembly הם ממש קרובים . . .(אלון) אחוקים . . (דותן) נכון . . . אם אתה רוצה לעשות WebAssembly ולהרוויח את ה-Benefits שלו, אז Rust זה המקום.כמובן שהדורות הקודמים של זה היו עם סקריפטים וכל השיפוצים האלה, שכמובן היו חייבים לקרות כדי שהעולם יתקדם - אבל Rust נותן לך את כל ה . . . Ticks all the boxes בשביל לבנות WebAssembly שעובד יעיל וטובוגם - כל ה-Tooling שם: אם אתה רוצה לעשות את זה, אתה פשוט . . . יש לך המון כלים ש”מחבקים אותך” ונותנים לך להפיק, בסופו של דבר, WebAssembly.וזהו - אז אחרי הדבר הזה, אמרתי “בוא נבדוק מה ככה, ה-State of WebAssembly”, וגם לתת למי שרוצה לנסות כמה רעיונות - אז המאמר הבא מדבר על איך מריצים Rust ב-Electronתזכורת Electron זו התשתית שעליה רצים מלא . . איך נקרא לזה? “אפלקיציות מהסוג החדש”כל מיני עורכי טקסט למיניהם וכו'(אלון) בקיצור - Chrome . . . או “HTML-Renderers”, בוא נגיד את זה ככה . . (דותן) כן . . . בקיצור, אם אתה רוצה לבנות אפליקציה, ובא לך להשתמש ב-HTML או Java או מה שאתה רוצה, אז אתה משתמש ב-Electronאני רק מזכיר את זה - היום זה די ברור כבר לכולםואם אתם רוצים להריץ שם Rust מאיזושהי סיבה, אם באמת בונים Client-Side-Up, ולא רוצים לבנות את זה בדרך המסורתית - רוצים Performance, ואת הדברים הטובים שה-User-ים מצפים להם - אז אפשר להריץ Rust פחות או יותר, שוב באותו שיטה, עם WebAssemblyנגיד - תחשבו על עורך תמונות, או עורך אודיועוד דבר שחשבתי שיכול להיות מגניב, באותו קו, זה איך להריץ Wasm בקלות - שזה WebAssembly - על Raspberry Piאז מי שאוהב יותר לשחק עם חומרה, זה מאמר שככה, ניגש שלזה בצורה אחרת.אייטם הבא - את האמת ששמתי אותו רק כדי שיהיה לי, ככה, לצחוק, איך להלביש בדיחה: אז יש פה Windows11 ממומש ב-React . . . [האימוג'ים במקור, זורם . . . ]למי שמכיר את הסיפור של Windows 11 - זה סיפור בהמשכים, ונקרא לזה “הקונספט הרווח” היום שזה - סוג-של-Face-lift, וה-UI השתנה . . .אז מישהו פשוט מימש את ה-UI - וזהו: בנה את זה ב-React, כמו Mock-up כזה, שמתנהג ועובד אותו הדבר.נראה לי שזו הדרך היחידה שבה לא יהיה Blue-Screen בדבר הזה . . .לא?(רן) הפכו את זה לירוק, לא?[מה פתאום?! מיקרוסופט שלנו?! לא - הם צבעו אותו לשחור; בטח כי לא היה רפרנס ל-Rolling Stones מאז 1995…][וזה אחרי ש-Windows 10 is the last version of Windows . . . ](דותן) כן, אבל זה רץ על Chrome, אז הכל טוב . . .(רן) זה מבטיח שימוש ב-CPU, זה בטוח(דותן) לגמרי . . .(אלון) זה יכול להריץ Chrome, בפנים?(דותן) ניסיתי, את האמת - יש שם Edge, כאילו - יש שם אייקונים של Edge וכל מיני דברים כאלו . . ניסיתי ללחוץ לפחות, זה לא מגיב . . . לך תדע, אולי זה באמת ה-Windows 11 האמיתי - זה גם לא יגיב . . .(אלון) אתה יודע, יש את ההרצאה הזאת, של הבחור הזה שעושה על JavaScript ב-20-30, לא זוכר . . . (דותן) Wat, לא?(אלון) כן, הבחור של Wat - אבל יש את ההרצאה האגדית על JavaScript ושם בסוף, מריצים . . . Browser שמריץ Browser שמריץ Browser . . . רקורסיביתאז חשבתי - אולי התקדמנו לשם . . . אבל עוד לא.(דותן) עוד לא . . . לא נראה לי.האייטם הבא - האמת שפשוט אני אוהב Recommender systems . . . והדבר הזה קצת הפתיע אותי: זה היה נראה כמו עוד Recommender system שכתוב ב-Go, אבל זה נראה כמו Playground אחד שלם, שמשלב טכניקות מתקדמות[זה נקרא gorse]יש בפנים עוד כל מיני דברים מתקדמיםלא יודע עד כמה זה אתגר היום, הנושא הזה בתחום, אבל למי שמתעניין זה יכול להיות נחמד לנבור בקוד שם.אייטם הבא - קצת הזכרנו אותו פה: פעם אני חושב ששמנו איזשהו פרויקט AI שמשבט קול - עושה Cloning ל-Voice . . . [יאפ - Real-Time-Voice-Cloning, ב-381 Bumpers 63]אז יש פה עוד אחד כזה, ואת זה ניסיתי - והוא עבד ממש מגניב.[זה אותו אחד מאז - Real-Time-Voice-Cloning - וגם באזכור אז ההנחה הייתה שהשימוש העיקרי יהיה להטריל אנשים בעבודה . . . Who would've thought . . . ]אז בעצם אפשר, תוך חמש שניות, לשבט קול של מישהו ופשוט לייצר שיחה . . . אז לא יודע מה איתכם, אני מקליט אתכם . . . (רן) זהו, אני מניח שההערה הבאה של אלון היא לא באמת של אלון . . . (אלון) נכון, זה רן הקליט אותי, ועכשיו מייצר בוט שאומר מה אני עושה . . . זה די מטורף, אתה היום באת עם כלים של חברה מסויימת, ועכשיו הכלי הזה . . . די פסיכי, הדבר הזה.(דותן) לגמרי . . . שמע - תנסו את זה בבית, זה יכול להיות מצחיקואולי גם בעבודה - זה יכול להיות עוד יותר מצחיק . . .(אלון) כמה מסובך זה לעבוד עם זה? טוב, בוא נראה . . .(דותן) זה חמוד . . .(רן) טוב - אלון . . . האייטמים שלךאלון - יש איזו Repo ב-GitHub, שמדבר על . . . כמו כל ה-Awesome-ים שיש? אז Awesome Engineering Managementהוא בעצם - יש פה לינקים, כמו כל ה-Awesome-ים - מה זה כל דבר: Agile ו-Extreme programming ו-Rapid Prototyping ו-Waterfallבעצם, כל Buzzword שקשור ל-Software Management יש פה, החל מניהול ל-Process-ים לכלים, Learning . . . בקיצור, כמו כל ה-Awesome-ים, אז עשו Awesome של Engineering Management . . .(דותן) יש לי בעיה, אבל, עם המילה “Awesome” . . . (אלון) זו מילה שמורה כבר?(דותן) זה באמת Awesome? (אלון) אה, האם זה באמת “Awesome”? . . . לא יודע, אבל זה כמו כל ה-Awesome-ים(דותן) Awesome Waterfall?! . . . (אלון) האמת שלא בדקתי מה זה Awesome Waterfall . . .(דותן) יש פה . . .(אלון) כן . . .שתדע, אם כבר Waterfall, אז שיהיה ב-Awesome, אני אומר . . . כאילו, מה רע?ואגב - אם יש לך פרויקט, נגיד, של שלוש שעות - אני ממליץ לעשות אותו ב-Waterfall, זה בלי שום בעיה.זו יכולה להיות פרקטיקה די טובה - נגיד, לחלק לספרינטים של שבועיים פחות מוצלח . . . אז לפעמים זה מתאים.(רן) אז יש פה, אלון, יש פה איזו חמישים נושאים שונים - הזכרנו Waterfall, יש גם Agile, יש גם Project Charter ו-Project Management Plan וכו' . . . יש פה מלא-מלא תת-סעיפים, ולכל אחד מהם יש כלים או הסברים על איך לעשות אותם - אבל זה מלא חומר . . .קראת פה משהו? מצאת פה משהו ממש שימושי ומעניין?(אלון) אני חייב להגיד שלא מצאתי עדיין משהו מעניין - אבל יש פה הרבה חומר, אם מישהו כן מחפש משהו מעניין להתעמק בו . . .אז יש פה רשימה שלמה, על מלא נושאים - ויכול להיות שאפשר למצוא פה משהו נחמד.שמע - זה דברים די Basic, כן? . . . אבל יש פה גם Tool-ים לכל מיני דברים שמחפשים, לינקים וכאלה . . .אז למי שחסר לו משהו - יכול לחפש פה.(רן) אוקיי . . .(אלון) נראה לי מקום טוב להתחיל, האמת . . . יש פה הרבה נושאים והרבה Tool-ים, לפעמים מחפשים Tool-ים ואתה . . . נראה לי שזה מקום לא רע להתחיל בו.(רן) אוקיי . . . נקסט?(אלון) נקסט! דיברת על Rust מקודם, שהוכיחו אפילו שהיא שפה Safe . . . אז עכשיו - תמיד אנחנו מדברים על זה שזה מגניc לכתוב ב-Rust, אבל עד שאתה מסיים לקמפל (Compile), אתה כבר יכול לכתוב את זה מחדש ב-Go . . .אז עכשיו יש פה מאמר שמסביר ש-Compiling Rust is NP-hard, שזו בעיה NP קשה . . . . אז זה אכן קשה לקמפל את השפה - ובגלל זה לוקח לזה זמן . . .(דותן) שמע, אותי לימדו שבחיים, מה שבזול - ביוקר . . .(אלון) אותי לימדו “קשה יש רק בלחם, ו-NP קשה - אז גם את זה אוכלים”.אז יש פה מאמר, שהאמת הוא לא ארוך, הוא די קצר . . . לא רק שהוא לא ארוך, הוא אפילו די קצר, על ש- Compiling Rust - למה זה NP Hardעל החוקים, על ה-Safety, על ה-Boolean, כל מה שצריך לעשות - והוכחה שזה NP-קשה . . .אז אולי כשיהיו לנו מחשבים קוונטיים, או משהו כזה, או שמישהו אחר יוכיח ש P = NP, אז נוכל לקמפל Rust מהר . . . אבל עד אז, אנחנו בבעיה.(רן) יכול להיות ש-P = NP, זה במקרה ש N=1 או ש P=0 . . . (אלון) כן, אבל יש את המקרה הכללי, שעוד לא הוכיחו.(רן) בכללי אני לא מתחייב . . . אבל בשני המקרים האלה כן.(אלון) כן, גם באפס, באפס הכי טוב . . .(דותן) מה שמאכזב זה שה-Comment-ים לא תפסו . . . יש Comment-ים, אנשים מתחילים להגיד לו שהוא לא צודק וכאלה, אבל זה לא ממש . . .(אלון) הוא מתעלם . . . זה בסדר, דיקטטור טוב, ככה דיקטטור צריך לעבוד - אני לא מצליח להבין מה הבעיה . . . (רן) אבל בוא, בינינו - יש הרבה בעיות NP קשות שעובדים איתן ביום-יום, זה לא פוסל את הבעיה מלהיות פתירה.אולי אי אפשר לפתור אותה, עקרונית, בזמן יעיל כשיש הרבה מאוד דאטה, אבל תכל'ס - ביום-יום, אנחנו כולנו עוסקים ופותרים בעיות שגם הן מוגדרות כ-NP קשותאו שעושים את זה בצורה, אולי, לא יעילה, אבל עדיין פותריםאו שמוצאים כל מיני יוריסטיקות (Heuristics), ופותרים אותן בצורה מקורבת - אבל עושים את זה כל היום.(דותן) ובנימה רצינית - הוא בנה פה איזשהו Extrema case - מעיין Code base כזה, שהוא מכניס את ה-Compiler ל-Loop-ים - זה קיים, כי יש ב-Rust גם Macro-ים, וגם Proc-Macros, שזו אבולוציה, או “גרסא יותר נאורה” של Macro-ים, אין ספק שאפשר לעשות את זהאני יכול להגיד שבאמת, אם מישהו ברצינות רוצה לראות זמני קומפילציה (Compilation time), אז עוד מעידן ה-Blockchain התחילו לעבוד ב--Rust מ-Day Zero, ויש פרויקטים ענקיים, שהם Fully Open Source ואפשר לראות כמה זמן לוקח להם להתקמפל - והם בנויים ב-Rust.(אלון) בסדר . . . לא אמרנו שזה שמשהו קשה, אז בסדר . . . מתמודדים.מה שביוקר - ביוקר יותר.[כמאמר הפילוסוף הידוע א. פישוף - דברים זולים עולים פחות]בקיצור - יאללה, בואו נמשיך הלאה.אז הדבר הבא שרציתי זה איזשהו Framework שנקרא Fluvio - זה Programmable platform for data in” motion”בקיצור, זה Real-time Data Streaming Framework - כן, Yet another one - שכתוב ב-Rustהוא ב-Beta, או אפילו ב-Alpha - ממש בחיתולים.זה Open-Sourceאבל הוא . . . קודם כל, מזמן לא נתקלנו במשהו מודרני שהוא לא יוצא Apache, אני חושב, וזקן כזה, מעצבן . .לזכרוני, גם כולם כתובים בסוף ב-Java, או ב-JVM כלשהו - וזה כתוב ב-Rust, אז זה שינוי מרענן.פרויקט סופר-צעיר, לא יודע אם יצא ממנו משהו - אבל אהבתי שהתחילו עם זה עידן חדש . . . לא יודע אם הפרויקט שווה משהו.(רן) אז זה משהו בסגנון של, נגיד -Kafka Streams, או משהו כזה? לעשות חישובים על Stream-ים של דאטה?(אלון) כן - זה מה שאני מבין מהפרויקט הזה, שזה “A New Kafka” . . .(רן) אוקי, מעניין . . .(אלון) !Don't write in Production - אפילו הם בעצמם רושמים שזה אלפא או בטא או משהו כזהאבל כ-Concept זה מגניב, ולפרויקטי-צד וכאלה זה יכול להיות אחלה, במיוחד עם חיים ב-Ecosystem של Rustלמשהו עם מערכות קטנות, במקום להתחיל להתעסק עם איזה Kafkaאבל - Use it wisely . . . לא הייתי מעביר עכשיו את כל ה-Kafka שלי לשם, בוא נגיד את זה ככה.ורשום אלפא! בסדר, אנחנו אוהבים אלפא . . . אלפא זה טוב.טוב, הדבר הבא שרציתי - Dropbox פרסמו את ה-Dropbox Engineering Career Framework שלהםזה בעצם רישום מאוד מאוד מפורט של כל הדרגות של המתכנתים, SREs, Security Engineers . . . בקיצור - כל מה שקשור ל-R&D בצורה כלשהיכל הדרגות שלהם - מ-IC1 ועד IC7, נגיד ב-Software Engineerבעצם - פירוט של מה נדרש, ממש בפרטים מדוייקים, מכל דרגה - מה הוא צריך לעשות, מה ה-Impact שלו, מה ה-Ownership שלו, Decision Making, ה-Direction, ה-Talent, ו-Culture ו-Craft . . . .זה מפורט ברמות . . .(דותן) משוגע . . .(אלון) כן, ממש משוגע.עכשיו - קודם כל, זה חושף קצת איך . . . מה הולך בפנים ולמה הם מצפים מאנשים, אז אם מישהו הולך לחברות גדולות, אפשר להבין את הכיוון הכללי.למי שרוצה לבנות Framework לקידום, אז אפשר לקחת מפה חלקים ולבנות משהו - כי הוא ממש מפורטמן הסתם - זה Enterprise Level, אבל אני חושב שאפשר לגזור מפה הרבה דברים יפים.וגם - מי שבא לעבוד בחברה יכול להבין בערך איפה ה-Level שלו, אם הוא IC4 או IC5 או IC6 . . . (רן) אני חושב ששווה להגיד, למי שאף פעם לא עבד בחברה כזאת גדולה - מה המשמעות של ה-Level-ים האלה . . .אז (1) זה משהו שנפוץ בחברות גדולות, אז מן הסתם תראו את זה ב-Facebook וב-Microsoft וב-Google ובהרבה מאוד חברות גדולות אחרות.דרך אגב - אני ראיתי כאלה פנימיים של Google, וזה נראה די דומה, זאת אומרת, מבחינת הנפח - זה באמת ארוך ומתיש ומאוד מאוד מפורט - אבל בצדק, כי יש פה הרבה מאוד דברים שצריך להבהיר.עכשיו, המשמעות של “הדרגות" האלה - נקרא לזה ככה, כל חברה קוראת לזה קצת אחרת, אבל נקרא לזה “דרגות”, רק לצורך הפשטות - זה (1) הן באות ואומרות מה מצופה ממך לעשות ו-(2) הן קובעות את ה-Compensations שלך . . . בדרך כלל התשלום יתאים ליכולות שלך.ופה יש איזושהי דרך פורמלית לבוא ולהגיד כמה צריך לשלם לך, בגדול.עכשיו, נכון - יש משחקים, יש טווחים . . . גם בתוך כל דרגה - אבל סביר להניח שרוב ה-IC2, יהיה להם Compensations יותר גבוה מרוב ה-IC1, אוקיי?אז באופן כללי, כשאתם מתגייסים לחברה כזאת גדולה, אתם צריכים להבין לאיזה Slot אתם נכנסים . . . ואחר כך, לאורך החיים שלכם [בחברה], אתם כמובן יכולים להתקדם למעלה ולקבל Slot יותר גבוה, דרגה יותר גבוהה.(אלון) כן - ולכל דרגה כזאת יש, כמו שאמרת, “טווח מחירים” - מ-X ל-Y - אז כן, יכול להיות שמישהו בדרגה 2 ירוויח יותר ממישהו בדרגה 3, כי המינימום של 3 הוא קטן מהמקסימום של 2 - אבל הממוצע הוא הרבה יותר גבוה.לא יודע אם הרבה יותר גבוה, זה תלוי בדרגה - אבל הוא יותר גבוה.(רן) כן, ודרך אגב - זה קצת מתקשר לאחד מהאייטמים הראשונים ששלחתי, לגבי ה-Interview Book, ששלחתי רפרנס אליואז שם, למשל, (א) יש שם איזשהו פירוט של דרגות - נגיד, השוואה בין דרגות, ב-Phase של Data Scientists במקרה הזה, אבל של דרגות . . . ב-Facebook וב-Microsoft וב-Google - לכל אחת מהן יש שמות שונים לדברים האלה, תלוי . . . לא תמצאו חפיפה - אין איזשהו סטנדרט בתעשייה . . . אבל כן יש שם איזשהו ניסיון להשוות בין השמות, לצורך העניין ה-”Facebook-יים”, ה-”Microsoft-יים” וה-”Google-יים”, של הדרגות השונותו-(2) - הם מראים, באופן מאוד מובהק, שה-Compensation משתנה לא רק בסה”כ - אלא גם ההרכב של ה-Compensation משתנה לאורך הזמןלצורך העניין, אם אתם מתחילים יחסית צעירים בתוך חברה - רוב ה-Compensation שלכם זה משכורת, אוקיי? משכורת וקצת אופציותזה כמובן תלוי בסוג החברה, אבל באופן כללי זה ככה.וככל שהזמן עובר, וככל שאתם ניהיים יותר ותיקים, ה-Balance הזה משתנה, וזה מגיע למצב שבו חלק משמעותי מאוד מה-Compensation שלכם זה אופציות או מניות - וזה לפעמים גם יותר גדול מהמשכורת עצמה, אוקיי?וזה משהו שרואים בעיקר בחברות גדולות, אבל זה יכול . . . זאת אומרת, המידע העקבי נמצא בעיקר מהחברות הגדולות, זה כמובן יכול להיות בכל מקום, אבל מידע עקבי - יותר קל לאסוף אותו מחברות גדולותושם מאוד מאוד ברור שהרכב ה-Compensation משתנה ככל שהזמן עובר, וזה הופך להיות יותר ויותר “מנייתי” ופחות משכורת.זה על סמך מידע שיש לנו מארצות הברית . . . אני מנחש שבישראל זה משהו שהוא דומה, אבל קשה . . . זאת אומרת - אין לנו מספרים מדוייקים על ישראל, לפחות לא לי.(אלון) תודה על ההבהרה . . .(אלון) טוב, נראה לי שמיצינו . . . זה אמנם על Dropbox, אבל זה מייצג כנראה כל Corporate גדול - Google, Facebook, Microsoft וכו'.(דותן) רק הוספתי לך, ככה, אייטם Response לזה - זה הזכיר לי, השבוע . . יש לי חבר בשם אדיר, בחור מאוד מוכשר, שפרסם בעברית סוג-של-וידאו-כזה, על Hiring בהיי-טק - משהו מצחיק(רן) גדול . . . ראיתי את זה היום, באמת שכחתי לשים . . .(אלון) אבל זה צריך להיות במצחיקולים . . . אני הייתי רציני!(דותן) תראה . . . הייתי חייב, אחרי כל הסולמות והזה, הייתי חייב רגע . . .(אלון) רגע . . . אני חושב שאתה לא הבנת - המצחיקולים זה אחרי זה, ואם חשבת שאני צוחק עד עכשיו, אני נורא נעלבתי . . .(דותן) אה, הבנתי, זה לא היה, כאילו . . . (אלון) לא, זו לא הייתה בדיחה, זה היה רציני . . . אני עכשיו אשב בצד, אכבה את המיקרופון ואעבוד בחושך.(רן) שמתי גם במצחיקולים, מספיק עם ה . . . (דותן) סולמות ונחשים . . .(אלון) אני מבקש מהעורך להזיז את הקטע הזה למצחיקולים![יש עורך? בכל אופן - אז נניח שהעברתי את ההודעה בקשר](אלון) הדבר הבא - וידאו, על משהו שיכול להיות שאני היחיד בעולם שלא הכיר אותו - וזה הגיוני, למרות ששלחתי למישהו והוא גם לא הכיר אותו, אז לפחות שניים . . . זה נקרא Connected Sheetsוזה בעצם - למי שיש BigQuery בארגון, ואז רוצים להתחיל להוציא Queries ולהתחיל לתשאל ולהוציא דוחות, ואז אתה רוצה איזה Engineer או איזה איש BI שישב ויוציא דוחות . . אז מסתבר שיש דרך ממש פשוטה לחבר בין BigQuery ל-Spreadsheet . . . בצורה די מטורפת, שדי מפוצצת את המוח - ואתה לא צריך לתכנת כלום: הוא פשוט מעביר את המידע: אתה בוחר את ה-Data-set, הוא עושה את ה-Query - ואתה עושה את כל החיתוכים והכל ב-Spreadsheetזה הופך את זה שכל מי שיודע לעשות עם Excel או Spreadsheets, פתאום יודע לעשות את כל ה-Queries שלו ב-BigQuery - ולא צריך שום מהנדס . . .לדעתי, זה ממש פסיכי וחוסך . . . למי שיש להם בארגון מפתחים שמתעסקים, או לא יודע - BI, לא יודע איך לקרוא ל-Title הזה - שמתעסקים ב- Queries ולהוציא לכל מיני גורמים אחרים בארגון חיתוכים של BigQuery - שווה להסתכל על זה, כי יכול להיות שאפשר להזיז כוח אדם למשהו אחר . . .ויש גם ככה מצוקת כוח אדם . . .זהו, לי זה די פוצץ את המוח, אבל יכול להיות שכולם הכירו את זה.(דותן) זה נראה . . . פרסמו את זה באפריל 2020, אז זה יחסית חדש(אלון) כן . . . זה, כאילו, כל כך פשוט שאמרתי “בוא, זה בטח כולם . . . זה היה שם קודם”, אבל זה סרטון של שבע דקות שמראה מה אפשר לעשות שם - הם לקחו את Uber, לדעתי - Demand, ובערים וכל מיני כאלה, לא זוכר בדיוק - וזה מרשים, מרשים מאוד.אז מומלץ לכל מי שיש BigQuery(רן) זהו - וזה סולל את דרכנו היישר למצחיקולים . . . כמו שהבטחנו יש כמה להיום:אז האייטם הראשון - אני בטוח שלכל מי שכאן מאזין לפודקאסט, וכל מי שמשדר בפודקאסט, יש ניסיון של לפחות חמש שנים של שימוש בכלי שנקרא Infinidashזהו כלי מבית היוצר של AWS, שהתחיל כבדיחה - ובעצם המשיך כבדיחה . . . Infinidash זה כלי מומצא, לא קיים, דימיוני - שעוזר לך לעשות הכל יותר טוב, הכל יותר מקצוע, בגדול . . . הכל.זהו איזשהו Framework . . . (אלון)הם לא אמרו מה . . . הם אמרו פשוט שהכל יותר מקצועי ויותר טוב . . . (רן) אז בגלל זה אני לא זוכר . . .זה פורסם כבדיחה, אבל כמו שמפרסמים מדי פעם בדיחות של 1 באפריל, אז נותנים לזה כאילו מסכת רצינותולאט-לאט הקהילה תפסה, והתחילו לפתח Eco-system מסביב לזה . . .כלים שונים ו-Open-Source-ים שמשתמשים ב-Infinidash או עושים לו כל מיני תוספות . . . בקיצור, גדלה קהילה שלמה מסביב לזה.ובאיזשהו שלב זה היה נראה ממש כמו משהו מאוד מאוד רציני . . . היו Job Descriptions שמכילים ניסיון ב-Infinidash וכו'. . . אז קישרתי פה כמה דברים - יש לא מעט תוכן על הרשת, קישרתי פה לפוסט ב-ycombinator ואיזשהו Twitt נחמד של Corey Quinn - יש לא מעט תוכן על זה, על Infinidash, אז אני ממליץ.זה Framework מדהים - אני ממליץ ללכת ולהשתמש בו אתמול.(אלון) כן . . . אז Signal מחפשים מפתחי Infinidash . . . זה תפס תאוצה, מי שיש לו ניסיון ב-Infinidash מוזמן.אני רק רוצה לציין, באותה נקודה, שאני כבר מכיר פודקאסט בעברית, מפורסם, שפעם עשה 1 באפריל ואחרי זה אנשים חשבו שכל מיני דברים קרו בתעשייה, שלא קרו, ואז היו התכתבויות נאצה על חלק מהדוברים בפודקאסט . . . (רן) חכו לאפריל כקרוב . . . [וורנר עוד יעלה לארץ, אתם עוד תראו]והיום יש לנו איזשהו נושא חביב, שנקרא Correlation ו- Causation: מי לא מכיר את המתח שבין Correlation ו- Causation? אז האייטם הראשון - שלך, אלון:(אלון) יש! . . . ניסו להסביר כאן Correlation ו- Causation, אז לקחו את כל כוכבי הלכת במערכת השמש שלנו, ובדקו כמה מתו . . . כמה אנשים מתו בכל כוכבאז בכל כוכבי הלכת המספר הוא אפס - ובכדור הארץ זה בערך 120 מליארד איש עד היום . . . [לא חשדתי]ואז [ואז!] עשו עם זה הצלבה עם באיזה כוכב לכת משתמשים ב-JavaScript? ויצא שרק בכדור הארץ . . .[פה חשדתי!]ולכן - כנראה ש-JavaScript גרם למותם של 120 מליארד איש . . . (רן) כן, הקורלציה פה מובהקת, ולכן גם ה”קוזציה”, אם יש מילה כזאת בעברית . . . [סיבתיות]כן - כאן שמעתם את זה לראשונה: JavaScript אחאי למותם של כ-120 מיליארד אזרחי כדור הארץ . . . בהחלט, משכנע[ולמי שלא השתכנע - בטקסט יש גם תמונות, אז זה בטוח נכון:]אז נלך למשהו קצת יותר “מעונב” - xkcd, שבא ומראה שתי דמויות, אני אתמלל לכם, וגם תוכלו לראות את הציור כמובן בבלוג [בלוג!] שלנו [הטבה בלעדית, רק למי שקורא פודקאסטים . . .]:שני אנשים באים ומדבריםאחד מהם אומר: “פעם חשבתי שקורלציה . . . I used to think that correlation implied causation“ . . . שקורלציה הינה “קוזציה” [סיבתיות]“אבל אז לקחתי קורס בסטטיסטיקה - ועכשיו אני כבר לא חושב ככה”אז עונה לו החברה - “אז נשמע כאילו הקורס עזר . . .”אז הוא אומר “טוב, לא בטוח”סטאגאדיש!(אלון) תלחץ על האפקט של התופים . . . (רן) לגמרי . . . זהו - והאייטם האחרון של המצחיקולים זה האייטם שדותן רמז עליו מקודם: החבר שלך, אדיר, שהוציא וידאו נחמד, היתולי, על מגייסים או מגייסות בחברת היי-טק טיפוסיתממליץ ללכת ולראות, מצחיק, לגמרי.ראיתי את זה . . . דרך אגב, יש לו גם Facebook, גם LinkedIn . . . .פה ספציפית שמתי את הקישור ל-LinkedIn, אבל אפשר למצוא את זה בכל הפלטפורמות המרכזיות . . . (אלון) אתה רואה - זה התזמון! עכשיו שמים את המצחיקולים . . . אחרי שרן אומר “מצחיקולים!” ויש קטע של קטעים מצחיקולים - אז אתה שם את זה.נעשה אימון לפני הפעם הבאה . . .(רן) זה לא היה הקול של אלון - זה היה דותן שעשה אותו . . .יאללה, חברים - תודה רבה, נעמתם לנו מאוד, נתראה בפעם הבאה . . . האזנה נעימה ותודה רבה לעופר פורר על התמלול!

Streaming Audio: a Confluent podcast about Apache Kafka
Consistent, Complete Distributed Stream Processing ft. Guozhang Wang

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jul 22, 2021 29:00 Transcription Available


Stream processing has become an important part of the big data landscape as a new programming paradigm to implement real-time data-driven applications. One of the biggest challenges for streaming systems is to provide correctness guarantees for data processing in a distributed environment. Guozhang Wang (Distributed Systems Engineer, Confluent) contributed to a leadership paper, along with other leaders in the Apache Kafka® community, on consistency and completeness in streaming processing in Apache Kafka in order to shed light on what a reimagined, modern infrastructure looks like. In his white paper titled Consistency and Completeness: Rethinking Distributed Stream Processing in Apache Kafka, Guozhang covers the following topics in his paper: Streaming correctness challengesStream processing with KafkaExactly-once in Kafka StreamsFor context, accurate, real-time data stream processing is more friendly to modern organizations that are composed of vertically separated engineering teams. Unlike in the past, stream processing was considered as an auxiliary system to normal batch processing oriented systems, often bearing issues around consistency and completeness. While modern streaming engines, such as ksqlDB and Kafka Streams are designed to be authoritative, as the source of truth, and are no longer treated as an approximation, by providing strong correctness guarantees. There are two major umbrellas of the correctness of guarantees: Consistency: Ensuring unique and extant recordsCompleteness: Ensuring the correct order of records, also referred to as exactly-once semantics. Guozhang also answers the question of why he wrote this academic paper, as he believes in the importance of knowledge sharing across the community and bringing industry experience back to academia (the paper is also published in SIGMOD 2021, one of the most important conference proceedings in the data management research area). This will help foster the next generation of industry innovation and push one step forward in the data streaming and data management industry. In Guozhang's own words, "Academic papers provide you this proof of concept design, which gets groomed into a big system."EPISODE LINKSWhite Paper: Rethinking Distributed Stream Processing in Apache KafkaBlog: Rethinking Distributed Stream Processing in Apache KafkaEnabling Exactly-Once in Kafka StreamsWhy Kafka Streams Does Not Use Watermarks ft. Matthias SaxStreams and Tables: Two Sides of the Same CoinWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get $60 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Data-Driven Digitalization with Apache Kafka in the Food Industry at BAADER

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Jun 29, 2021 27:53 Transcription Available


Coming out of university, Patrick Neff (Data Scientist, BAADER) was used to “perfect” examples of datasets. However, he soon realized that in the real world, data is often either unavailable or unstructured. This compelled him to learn more about collecting data, analyzing it in a smart and automatic way, and exploring Apache Kafka® as a core ecosystem while at BAADER, a global provider of food processing machines. After Patrick began working with Apache Kafka in 2019, he developed several microservices with Kafka Streams and used Kafka Connect for various data analytics projects. Focused on the food value chain, Patrick's mission is to optimize processes specifically around transportation and processing. In consulting one customer, Patrick detected an area of improvement related to animal welfare, lost revenues, unnecessary costs, and carbon dioxide emissions. He also noticed that often machines are ready to send data into the cloud, but the correct presentation and/or analysis of the data is missing and thus the possibility of optimization. As a result:Data is difficult to understand because of missing unitsData has not been analyzed so farComparison of machine/process performance for the same machine but different customers is missing In response to this problem, he helped develop the Transport Manager. Based on data analytics results, the Transport Manager presents information like a truck's expected arrival time and its current poultry load. This leads to better planning, reduced transportation costs, and improved animal welfare. The Asset Manager is another solution that Patrick has been working on, and it presents IoT data in real time and in an understandable way to the customer. Both of these are data analytics projects that use machine learning.Kafka topics store data, provide insight, and detect dependencies related to why trucks are stopping along the route, for example. Kafka is also a real-time platform, meaning that alerts can be sent directly when a certain event occurs using ksqlDB or Kafka Streams.As a result of running Kafka on Confluent Cloud and creating a scalable data pipeline, the BAADER team is able to break data silos and produce live data from trucks via MQTT. They've even created an Android app for truck drivers, along with a desktop version that monitors the data inputted from a truck driver on the app in addition to other information, such as expected time of arrival and weather information—and the best part: All of it is done in real time.EPISODE LINKSLearn more about BAADER's data-in-motion use casesRead about how BAADER uses Confluent CloudWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)