Podcasts about Apache Flink

  • 53PODCASTS
  • 89EPISODES
  • 48mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • May 14, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about Apache Flink

Latest podcast episodes about Apache Flink

IIoT Use Case Podcast | Industrie
#168 | (EN) Real-Time Data with Apache Flink: How Ververica and Steadforce Drive IIoT Success | Steadforce & Ververica

IIoT Use Case Podcast | Industrie

Play Episode Listen Later May 14, 2025 34:31


www.iotusecase.com#ApacheFlink #StreamProcessing #RealTimeData In Episode 168 of the IoT Use Case Podcast, host Ing. Madeleine Mickeleit speaks with Ben Gamble, Product Manager at Ververica, and Stephan Schiffner, Head of Data + AI at Steadforce, about real-time data processing with Apache Flink. Ververica, co-creator of Flink, provides a production-ready platform for stream processing. Steadforce brings years of project experience from industrial environments. Together, they share insights into real-world IIoT projects and explain how production processes, anomaly detection, and supply chains can be optimized through streaming technologies—and why “evolution over revolution” is key to success.Podcast episode summaryReal-Time over Downtime – Why Apache Flink Is Becoming a Key Technology for Industrial Data ProjectsWhether it's predictive maintenance, anomaly detection, or adaptive production control – modern industrial companies face the challenge of not just collecting data, but acting on it in real time. Apache Flink has emerged as the leading tool for stream processing.This episode dives into real-world applications in manufacturing, logistics, and infrastructure – from reducing work-in-progress and monitoring temperature trends to optimizing complex supply chains. You'll learn how companies gradually extend their existing IT/OT architectures with Flink, what common mistakes to avoid, and why “evolution over revolution” is often the smarter strategy.Also in focus: Why investments in streaming technologies often pay off before the ROI becomes measurable in monetary terms – and how projects can get started efficiently using starter kits, Flink SQL, and the Ververica Cloud.For OT/IT leaders, data architects, and decision-makers in Industrial IoT who want to build scalable, secure, and maintainable streaming use cases.-----Relevant links from this episode:Madeleine (https://www.linkedin.com/in/madeleine-mickeleit/)Stephan (https://www.linkedin.com/in/stephan-schiffner/)Ben (https://www.linkedin.com/in/bengamble7/)Ververica Cloud (https://www.ververica.com/deployment/managed-service)Jetzt IoT Use Case auf LinkedIn folgen

IBS Intelligence Podcasts
EP860: The value of stream processing and real-time analytics in financial services

IBS Intelligence Podcasts

Play Episode Listen Later Apr 11, 2025 13:34


Ben Gamble, Field CTO, VervericaInvestigating the use cases for stream processing technology, such as fraud and anomaly detection; we quantify the value of real-time data analytics for business intelligence and decision-making. Ben Gamble, Field CTO of Ververica talks to Robin Amlôt of IBS Intelligence about the convergence of technologies such as agentic AI and stream processing, and how it can help in the drive towards personalisation.

DMRadio Podcast
Just In Time: How Streaming Architectures Enable Business

DMRadio Podcast

Play Episode Listen Later Mar 20, 2025 54:12


In an era where real-time decision-making is a serious competitive advantage, streaming-first architectures are revolutionizing how organizations process and act on data. Unlike traditional batch-oriented systems, streaming platforms like Apache Kafka, Redpanda, Apache Flink, and Apache Pulsar enable continuous data ingestion, transformation, and analysis at scale. These technologies empower businesses to break free from the limitations of periodic data updates, unlocking the ability to react instantly to events, personalize customer experiences in real-time, and drive automation with high-velocity insights. By decoupling producers and consumers through scalable, event-driven pipelines, streaming-first architectures not only enhance system resilience but also pave the way for a more agile, intelligence-driven enterprise. Register for this episode of DM Radio to learn how today's innovators are leveraging this rapidly evolving discipline.

Engineering Kiosk
#180 Skalierung, aber zu welchem Preis? (Papers We Love)

Engineering Kiosk

Play Episode Listen Later Jan 28, 2025 58:55


Skalierung und verteilte Berechnungen: Sind mehr CPUs wirklich immer schneller?Stell dir vor, du bist Softwareentwickler*in und jeder spricht von Skalierung und verteilten Systemen. Doch wie effizient sind diese eigentlich wirklich? Heißt mehr Rechenpower gleich schnellere Ergebnisse?In dieser Episode werfen wir einen Blick auf ein wissenschaftliches Paper, das behauptet, die wahre Leistung von verteilten Systemen kritisch zu hinterfragen. Wir diskutieren, ab wann es sich lohnt, mehr Ressourcen einzusetzen, und was es mit der mysteriösen Metrik COST (ausgesprochen Configuration that Outperforms a Single Thread) auf sich hat. Hör rein, wenn du wissen willst, ob Single-Threaded Algorithmen in vielen Fällen die bessere Wahl sind.Bonus: Ggf. machen nicht alle Wissenschaftler so wissenschaftliche Arbeit.Unsere aktuellen Werbepartner findest du auf https://engineeringkiosk.dev/partnersDas schnelle Feedback zur Episode:

Le Podcast AWS en Français
Les nouveautés AWS au 10 janvier

Le Podcast AWS en Français

Play Episode Listen Later Jan 10, 2025 11:31


Le premier episode 'Quoi de neuf ?' de cette année revient sur les prédictions technologiques pour 2025 et au delà de Werner Vogels, CTO d'Amazon. Nous parlons aussi des nouveautés des dernières semaines: Amazon bedorck et Sagemake acceuillent Les nouveaux modèles de Meta (Llama 3.3) et de Stability.ai (Stable Diffusion 3.5). On parle aussi d'un nouveau connecteur open source pour Apache Flink et Amazon Kinesis Data Stream, de Amazon Workspaces qui est désormais accessible via AWS Global Accelerator. Enfin nous abordons des nouvelles fonctions de Resources Explorer et dans la console de gestion des factures et des coûts (AWS Billing and Costs Management).

The Ravit Show
Origin of Apache Flink and it's impact of the market

The Ravit Show

Play Episode Listen Later Jan 2, 2025 12:09


Do you want to know how Apache Flink was formed and transformed the world of stream processing? You get to hear it from the co-creator himself! It was absolutely amazing to chat with Stephan Ewen, the visionary Co-creator of Apache Flink and Founder of Restate, at Flink Forward Berlin 2024, hosted by Ververica | Original creators of Apache Flink® — We spoke about the origins of Apache Flink, exploring the inspiration behind its creation and how it transformed the world of stream processing. Stephan shared insights into the core challenges his team faced during Flink's development and how overcoming those hurdles shaped the product into what it is today — We also discussed Flink's impact on the market, its evolution alongside other technologies like Hadoop and Spark, and the turning point that led to its widespread adoption. It was fascinating to hear about the ecosystem's growth and the unexpected ways it has expanded across industries — In addition to his journey with Flink, Stephan gave us a glimpse into his new project, Restate. He shared the motivation behind this venture and how it aims to tackle emerging challenges in the data landscape — For those venturing into real-time data processing, Stephan offered some valuable advice based on his experiences—a great takeaway for any developer or engineer in this field Thanks Stephan for talking with us on the journey! Stay tuned for more updates from Flink Forward! Join our Newsletter to stay updated with such content with 137k+ subscribers here — https://lnkd.in/d3wssfcK #data #ai #ververica #flinkforward #apacheflink #datastreaming #theravitshow

The Ravit Show
StreamNative's Partnership with Ververica

The Ravit Show

Play Episode Listen Later Jan 1, 2025 13:22


Thrilled to be covering Flink Forward Berlin 2024 hosted by Ververica | Original creators of Apache Flink®, where I had a fascinating conversation with Sijie Guo, CEO of StreamNative! As the first partner in Ververica's new “Powered by Ververica” program, StreamNative is making waves by integrating Ververica's VERA engine into its platform. Sijie shared exciting insights into how this collaboration brings real-time data processing to life across industries like financial services, automotive, and IoT. We discussed use cases where the blend of Apache Pulsar and Flink simplifies data processing while enhancing operational efficiency and scalability for their customers. With a shared commitment to open-source innovation, StreamNative and Ververica are pushing boundaries to bring advanced data capabilities to businesses worldwide. Thank you, Sijie, for a visionary conversation! #data #ververica #flinkforward #apacheflink #datastreaming #theravitshow

The Ravit Show
Booking.com on using Ververica

The Ravit Show

Play Episode Listen Later Dec 26, 2024 5:31


At Flink Forward Berlin 2024, I had the pleasure of interviewing Siddhartha Choudhury, Senior Product Manager, Booking.com about Ververica | Original creators of Apache Flink®'s transformative impact on their systems. Siddhartha shared insights into how Ververica has enabled Booking.com to scale their Flink jobs seamlessly over the years. Since adopting Ververica, they've reached major milestones in data quality and operational scalability. From streamlined data quality management to achieving new heights in scalability, Ververica has become integral to their strategy. Big thanks to Siddhartha for sharing Booking.com's journey with Ververica and the impressive impact it's had! #data #ververica #flinkforward #apacheflink #datastreaming #theravitshow

The Ravit Show
BYOC, Apache Flink, Future Developments

The Ravit Show

Play Episode Listen Later Dec 18, 2024 10:23


Excited to be at Flink Forward 2024 by Ververica | Original creators of Apache Flink®. I had the pleasure of interviewing Igor Kersic, Head of Product at Ververica, where we discussed their exciting new BYOC (Bring Your Own Cloud) product and how it's shaping modern cloud-native architectures. Igor shared some incredible insights into how BYOC differentiates Ververica in the competitive streaming data market, empowering enterprises with unprecedented scalability and flexibility. We also explored real-world use cases that highlight the strengths of BYOC, and how Ververica's expertise with Apache Flink enhances this offering. Plus, we got a sneak peek into what's next for BYOC at Ververica! Big thanks to Igor for such a fascinating conversation. Stay tuned for more insights from the conference!

Open at Intel
AI, Community, and the Future of Generative Applications

Open at Intel

Play Episode Listen Later Nov 27, 2024 20:53


In this engaging conversation at the All Things Open conference, Tim Spann, Principal Developer Advocate at Zilliz, discusses the importance of community collaboration in advancing AI technologies. He emphasizes the need for diverse perspectives in solving complex problems and highlights his work with the Milvus open source vector database. Tim also explains the evolving landscape of retrieval augmented generation (RAG) and its applications and shares insights into the future of AI development. The conversation concludes on a lighter note with Tim describing his creative use of Milvus in a fun Halloween project to catalog and identify ghosts. 00:00 Introduction 00:41 Meet Tim Spann: Principal Developer Advocate 01:35 The Importance of Community in AI 02:56 Advanced RAG and Multimodal Models 06:17 The Future of Agentic RAG 09:04 Challenges and Excitement in AI Development 13:35 Building AI the Right Way 17:50 Fun with AI: Capturing Ghosts 19:24 Conclusion and Final Thoughts   Guest: Tim Spann is a Principal Developer Advocate for Zilliz and Milvus. He works with Apache NiFi, Apache Kafka, Apache Pulsar, Apache Flink, Flink SQL, Apache Pinot, Trino, Apache Iceberg, DeltaLake, Apache Spark, Big Data, IoT, Cloud, AI/DL, machine learning, and deep learning. Tim has over ten years of experience with the IoT, big data, distributed computing, messaging, streaming technologies, and Java programming. Previously, he was a Principal Developer Advocate at Cloudera, Developer Advocate at StreamNative, Principal DataFlow Field Engineer at Cloudera, a Senior Solutions Engineer at Hortonworks, a Senior Solutions Architect at AirisData, a Senior Field Engineer at Pivotal and a Team Leader at HPE. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton & NYC on Big Data, Cloud, IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as ApacheCon, DeveloperWeek, Pulsar Summit and many more. He holds a BS and MS in computer science.

Contributor
No PhD Required: Restate with Stephan Ewan

Contributor

Play Episode Listen Later Sep 25, 2024 32:34


Stephan Ewan (@StephanEwan) is the co-founder of Restate, the open-source workflow-as-code engine. Restate is lightweight, simple, and provides durable execution. Before Restate, Stephan co-created Apache Flink, the open-source stream processing framework. Lessons learned from Flink have heavily influenced the development of Restate, although Stephan says they have exact opposite use cases. Contributor is looking for a community manager! If you want to know more, shoot us an email at eric@scalevp.com. Subscribe to Contributor on Substack for email notifications! In this episode we discuss: The history of Flink and the impact of the 2016 U.S. election Why tooling for real-time transactional problems has historically had room for improvement What constitutes “modern” workflow engines Can you use Restate for any use case? Moving from a large company to a small startup as an open-source developer Links: Restate Apache Flink People mentioned: Kostas Tzoumas (@kostas_tzoumas) Other episodes: Temporal with Maxim Fateev

The New Stack Podcast
How Apache Iceberg and Flink Can Ease Developer Pain

The New Stack Podcast

Play Episode Listen Later Sep 12, 2024 47:08


In the New Stack Makers episode, Adi Polak, Director, Advocacy and Developer Experience Engineering at Confluent discusses the operational and analytical estates in data infrastructure. The operational estate focuses on fast, low-latency event-driven applications, while the analytical estate handles long-running data crunching tasks. Challenges arise due to the "schema evolution" from upstream operational changes impacting downstream analytics, creating complexity for developers. Apache Iceberg and Flink help mitigate these issues. Iceberg, a table format developed by Netflix, optimizes querying by managing file relationships within a data lake, reducing processing time and errors. It has been widely adopted by major companies like Airbnb and LinkedIn. Apache Flink, a versatile data processing framework, is driving two key trends: shifting some batch processing tasks into stream processing and transitioning microservices into Flink streaming applications. This approach enhances system reliability, lowers latency, and meets customer demands for real-time data, like instant flight status updates. Together, Iceberg and Flink streamline data infrastructure, addressing developer pain points and improving efficiency. Learn more from The New Stack about Apache Iceberg and Flink:Unfreeze Apache Iceberg to Thaw Your Data LakehouseApache Flink: 2023 Retrospective and Glimpse into the Future 4 Reasons Why Developers Should Use Apache Flink Join our community of newsletter subscribers to stay on top of the news and at the top of your game. 

Data Engineering Podcast
X-Ray Vision For Your Flink Stream Processing With Datorios

Data Engineering Podcast

Play Episode Listen Later Jun 9, 2024 42:22


Summary Streaming data processing enables new categories of data products and analytics. Unfortunately, reasoning about stream processing engines is complex and lacks sufficient tooling. To address this shortcoming Datorios created an observability platform for Flink that brings visibility to the internals of this popular stream processing system. In this episode Ronen Korman and Stav Elkayam discuss how the increased understanding provided by purpose built observability improves the usefulness of Flink. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management This episode is supported by Code Comments, an original podcast from Red Hat. As someone who listens to the Data Engineering Podcast, you know that the road from tool selection to production readiness is anything but smooth or straight. In Code Comments, host Jamie Parker, Red Hatter and experienced engineer, shares the journey of technologists from across the industry and their hard-won lessons in implementing new technologies. I listened to the recent episode "Transforming Your Database" and appreciated the valuable advice on how to approach the selection and integration of new databases in applications and the impact on team dynamics. There are 3 seasons of great episodes and new ones landing everywhere you listen to podcasts. Search for "Code Commentst" in your podcast player or go to dataengineeringpodcast.com/codecomments (https://www.dataengineeringpodcast.com/codecomments) today to subscribe. My thanks to the team at Code Comments for their support. Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Ronen Korman and Stav Elkayam about pulling back the curtain on your real-time data streams by bringing intuitive observability to Flink streams Interview Introduction How did you get involved in the area of data management? Can you describe what Datorios is and the story behind it? Data observability has been gaining adoption for a number of years now, with a large focus on data warehouses. What are some of the unique challenges posed by Flink? How much of the complexity is due to the nature of streaming data vs. the architectural realities of Flink? How has the lack of visibility into the flow of data in Flink impacted the ways that teams think about where/when/how to apply it? How have the requirements of generative AI shifted the demand for streaming data systems? What role does Flink play in the architecture of generative AI systems? Can you describe how Datorios is implemented? How has the design and goals of Datorios changed since you first started working on it? How much of the Datorios architecture and functionality is specific to Flink and how are you thinking about its potential application to other streaming platforms? Can you describe how Datorios is used in a day-to-day workflow for someone building streaming applications on Flink? What are the most interesting, innovative, or unexpected ways that you have seen Datorios used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Datorios? When is Datorios the wrong choice? What do you have planned for the future of Datorios? Contact Info Ronen LinkedIn (https://www.linkedin.com/in/ronen-korman/) Stav LinkedIn (https://www.linkedin.com/in/stav-elkayam-118a2795/?originalSubdomain=il) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com) with your story. Links Datorios (https://datorios.com/) Apache Flink (https://flink.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/apache-flink-with-fabian-hueske-episode-57) ChatGPT-4o (https://openai.com/index/hello-gpt-4o/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Developer Voices
ByteWax: Rust's Research Meets Python's Practicalities (with Dan Herrera)

Developer Voices

Play Episode Listen Later May 8, 2024 61:54


Bytewax is a curious stream processing tool that blends a Python surface with a Rust core to produce something that's in a similar vein to Kafka Streams or Apache Flink, but with a fundamentally different implementation. This week we're going to take a look at what it does, how it works in theory, and how the marriage of Python and Rust works in practice…–The original Naiad Paper: https://dl.acm.org/doi/10.1145/2517349.2522738Timely Dataflow: https://github.com/TimelyDataflow/timely-dataflowBytewax the Library: https://github.com/bytewax/bytewaxBytewax the Service: https://bytewax.io/PyO3, for calling Rust from Python: https://pyo3.rs/v0.21.2/Kris on Mastodon: http://mastodon.social/@krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/Kris on Twitter: https://twitter.com/krisajenkins--#softwaredevelopment #dataengineering #apachekafka #timelydataflow

Open Source Startup Podcast
E126: RisingWave's Take on Launching a New Database

Open Source Startup Podcast

Play Episode Listen Later Apr 15, 2024 40:27


Yingjun Wu is Founder of RisingWave, a new open source stream processing database. RisingWave has raised $40M from investors including Yunqi Partners. In this episode, we discuss RisingWave's approach versus Apache Flink and other stream processing frameworks, why stream processing is important for real-time monitoring use cases, why they initially focused on startups and how free support helped develop trust with these early users, key decisions around their product and why Postgres compatibility was crucial & more!

The GeekNarrator
Restate - making distributed systems simple with Stephan Ewen

The GeekNarrator

Play Episode Listen Later Mar 22, 2024 65:33


In this video, I talk to Stephan Ewen from Restate, who is popularly known from the world of Apache Flink. We have talked about the problems in the world of Distributed systems and the complex solutions developers have to deal with. This complexity makes the architecture so complex that it eventually creates reliability, Observability and delivery velocity problems. Restate aims to solve it by making resilience and durability for your services, functions and RPC a lot simpler. Chapters: 00:00 Introduction 00:45 Introducing Restate: A Solution for Distributed System Challenges 01:22 Deep Dive into Restate with Stefan: From Apache Flink to Building Resilient Systems 06:04 The Complexities of Distributed Systems and How Restate Addresses Them 15:49 The Vision of Restate: Simplifying Developer Experience in Distributed Systems 24:42 Integrating Restate into Your Architecture: A User's Perspective 33:16 Exploring Restate: The Durable Service Mesh 33:32 The Power of Restate in Handling Transactions 34:26 Restate's Role in Service Communication and Durability 35:40 Deep Dive into Restate's Mechanisms and Benefits 38:04 Practical Example: Email Pipeline with Restate 39:40 Understanding Restate's Log and Event Handling 58:43 Restate's Unique Features and Programming Model 01:04:22 Final Thoughts on Restate's Impact and Deployment Restate: https://restate.dev/ =============================================================================== For discount on the below courses: Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount. =============================================================================== Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #distributedsystems #faulttolerance #reliability #resilience

CZPodcast
CZ Podcast 314 - Stream processing a Apache Flink

CZPodcast

Play Episode Listen Later Mar 2, 2024 68:00


Datovy processing se zacina presouvat z batch zpracovani k streamovemu zpracovani. Apache Flink se stava go to resenim pro tento typ zpracovani dat. V tomto dile jsme se potkali s Davidem Moravkem a Janem Svobodou z Confluent.io, kteri maji s Apache Flink zkusenosti. Spolecne s Dagim se ptal Vaclav Brodec, ktery je v Ataccame zodpovedny za next generation engine pro zajisteni datove kvality.

GOTO - Today, Tomorrow and the Future
Designing A Data-Intensive Future • Martin Kleppmann & Jesse Anderson

GOTO - Today, Tomorrow and the Future

Play Episode Listen Later Feb 16, 2024 28:36 Transcription Available


This interview was recorded at GOTO Amsterdam for GOTO Unscripted.http://gotopia.techRead the full transcription of this interview hereMartin Kleppmann - Researcher at the Technical University of Munich & Author of "Designing Data-Intensive Applications"Jesse Anderson - Managing Director of Big Data Institute, Host of The Data Dream Team PodcastRESOURCESJesse Anderson: https://youtu.be/cWSCI1LpoGYMartin Kleppmann: https://youtu.be/esMjP-7jlREPrag. Dave Thomas: https://youtu.be/ug8XX2MpzEwhttps://automerge.orgMartinhttps://martin.kleppmann.comhttps://twitter.com/martinklhttps://nondeterministic.computer/@martinhttps://linkedin.com/in/martinkleppmannJessehttps://twitter.com/jessetandersonhttps://www.jesse-anderson.comhttps://sodapodcast.libsyn.com/sitehttps://linkedin.com/in/jessetandersonhttps://www.jesse-anderson.com/category/blogDESCRIPTIONJesse Anderson, director at Big Data Institute, and Martin Kleppmann, author of "Designing Data-Intensive Applications" explore together the evolving data landscape. They start with the origins of Martin's book, emphasizing the crucial art of asking the right questions. Martin unveils industry shifts since 2017, spotlighting the transformative rise of cloud services.The conversation then takes a twist as Martin delves into academia, sharing insights on local-first collaboration software and the fascinating world of Automerge. Aspiring software engineers are treated with some advice on how to navigate the delicate balance between simplicity and adaptability.The interview concludes with a glimpse into diverse career paths in the dynamic realm of data engineering, making it a must-watch for professionals at every stage of their journey.RECOMMENDED BOOKSMartin Kleppmann • Designing Data-Intensive ApplicationsMartin Kleppmann • Secret Colors: A Gentle Introduction to CryptographyJesse Anderson • Data TeamsJesse Anderson • Data Engineering TeamsJesse Anderson • The Ultimate Guide to Switching Careers to Big DataViktor Gamov, Dylan Scott & Dave Klein • Kafka in ActionFabian Hueske & Vasiliki Kalavri • Stream Processing with Apache FlinkTwitterInstagramLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!

Software Huddle
Durable Async/Await with Stephan Ewen of Restate

Software Huddle

Play Episode Listen Later Jan 30, 2024 64:56


Today's guest is a legend in the distributed systems community. Stephan Ewan was one of the creators of Apache Flink, a stream processing engine that took off with the rise of Apache Kafka. Stephan is now working on core transactional problems by building a durable async/await system that integrates with any programming language. It's designed to help with a number of difficult problems in transactional processing, including idempotency, dual writes, distributed locks, and even simple retries and cancellation. In this episode, we get into the details of how Restate works and what it does. We cover core use cases and how people are solving these problems today. Then, we dive into the core of the Restate engine to learn why they're building on a log-based system. Finally, we cover lessons learned from Stephan's time with Flink and what's next for Restate.

Rust in Production
Rust in Production Ep 4 - Arroyo's Micah Wylde

Rust in Production

Play Episode Listen Later Jan 25, 2024 55:50


In this episode, we have Micah Wylde from Arroyo as our guest. Micah introduces us to Arroyo, a real-time data processing engine that simplifies stream processing for data engineers using Rust. They explain how Arroyo enables users to write SQL queries with Rust user-defined functions on top of streaming data, highlighting the advantages of real-time data processing and discussing the challenges posed by competitors like Apache Flink. Moving on, we dive into the use of Rust in Arroyo and its benefits in terms of performance and memory safety. We explore the complementarity of workflow engines and stream processors and examine Arroyo's approach to real-time SQL and its compatibility with Postgres. Micah delves into memory and lifetime concerns and elaborates on how Arroyo manages them in its storage layer. Shifting gears, we explore the use of the Tokyo framework in the Arroyo system and how it has enhanced speed and efficiency. Micah shares insights into the challenges and advantages of utilizing Rust, drawing from their experiences with Arroyo projects. Looking ahead, we discuss the future of the Rust ecosystem, addressing the current state of the Rust core and standard library, as well as the challenges of interacting with other languages using FFI or dynamically loading code. We touch upon Rust's limitations regarding a stable ABI and explore potential solutions like WebAssembly. We also touch upon industry perceptions of Rust, investor perspectives, and the hiring process for Rust engineers. The conversation takes us through the crates used in the Arroyo system, our wishlist for Rust ecosystem improvements, and the cost-conscious nature of companies that make Rust an attractive choice in the current macroeconomic environment. As we wrap up, we discuss the challenges Rust faces in competing with slower Java systems and ponder the potential for new languages to disrupt the trend in the future. We touch upon efficiency challenges in application software and the potential for a new language to emerge in this space. We delve into the increasing interest in using Rust in data science and the promising prospects of combining Rust with higher-level languages. Finally, we discuss the importance of fostering a welcoming and drama-free Rust community. I would like to thank Micah for joining us today and sharing their insights. To find more resources related to today's discussion, please refer to the show notes. Stay tuned for our next episode, and thank you for listening!

Web and Mobile App Development (Language Agnostic, and Based on Real-life experience!)
(Part 1/N): Confluent Cloud (Managed Kafka as a Service) - Create a cluster, generate API keys, create topics, publish messages

Web and Mobile App Development (Language Agnostic, and Based on Real-life experience!)

Play Episode Listen Later Jan 11, 2024 45:53


In this podcast, the host explores Confluent Cloud, a fully managed Kafka service. The host shares their experience with RabbitMQ and Kafka and explains the value of using a managed service like Confluent Cloud. They walk through the process of signing up for an account, creating a cluster, generating API keys, and creating topics. The host also discusses the use of connectors and introduces ksqlDB and Apache Flink. They explore cluster settings, message consumption, and additional features of Confluent Cloud. The podcast concludes with a summary of the topics covered. Takeaways Confluent Cloud is a fully managed Kafka service that provides added value through pre-built connectors and ease of use. Creating a cluster, generating API keys, and creating topics are essential steps in getting started with Confluent Cloud. ksqlDB and Apache Flink offer stream processing capabilities and can be integrated with Confluent Cloud. Cluster settings, message consumption, and additional features like stream lineage and stream designer enhance the functionality of Confluent Cloud. Using a managed service like Confluent Cloud allows developers to focus on solving customer problems rather than managing infrastructure. Chapters 00:00 Introduction 02:25 Exploring Confluent Cloud 09:14 Creating a Cluster and API Keys 11:00 Creating Topics 13:20 Sending Messages to Topics 15:12 Introduction to ksqlDB and Apache Flink 17:03 Exploring Connectors 25:44 Cluster Settings and Configuration 28:05 Consuming Messages 35:20 Stream Lineage and Stream Designer 38:44 Exploring Additional Features 44:21 Summary and Conclusion Snowpal Products: Backends as Services on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠AWS Marketplace⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Mobile Apps on ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠App Store⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ and ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Play Store⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Web App⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Education Platform⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ for Learners and Course Creators

Real-Time Analytics with Tim Berglund
Best of 2023: Inside Apache Flink: A Conversation with Robert Metzger

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Dec 18, 2023 24:09


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | Looking back at our favorite episodes from 2023, we have Apache Flink's PMC Chair, Robert Metzger, on the show, who provides a friendly introduction to the world of Flink. Like a tour guide, he navigates listeners through Flink's role as a handy tool for building applications that process data in real-time. Metzger illustrates Flink's unique ability to work smoothly with both batch and streaming data, making it a nifty sidekick for anyone dealing with everything from historical data to real-time processing. New episodes every Monday resume on January 8, 2024!

Developer Voices
Is Flink the answer to the ETL problem? (with Robert Metzger)

Developer Voices

Play Episode Listen Later Nov 22, 2023 64:26


Integration is probably the last, hardest, and least well thought-out part of any large software project. So anything that makes the data-streaming job easier is worth knowing about. So this week we turn our attention to Apache Flink, a flexible system for grabbing, transforming and shipping data between systems using Java, Python or good ol' SQL. So this week Robert Metzger—Apache Flink expert and PMC member—joins us to explain what problems Flink solves and how it solves them reliably. We cover the range from simple use cases to realtime aggregations & joins to its high availability strategy.If you're working on systems that include more than one database, then you're definitely going to face the kinds of problems that Flink tackles.--Apache Flink: https://flink.apache.org/Robert on Twitter: https://twitter.com/rmetzger_Robert on LinkedIn: https://www.linkedin.com/in/metzgerrobert/Kris on Twitter: https://twitter.com/krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/–#software #programming #podcast #flink #apacheflink #dataintegration

Tech Disruptors
Confluent CEO on the Motion to Fast-Data World

Tech Disruptors

Play Episode Listen Later Nov 9, 2023 45:27


Confluent's platform provides infrastructure for enterprises to connect, stream and process data across applications and systems in real time. In this episode of the Tech Disruptors podcast, Confluent's cofounder and CEO Jay Kreps joins Bloomberg Intelligence senior software analyst Sunil Rajgopal to discuss the origins of Apache Kafka and Confluent, the flow of enterprise data and future of software architecture. The two also talk about the opportunity arising from the shift toward real-time data streaming from batch processing, budding artificial intelligence workloads and the company's new products such as Confluent Cloud for Apache Flink, Kora Engine and KSQL database.

Developer Voices
Why did Redpanda rewrite Apache Kafka? (with Christina Lin)

Developer Voices

Play Episode Listen Later Nov 8, 2023 49:27


Would you ever take on a rewrite of one of the largest and most popular Apache projects? And if so, what would you keep the same and what would you change?This week we're talking to Christina Lin, who's part of Redpanda, a company that's rewriting parts of the Apache Kafka ecosystem in C++, with the aim of getting performance gains that aren't feasible in Java. It seems like a huge mountain to climb, and a fascinating journey to be on, so let's ask why and how they've taken on this challenge…Christina on Twitter: https://twitter.com/Christina_wmKris on Twitter: https://twitter.com/krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/Redpanda: https://redpanda.com/Redpanda University: https://university.redpanda.com/Seestar framework: https://seastar.io/Apache Flink: https://flink.apache.org/#redpanda #kafka #apachekafka #streaming #python

The New Stack Podcast
How Apache Flink Delivers for Deliveroo

The New Stack Podcast

Play Episode Listen Later Sep 20, 2023 20:38


Deliveroo, a prominent food delivery company, relies on Apache Flink, a distributed processing engine, to enhance its three-sided marketplace, connecting delivery drivers, restaurants, and customers. Seeking to improve real-time data streaming and gain insights into customer behavior, Deliveroo transitioned to Flink, comparing it to alternatives like Apache Spark and Kafka Streams. Flink, with feature parity to their previous platform, offered stability and scalability. They initially experimented with Flink on Kubernetes but turned to the Amazon Managed Service for Flink (MSF) for enhanced support and maintenance.Engineers from Deliveroo, Felix Angell and Duc Anh Khu, emphasized the need for flexibility in data modeling to accommodate their fast-paced product development. However, flexibility can be complex, often requiring data model adjustments. They expressed the desire for a self-serve configuration feature in MSF, allowing easy customization of low-level settings and auto-scaling based on application metrics. This move to Flink and MSF has empowered Deliveroo to focus on core responsibilities like continuous integration and delivery while efficiently managing their data processing needs.Learn more from The New Stack about Apache Flink and AWS:Kinesis, Kafka and Amazon Managed Service for Apache FlinkApache Flink for Real Time Data AnalysisApache Flink for Unbounded Data Streams

The New Stack Podcast
Kinesis, Kafka and Amazon Managed Service for Apache Flink

The New Stack Podcast

Play Episode Listen Later Sep 12, 2023 27:07


Apache Flink is an open-source framework and distributed processing engine designed for data analytics. It excels at handling tasks such as data joins, aggregations, and ETL (Extract, Transform, Load) operations. Moreover, it supports advanced real-time techniques like complex event processing.In this episode, Deepthi Mohan and Nagesh Honnalii from AWS discussed Apache Flink and the Amazon Managed Service for Apache Flink (MSF) with our host, Alex Williams. MSF is a service that caters to customers with varying infrastructure preferences. Some prefer complete control, while others want AWS to handle all infrastructure-related aspects.Use cases for MSF can be grouped into three categories. First, there's streaming ETL, which involves tasks like log aggregation for later auditing. Second, it supports real-time analytics, enabling customers to create dashboards for tasks like fraud detection. Third, it handles complex event processing, where data from multiple sources is joined and aggregated to extract meaningful insights.The origins of MSF trace back to the evolution of real-time data services within AWS. In 2013, AWS introduced Amazon Kinesis, while the open-source community developed Apache Kafka. These services paved the way for MSF by highlighting the need for real-time data processing.To provide more flexibility, AWS launched Kinesis Data Analytics in 2016, allowing customers to write code in JVM-based languages like Java and Scala. In 2018, AWS decided to incorporate Apache Flink into its Kinesis Data Analytics offering, leading to the birth of MSF.Today, thousands of customers use MSF, and AWS continues to enhance its offerings in the real-time data processing space, including the launch of Amazon MSK (Managed Streaming for Apache Kafka). To align with its foundation on Flink, AWS rebranded Kinesis Data Analytics for Apache Flink to Amazon Managed Service for Apache Flink, making it clearer for customers.Learn more from The New Stack about AWS and Apache Flink:Apache Flink for Real Time Data AnalysisApache Flink for Unbounded Data Streams3 Reasons Why You Need Apache Flink for Stream Processing

The Cloud Pod
226: Duet, Co-Pilot, and a Code Whisperer Walk into a bar in San Francisco

The Cloud Pod

Play Episode Listen Later Sep 8, 2023 65:42


Welcome episode 226 of the Cloud Pod podcast - where the forecast is always cloudy! This week Justin, Matt and Ryan chat about all the news and announcements from Google Next, including - surprise surprise - the hot topic of AI, GKE Enterprise, Duet, Co-Pilot, Code Whisperer and more! There's even some non-Next news thrown into the episode. So whether you're interested in BART or Bard, we've got the news from SF just for you.  Titles we almost went with this week:

AWS Podcast
#619: Amazon Managed Service for Apache Flink

AWS Podcast

Play Episode Listen Later Sep 7, 2023 15:49


Amazon Managed Service for Apache Flink makes it easy to build and run real-time streaming applications using Apache Flink. Amazon Managed Service for Apache Flink takes care of everything required to run streaming applications. There are no servers and clusters to manage, no compute and storage infrastructure to set up, and you only pay for the resources you use. You can easily setup and integrate data sources or destinations with minimal code, process data continuously with sub-second latencies, and respond to events in real-time.

The New Stack Podcast
Apache Flink for Real Time Data Analysis

The New Stack Podcast

Play Episode Listen Later Sep 5, 2023 23:52


This episode delves into Apache Flink, a versatile platform for executing both batch and real-time streaming data analysis tasks. This session marks the beginning of a three-part series unveiling Amazon Web Services' (AWS) new managed service built on Flink. Future episodes will explore this service in detail and examine customer experiences.The podcast features insights from Danny Cranmer, a principal engineer at AWS and an Apache Flink PMC and Committer, along with Hong Teoh, a software development engineer at AWS.Flink stands out as a high-level framework for defining data analytics jobs, accommodating both batch and streaming data sets. It offers APIs for building analysis jobs in various languages, including Java, Python, and SQL. Flink also provides a distributed job execution engine with fault tolerance and horizontal scaling capabilities.One prominent use case is Extract-Transform-Load (ETL), where raw data is swiftly processed for specific workloads. Flink excels in delivering low-latency transformations for unbounded data streams. Additionally, Flink supports event-driven applications, responding immediately to triggers such as user requests for weather data.Flink ensures exactly-once processing, critical for scenarios like financial transactions. It employs checkpoints to maintain data integrity in case of node failures.The podcast also touches on AWS's role in supporting the open-source Flink project and the future outlook for this powerful data processing framework.Learn more from The New Stack about Apache Flink:3 Reasons Why You Need Apache Flink for Stream ProcessingApache Flink for Unbounded Data Streams8 Real-Time Data Best Practices

Enterprise Java Newscast
Stackd 66: Streams, Messages, Events, and a Java User Group

Enterprise Java Newscast

Play Episode Listen Later Aug 11, 2023 121:43


Ian, Kito, and Josh are joined by Java Champion, Streaming Developer Advocate at DataStax, and President of Chicago-JUG, Mary Grygleski. They discuss news about Capacitor, Angular, PrimeNG Designer for Tailwind, JetBraiins Compose Multiplatform for iOS, JDK 21,  AI developer tools, Jakarta EE 10, and more. Kito announces the work he is doing on the Jakarta EE Tutorial, and then they delve into Mary's background and event streaming with Apache Pulsar, plus tools like Apache Pinot, Apache Flink, RisingWave, ByteWax and Apache Cassandra. We Thank DataDog for sponsoring this podcast! https://www.pubhouse.net/datadog Front End  - Announcing Capacitor 5.0 - Ionic Blog (https://ionic.io/blog/announcing-capacitor-5)  - Angular v16 is here! (https://blog.angular.io/angular-v16-is-here-4d7a28ec680d)  - Compose Multiplatform (https://blog.jetbrains.com/kotlin/2023/05/compose-multiplatform-for-ios-is-in-alpha/)  - PrimeNG Designer - Tailwind (Q3 2023) (https://www.primefaces.org/primeng-theme-designer-with-tailwind/) Server Side Java  - Kito is working with Bauke Scholtz and Arjan Tjmes to refresh the Jakarta EE Tutorial     - Eclipse Documentation for Jakarta EE (https://projects.eclipse.org/projects/ee4j.jakartaee-documentation)    - Antora (https://antora.org)    - Asciidoc (http://asciidoc.org)  - Jakarta EE 10; MicroProfile 6; Java SE 20; Open Liberty (https://openliberty.io/blog/2023/04/04/23.0.0.3.html)  - Jakarta EE Starter (https://start.jakarta.ee/) AI/ML  - Phind - AI search engine for developers (https://www.phind.com/)  - 92% of devs using AI coding assistants (https://www.zdnet.com/article/github-developer-survey-finds-92-of-programmers-using-ai-tools/) Java Platform  - JDK 21, the next LTS release, due out in September (https://www.infoworld.com/article/3689880/jdk-21-the-new-features-in-java-21.html) IDE and Tools  - Grazie Professional - IntelliJ IDEs Plugin | Marketplace (https://plugins.jetbrains.com/plugin/16136-grazie-professional) Chat w/Mary  - Twitter: @mgrygles (https://twitter.com/mgrygles)  - Discord server:  https://discord.gg/RMU4Juw  - LinkedIn:  https://www.linkedin.com/in/mary-grygleski/  - Apache Pulsar (https://pulsar.apache.org/)  - Apache Pinot (https://pinot.apache.org/)  - Apache Flink (https://flink.apache.org/)  - RisingWave (https://www.risingwave.dev/)  - ByteWax (https://bytewax.io/)  - Apache Cassandra (https://cassandra.apache.org/)  - Apache Kafka (https://kafka.apache.org/) Picks   - Quantum Energy Squares (Kito) (https://quantumsquares.com/)  - JBOSS EAP on Azure (Josh) (https://learn.microsoft.com/en-us/azure/developer/java/ee/jboss-on-azure)  - Interstellar (Mary) (https://www.imdb.com/title/tt0816692/)  - Black Mirror Season 6 Episode 1 - Joan Is Awful - Netflix (Ian) (https://www.rottentomatoes.com/tv/black_mirror/s06/e01) Other Pubhouse Network podcasts   - Breaking into Open Source (https://www.pubhouse.net/breaking-into-open-source)  - OffHeap (https://www.javaoffheap.com/)  - Java Pubhouse (https://www.javapubhouse.com/) Events  - Lone Star Software Symposium - July 14 - 15, Austin, TX, USA (https://nofluffjuststuff.com/austin)  - ÜberConf - July 18 - 21, Denver, CO, USA (https://uberconf.com/)  - Nebraska.code() - July 19-20, Lincoln, NE, USA (https://nebraskacode.amegala.com/)

Tales at Scale
Confluent, Kafka, Druid, and Flink: The Future of Streaming Data with Kai Waehner

Tales at Scale

Play Episode Listen Later Aug 8, 2023 32:22


Apache Kafka® is a streaming platform that can handle large-scale, real-time data streams reliably. It's used for real-time data pipelines, event sourcing, log aggregation, stream processing, and building analytics applications. Apache® Druid is a database designed to provide fast, interactive, and scalable analytics on time-series and event-based data, empowering organizations to derive insights, monitor real-time metrics, and build analytics applications. Naturally, these two things just go together and are often both key parts of a company's data architecture. Confluent is one of those companies. On this episode, Kai Waehner, Field CTO at Confluent walks us through how they use Kafka and Druid together, where Apache Flink fits into the mix and shares insights and trends from the world of data streaming.

Real-Time Analytics with Tim Berglund
Diving Deep into Apache Flink with Robert Metzger | Ep. 14

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Jul 10, 2023 30:48


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! In part two of the "Real-Time Analytics" podcast, Robert Metzger, the PMC chair of Apache Flink, elaborates on using Flink as a developer. Metzger discusses the spectrum of APIs in Flink, ranging from expressive APIs to easy-to-use APIs. He mentions the process function, a low-level, flexible API that exposes basic building blocks of Flink, such as real-time events, state, and event time. Metzger also speaks about the windowing API of Flink and the Async I/O operator. He further details how Flink users can work with a combination of SQL and Java code in the data stream API. You won't want to miss this episode!Flink Deployments At Decodable: https://www.decodable.co/blog/flink-deployments-at-decodable3 Reasons Why You Need Apache Flink for Stream Processing: https://thenewstack.io/3-reasons-why-you-need-apache-flink-for-stream-processing/#:~:text=For%20example%2C%20Uber%20uses%20Flink,streaming%20data%20at%20massive%20scale.

airhacks.fm podcast with adam bien
FPGAs, GPUs or Data Science with Java

airhacks.fm podcast with adam bien

Play Episode Listen Later Jul 10, 2023 61:01


An airhacks.fm conversation with Juan Fumero (@snatverk) about: 8088 an IBM clone, joining a cross country running team at school, Zoran previously at "#169 Deep Learning with Modern Java Code", leaning Turbo Pascal, C and C++, working on particle detection at CERN , working on GraalVM to GPU compilation and optimization, using direct memory access to communicate with the GPU, vector types in Java, Apache Flink acceleration on FPGA and GPUs, working on FPGAs, using RTL for FPGA programming, transparent acceleration for Java, astrophysics analytics with Java, DeepNetts on tornadovm, the relation between TornadoVM and GraalVM, using Panama to access native memory GPU-less TornadoVM, contributing to TornadoVM Juan Fumero on twitter: @snatverk

Engenharia de Dados [Cast]
Conferência Kafka Summit 2023 London

Engenharia de Dados [Cast]

Play Episode Listen Later Jun 27, 2023 58:42


No episódio de hoje, Luan Moreno e Mateus Oliveira conversam sobre a participação no Kafka Summit London 2023. Kafka Summit é uma das maiores conferências de tecnologia do mundo, onde empresas de tecnologias de streaming anunciam novidades e podemos entender mais sobre como as elas estão usando estas tecnologias no dia a dia.Na conferência tivemos 3 momentos:Keynote - (Anúncios);Vendor Hall - (Onde os patrocinadores ficam); Sessions - (Salas que os palestrantes fazem suas apresentações).Falamos também nesse bate-papo sobre os seguintes temas: Anúncios Open-Source;Anúncios Confluent;Overview das sessões;Hall dos patrocinadores;Impressões principais da Conferência.Aprenda mais sobre tecnologias como Apache Kafka, Apache Flink dentre outras de Streaming. Além disso, vamos entender como as empresas como financeiras europeias, Apple, Uber, Netflix, entre outras, estão usando o Apache Kafka para resolver problemas de negócio.Kafka Summit 2023 Londonhttps://www.confluent.io/events/kafka-summit-london-2023/ Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Real-Time Analytics with Tim Berglund
Inside Apache Flink: A Conversation with Robert Metzger | Ep. 13

Real-Time Analytics with Tim Berglund

Play Episode Listen Later Jun 26, 2023 23:03


Follow: https://stree.ai/podcast | Sub: https://stree.ai/sub | New episodes every Monday! Today we have Apache Flink's PMC Chair, Robert Metzger, on the show, who provides a friendly introduction to the world of Flink. Like a tour guide, he navigates listeners through Flink's role as a handy tool for building applications that process data in real-time. Metzger illustrates Flink's unique ability to work smoothly with both batch and streaming data, making it a nifty sidekick for anyone dealing with everything from historical data to real-time processing. Make sure to tune into Part 2 next week, where Robert will dive even deeper into this technology.

Engenharia de Dados [Cast]
Cloudera CDP & Stream Processing para Real-Time Analytics com André Araújo, Field Engineer, Data in Motion na Cloudera

Engenharia de Dados [Cast]

Play Episode Listen Later Jun 22, 2023 58:00


No episódio de hoje, Luan Moreno & Mateus Oliveira entrevistaram André Araújo , atualmente como Field Engineer, Data in Motion na Cloudera.CDP é uma Plataforma de Dados Enterprise Cloudera, com foco na versatilidade em casos de uso como Streaming Platform, possuindo tecnologias como Apache Kafka e Apache Flink .Com CSP, você tem os seguintes benefícios: Apache Kafka - Plataforma de armazenamento de Streaming de Dados líder de mercado;Apache Flink - Plataforma de Processamento de Dados.Neste bate-papo vamos falar sobre:Plataforma de Dados Cloudera ;Plataforma de transmissão Cloudera .O Cloudera sempre foi uma das plataformas mais utilizadas no mercado, agora com a nova versão e casos de uso que atendem diversos cenários, como o caso do CSP ( Cloudera Stream Platform ).André Araújo  = LinkedinCloudera  =  webpage Luan Moreno = https://www.linkedin.com/in/luanmoreno/

AWS - Il podcast in italiano
Il data streaming: benefici, tecnologie e progetti di riferimento (ospite: Francesco Tisiot)

AWS - Il podcast in italiano

Play Episode Listen Later Mar 13, 2023 55:28


Cos'è il data streaming? Cosa cambia rispetto all'analisi dei dati in batch e perché è un argomento interessante anche per sviluppatori e architetti cloud? In questo episodio ospito Francesco Tisiot, Sr. Developer Advocate di Aiven, per parlare di tecnologie come Apache Kafka e Apache Flink, che tipo di problemi risolvono, quali alternative gestite esistono nel cloud, ed anche alcuni progetti ed iniziative interessanti per chi vuole approfondire questi temi. Link: Apache Kafka. Link: Apache Flink. Link: Aiven. Link: Python fake data producer for Apache Kafka. Link: Open Source Data Infrastructure Meetup.

Tales at Scale
Accurate, Validated, and Real Time: Diving into Reddit's Druid-powered Ad Platform with Lakshmi Ramasubramanian

Tales at Scale

Play Episode Listen Later Feb 28, 2023 19:49


How do ads work on the “front page of the internet?” On today's episode, staff software engineer at Reddit Lakshmi Ramasubramanian discusses Reddit's ad platform, including how it handles ad pacing, real-time data, and more. We'll dive into the challenges they needed to solve and why Apache Druid was the right database for the job.

Data Engineering Podcast
The View Below The Waterline Of Apache Iceberg And How It Fits In Your Data Lakehouse

Data Engineering Podcast

Play Episode Listen Later Feb 19, 2023 55:06


Summary Cloud data warehouses have unlocked a massive amount of innovation and investment in data applications, but they are still inherently limiting. Because of their complete ownership of your data they constrain the possibilities of what data you can store and how it can be used. Projects like Apache Iceberg provide a viable alternative in the form of data lakehouses that provide the scalability and flexibility of data lakes, combined with the ease of use and performance of data warehouses. Ryan Blue helped create the Iceberg project, and in this episode he rejoins the show to discuss how it has evolved and what he is doing in his new business Tabular to make it even easier to implement and maintain. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Hey there podcast listener, are you tired of dealing with the headache that is the 'Modern Data Stack'? We feel your pain. It's supposed to make building smarter, faster, and more flexible data infrastructures a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it—it's all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to do its thing. But don't worry, there is a better way. TimeXtender takes a holistic approach to data integration that focuses on agility rather than fragmentation. By bringing all the layers of the data stack together, TimeXtender helps you build data solutions up to 10 times faster and saves you 70-80% on costs. If you're fed up with the 'Modern Data Stack', give TimeXtender a try. Head over to timextender.com/dataengineering where you can do two things: watch us build a data estate in 15 minutes and start for free today. Your host is Tobias Macey and today I'm interviewing Ryan Blue about the evolution and applications of the Iceberg table format and how he is making it more accessible at Tabular Interview Introduction How did you get involved in the area of data management? Can you describe what Iceberg is and its position in the data lake/lakehouse ecosystem? Since it is a fundamentally a specification, how do you manage compatibility and consistency across implementations? What are the notable changes in the Iceberg project and its role in the ecosystem since our last conversation October of 2018? Around the time that Iceberg was first created at Netflix a number of alternative table formats were also being developed. What are the characteristics of Iceberg that lead teams to adopt it for their lakehouse projects? Given the constant evolution of the various table formats it can be difficult to determine an up-to-date comparison of their features, particularly earlier in their development. What are the aspects of this problem space that make it so challenging to establish unbiased and comprehensive comparisons? For someone who wants to manage their data in Iceberg tables, what does the implementation look like? How does that change based on the type of query/processing engine being used? Once a table has been created, what are the capabilities of Iceberg that help to support ongoing use and maintenance? What are the most interesting, innovative, or unexpected ways that you have seen Iceberg used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Iceberg/Tabular? When is Iceberg/Tabular the wrong choice? What do you have planned for the future of Iceberg/Tabular? Contact Info LinkedIn (https://www.linkedin.com/in/rdblue/) rdblue (https://github.com/rdblue) on GitHub Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Iceberg (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/iceberg-with-ryan-blue-episode-52/) Hadoop (https://hadoop.apache.org/) Data Lakehouse (https://www.forbes.com/sites/bernardmarr/2022/01/18/what-is-a-data-lakehouse-a-super-simple-explanation-for-anyone/) ACID == Atomic, Consistent, Isolated, Durable (https://en.wikipedia.org/wiki/ACID) Apache Hive (https://hive.apache.org/) Apache Impala (https://impala.apache.org/) Bodo (https://www.bodo.ai/) Podcast Episode (https://www.dataengineeringpodcast.com/bodo-parallel-data-processing-python-episode-223/) StarRocks (https://www.starrocks.io/) Dremio (https://www.dremio.com/) Podcast Episode (https://www.dataengineeringpodcast.com/dremio-open-data-lakehouse-episode-333/) DDL == Data Definition Language (https://en.wikipedia.org/wiki/Data_definition_language) Trino (https://trino.io/) PrestoDB (https://prestodb.io/) Apache Hudi (https://hudi.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/hudi-streaming-data-lake-episode-209/) dbt (https://www.getdbt.com/) Apache Flink (https://flink.apache.org/) TileDB (https://tiledb.com/) Podcast Episode (https://www.dataengineeringpodcast.com/tiledb-universal-data-engine-episode-146/) CDC == Change Data Capture (https://en.wikipedia.org/wiki/Change_data_capture) Substrait (https://substrait.io/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Engenharia de Dados [Cast]
A Day in a Life of a Co-Founder, Commiter & PMC Member of Apache Flink with Timo Walther

Engenharia de Dados [Cast]

Play Episode Listen Later Jan 30, 2023 57:03


Nesse episódio Luan Moreno & Mateus Oliveira entrevistam Timo Walther, atualmente como Principal Software Engineer na Confluent após a recente aquisição da Immerok pela Confluent Cloud.O Apache Flink é uma engine de processamento de dados unificada que aplica tanto batch quanto tempo-real. Tem ganhado grande adoção entre as grandes empresas por oferecer um modelo de computação extremamente eficiente, principalmente para streaming e computação que retenha estado (stateful). Além de ser uma plataforma Open Source, capaz de responder aos seguintes requisitos de forma efetiva como:In-Memory ProcessingGraph ProcessingBatch ProcessingReal-Time Stream ProcessingNesse bate papo falamos sobre os seguintes temas:State Backend & RocksDBProcessamento de Dados em Tempo RealComunicação entre API de Alto e Baixo NívelCheckpoint & EOS (Exactly-Once Semantics)Recursos e Melhores Práticas para ImplementaçãoAprenda como o Apache Flink pode ser adicionado a seus pipelines de dados e como ele pode se diferenciar como uma plataforma de processamento em tempo-real para atender grandes demandas de dados.Apache FlinkConfluent Cloud + ImmerokTimo Walther Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Software Defined Talk
Episode 396: Aloha to your strategy

Software Defined Talk

Play Episode Listen Later Jan 13, 2023 80:21


This week we discuss digital transformation at Southwest and Delta Airlines, Shopify cancels all meetings, Salesforce's M&A strategy, and A.I. is everywhere. Plus, thoughts on bike lanes… Watch the YouTube Live Recording of Episode 396 (https://youtu.be/tmm8rH9fZEE) Runner-up Titles Work trying to get on my personal calendar Traveling with an infant =BLACKSWAN(A1:G453) Socks in a Costco Can't do the business case on savings until you loose it. Pay transparency for you, not me We don't pay for things on the Internet Semper Nimbus Privatus Rundown Dutch residents are the most physically active on earth, (https://twitter.com/BrentToderian/status/1611901297552396289) Digital Transformation Travel Edition Delta plans to offer free Wi-Fi starting Feb. 1 (https://www.cnbc.com/2023/01/05/delta-plans-to-offer-free-wi-fi-starting-feb-1.html) The Southwest Airlines Meltdown (https://www.nytimes.com/2023/01/10/podcasts/the-daily/the-southwest-airlines-meltdown.html) Southwest's Meltdown Could Cost It Up to $825 Million (https://www.nytimes.com/2023/01/06/business/southwest-airlines-meltdown-costs-reimbursement.html) Southwest pilots union writes scathing letter to airline executives after holiday travel fiasco (https://www.yahoo.com/now/southwest-pilots-union-writes-scathing-011720946.html) Southwest makes frequent flyer miles offer while lots of luggage remains in limbo (https://www.cnn.com/travel/article/southwest-airlines-frequent-flyer-miles-meltdown/index.html) Point of Sale: Scan and Pay (https://twitter.com/pitdesi/status/1602843962602975233?s=20&t=YdGNYzReSf4r1twJ1hRfbA) Work Life Shopify Tells Employees to Just Say No to Meetings (https://www.bloomberg.com/news/articles/2023-01-03/shopify-ceo-tobi-lutke-tells-employees-to-just-say-no-to-meetings) Netflix Revokes Some Staff's Access to Other People's Salary Information (https://apple.news/A--bGmZgJTQCgHQ-9QdWu4w) U.S. Moves to Bar Noncompete Agreements in Labor Contracts (https://www.nytimes.com/2023/01/05/business/economy/ftc-noncompete.html) Gartner HR expert: Quiet hiring will dominate U.S. workplaces in 2023 (https://www.cnbc.com/2023/01/04/gartner-hr-expert-quiet-hiring-will-dominate-us-workplaces-in-2023.html) Netflix revokes some staff's access to other people's salary information (https://www.marketwatch.com/story/netflix-revokes-some-staffs-access-to-other-peoples-salary-information-11673384493) SFDC Salesforce: There's no more Slack left to cut (https://www.theregister.com/2023/01/10/salesforce_comment/) Salesforce to Lay Off 10 Percent of Staff and Cut Office Space (https://www.nytimes.com/2023/01/04/business/salesforce-layoffs.html) After layoffs, Salesforce CEO still blasts worker productivi (https://www.sfgate.com/tech/article/salesforce-ceo-blasts-worker-productivity-17708474.php)ty (https://www.sfgate.com/tech/article/salesforce-ceo-blasts-worker-productivity-17708474.php) AI is everywhere Google execs warn company's reputation could suffer if it moves too fast on AI-chat technology (https://www.cnbc.com/2022/12/13/google-execs-warn-of-reputational-risk-with-chatgbt-like-tool.html) Microsoft and OpenAI Working on ChatGPT-Powered Bing in Challenge to Google (https://www.theinformation.com/articles/microsoft-and-openai-working-on-chatgpt-powered-bing-in-challenge-to-google?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axioslogin&stream=top) Microsoft eyes $10 billion bet on ChatGPT (https://www.semafor.com/article/01/09/2023/microsoft-eyes-10-billion-bet-on-chatgpt) Wolfram|Alpha as the Way to Bring Computational Knowledge Superpowers to ChatGPT (https://writings.stephenwolfram.com/2023/01/wolframalpha-as-the-way-to-bring-computational-knowledge-superpowers-to-chatgpt/) Relevant to your Interests 2023 Bum Steer of the Year: Austin (https://www.texasmonthly.com/news-politics/2023-bum-steer-of-year-austin/) Twitter's Rivals Try to Capitalize on Musk-Induced Chaos (https://www.nytimes.com/2022/12/07/technology/twitter-rivals-alternative-platforms.html) On Organizational Structures and the Developer Experience (https://redmonk.com/sogrady/2022/12/13/org-structure-devx/) KubeCon + CloudNativeCon North America 2022 Transparency Report | Cloud Native Computing Foundation (https://www.cncf.io/reports/kubecon-cloudnativecon-north-america-2022-transparency-report/) Inside the chaos at Washington's most connected military tech startup (https://www.vox.com/recode/23507236/inside-disruption-rebellion-defense-washington-connected-military-tech-startup) Elon Musk Starts Week As World's Second Richest Person (https://www.forbes.com/sites/mattdurot/2022/12/12/elon-musk-starts-week-as-worlds-second-richest-person/) 10 Tesla Investors Lose $132.5 Billion From Musk's Twitter Fiasco (https://www.investors.com/etfs-and-funds/sectors/tesla-stock-investors-lose-132-5-billion-from-musks-twitter-fiasco/) Rackspace's ransomware messaging dilemma (https://www.axios.com/newsletters/axios-login-83146574-380f-4e37-965d-7fd79bce7278.html?chunk=2&utm_term=emshare#story2) Heads-Up: Amazon S3 Security Changes Are Coming in April of 2023 (https://aws.amazon.com/blogs/aws/heads-up-amazon-s3-security-changes-are-coming-in-april-of-2023/) A MultiCloud Rant (https://www.lastweekinaws.com/blog/a_multicloud_rant/) Great visualization of the revenue breakdown of the 4 largest tech companies. (https://twitter.com/Carnage4Life/status/1603012861017862144?s=20&t=HC2UuMCHBB408xae6tZpbQ) AG Paxton's Google Suit Makes the Perfect the Enemy of the Good (https://truthonthemarket.com/2022/12/14/ag-paxtons-google-suit-makes-the-perfect-the-enemy-of-the-good/) AWS simplifies Simple Storage Service to prevent data leaks (https://www.theregister.com/2022/12/14/aws_simple_storage_service_simplified/) Creating the ultimate smart map with new map data initiative launched by Linux Foundation (https://venturebeat.com/virtual/creating-the-ultimate-smart-map-with-new-map-data-initiative-launched-by-linux-foundation/) Spotify's grand plan to monetize developers via its open source Backstage project (https://techcrunch.com/2022/12/15/spotifys-plan-to-monetize-its-open-source-backstage-developer-project/?guccounter=1&guce_referrer=aHR0cHM6Ly93d3cubGlua2VkaW4uY29tLw&guce_referrer_sig=AQAAAAlyOmdhogtX6nuQkNHQ7mVSyci6aMv7X6QwRTvS9PHGJmjO_wjCqsJXXPKI36A9MkIclSIQoHQ_dz7wJ-WzfaYQT_clMcUijiC28ZQhEau4NOcU-70wy5m0Q9LLmtvWuQbWQQEccEbQH2Lvg4_GqfnQBYNPZWRcgpx7XMLas_2R) VMware offers subs for server consolidation vSphere cut (https://www.theregister.com/2022/12/15/vsphere_plus_standard/) Senior execs to leave VMware before acquisition by Broadcom (https://www.bizjournals.com/sanjose/news/2022/12/13/three-senior-execs-to-leave-vmware.html#:~:text=Mark%20Lohmeyer%2C%20who%20heads%20cloud,Raghuram%20announced%20in%20a%20memo) China Bans Exports of Loongson CPUs to Russia, Other Countries: Report (https://www.tomshardware.com/news/china-bans-exports-of-its-loongson-cpus-to-russia-other-countries) Dropbox buys form management platform FormSwift for $95M in cash (https://techcrunch.com/2022/12/16/dropbox-buys-form-management-platform-formswift-for-95m-in-cash/) Sweep, a no-code config tool for Salesforce software, raises $28M (https://techcrunch.com/2022/12/15/sweep-a-no-code-config-tool-for-salesforce-software-raises-28m/) Twitter Aided the Pentagon in its Covert Online Propaganda Campaign (https://theintercept.com/2022/12/20/twitter-dod-us-military-accounts/) Okta's source code stolen after GitHub repositories hacked (https://www.bleepingcomputer.com/news/security/oktas-source-code-stolen-after-github-repositories-hacked/) Workday appoints VMware veteran as co-CEO (https://www.theregister.com/2022/12/21/workday_co_ceo/) Top Paying Tools (https://softwaredefinedtalk.slack.com/archives/C04EK1VBK/p1671635825838769) Winging It: Inside Amazon's Quest to Seize the Skies (https://www.wired.com/story/amazon-air-quest-to-seize-the-skies/) CIS Benchmark Framework Scanning Tools Comparison (https://www.armosec.io/blog/cis-kubernetes-benchmark-framework-scanning-tools-comparison/) MSG defends using facial recognition to kick lawyer out of Rockettes show (https://arstechnica.com/tech-policy/2022/12/facial-recognition-flags-girl-scout-mom-as-security-risk-at-rockettes-show/) OpenAI releases Point-E, an AI that generates 3D models (https://techcrunch.com/2022/12/20/openai-releases-point-e-an-ai-that-generates-3d-models/) No, You Haven't Won a Yeti Cooler From Dick's Sporting Goods (https://www.wired.com/story/email-scam-dicks-sporting-goods-yeti-cooler/) The Lastpass hack was worse than the company first reported (https://www.engadget.com/the-lastpass-hack-was-worse-than-the-company-first-reported-000501559.html?utm_source=facebook&utm_medium=news_tab) IRS delays tax reporting change for 1099-K on Venmo, Paypal business payments (https://www.cnbc.com/2022/12/23/irs-delays-tax-reporting-change-for-1099-k-on-venmo-paypal-payments.html) Cyber attacks set to become ‘uninsurable', says Zurich chief (https://www.ft.com/content/63ea94fa-c6fc-449f-b2b8-ea29cc83637d) Google Employees Brace for a Cost-Cutting Drive as Anxiety Mounts (https://www.nytimes.com/2022/12/28/technology/google-job-cuts.html) IBM beat all its large-cap tech peers in 2022 as investors shunned growth for safety (https://www.cnbc.com/2022/12/27/ibm-stock-outperformed-technology-sector-in-2022.html) Europe Taps Tech's Power-Hungry Data Centers to Heat Homes (https://www.wsj.com/articles/europe-taps-techs-power-hungry-data-centers-to-heat-homes-11672309944?mod=djemalertNEWS) List of defunct social networking services (https://en.wikipedia.org/wiki/List_of_defunct_social_networking_services) 2023 Predictions | No Mercy / No Malice (https://www.profgalloway.com/2023-predictions/) Twitter rival Mastodon rejects funding to preserve nonprofit status (https://arstechnica.com/tech-policy/2022/12/twitter-rival-mastodon-rejects-funding-to-preserve-nonprofit-status/) TSMC Starts Next-Gen Mass Production as World Fights Over Chips (https://www.bloomberg.com/news/articles/2022-12-29/tsmc-mass-produces-next-gen-chips-to-safeguard-global-lead) Microsoft and FTC pre-trial hearing set for January 3rd (https://www.engadget.com/pre-trial-hearing-between-microsoft-and-ftc-set-for-january-3rd-203320387.html) The infrastructure behind ATMs (https://www.bitsaboutmoney.com/archive/the-infrastructure-behind-atms/) Apple is increasing battery replacement service charges for out-of-warranty devices (https://techcrunch.com/2023/01/03/apple-is-increasing-battery-replacement-service-charges-for-out-of-warranty-devices/) Snowflake's business and how the weakening economy is impacting cloud vendors (https://twitter.com/LiebermanAustin/status/1607376944873754626) Shift Happens: A book about keyboards (https://shifthappens.site/) Amazon to cut 18,000 jobs (https://www.axios.com/2023/01/05/amazon-layoffs-18000-jobs) CircleCI security alert: Rotate any secrets stored in CircleCI (https://circleci.com/blog/january-4-2023-security-alert/) Video game workers form Microsoft's first U.S. labor union (https://www.nbcnews.com/tech/tech-news/video-game-workers-form-microsofts-first-us-labor-union-rcna64103) World's Premier Investors Line Up to Partner with Netskope as the SASE Security and Networking Platform of Choice (https://www.prnewswire.com/news-releases/worlds-premier-investors-line-up-to-partner-with-netskope-as-the-sase-security-and-networking-platform-of-choice-301712417.html) omg.lol - A lovable web page and email address, just for you (https://home.omg.lol/) Alphabet led a $100 million funding of Chronosphere, a startup that helps companies monitor and cut cloud bills. (https://twitter.com/theinformation/status/1611165698868367360) Confluent expands Kafka Streams capabilities, acquires Apache Flink vendor (https://venturebeat.com/enterprise-analytics/confluent-acquires-apache-flink-vendor-immerok-to-expand-data-stream-processing/) Excel & Google Sheets AI Formula Generator - Excelformulabot.com (https://excelformulabot.com/) Has the Internet Reached Peak Clickability? (https://tedgioia.substack.com/p/has-the-internet-reached-peak-clickability) Adobe's CEO Sizes Up the State of Tech Now (https://www.wsj.com/articles/adobes-ceo-sizes-up-the-state-of-tech-now-11673151167?mod=djemalertNEWS) Researchers Hacked California's Digital License Plates, Gaining Access to GPS Location and User Info (https://jalopnik.com/researchers-hacked-californias-digital-license-plates-1849966295) Microsoft's New AI Can Simulate Anyone's Voice With 3 Seconds of Audio (https://slashdot.org/story/23/01/10/0749241/microsofts-new-ai-can-simulate-anyones-voice-with-3-seconds-of-audio?utm_source=slashdot&utm_medium=twitter) Observability platform Chronosphere raises another $115M at a $1.6B valuation (https://techcrunch.com/2023/01/10/observability-platform-chronosphere-raises-another-115m-at-a-1-6b-valuation/) Why IBM is no longer interested in breaking patent records–and how it plans to measure innovation in the age of open source and quantum computing (https://fortune.com/2023/01/06/ibm-patent-record-how-to-measure-innovation-open-source-quantum-computing-tech/) New research aims to analyze how widespread COBOL is (https://www.theregister.com/2022/12/14/cobol_research/) Companies are still waiting for their cloud ROI (https://www.infoworld.com/article/3675374/companies-are-still-waiting-for-their-cloud-roi.html) What TNS Readers Want in 2023: More DevOps, API Coverage (https://thenewstack.io/what-tns-readers-want-in-2023-more-devops-api-coverage/) Tech Debt Yo-Yo Cycle. (https://twitter.com/wardleymaps/status/1605860426671177728) How a single developer dropped AWS costs by 90%, then disappeared (https://scribe.rip/@maximetopolov/how-a-single-developer-dropped-aws-costs-by-90-then-disappeared-2b46a115103a) A look at the 2022 velocity of CNCF, Linux Foundation, and top 30 open source projects (https://www.cncf.io/blog/2023/01/11/a-look-at-the-2022-velocity-of-cncf-linux-foundation-and-top-30-open-source-projects/) The golden age of the streaming wars has ended (https://www.theverge.com/2022/12/14/23507793/streaming-wars-hbo-max-netflix-ads-residuals-warrior-nun) YouTube exec says NFL Sunday Ticket will have multiscreen functionality (https://awfulannouncing.com/youtube/nfl-sunday-ticket-multiscreen-mosaic-mode.html) (https://twitter.com/theinformation/status/1611165698868367360)## Nonsense The $11,500 toilet with Alexa inside can now be put inside your home (https://www.theverge.com/2022/12/19/23510864/kohler-numi-smart-toilet-alexa-ces-2022) Starbucks updating its loyalty program starting in February (https://www.axios.com/2022/12/28/starbucks-rewards-program-changes-coming) The revenue model of a popular YouTube channel about Lego. (https://paper.dropbox.com/doc/SDT-396--BwhY9F5kpz_BI2kkdw63ZpJ~Ag-MVMKwqqBEH5SzYKqYO2Jc) Conferences THAT Conference Texas Speakers and Schedule (https://that.us/events/tx/2023/schedule/), Round Rock, TX Jan 15th-18th Use code SDT for 5% off SpringOne (https://springone.io/), Jan 24–26. Coté speaking at cfgmgmtcamp (https://cfgmgmtcamp.eu/ghent2023/), Feb 6th to 8th, Ghent. State of Open Con 2023, (https://stateofopencon.com/sponsors/) London, UK, February 7th-8th 2023 CloudNativeSecurityCon North America (https://events.linuxfoundation.org/cloudnativesecuritycon-north-america/), Seattle, Feb 1 – 2, 2023 Southern California Linux Expo, (https://www.socallinuxexpo.org/scale/20x) Los Angeles, March 9-12, 2023 DevOpsDays Birmingham, AL 2023 (https://devopsdays.org/events/2023-birmingham-al/welcome/), April 20 - 21, 2023 SDT news & hype Join us in Slack (http://www.softwaredefinedtalk.com/slack). Get a SDT Sticker! Send your postal address to stickers@softwaredefinedtalk.com (mailto:stickers@softwaredefinedtalk.com) and we will send you free laptop stickers! Follow us on Twitch (https://www.twitch.tv/sdtpodcast), Twitter (https://twitter.com/softwaredeftalk), Instagram (https://www.instagram.com/softwaredefinedtalk/), LinkedIn (https://www.linkedin.com/company/software-defined-talk/) and YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured). Use the code SDT to get $20 off Coté's book, Digital WTF (https://leanpub.com/digitalwtf/c/sdt), so $5 total. Become a sponsor of Software Defined Talk (https://www.softwaredefinedtalk.com/ads)! Recommendations Brandon: Industrial Garage Shelves (https://www.homedepot.com/p/Husky-5-Tier-Industrial-Duty-Steel-Freestanding-Garage-Storage-Shelving-Unit-in-Black-90-in-W-x-90-in-H-x-24-in-D-N2W902490W5B/319132842) Matt: Oxide and Friends: Breaking it down with Ian Brown (https://oxide.computer/podcasts/oxide-and-friends/1150480) Wu Tang Saga (https://www.imdb.com/title/tt9113406/) Season 3 coming next month! Coté: Mouth to Mouth (https://www.goodreads.com/en/book/show/58438631-mouth-to-mouth) by Antoine Wilson (https://www.goodreads.com/en/book/show/58438631-mouth-to-mouth). Photo Credits Header (https://unsplash.com/photos/euaDCtB_jyw) CoverArt (https://unsplash.com/photos/9xdho4stJQ8)

Startup Insider
Investments & Exits - mit Otto Birnbaum von Revent

Startup Insider

Play Episode Listen Later Jan 10, 2023 29:15


In der Rubrik “Investments & Exits” begrüßen wir heute Otto Birnbaum, General Partner von Revent. Otto hat die Übernahme von Immerok und die Finanzierungsrunde von Enpal besprochen: Das US-Unternehmen Confluent hat das Berliner Software-Startup Immerok für etwa 100 Millionen US-Dollar gekauft. Beide Unternehmen konzentrieren sich auf die Verarbeitung und Analyse von Datenströmen aus verschiedenen Quellen in Echtzeit. Confluent ist eine Ausgründung von LinkedIn und fokussiert sich auf Apache Kafka, während Immerok sich auf Anwendungen für Apache Flink konzentriert. Mit der Akquise möchte Confluent beide Technologien nun gemeinsam für Kunden anbieten. Immerok wurde im Mai 2022 von Holger Temme, Johannes Moser und Konstantin Knauf gegründet und startete im Oktober in eine frühe Testphase. Das Berliner Solarunternehmen Enpal hat im Rahmen einer Series-D-Finanzierungsrunde 215 Millionen Euro eingesammelt. Federführend war TPG Rise Climate, einem Fonds im Bereich erneuerbare Energien und Climate Tech. Zu den weiteren neuen Investoren gehören Westly Group und Activate Capital, auch Bestandsinvestoren wie HV Capital, SoftBank Vision Fund II Capital und Princeville Climate Tech haben sich erneut beteiligt. Mit den zusätzlichen Mitteln will Enpal an neuen Produkten arbeiten und die Expansion in weitere europäische Märkte vorantreiben. Für das Jahr 2022 geht Enpal von einem Umsatz von mehr als 400 Millionen Euro aus. Erst im Dezember 2022 hatte das Unternehmen eine Fremdkapital-Finanzierung in Höhe von 855 Millionen Euro bekannt gegeben. Enpal vermietet Solaranlagen, Energiespeicher und Wallboxen.

Streaming Audio: a Confluent podcast about Apache Kafka
Build a Real Time AI Data Platform with Apache Kafka

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 20, 2022 37:18 Transcription Available


Is it possible to build a real-time data platform without using stateful stream processing? Forecasty.ai is an artificial intelligence platform for forecasting commodity prices, imparting insights into the future valuations of raw materials for users. Nearly all AI models are batch-trained once, but precious commodities are linked to ever-fluctuating global financial markets, which require real-time insights. In this episode, Ralph Debusmann (CTO, Forecasty.ai) shares their journey of migrating from a batch machine learning platform to a real-time event streaming system with Apache Kafka® and delves into their approach to making the transition frictionless. Ralph explains that Forecasty.ai was initially built on top of batch processing, however, updating the models with batch-data syncs was costly and environmentally taxing. There was also the question of scalability—progressing from 60 commodities on offer to their eventual plan of over 200 commodities. Ralph observed that most real-time systems are non-batch, streaming-based real-time data platforms with stateful stream processing, using Kafka Streams, Apache Flink®, or even Apache Samza. However, stateful stream processing involves resources, such as teams of stream processing specialists to solve the task. With the existing team, Ralph decided to build a real-time data platform without using any sort of stateful stream processing. They strictly keep to the out-of-the-box components, such as Kafka topics, Kafka Producer API, Kafka Consumer API, and other Kafka connectors, along with a real-time database to process data streams and implement the necessary joins inside the database. Additionally, Ralph shares the tool he built to handle historical data, kash.py—a Kafka shell based on Python; discusses issues the platform needed to overcome for success, and how they can make the migration from batch processing to stream processing painless for the data science team. EPISODE LINKSKafka Streams 101 courseThe Difference Engine for Unlocking the Kafka Black BoxGitHub repo: kash.pyWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)   

Open||Source||Data
Stream Processing, Observability, and the User Experience with Eric Sammer

Open||Source||Data

Play Episode Listen Later Sep 28, 2022 42:51


This episode features an interview with Eric Sammer, CEO of Decodable. Eric has been in the tech industry for over 20 years, holding various roles as an early Cloudera employee. He also was the co-founder and CTO of Rocana, which was acquired by Splunk in 2017. During his time at Splunk, Eric served as the VP and Senior Distinguished Engineer responsible for cloud platform services.In this episode, Sam and Eric discuss the gap between operating infrastructure and the analytical world, stream processing innovations, and why it's important to work with people who are smarter than you.-------------------"The thing about Decodable was just like let's connect systems, let's process the data between them. Apache Flink is the right engine and SQL is the language for programming the engine. It doesn't need to be any more complicated. The trick is getting it right, so that people can think about that part of the data infrastructure, the way they think about the network. They don't question whether the packet makes it to the other side because that infrastructure is so burned in and it scales reasonably well these days. You don't even think about it, especially in the cloud." – Eric Sammer-------------------Episode Timestamps:(01:09): What open source data means to Eric(06:57): What led Eric to Cloudera and Hadoop(12:48): What inspired Eric to create Rocana(20:29): The problem Eric is trying to solve at Flink(29:54): What problems in stream processing we'll have to solve in the next 5 years(36:58): Eric's advice for advancing your career-------------------Links:LinkedIn - Connect with EricTwitter - Follow EricTwitter - Follow DecodableDecodable

Data on Kubernetes Community
Serverless Event Streaming Applications as Functions on K8 (DoK Day EU 2022) // Timothy Spann

Data on Kubernetes Community

Play Episode Listen Later May 27, 2022 8:43


https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) We will walk through how to build serverless event streaming applications as functions running in a function mesh on kubernetes with cloud native messaging via Apache Pulsar. In this talk, you will deploy ML functions to transform real-time data on Kubernets. Tim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science. https://www.datainmotion.dev/p/about-me.html https://dzone.com/users/297029/bunkertor.html https://conferences.oreilly.com/strata/strata-ny-2018/public/schedule/speaker/185963

Streaming Audio: a Confluent podcast about Apache Kafka
Flink vs Kafka Streams/ksqlDB: Comparing Stream Processing Tools

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later May 26, 2022 55:55 Transcription Available


Stream processing can be hard or easy depending on the approach you take, and the tools you choose. This sentiment is at the heart of the discussion with Matthias J. Sax (Apache Kafka® PMC member; Software Engineer, ksqlDB and Kafka Streams, Confluent) and Jeff Bean (Sr. Technical Marketing Manager, Confluent). With immense collective experience in Kafka, ksqlDB, Kafka Streams, and Apache Flink®, they delve into the types of stream processing operations and explain the different ways of solving for their respective issues.The best stream processing tools they consider are Flink along with the options from the Kafka ecosystem: Java-based Kafka Streams and its SQL-wrapped variant—ksqlDB. Flink and ksqlDB tend to be used by divergent types of teams, since they differ in terms of both design and philosophy.Why Use Apache Flink?The teams using Flink are often highly specialized, with deep expertise, and with an absolute focus on stream processing. They tend to be responsible for unusually large, industry-outlying amounts of both state and scale, and they usually require complex aggregations. Flink can excel in these use cases, which potentially makes the difficulty of its learning curve and implementation worthwhile.Why use ksqlDB/Kafka Streams?Conversely, teams employing ksqlDB/Kafka Streams require less expertise to get started and also less expertise and time to manage their solutions. Jeff notes that the skills of a developer may not even be needed in some cases—those of a data analyst may suffice. ksqlDB and Kafka Streams seamlessly integrate with Kafka itself, as well as with external systems through the use of Kafka Connect. In addition to being easy to adopt, ksqlDB is also deployed on production stream processing applications requiring large scale and state.There are also other considerations beyond the strictly architectural. Local support availability, the administrative overhead of using a library versus a separate framework, and the availability of stream processing as a fully managed service all matter. Choosing a stream processing tool is a fraught decision partially because switching between them isn't trivial: the frameworks are different, the APIs are different, and the interfaces are different. In addition to the high-level discussion, Jeff and Matthias also share lots of details you can use to understand the options, covering employment models, transactions, batching, and parallelism, as well as a few interesting tangential topics along the way such as the tyranny of state and the Turing completeness of SQL.EPISODE LINKSThe Future of SQL: Databases Meet Stream ProcessingBuilding Real-Time Event Streams in the Cloud, On PremisesKafka Streams 101 courseksqlDB 101 courseWatch the video version of this podcastKris Jenkins' TwitterStreaming Audio Playlist Join the Confluent CommunityLearn more on Confluent DeveloperUse PODCAST100 for additional $100 of  Confluent Cloud usage (details)

MLOps.community
Real-Time Exactly-Once Event Processing with Apache Flink, Kafka, and Pinot //Jacob Tsafatinos // MLOps Coffee Sessions #97

MLOps.community

Play Episode Listen Later May 5, 2022 54:12


MLOps Coffee Sessions #97 with Jacob Tsafatinos, Real-Time Exactly-Once Event Processing with Apache Flink, Kafka, and Pinot co-hosted by Mihail Eric. // Abstract A few years ago Uber set out to create an ads platform for the Uber Eats app that relied heavily on three pillars; Speed, Reliability, and Accuracy. Some of the technical challenges they were faced with included exactly-once semantics in real-time. To accomplish this goal, they created the architecture diagram above with lots of love from Flink, Kafka, Hive, and Pinot. You can dig into the whole paper (https://go.mlops.community/k8gzZd) to see all the reasoning for their design decisions. // Bio Jacob Tsafatinos is a Staff Software Engineer at Elemy. He led the efforts of the Ad Events Processing system at Uber and has previously worked on a range of problems including data ingestion for search and machine learning recommendation pipelines. In his spare time, he can be found playing lead guitar in his band Good Kid. // MLOps Jobs board https://mlops.pallet.xyz/jobs // Related Links Uber blog https://eng.uber.com/author/jacob-tsafatinos/ https://eng.uber.com/real-time-exactly-once-ad-event-processing/ --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Mihail on LinkedIn: https://www.linkedin.com/in/mihaileric/ Connect with Jacob on LinkedIn: https://www.linkedin.com/in/jacobtsaf/ Timestamps: [00:00] Introduction to Jacob Tsafatinos [00:40] Takeaways [04:25] Jacob's band [05:29] Lyrics about software engineers or artistic stuff [06:20] Connection of hobby and real-time system [08:43] How to game Spotify Algorithm? [10:00] Data stack for analytics [13:28] Uber blog [16:28] Video mess up [17:04] Considerations and importance of the Uber System [21:22] Challenges encountered through the Uber System journey [26:06] Crucial to building the system [28:13] Not exactly real-time [30:22] Design decisions main questions [34:23] Testament to OSS [36:58] Real-time processing systems for analytical use cases vs Real-time processing systems for predictive use cases [38:46] Real-time systems necessity [41:04] Potential that opens up new doors [41:40] Runaway or learn it? [46:09] Real-time use case target [49:31] Resource constrained [50:48] ML Oops stories [52:45] Wrap up

Google Cloud Platform Podcast
Apache Beam with Kenneth Knowles and Pablo Estrada

Google Cloud Platform Podcast

Play Episode Listen Later Apr 6, 2022 39:28


On the podcast this week, your hosts Stephanie Wong and Mark Mirchandani talk about the data processing tool Apache Beam with guests Pablo Estrada and Kenneth Knowles. Kenn starts us off with an overview of how Apache Beam began and how Cloud Dataflow was involved. The unique batch and stream method and emphasis on correctness garnered support from developers early on and continues to attract users. Pablo helps us understand why Beam is a better option for certain projects looking to process large amounts of data. Our guests describe how Beam may be a better fit than microservices that could become obsolete as company needs change. Next, we step back and take a look at why batch and stream is the gold standard of data processing because of its balance between low latency and ease of “being done” with data collection. Beam's focus on the correctness of data and correctness in processing that data is a core component. With good data, processing becomes easier, more reliable, and cheaper. Kenn gives examples of how things can go wrong with bad data processing. Beam strives for the perfect combination of low latency, correct data, and affordability. Users can choose where to run Beam pipelines, from other Apache software offerings to Dataflow, which means excellent flexibility. Our guests talk about the pros and cons of some of these options and we hear examples of how companies are using Beam along with supporting software to solve data processing challenges. To get started with Beam, check out Beam College or attend Beam Summit 2022. Kenneth Knowles Kenn Knowles is chair of the Apache Beam Project Management Committee. Kenn has been working on Google Cloud Dataflow—Google's Beam backend—since 2014. Kenn holds a PhD in programming languages from the University of California, Santa Cruz. Pablo Estrada Pablo is a Software Engineer at Google, and a management committee member for Apache Beam. Pablo is big into working on an open source project, and has worked all across the Apache Beam stack. Cool things of the week Under the sea: Building the world's fiber optic internet video Discovering Data Centers videos Google Data Cloud Summit site It's official—Google Distributed Cloud Edge is generally available blog GCP Podcast Episode 228: Fastly with Tyler McMullen podcast Save big by temporarily suspending unneeded Compute Engine VMs—now GA blog Interview Apache Beam site Apache Beam Documentation site Dataflow site Apache Flink site Apache Spark site Apache Samza site Apache Nemo site Spanner site BigQuery site Beam College site Beam College on Github site Beam Developer Mailing List email Beam User Mailing List email Beam Summit site What's something cool you're working on? Mark is working on a new Apache Beam video series Getting Started Wtih Apache Beam Hosts Stephanie Wong and Mark Mirchandani

DMRadio Podcast
You *Can* Step in the Same Stream Twice

DMRadio Podcast

Play Episode Listen Later Dec 18, 2021 53:58


Streaming technology has upended the data business, largely thanks to Apache Kafka, but also because of other technologies such as Apache Flink and Apache Pulsar. In this episode, Host @eric_kavanagh interviews Paul Brebner, Instaclustr; along with Tim Spann and David Kjerrumgaard of StreamNative. 

Streaming Audio: a Confluent podcast about Apache Kafka
Powering Event-Driven Architectures on Microsoft Azure with Confluent

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Oct 14, 2021 38:42 Transcription Available


When you order a pizza, what if you knew every step of the process from the moment it goes in the oven to being delivered to your doorstep? Event-Driven Architecture is a modern, data-driven approach that describes “events” (i.e., something that just happened). A real-time data infrastructure enables you to provide such event-driven data insights in real time. Israel Ekpo (Principal Cloud Solutions Architect, Microsoft Global Partner Solutions, Microsoft) and Alicia Moniz (Cloud Partner Solutions Architect, Confluent) discuss use cases on leveraging Confluent Cloud and Microsoft Azure to power real-time, event-driven architectures. As an Apache Kafka® community stalwart, Israel focuses on helping customers and independent software vendor (ISV) partners build solutions for the cloud and use open source databases and architecture solutions like Kafka, Kubernetes, Apache Flink, MySQL, and PostgreSQL on Microsoft Azure. He's worked with retailers and those in the IoT space to help them adopt processes for inventory management with Confluent. Having a cloud-native, real-time architecture that can keep an accurate record of supply and demand is important in keeping up with the inventory and customer satisfaction. Israel has also worked with customers that use Confluent to integrate with Cosmos DB, Microsoft SQL Server, Azure Cognitive Search, and other integrations within the Azure ecosystem. Another important use case is enabling real-time data accessibility in the public sector and healthcare while ensuring data security and regulatory compliance like HIPAA. Alicia has a background in AI, and she expresses the importance of moving away from the monolithic, centralized data warehouse to a more flexible and scalable architecture like Kafka. Building a data pipeline leveraging Kafka helps ensure data security and consistency with minimized risk.The Confluent and Azure integration enables quick Kafka deployment with out-of-the-box solutions within the Kafka ecosystem. Confluent Schema Registry captures event streams with a consistent data structure, ksqlDB enables the development of real-time ETL pipelines, and Kafka Connect enables the streaming of data to multiple Azure services.EPISODE LINKSMicrosoft Azure at Kafka Summit AmericasIzzyAcademy Kafka on Azure Learning Series by Alicia MonizWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

Streaming Audio: a Confluent podcast about Apache Kafka
Using Apache Kafka and ksqlDB for Data Replication at Bolt

Streaming Audio: a Confluent podcast about Apache Kafka

Play Episode Listen Later Aug 26, 2021 29:15 Transcription Available


What does a ride-hailing app that offers micromobility and food delivery services have to do with data in motion? In this episode, Ruslan Gibaiev (Data Architect, Bolt) shares about Bolt's road to adopting Apache Kafka® and ksqlDB for stream processing to replicate data from transactional databases to analytical warehouses. Rome wasn't built overnight, nor was the adoption of Kafka and  ksqlDB at Bolt. Initially, Bolt noticed the need for system standardization and replacing the unreliable query-based change data capture (CDC) process. As an experienced Kafka developer, Ruslan believed that Kafka is the solution for adopting change data capture as a company-wide event streaming solution. Persuading the team at Bolt to adopt and buy in was hard at first, but Ruslan made it possible. Eventually, the team replaced query-based CDC with log-based CDC from Debezium, built on top of Kafka. Shortly after the implementation, developers at Bolt began to see precise, correct, and real-time data. As Bolt continues to grow, they see the need to implement a data lake or a data warehouse for OTP system data replication and stream processing. After carefully considering several different solutions and frameworks such as ksqlDB, Apache Flink®, Apache Spark™, and Kafka Streams, ksqlDB shines most for their business requirement. Bolt adopted ksqlDB because it is native to the Kafka ecosystem, and it is a perfect fit for their use case. They found ksqlDB to be a particularly good fit for replicating all their data to a data warehouse for a number of reasons, including: Easy to deploy and manageLinearly scalableNatively integrates with Confluent Schema Registry Turn in to find out more about Bolt's adoption journey with Kafka and ksqlDB. EPISODE LINKSInside ksqlDB Course ksqlDB 101 CourseHow Bolt Has Adopted Change Data Capture with Confluent PlatformAnalysing Changes with Debezium and Kafka StreamsNo More Silos: How to Integrate Your Databases with Apache Kafka and CDCChange Data Capture with Debezium ft. Gunnar MorlingAnnouncing ksqlDB 0.17.0Real-Time Data Replication with ksqlDBWatch the video version of this podcastJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)

The Cloud Pod
116: The Cloud Pod is positively charged for AWS Proton

The Cloud Pod

Play Episode Listen Later May 12, 2021 60:20


This week on The Cloud Pod, Yahoo is back and cheaper than ever. Just kidding, it's Ryan who is back and the team is curious as to how he managed to extricate himself out from under that kitten.   A big thanks to this week's sponsors: Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. This week's highlights Amazon has been doing yoga and the results are paying off. Google bought a hard hat and is getting into the construction business. If you need to get your kid to sleep, let them read this from Azure. General News: Yahoo's Renaissance Verizon dumps Yahoo-AOL for rock-bottom price. But they're not dead yet! Amazon posts record profits as AWS hits $54B annual run rate. That's pretty good! Microsoft beats Q3 revenue expectations, spurred by strong cloud sales. Get on the bandwagon, Azure. Alphabet announces first quarter results for 2021. It does include GCP and G-Suite revenue.    Cloud infrastructure spending grew 35% to $41.8B in Q1 2021. These numbers boggle our minds. JEDI: Just Keeps Getting Better Court snubs Microsoft and the U.S. government's request to throw out Amazon’s complaint against JEDI cloud contract decision. We can't wait to hear what Trump says under oath.  Amazon Web Services: Bring Your Own Talent AWS is launching Amazon FinSpace, a data management and analytics solution. Step one, invent the universe.  AWS Proton introduces customer-managed environments. We had to look up what Proton actually is.  AWS Proton allows adding and removing instances from an existing service. We're looking forward to some re:Invent sessions on this.    Amazon launches CloudFront Functions for the lowest possible latency. A great solution that can reduce your costs quite a bit.   Happy 10th birthday to AWS Identity and Access Management. Ten years on and still a pain in the ass. Introducing Amazon Nimble, a new service that creative studios can use to produce visual effects, animations and interactive content entirely in the cloud. More verticalization!   Google Cloud Platform: If You Hate Money Google wants customers to move their vSphere 5.5+ to Google Cloud VMware Engine. Taking the responsibility away from engineering teams.    Databricks on Google Cloud is now generally available. A good play by Google.   Google has released its Liquibase Cloud Spanner extension. In theory, you should be able to roll back…  Google Cloud and the DORA research team are excited to launch the 2021 state of DevOps survey. We highly recommend you check this out. Google announces the Google Kubernetes Engine Gateway Controller is now in preview. Check this out if you're tired of service mesh.     Google is here to tell you six more reasons GKE is the best K8 service. Stay tuned for more announcements from Kubecon EU 2021 next week.  Google Cloud announces a new region to support growing customer base in Israel. Although this is great, it hasn't told us when or where it will be built.    Azure: The Best We Could Do Azure is announcing the preview of Azure Web PubSub service for building real-time web applications with websockets. Welcome to the club — you're a little late, Microsoft. TCP Lightning Round Jonathan is winning with waffles and takes this week's point, leaving scores at Justin (7), Ryan (3), Jonathan (7).  Other headlines mentioned: Amazon Redshift announces support for hierarchical data queries with Recursive CTE Amazon Connect Customer Profiles launches Identity Resolution in Preview to detect and merge duplicate customer profiles Amazon Kinesis Data Analytics for Apache Flink introduces custom maintenance windows in preview Amazon ECS on AWS Fargate now allows you to configure the size of ephemeral storage for your Tasks Announcing support for linear interpolation in AWS IoT SiteWise Easily clean up unused resources in Amazon Forecast using hierarchical deletion   Amazon CloudWatch Monitoring Framework for Apache is generally available AWS Snow Family now enables you to order, track, and manage long-term pricing Snow jobs AWS Glue DataBrew announces native console integration with Amazon AppFlow to connect to data from SaaS (Software as a Service) applications and AWS services (in Preview) Introducing AWS for media and entertainment AWS Identity and Access Management (IAM) now makes it easier for you to manage permissions for AWS services accessing your resources General availability: Azure Site Recovery now supports cross-continental disaster recovery for 3 region pairs Google Introducing Open Saves: Open-source cloud-native storage for game  Things Coming Up Announcing Google Cloud 2021 Summits [frequently updated] Save the date: AWS Containers events in May AWS Regional Summits — May 10–19 AWS Summit Online Americas — May 12–13 Microsoft Build — May 19–21 (Digital) Google Financial Services Summit — May 27th  Harness Unscripted Conference — June 16–17 Google Cloud Next — Not announced yet (one site says Moscone is reserved June 28–30) Google Cloud Next 2021 — October 12–14, 2021 AWS re:Invent — November 29–December 3 — Las Vegas Oracle Open World (no details yet)

Software Crafts Podcast
Interview with James Urquhart

Software Crafts Podcast

Play Episode Listen Later Apr 27, 2021 35:56


In this episode, we host James Urquhart. James is challenged with the heuristic “Use of standard interfaces and protocols for event-driven integration”, based on his recent work. We discuss the changes in the behaviour of teams creating software when they embrace an event-driven integration, together with leveraging engineering practices like continuous delivery. James also shares his experiences with value streams and the impacts on software architecture. By using event-driven integration and flow architectures to unlock the value within organisations and also between organisations. He predicts that the next decade will be very dynamic in this space, and technology and cost of ownership can be potential roadblocks. James recommends: Scale: The Universal Laws of Life, Growth, and Death in Organisms, Cities, and  Companies from Geoffrey West Open resources as Kafka, Apache Flink and Swin.AI Flow Architectures from James Urquhart Wardley Maps Promise Theory James Urquhart is a Strategic Executive Advisor for VMware Tanzu customers. James brings almost 30 years of experience in distributed applications development, deployment, and operations, focusing on software as a complex adaptive system, cloud-native applications and platforms, and automation. Prior to joining VMware, via Pivotal, James ran product and engineering teams for AWS, SOASTA, and Dell (via Enstratius). James has also written and spoken extensively about software agility and the business opportunities it affords. James was named one of the ten most influential people in cloud computing by both the MIT Technology Review and the Huffington Post and is a former contributing author to GigaOm and CNET. He recently completed a book on event-driven integration for O'Reilly Publishing titled "Flow Architectures: The Future of Event-Driven Integration". James graduated from Macalester College with a Bachelor of Arts in Mathematics and Physics.

Eventador Streams: All Things Streaming Data
Crushing Apache Flink-based Streaming Systems with special guest Gyula Fora

Eventador Streams: All Things Streaming Data

Play Episode Listen Later Aug 27, 2020 39:58


In this episode, we dive into the intricacies and fun of not only developing Apache Flink but also of deploying and managing it in one of the world's most sophisticated streaming pipelines with special guest Gyula Fóra.

Eventador Streams: All Things Streaming Data
Getting More Out of Streaming SQL with the Blink Planner in Apache Flink

Eventador Streams: All Things Streaming Data

Play Episode Listen Later Aug 6, 2020 35:57


In this episode, we take a look at all the ways streaming SQL has evolved—especially with the most recent releases of Apache Flink and the Blink planner. From rank to last value and more, we talk about all the cool new functions that the Blink planner unlocks for streaming data.

DevZen Podcast
Свежевыжатый Flink — Episode 0296

DevZen Podcast

Play Episode Listen Later Jul 12, 2020 132:31


В этом выпуске: рекламируем ICFPC 2020, который проводит команда из России!!; SааS (youtube) опять ненадежны; обсуждаем разные вопросы, связанные с мотивацией; сочно рассказываем про новый релиз Apache Flink; и другие вопросы; а также вопросы слушателей Шоу нотес: [00:04:02] ICFPC 2020 ICFP Programming Contest 2020 https://twitter.com/icfpcontest2020 https://codeforces.com/blog/entry/79846 [00:07:05] Mikhail Goncharov сложил квадрат! Задачка-головоломка на программирование «собери… Читать далее →

Datacast
Episode 36: Machine Learning Bookcamp with Alexey Grigorev

Datacast

Play Episode Listen Later Jul 3, 2020 65:37


Show Notes(2:00) Alexey studied Information Systems and Technologies from a local university in his hometown in eastern Russia.(4:54) Alexey commented on his experience working as a Java developer in the first three years after college in Russia and Poland, along with his initial exposure to Machine Learning thanks to Coursera.(7:55) Alexey talked about his decision to pursue the IT4BI Master Program specializing in Large-Scale Business Intelligence in 2013.(9:42) Alexey discussed his time working as a Research Assistant on Apache Flink at the DIMA Group at TU Berlin.(12:28) Alexey’s Master Thesis is called Semantification of Identifiers in Mathematics for Better Math Information Retrieval, which was later presented at the SIGIR conference on R&D in Information Retrieval in 2016.(14:35) Alexey discussed his first job as a Data Scientist at Searchmetrics - working on projects to help content marketers improve SEO ranking for their articles.(18:54) Alexey’s next role was with the ad-tech company Simplaex. There, he designed, developed, and maintained the ML infrastructure for processing 3+ billion events per day with 100+ million unique daily users - working with tools like Spark for data engineering tasks.(22:17) Alexey reflected on his journey participating in Kaggle competitions.(25:35) Alexey also participated in other competitions at academic conferences: winning 2nd place at the Web Search and Data Mining 2017 challenge on Vandalism Detection and winning 1st place at the NIPS 2017 challenge on Ad Placement.(29:59) Alexey authored his first book called Mastering Java for Data Science, which teaches readers how to create data science applications with Java.(31:40) Alexey then transitioned to a Data Scientist role at OLX Group, a global marketplace for online classified advertisements.(33:23) Alexey explained the ML system that detects duplicates of images submitted to the OLX marketplace, which he presented at PyData Berlin 2019. Read his two-part blog series: The first post presents a two-step framework for duplicate detection, and the second post explains how his team served and deployed this framework at scale.(38:12) Alexey was recently involved in building an infrastructure for serving image models at OLX. Read his two-part blog series on this evolution of image model serving at OLX, including the transition from AWS SageMaker to Kubernetes for model deployment, as well as the utilization of AWS Athena and MXNet for design simplification.(42:39) Alexey is in the process of writing a technical book called Machine Learning Bookcamp - which encourages readers to learn machine learning by doing projects.(46:17) Alexey discussed common struggles during data science interviews, referring to his talk on Getting a Data Science Job.(48:32) Alexey has put together a neat GitHub page that includes both theoretical and technical questions for people who are preparing for interviews.(52:19) Alexey extrapolated on the steps needed to become a better data scientist, in conjunction to his LinkedIn post a while back.(56:40) Alexey gave his advice for software engineers looking to transition into data science.(58:32) Alexey shared his opinion on the data science community in Berlin.(01:01:53) Closing segment.His Contact InfoWebsiteTwitterLinkedInGitHubKaggleQuoraGoogle ScholarMediumHis Recommended ResourcesApache FlinkKubeflowData Science Interviews GitHub RepoPyData BerlinBerlin BuzzwordsAndrew NgDesigning Data-Intensive Applications by Martin KleppmannMachine Learning BookcampPermanent 40$ discount code: poddcast195 free eBook codes (each good for one sample of the book): mlbdrt-D452, mlbdrt-5922, mlbdrt-2C4D, mlbdrt-3034, mlbdrt-1DD1

DataCast
Episode 36: Machine Learning Bookcamp with Alexey Grigorev

DataCast

Play Episode Listen Later Jul 3, 2020 65:37


Show Notes(2:00) Alexey studied Information Systems and Technologies from a local university in his hometown in eastern Russia.(4:54) Alexey commented on his experience working as a Java developer in the first three years after college in Russia and Poland, along with his initial exposure to Machine Learning thanks to Coursera.(7:55) Alexey talked about his decision to pursue the IT4BI Master Program specializing in Large-Scale Business Intelligence in 2013.(9:42) Alexey discussed his time working as a Research Assistant on Apache Flink at the DIMA Group at TU Berlin.(12:28) Alexey’s Master Thesis is called Semantification of Identifiers in Mathematics for Better Math Information Retrieval, which was later presented at the SIGIR conference on R&D in Information Retrieval in 2016.(14:35) Alexey discussed his first job as a Data Scientist at Searchmetrics - working on projects to help content marketers improve SEO ranking for their articles.(18:54) Alexey’s next role was with the ad-tech company Simplaex. There, he designed, developed, and maintained the ML infrastructure for processing 3+ billion events per day with 100+ million unique daily users - working with tools like Spark for data engineering tasks.(22:17) Alexey reflected on his journey participating in Kaggle competitions.(25:35) Alexey also participated in other competitions at academic conferences: winning 2nd place at the Web Search and Data Mining 2017 challenge on Vandalism Detection and winning 1st place at the NIPS 2017 challenge on Ad Placement.(29:59) Alexey authored his first book called Mastering Java for Data Science, which teaches readers how to create data science applications with Java.(31:40) Alexey then transitioned to a Data Scientist role at OLX Group, a global marketplace for online classified advertisements.(33:23) Alexey explained the ML system that detects duplicates of images submitted to the OLX marketplace, which he presented at PyData Berlin 2019. Read his two-part blog series: The first post presents a two-step framework for duplicate detection, and the second post explains how his team served and deployed this framework at scale.(38:12) Alexey was recently involved in building an infrastructure for serving image models at OLX. Read his two-part blog series on this evolution of image model serving at OLX, including the transition from AWS SageMaker to Kubernetes for model deployment, as well as the utilization of AWS Athena and MXNet for design simplification.(42:39) Alexey is in the process of writing a technical book called Machine Learning Bookcamp - which encourages readers to learn machine learning by doing projects.(46:17) Alexey discussed common struggles during data science interviews, referring to his talk on Getting a Data Science Job.(48:32) Alexey has put together a neat GitHub page that includes both theoretical and technical questions for people who are preparing for interviews.(52:19) Alexey extrapolated on the steps needed to become a better data scientist, in conjunction to his LinkedIn post a while back.(56:40) Alexey gave his advice for software engineers looking to transition into data science.(58:32) Alexey shared his opinion on the data science community in Berlin.(01:01:53) Closing segment.His Contact InfoWebsiteTwitterLinkedInGitHubKaggleQuoraGoogle ScholarMediumHis Recommended ResourcesApache FlinkKubeflowData Science Interviews GitHub RepoPyData BerlinBerlin BuzzwordsAndrew NgDesigning Data-Intensive Applications by Martin KleppmannMachine Learning BookcampPermanent 40$ discount code: poddcast195 free eBook codes (each good for one sample of the book): mlbdrt-D452, mlbdrt-5922, mlbdrt-2C4D, mlbdrt-3034, mlbdrt-1DD1

Eventador Streams: All Things Streaming Data
The Fun of Stateful Functions and Apache Flink with special guest Stephan Ewen

Eventador Streams: All Things Streaming Data

Play Episode Listen Later Jun 23, 2020 35:10


In this episode, we had the opportunity to learn about Stateful Functions as well as dig in more on the beginnings—and the future—of Apache Flink with Ververica co-founder and CTO Stephan Ewen.

Software Engineering Daily
Data Infrastructure Investing with Eric Anderson

Software Engineering Daily

Play Episode Listen Later Feb 20, 2020 72:07


In a modern data platform, distributed streaming systems are used to read data coming off of an application in real-time. There are a wide variety of streaming systems, including Kafka Streams, Apache Samza, Apache Flink, Spark Streaming, and more.  When Eric Anderson joined the show back in 2016, he was working at Google on Google The post Data Infrastructure Investing with Eric Anderson appeared first on Software Engineering Daily.

Podcast – Software Engineering Daily
Data Infrastructure Investing with Eric Anderson

Podcast – Software Engineering Daily

Play Episode Listen Later Feb 20, 2020 72:07


In a modern data platform, distributed streaming systems are used to read data coming off of an application in real-time. There are a wide variety of streaming systems, including Kafka Streams, Apache Samza, Apache Flink, Spark Streaming, and more.  When Eric Anderson joined the show back in 2016, he was working at Google on Google The post Data Infrastructure Investing with Eric Anderson appeared first on Software Engineering Daily.

Data – Software Engineering Daily
Data Infrastructure Investing with Eric Anderson

Data – Software Engineering Daily

Play Episode Listen Later Feb 20, 2020 72:07


In a modern data platform, distributed streaming systems are used to read data coming off of an application in real-time. There are a wide variety of streaming systems, including Kafka Streams, Apache Samza, Apache Flink, Spark Streaming, and more.  When Eric Anderson joined the show back in 2016, he was working at Google on Google The post Data Infrastructure Investing with Eric Anderson appeared first on Software Engineering Daily.

Splunk [All Products] 2019 .conf Videos w/ Slides
Data Stream Processor: Architecture and SDKs [Splunk Data Fabric Search and Data Stream Processor]

Splunk [All Products] 2019 .conf Videos w/ Slides

Play Episode Listen Later Dec 23, 2019


Popular stream processing frameworks (such as Apache Spark Streaming, Apache Flink, and Apache Kafka Streams) make stream processing accessible to developers with language bindings typically in Java, Scala, and Python. These frameworks also include some variant of streaming SQL support to further expand the accessibility of large-scale, low-latency, high-throughput stream processing. What's missing is bringing the world of stream processing to the Business Intelligence user. At Splunk we've built a tool called Splunk Data Stream Processor (DSP) to fill this gap. In this session, Max and Sharon will present the design and architecture of DSP. We will compare it with other stream processing frameworks to show you how DSP allows users to visually author and preview stream processing pipelines and instantly deploy them at scale. We will also present our developer SDKs, allowing third-party custom functions to be developed and integrated for data processing. With its high level abstractions for business users and extensible framework for developers, Data Stream Processor makes stream processing accessible to the widest possible audience. Speaker(s) Sharon Xie, Sr. Software Engineer, Splunk Max Feng, Software Engineer, Splunk Slides PDF link - https://conf.splunk.com/files/2019/slides/DEV1317.pdf?podcast=1577146224 Product: Splunk Data Fabric Search and Data Stream Processor Track: Developer Level: Intermediate

Splunk [Data Fabric Search and Data Stream Processor] 2019 .conf Videos w/ Slides
Data Stream Processor: Architecture and SDKs [Splunk Data Fabric Search and Data Stream Processor]

Splunk [Data Fabric Search and Data Stream Processor] 2019 .conf Videos w/ Slides

Play Episode Listen Later Dec 23, 2019


Popular stream processing frameworks (such as Apache Spark Streaming, Apache Flink, and Apache Kafka Streams) make stream processing accessible to developers with language bindings typically in Java, Scala, and Python. These frameworks also include some variant of streaming SQL support to further expand the accessibility of large-scale, low-latency, high-throughput stream processing. What's missing is bringing the world of stream processing to the Business Intelligence user. At Splunk we've built a tool called Splunk Data Stream Processor (DSP) to fill this gap. In this session, Max and Sharon will present the design and architecture of DSP. We will compare it with other stream processing frameworks to show you how DSP allows users to visually author and preview stream processing pipelines and instantly deploy them at scale. We will also present our developer SDKs, allowing third-party custom functions to be developed and integrated for data processing. With its high level abstractions for business users and extensible framework for developers, Data Stream Processor makes stream processing accessible to the widest possible audience. Speaker(s) Sharon Xie, Sr. Software Engineer, Splunk Max Feng, Software Engineer, Splunk Slides PDF link - https://conf.splunk.com/files/2019/slides/DEV1317.pdf?podcast=1577146266 Product: Splunk Data Fabric Search and Data Stream Processor Track: Developer Level: Intermediate

Splunk [Developer Track] 2019 .conf Videos w/ Slides
Data Stream Processor: Architecture and SDKs [Splunk Data Fabric Search and Data Stream Processor]

Splunk [Developer Track] 2019 .conf Videos w/ Slides

Play Episode Listen Later Dec 23, 2019


Popular stream processing frameworks (such as Apache Spark Streaming, Apache Flink, and Apache Kafka Streams) make stream processing accessible to developers with language bindings typically in Java, Scala, and Python. These frameworks also include some variant of streaming SQL support to further expand the accessibility of large-scale, low-latency, high-throughput stream processing. What's missing is bringing the world of stream processing to the Business Intelligence user. At Splunk we've built a tool called Splunk Data Stream Processor (DSP) to fill this gap. In this session, Max and Sharon will present the design and architecture of DSP. We will compare it with other stream processing frameworks to show you how DSP allows users to visually author and preview stream processing pipelines and instantly deploy them at scale. We will also present our developer SDKs, allowing third-party custom functions to be developed and integrated for data processing. With its high level abstractions for business users and extensible framework for developers, Data Stream Processor makes stream processing accessible to the widest possible audience. Speaker(s) Sharon Xie, Sr. Software Engineer, Splunk Max Feng, Software Engineer, Splunk Slides PDF link - https://conf.splunk.com/files/2019/slides/DEV1317.pdf?podcast=1577146192 Product: Splunk Data Fabric Search and Data Stream Processor Track: Developer Level: Intermediate

Data Engineering Podcast
Stateful, Distributed Stream Processing on Flink with Fabian Hueske - Episode 57

Data Engineering Podcast

Play Episode Listen Later Nov 18, 2018 48:01


Modern applications and data platforms aspire to process events and data in real time at scale and with low latency. Apache Flink is a true stream processing engine with an impressive set of capabilities for stateful computation at scale. In this episode Fabian Hueske, one of the original authors, explains how Flink is architected, how it is being used to power some of the world's largest businesses, where it sits in the lanscape of stream processing tools, and how you can start using it today.

Software Engineering Radio - The Podcast for Professional Software Developers
SE-Radio Episode 346: Stephan Ewen on Streaming Architecture

Software Engineering Radio - The Podcast for Professional Software Developers

Play Episode Listen Later Nov 14, 2018 62:56


Edaena Salinas talks with Stephen Ewen about streaming architecture. Stephen is one of the original creators of Apache Flink. Topics discussed: stream processing vs batch processing, architecture components of stream architectures, Apache Flink...

Software Engineering Radio - The Podcast for Professional Software Developers
SE-Radio Episode 346: Stephan Ewen on Streaming Architecture

Software Engineering Radio - The Podcast for Professional Software Developers

Play Episode Listen Later Nov 14, 2018 62:56


Stephen Ewen, one of the original creator of Apache Flink discusses streaming architecture. Streaming architecture has become more important because it enables real-time computation on big data. Edaena Salinas spoke with Stephen Ewen about the comparison between batch processing and stream processing. Stephen explained the architecture components and the types of applications that can be […]

The InfoQ Podcast
Streaming: Danny Yuan on Real-Time, Time Series Forecasting @Uber

The InfoQ Podcast

Play Episode Listen Later Mar 31, 2018 26:59


On this week’s podcast, Danny Yuan, Uber’s Real-time Streaming/Forecasting Lead, lays out a thorough recipe book for building a real-time streaming platform with a major focus on forecasting. In this podcast, Danny discusses everything from the scale Uber operates at to what the major steps for training/deploy models in an iterative (almost Darwinistic) fashion and wraps with his advice for software engineers who want to begin applying machine learning into their day-to-day job. Why listen to this podcast: * Uber processes 850,000 - 1.3 million messages per second in their streaming platform with about 12 TB of growth per day. The system’s queries scan 100 million to 4 billion documents per second. * Uber’s frontend is mobile. The frontend talks to an API layer. All services generate events that are shuffled into Kafka. The real-time forecasting pipeline taps into Kafka to processes events and stores the data into Elasticsearch. * There is a federated query layer in front of Elasticsearch to provide OLAP query capabilities. * Apache Flink’s advanced windowing features, programming model, and checkpointing convinced Uber to move away from the simplicity of Apache Samza. * The forecasting system allows Uber to remove the notion of delay by using recent signals plus historical data to project what is happening now and what will happen into the future. * Uber’s pipeline for deploying ML models: HDFS, feature engineering, organizing into data structures (similar to data frames), deploy mostly offline training models, train models, & store into a container-based model manager. * A model serving layer is used to pick which model to use, forecasting results are stored in an OLAP data store, a validation layer compares real results against forecast results to verify the model is working as desired, and a rollback feature enables poor performing models to be automatically replaced by previous one. * “Without output, you don’t have input.” If you want to start leveraging machine learning, developers just need to start doing. Start with intuition and practice. Over time ask questions and learn what you need, then apply a laser focus to gain that knowledge. You can also subscribe to the InfoQ newsletter to receive weekly updates on the hottest topics from professional software development. bit.ly/24x3IVq Subscribe: www.youtube.com/infoq Like InfoQ on Facebook: bit.ly/2jmlyG8 Follow on Twitter: twitter.com/InfoQ Follow on LinkedIn: www.linkedin.com/company/infoq Check the landing page on InfoQ: https://bit.ly/2GJQbUo

THE ARCHITECHT SHOW
Ep. 50: Apache Flink creator Stephan Ewen on stream processing and making a name in open source

THE ARCHITECHT SHOW

Play Episode Listen Later Mar 28, 2018 46:16


In this episode of the ARCHITECHT Show, Stephan Ewen discusses the open source stream-processing system Apache Flink (which he helped create) and the commercial platform data Artisans (of which he's co-founder and CTO) is building around it. Ewan covers some of the major use cases for stream processing and some of Flink's biggest users -- including Alibaba -- and the challenges of standing out against, and sometimes working alongside, established projects such as Apache Spark, Apache Storm and Apache Kafka.

Big Data Beard
Changing the Processing Paradigm with Pravega and Apache Flink

Big Data Beard

Play Episode Listen Later Mar 27, 2018 37:35


Live from Strata Data San Jose, the team dives into the world of batch and stream processing with guests Fabian from Data Artisans and Flavio from Dell EMC. Our guests detail an open source streaming data stack consisting of Pravega (stream storage) an...

Roaring Elephant
Episode 73 – Roaring News

Roaring Elephant

Play Episode Listen Later Feb 6, 2018 34:58


In this edition of the Roaring News series, we talk about delivering business value and how to build an analytics team. For the Machine learning aficionados, we cover the top ML algorithms and we round off with an article on sizing a Apache Flink cluster, which fits nicely with the previous and next episode! Breaking News Delivering Business Value with Big Data Projects https://www.techrepublic.com/article/4-tips-for-delivering-more-business-value-with-short-term-big-data-projects/ Sizing Flink (and other streaming?) https://data-artisans.com/blog/how-to-size-your-apache-flink-cluster-general-guidelines Building The Analytics Team At Wish Part 1 — Rebuilding The Foundation Part 2 — Scaling Data Engineering Part 3 — Scaling Data Analysis Part 4 — Recruiting A Tour of The Top 10 Algorithms for Machine Learning Newbies https://towardsdatascience.com/a-tour-of-the-top-10-algorithms-for-machine-learning-newbies-dde4edffae11 Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to c

Bigdata Hebdo
Episode 52 : Cosmos speaks Cassandra

Bigdata Hebdo

Play Episode Listen Later Dec 15, 2017 72:15


Reaper 1.0 Has Been Released!http://thelastpickle.com/blog/2017/11/14/reaper-10-announcement.htmlDear Cassandra Developers, welcome to Azure #CosmosDB!https://azure.microsoft.com/en-us/blog/dear-cassandra-developers-welcome-to-azure-cosmosdb/https://venturebeat.com/2017/11/15/microsoft-updates-cosmos-db-with-cassandra-support-better-availability-guarantees/Introducing Azure Databrickshttps://databricks.com/blog/2017/11/15/introducing-azure-databricks.html?utm_content=63154906&utm_medium=social&utm_source=twitterGoogle Cloud Spanner goes multi-regionhttp://www.zdnet.com/article/google-cloud-spanner-goes-multi-region/Transactions in Apache Kafkahttps://www.confluent.io/blog/transactions-apache-kafka/KSQL Developer Preview november updatehttps://www.confluent.io/blog/november-update-ksql-developer-preview-available/Looking Ahead to Apache Flink 1.4.0 and 1.5.0https://data-artisans.com/blog/looking-ahead-apache-flink-1-4-1-5Elasticsearch 6.0: not that new, but quite improvedhttp://www.zdnet.com/article/elasticsearch-6-0-not-that-new-but-quite-improved/À la rencontre des « raters », petites mains des « big data »http://theconversation.com/a-la-rencontre-des-raters-petites-mains-des-big-data-86484http://books.openedition.org/cdf/5013The Washington Post Is A Software Company Nowhttps://www.fastcompany.com/40495770/the-washington-post-is-a-software-company-nowDevoxx 2017https://www.youtube.com/playlist?list=PLRsbF2sD7JVqZ4RpHYkqSuCNhxumGP5eoLisez le blog D'affini-Techhttp://blog.affini-tech.com-------------------------------------------------------------http://www.bigdatahebdo.com https://twitter.com/bigdatahebdoVincent : https://twitter.com/vhe74Alexander : https://twitter.com/alexanderdeja

Bigdata Hebdo
Episode 52 : Cosmos speaks Cassandra

Bigdata Hebdo

Play Episode Listen Later Dec 15, 2017 72:15


Reaper 1.0 Has Been Released!http://thelastpickle.com/blog/2017/11/14/reaper-10-announcement.htmlDear Cassandra Developers, welcome to Azure #CosmosDB!https://azure.microsoft.com/en-us/blog/dear-cassandra-developers-welcome-to-azure-cosmosdb/https://venturebeat.com/2017/11/15/microsoft-updates-cosmos-db-with-cassandra-support-better-availability-guarantees/Introducing Azure Databrickshttps://databricks.com/blog/2017/11/15/introducing-azure-databricks.html?utm_content=63154906&utm_medium=social&utm_source=twitterGoogle Cloud Spanner goes multi-regionhttp://www.zdnet.com/article/google-cloud-spanner-goes-multi-region/Transactions in Apache Kafkahttps://www.confluent.io/blog/transactions-apache-kafka/KSQL Developer Preview november updatehttps://www.confluent.io/blog/november-update-ksql-developer-preview-available/Looking Ahead to Apache Flink 1.4.0 and 1.5.0https://data-artisans.com/blog/looking-ahead-apache-flink-1-4-1-5Elasticsearch 6.0: not that new, but quite improvedhttp://www.zdnet.com/article/elasticsearch-6-0-not-that-new-but-quite-improved/À la rencontre des « raters », petites mains des « big data »http://theconversation.com/a-la-rencontre-des-raters-petites-mains-des-big-data-86484http://books.openedition.org/cdf/5013The Washington Post Is A Software Company Nowhttps://www.fastcompany.com/40495770/the-washington-post-is-a-software-company-nowDevoxx 2017https://www.youtube.com/playlist?list=PLRsbF2sD7JVqZ4RpHYkqSuCNhxumGP5eoLisez le blog D'affini-Techhttp://blog.affini-tech.com-------------------------------------------------------------http://www.bigdatahebdo.com https://twitter.com/bigdatahebdoVincent : https://twitter.com/vhe74Alexander : https://twitter.com/alexanderdeja

Lightbend
Running Fast Data Applications - Resilient Production Tools And One-Stop Support

Lightbend

Play Episode Listen Later Dec 12, 2017 14:50


In this Lightbend Fast Data Platform interview for Operations teams, Michael Nash and Oliver White of Lightbend discuss the value of resilient production tooling, and the benefits of one-stop shop support for various technologies integrated with Fast Data Platform. This conversation focuses on: • Why is it difficult to maintain always-on, dynamic workloads without resilient tooling, as well as managing multiple support contracts with various vendors? • What new challenges (and risks) are presented by the significantly more complex streaming, Fast Data ecosystem? • How Lightbend Fast Data Platform can set your mind at ease by providing one-stop shop for support that covers Akka Streams, Apache Spark, Apache Kafka, Apache Flink, Mesosphere DC/OS, Lightbend Reactive Platform and more. Learn more at: https://lightbend.com/fast-data

Roaring Elephant
Episode 61 – Roaring News

Roaring Elephant

Play Episode Listen Later Nov 14, 2017 31:09


In this episode of Roaring News, we talk about the seemingly inevitable block chain, Fraud detection in banking and a celebration of the DevOps engineer. Dave: The continued journey to understand enterprise usage of block-chain http://fortune.com/2017/10/17/blockchain-berners-lee/ https://www.hyperledger.org/blog/2017/10/17/qa-does-blockchain-alleviate-security-concerns-or-create-new-challenges Jhon: StreamING Machine Learning Models: How ING Adds Fraud Detection Models at Runtime with Apache Flink® https://data-artisans.com/blog/real-time-fraud-detection-ing-bank-apache-flink DevOps might be the key to your Big Data project success https://datahub.packtpub.com/big-data/devops-for-big-data-success/ Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Bigdata Hebdo
Episode 47 : Kafka, SQL, Beam and co

Bigdata Hebdo

Play Episode Listen Later Sep 8, 2017 77:59


Exactly-once Semantics are Possible: Here’s How Kafka Does ithttps://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/?utm_content=buffer9b1b6&utm_medium=social&utm_source=twitter.com&utm_campaign=bufferhttps://blog.ippon.fr/2017/07/11/kafka-0-11-0-%E2%99%A5/Confluent KSQLhttps://www.confluent.io/blog/ksql-open-source-streaming-sql-for-apache-kafka/https://www.youtube.com/watch?v=A45uRzJiv7I&feature=youtu.beKafka + Prestodb.iohttps://prestodb.io/docs/current/connector/kafka.htmlStreaming SQL in Apache Flink, KSQL, and Stream Processing for Everyonehttps://data-artisans.com/blog/flink-streaming-sql-ksql-stream-processingKafka Wakes Up And Is Metamorphosed Into A Databasehttps://www.nextplatform.com/2017/08/30/kafka-wakes-metamorphosed-database/amp/(Editor’s Note: It would have been far funnier, of course, if Kafka woke up one morning and had been turned into CockroachDB.)Open sourcing Kafka cruise controlhttps://engineering.linkedin.com/blog/2017/08/open-sourcing-kafka-cruise-controlhttps://github.com/linkedin/cruise-controlYahoo’s New Pulsar: A Kafka Competitor?https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/Apache Beam 2.1https://beam.apache.org/get-started/downloads/Apache Beam splittable DoFnhttps://beam.apache.org/blog/2017/08/16/splittable-do-fn.htmlInstaclustr Dynamic Resizing for Apache Cassandrahttps://www.instaclustr.com/instaclustr-dynamic-resizing-for-apache-cassandra/?utm_content=buffer624e7&utm_medium=social&utm_source=twitter.com&utm_campaign=bufferRiak devs giddy over gambling biz's vow to set code freehttps://www.theregister.co.uk/2017/08/25/bet365_to_buy_basho_release_code/?mt=1503782778086Spark Release 2.2.0http://spark.apache.org/releases/spark-release-2-2-0.html[mooc] Specialisation Data-Engineering Google Cloud sur Courserahttps://fr.coursera.org/specializations/gcp-data-machine-learning[podcast] Y a-t-il un cerveau dans la machine ? une interview de Yann Le Cun, directeur du FAIRhttps://www.franceculture.fr/emissions/la-methode-scientifique/y-t-il-un-cerveau-dans-la-machine[podcast] DREMEL, DRUID AND DATA MODELING ON GOOGLE BIGQUERY' https://www.drilltodetail.com/podcast/2017/6/19/drill-to-detail-ep31-dremel-druid-and-data-modeling-on-google-bigquery-with-special-guest-dan-mcclary[privacy] comment les apps Figaro, L’Équipe ou Closer participent au pistage de 10 millions de Françaishttp://www.numerama.com/politique/282934-enquete-comment-les-apps-figaro-lequipe-ou-closer-participent-au-pistage-de-10-millions-de-francais.htmlComment l’intelligence artificielle bouleverse l’industrie des médiashttp://www.latribune.fr/opinions/tribunes/comment-l-intelligence-artificielle-bouleverse-l-industrie-des-medias-746917.htmlCédric Villani est chargé d'une mission d'information parlementaire sur l'IA.http://www.numerama.com/politique/286341-le-gouvernement-fait-appel-a-cedric-villani-pour-une-mission-sur-lia.html-------------------------------------------------------------http://www.bigdatahebdo.comhttps://twitter.com/bigdatahebdoVincent : https://twitter.com/vhe74Alexander : https://twitter.com/alexanderdeja Cette publication est sponsorisée par Affini-Tech ( http://affini-tech.com https://twitter.com/affinitech )On recrute ! venez cruncher de la data avec nous ! écrivez nous à recrutement@affini-tech.com

Bigdata Hebdo
Episode 47 : Kafka, SQL, Beam and co

Bigdata Hebdo

Play Episode Listen Later Sep 8, 2017 77:59


Exactly-once Semantics are Possible: Here’s How Kafka Does ithttps://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/?utm_content=buffer9b1b6&utm_medium=social&utm_source=twitter.com&utm_campaign=bufferhttps://blog.ippon.fr/2017/07/11/kafka-0-11-0-%E2%99%A5/Confluent KSQLhttps://www.confluent.io/blog/ksql-open-source-streaming-sql-for-apache-kafka/https://www.youtube.com/watch?v=A45uRzJiv7I&feature=youtu.beKafka + Prestodb.iohttps://prestodb.io/docs/current/connector/kafka.htmlStreaming SQL in Apache Flink, KSQL, and Stream Processing for Everyonehttps://data-artisans.com/blog/flink-streaming-sql-ksql-stream-processingKafka Wakes Up And Is Metamorphosed Into A Databasehttps://www.nextplatform.com/2017/08/30/kafka-wakes-metamorphosed-database/amp/(Editor’s Note: It would have been far funnier, of course, if Kafka woke up one morning and had been turned into CockroachDB.)Open sourcing Kafka cruise controlhttps://engineering.linkedin.com/blog/2017/08/open-sourcing-kafka-cruise-controlhttps://github.com/linkedin/cruise-controlYahoo’s New Pulsar: A Kafka Competitor?https://www.datanami.com/2016/09/07/yahoos-new-pulsar-kafka-competitor/Apache Beam 2.1https://beam.apache.org/get-started/downloads/Apache Beam splittable DoFnhttps://beam.apache.org/blog/2017/08/16/splittable-do-fn.htmlInstaclustr Dynamic Resizing for Apache Cassandrahttps://www.instaclustr.com/instaclustr-dynamic-resizing-for-apache-cassandra/?utm_content=buffer624e7&utm_medium=social&utm_source=twitter.com&utm_campaign=bufferRiak devs giddy over gambling biz's vow to set code freehttps://www.theregister.co.uk/2017/08/25/bet365_to_buy_basho_release_code/?mt=1503782778086Spark Release 2.2.0http://spark.apache.org/releases/spark-release-2-2-0.html[mooc] Specialisation Data-Engineering Google Cloud sur Courserahttps://fr.coursera.org/specializations/gcp-data-machine-learning[podcast] Y a-t-il un cerveau dans la machine ? une interview de Yann Le Cun, directeur du FAIRhttps://www.franceculture.fr/emissions/la-methode-scientifique/y-t-il-un-cerveau-dans-la-machine[podcast] DREMEL, DRUID AND DATA MODELING ON GOOGLE BIGQUERY' https://www.drilltodetail.com/podcast/2017/6/19/drill-to-detail-ep31-dremel-druid-and-data-modeling-on-google-bigquery-with-special-guest-dan-mcclary[privacy] comment les apps Figaro, L’Équipe ou Closer participent au pistage de 10 millions de Françaishttp://www.numerama.com/politique/282934-enquete-comment-les-apps-figaro-lequipe-ou-closer-participent-au-pistage-de-10-millions-de-francais.htmlComment l’intelligence artificielle bouleverse l’industrie des médiashttp://www.latribune.fr/opinions/tribunes/comment-l-intelligence-artificielle-bouleverse-l-industrie-des-medias-746917.htmlCédric Villani est chargé d'une mission d'information parlementaire sur l'IA.http://www.numerama.com/politique/286341-le-gouvernement-fait-appel-a-cedric-villani-pour-une-mission-sur-lia.html-------------------------------------------------------------http://www.bigdatahebdo.comhttps://twitter.com/bigdatahebdoVincent : https://twitter.com/vhe74Alexander : https://twitter.com/alexanderdeja Cette publication est sponsorisée par Affini-Tech ( http://affini-tech.com https://twitter.com/affinitech )On recrute ! venez cruncher de la data avec nous ! écrivez nous à recrutement@affini-tech.com

Roaring Elephant
Episode 50 – Alan Gates Wrap Up (Part 4)

Roaring Elephant

Play Episode Listen Later Aug 29, 2017 34:37


This is the final part of our long interview with Alan Gates. In this part, Alan talks more about ODPI, Cloud First, Apache Flink, Apache Pig and we finish off with a little bit of Philosophy. A big thank you to Alan for sharing his pearls of wisdom with us! [Image from Linux.com] 00:00 Recent events Our vacation is almost over but this episode too was pre-recorded ahead of time. Because of this, we do not have any recent events to talk about 02:10 Alan Gates Wrap Up (Part 4) 34:37 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Bigdata Hebdo
Episode 44 : Apachecon, et des news en vrac...

Bigdata Hebdo

Play Episode Listen Later Jul 9, 2017 53:42


Apache BigData Retour sur Apache BigDataDataStax announces availability of ‘white glove’ managed cloud servicehttp://diginomica.com/2017/05/23/datastax-announces-availability-white-glove-managed-cloud-service/amp/CockroachDB 1.0 is Production-Readyhttps://www.cockroachlabs.com/blog/cockroachdb-1-0-release/Local and distributed query processing in CockroachDBhttps://www.cockroachlabs.com/blog/local-and-distributed-processing-in-cockroachdb/#Azure Cosmos DBhttps://speakerdeck.com/dharmashukla/azure-cosmos-db-lessons-learnt-from-building-a-globally-distributed-database-from-the-ground-uphttps://channel9.msdn.com/Events/Build/2017/KEY01#time=1h27m20shttps://softwareengineeringdaily.com/2017/06/01/cosmosdb-with-andrew-hoh/A Vision for Making Deep Learning Simplehttps://databricks.com/blog/2017/06/06/databricks-vision-simplify-large-scale-deep-learning.htmlSpark gets automation: Analyzing code and tuning clusters in productionhttp://www.zdnet.com/article/spark-gets-automation-analyzing-code-and-tuning-clusters-in-production/https://www.pepperdata.com/press-releases/pr_052317/What’s New in Hadoop 3.0 – Enhancements in Apache Hadoop 3https://www.edureka.co/blog/hadoop-3/Apache Flink® 1.3.0 and the Evolution of Stream Processing with Flinkhttps://data-artisans.com/blog/apache-flink-1-3-0-evolution-stream-processingYou are not Googlehttps://blog.bradfieldcs.com/you-are-not-google-84912cf44afbMaster time with Kibana’s new time series visual builderhttps://www.elastic.co/blog/master-time-with-kibanas-new-time-series-visual-builder?blade=twTeradata doubles downhttp://www.zdnet.com/google-amp/article/teradata-doubles-down/

Bigdata Hebdo
Episode 44 : Apachecon, et des news en vrac...

Bigdata Hebdo

Play Episode Listen Later Jul 9, 2017 53:42


Apache BigData Retour sur Apache BigDataDataStax announces availability of ‘white glove’ managed cloud servicehttp://diginomica.com/2017/05/23/datastax-announces-availability-white-glove-managed-cloud-service/amp/CockroachDB 1.0 is Production-Readyhttps://www.cockroachlabs.com/blog/cockroachdb-1-0-release/Local and distributed query processing in CockroachDBhttps://www.cockroachlabs.com/blog/local-and-distributed-processing-in-cockroachdb/#Azure Cosmos DBhttps://speakerdeck.com/dharmashukla/azure-cosmos-db-lessons-learnt-from-building-a-globally-distributed-database-from-the-ground-uphttps://channel9.msdn.com/Events/Build/2017/KEY01#time=1h27m20shttps://softwareengineeringdaily.com/2017/06/01/cosmosdb-with-andrew-hoh/A Vision for Making Deep Learning Simplehttps://databricks.com/blog/2017/06/06/databricks-vision-simplify-large-scale-deep-learning.htmlSpark gets automation: Analyzing code and tuning clusters in productionhttp://www.zdnet.com/article/spark-gets-automation-analyzing-code-and-tuning-clusters-in-production/https://www.pepperdata.com/press-releases/pr_052317/What’s New in Hadoop 3.0 – Enhancements in Apache Hadoop 3https://www.edureka.co/blog/hadoop-3/Apache Flink® 1.3.0 and the Evolution of Stream Processing with Flinkhttps://data-artisans.com/blog/apache-flink-1-3-0-evolution-stream-processingYou are not Googlehttps://blog.bradfieldcs.com/you-are-not-google-84912cf44afbMaster time with Kibana’s new time series visual builderhttps://www.elastic.co/blog/master-time-with-kibanas-new-time-series-visual-builder?blade=twTeradata doubles downhttp://www.zdnet.com/google-amp/article/teradata-doubles-down/

Bigdata Hebdo
Episode 43 : DevoxxFr, Kafka, AWS, Microsoft CosmosDB, AML

Bigdata Hebdo

Play Episode Listen Later May 15, 2017 77:10


KafkaConfluent Cloud : Managed Apache Kafka par Confluenthttps://www.confluent.io/confluent-cloud/https://www.forbes.com/sites/alexkonrad/2017/05/08/confluent-brings-kafka-to-cloud-and-challenges-aws/amp/Kafka with Docker: A Docker introductionhttps://ngeor.wordpress.com/2017/03/25/kafka-with-docker-a-docker-introduction/amp/Apache Flink and Apache Kafka Streams: a comparison and guideline for usershttps://www.confluent.io/blog/apache-flink-apache-kafka-streams-comparison-guideline-users/The Continued Rise of Apache Kafkahttps://redmonk.com/fryan/2017/05/07/the-continued-rise-of-apache-kafka/Kafka Summit - Introduction to Kafka Streams with a Real-Life Example by Alexis Seigneurinhttps://speakerdeck.com/aseigneurin/kafka-summit-introduction-to-kafka-streams-with-a-real-life-exampleWebinar Boontadata avec @benjguin du 10/05/17 (replay bientôt disponible)https://aka.ms/wp-boontadataMicrosoftServing AI with data: A summary of Build 2017 data innovationshttps://blogs.technet.microsoft.com/dataplatforminsider/2017/05/10/serving-ai-with-data-a-summary-of-build-2017-data-innovations/Azure Cosmos DB: The industry’s first globally-distributed, multi-model database servicehttps://azure.microsoft.com/en-us/blog/azure-cosmos-db-microsofts-globally-distributed-multi-model-database-service/Using Jupyter notebooks and Pandas with Azure Data Lake Storehttps://medium.com/azure-data-lake/using-jupyter-notebooks-and-pandas-with-azure-data-lake-store-48737fbad305End-to-End Scenarios Enabled by the Data Science Virtual Machine: Webinar Videohttps://blogs.technet.microsoft.com/machinelearning/2017/05/02/end-to-end-scenarios-enabled-by-the-data-science-virtual-machine-video/AWSAWS now lets you migrate MongoDB databases to DynamoDBhttps://venturebeat.com/2017/04/10/aws-now-lets-you-migrate-mongodb-databases-to-dynamodb/amp/Deep Dive on Amazon EC2 Instances - January 2017 Online Tech Talkshttps://www.youtube.com/watch?v=29QZPttiKJADatascienceAutomated Machine Learning — A Paradigm Shift That Accelerates Data Scientist Productivity @ Airbnbhttps://medium.com/airbnb-engineering/automated-machine-learning-a-paradigm-shift-that-accelerates-data-scientist-productivity-airbnb-f1f8a10d61f8DiversManaged Service for Elassandra provided by Instaclustrhttps://www.instaclustr.com/blog/2017/05/09/managed-service-elassandra-provided-instaclustr/The new BigData file format for Faster Data analysishttp://carbondata.apache.org/Elasticsearch succombe au machine learninghttp://www.silicon.fr/elasticsearch-succombe-au-machine-learning-174421.htmlGDPRLa conformité un avantage compétitifhttp://www.zdnet.fr/actualites/la-conformite-un-avantage-competitif-39850544.htmPrivacy by designhttp://www.zdnet.fr/actualites/privacy-by-design-kezako-39850666.htm

Bigdata Hebdo
Episode 43 : DevoxxFr, Kafka, AWS, Microsoft CosmosDB, AML

Bigdata Hebdo

Play Episode Listen Later May 15, 2017 77:10


KafkaConfluent Cloud : Managed Apache Kafka par Confluenthttps://www.confluent.io/confluent-cloud/https://www.forbes.com/sites/alexkonrad/2017/05/08/confluent-brings-kafka-to-cloud-and-challenges-aws/amp/Kafka with Docker: A Docker introductionhttps://ngeor.wordpress.com/2017/03/25/kafka-with-docker-a-docker-introduction/amp/Apache Flink and Apache Kafka Streams: a comparison and guideline for usershttps://www.confluent.io/blog/apache-flink-apache-kafka-streams-comparison-guideline-users/The Continued Rise of Apache Kafkahttps://redmonk.com/fryan/2017/05/07/the-continued-rise-of-apache-kafka/Kafka Summit - Introduction to Kafka Streams with a Real-Life Example by Alexis Seigneurinhttps://speakerdeck.com/aseigneurin/kafka-summit-introduction-to-kafka-streams-with-a-real-life-exampleWebinar Boontadata avec @benjguin du 10/05/17 (replay bientôt disponible)https://aka.ms/wp-boontadataMicrosoftServing AI with data: A summary of Build 2017 data innovationshttps://blogs.technet.microsoft.com/dataplatforminsider/2017/05/10/serving-ai-with-data-a-summary-of-build-2017-data-innovations/Azure Cosmos DB: The industry’s first globally-distributed, multi-model database servicehttps://azure.microsoft.com/en-us/blog/azure-cosmos-db-microsofts-globally-distributed-multi-model-database-service/Using Jupyter notebooks and Pandas with Azure Data Lake Storehttps://medium.com/azure-data-lake/using-jupyter-notebooks-and-pandas-with-azure-data-lake-store-48737fbad305End-to-End Scenarios Enabled by the Data Science Virtual Machine: Webinar Videohttps://blogs.technet.microsoft.com/machinelearning/2017/05/02/end-to-end-scenarios-enabled-by-the-data-science-virtual-machine-video/AWSAWS now lets you migrate MongoDB databases to DynamoDBhttps://venturebeat.com/2017/04/10/aws-now-lets-you-migrate-mongodb-databases-to-dynamodb/amp/Deep Dive on Amazon EC2 Instances - January 2017 Online Tech Talkshttps://www.youtube.com/watch?v=29QZPttiKJADatascienceAutomated Machine Learning — A Paradigm Shift That Accelerates Data Scientist Productivity @ Airbnbhttps://medium.com/airbnb-engineering/automated-machine-learning-a-paradigm-shift-that-accelerates-data-scientist-productivity-airbnb-f1f8a10d61f8DiversManaged Service for Elassandra provided by Instaclustrhttps://www.instaclustr.com/blog/2017/05/09/managed-service-elassandra-provided-instaclustr/The new BigData file format for Faster Data analysishttp://carbondata.apache.org/Elasticsearch succombe au machine learninghttp://www.silicon.fr/elasticsearch-succombe-au-machine-learning-174421.htmlGDPRLa conformité un avantage compétitifhttp://www.zdnet.fr/actualites/la-conformite-un-avantage-competitif-39850544.htmPrivacy by designhttp://www.zdnet.fr/actualites/privacy-by-design-kezako-39850666.htm

Roaring Elephant
Episode 40 – Dataworks Summit Europe – Day 2

Roaring Elephant

Play Episode Listen Later Apr 6, 2017 67:51


In this episode of the Roaring Elephant podcast, Dave and I continue to share our Dataworks summit experience, meet yet more listeners, sit in on a few more sessions and give our overall view of the day and the summit as a whole! It will make you wish you were here. 00:00:00 Intro Roaring Elephant Roadshow Day 2 - The night after the party! 00:04:14 Session Discussions Our review of the sessions, what we liked, what we learned, what we'd recommend you go and check out afterwards: Keynote Meet HBase 2.0 Bridle your Flying Islands and Castles in the Sky HBase in Practice Solving Cyber at Scale Achieving Realtime Ingestion and Analysis of Security Events through Kafka and Metron Row/Column-Level Security in SQL for Apache Spark Apache Kafka Best Practices Mool - Automated Log Analysis using Data Science and ML Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and Spark Backup and Disaster Recovery in Hadoop 01:02:15 Wrap up Some final overall observations and looking forward to the next summit news from Dataworks San Jose! 01:07:51 End Please use the Contact Form on this blog or our twitter feed to send us your questions, or to suggest future episode topics you would like us to cover.

Bigdata Hebdo
Episode 31 : Quelques news, et la relation Datastax / Cassandra

Bigdata Hebdo

Play Episode Listen Later Nov 7, 2016 72:15


Datastax vs Apache Software Foundationhttp://sdtimes.com/apache-foundation-board-reining-datastax/http://www.datastax.com/2016/11/serving-customers-serving-the-communityTeradata MPP on AWS et Teradata (re)embarks on a solutions journeyhttp://www.vldbsolutions.com/blog/teradata-mpp-aws/http://www.zdnet.com/article/teradata-reembarks-on-a-solutions-journey/#ftag=RSSbaffb68Pricing : https://aws.amazon.com/marketplace/pp/B01LW1R13TAnnouncing the dA Platform, our distribution of Apache® Flink®http://data-artisans.com/announcing-the-da-platform-our-distribution-of-apache-flink/Unifying Stream Processing and Interactive Queries in Apache Kafkahttp://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/Apache Kafka: Online Talk Serieshttp://www.confluent.io/apache-kafka-talk-seriesBoontadata streamshttps://github.com/boontadata/boontadata-streamsComparing ORC vs Parquet Data Storage Formats using Hivehttp://www.thecloudavenue.com/2016/10/comparing-orc-vs-parquet-data-storage.htmlWHERE IS APACHE HIVE GOING? TO IN-MEMORY COMPUTINGhttp://fr.hortonworks.com/blog/apache-hive-going-memory-computing/http://hortonworks.com/blog/announcing-apache-hive-2-1-25x-faster-queries-much/Palantir Sued By Department of Labor For Race Discriminationhttp://gizmodo.com/palantir-sued-by-department-of-labor-for-race-discrimin-1787103451/ampMongoDB 3.4 : nouvelles fonctionnalitéshttp://www.zdnet.com/article/mongodb-3-4-fills-some-enterprise-database-gaps/Simba Drivers for Google BigQueryhttps://cloud.google.com/bigquery/partners/simba-drivers/How Bayesian Inference WorksBayesian inference is a way to get sharper predictions from your data.http://www.datasciencecentral.com/profiles/blogs/how-bayesian-inference-worksAnnouncing RStudio v1.0!https://blog.rstudio.org/2016/11/01/announcing-rstudio-v1-0/Classifying handwritten digits using TensorFlowhttp://blog.yhat.com/posts/handwriting-classifier-updated.htmlTraffic in London episode I: processing 100 billion IoT eventshttp://blog.datatonic.com/2016/10/traffic-in-london-episode-i-live.htmlhttps://code.visualstudio.com/

Bigdata Hebdo
Episode 31 : Quelques news, et la relation Datastax / Cassandra

Bigdata Hebdo

Play Episode Listen Later Nov 7, 2016 72:15


Datastax vs Apache Software Foundationhttp://sdtimes.com/apache-foundation-board-reining-datastax/http://www.datastax.com/2016/11/serving-customers-serving-the-communityTeradata MPP on AWS et Teradata (re)embarks on a solutions journeyhttp://www.vldbsolutions.com/blog/teradata-mpp-aws/http://www.zdnet.com/article/teradata-reembarks-on-a-solutions-journey/#ftag=RSSbaffb68Pricing : https://aws.amazon.com/marketplace/pp/B01LW1R13TAnnouncing the dA Platform, our distribution of Apache® Flink®http://data-artisans.com/announcing-the-da-platform-our-distribution-of-apache-flink/Unifying Stream Processing and Interactive Queries in Apache Kafkahttp://www.confluent.io/blog/unifying-stream-processing-and-interactive-queries-in-apache-kafka/Apache Kafka: Online Talk Serieshttp://www.confluent.io/apache-kafka-talk-seriesBoontadata streamshttps://github.com/boontadata/boontadata-streamsComparing ORC vs Parquet Data Storage Formats using Hivehttp://www.thecloudavenue.com/2016/10/comparing-orc-vs-parquet-data-storage.htmlWHERE IS APACHE HIVE GOING? TO IN-MEMORY COMPUTINGhttp://fr.hortonworks.com/blog/apache-hive-going-memory-computing/http://hortonworks.com/blog/announcing-apache-hive-2-1-25x-faster-queries-much/Palantir Sued By Department of Labor For Race Discriminationhttp://gizmodo.com/palantir-sued-by-department-of-labor-for-race-discrimin-1787103451/ampMongoDB 3.4 : nouvelles fonctionnalitéshttp://www.zdnet.com/article/mongodb-3-4-fills-some-enterprise-database-gaps/Simba Drivers for Google BigQueryhttps://cloud.google.com/bigquery/partners/simba-drivers/How Bayesian Inference WorksBayesian inference is a way to get sharper predictions from your data.http://www.datasciencecentral.com/profiles/blogs/how-bayesian-inference-worksAnnouncing RStudio v1.0!https://blog.rstudio.org/2016/11/01/announcing-rstudio-v1-0/Classifying handwritten digits using TensorFlowhttp://blog.yhat.com/posts/handwriting-classifier-updated.htmlTraffic in London episode I: processing 100 billion IoT eventshttp://blog.datatonic.com/2016/10/traffic-in-london-episode-i-live.htmlhttps://code.visualstudio.com/