The Distributed Data Podcast is your weekly source for the latest news and technical expertise to help you succeed in building large-scale distributed systems. Brought to you by the Developer Advocate team, we go in-depth with DataStax engineers and special guests from the broader data community. Ne…
The Distributed Data Show brought you the latest news and technical expertise to help you succeed in building large-scale distributed systems. In the series FINALE (yes finale
Ben Covi joins the show to discuss dse-pronto, a framework he helped create at Intuit for automating deployment and management of DataStax Enterprise clusters. We learn about the origins and capabilities of dse-pronto, the process of inner-sourcing and open-sourcing the framework, and Ben's invitation to the community to for suggestions and improvements. See omnystudio.com/listener for privacy information.
Cedrick Lunven talks with Mark Paluch, Spring Data Lead at Vmware, about the origins of the Spring Framework, how he got involved with Spring Data, and the 3.0 release of the Spring Data Cassandra module, which is based on the DataStax Java Driver 4.x series. See omnystudio.com/listener for privacy information.
When will Cassandra 4.0 be ready? DataStax VP of Engineering Josh McKenzie joins Aleks Volochnev to discuss his background as a member of the Project Management Committee for Apache Cassandra and the community collaboration involved in delivering the next major release of our favorite open source database. See omnystudio.com/listener for privacy information.
Russ Spitzer returns to the show to discuss the benefits of combining Cassandra and Spark, the new release of the DataStax Spark-Cassandra connector, and why the proprietary features that used to ship with DataStax Enterprise have now been merged into the open-source project. We also check in on the Spark 3.0 Beta and discuss the tradeoffs in various ways you can deploy Spark and Cassandra together. See omnystudio.com/listener for privacy information.
As infrastructure engineers, we have plenty to do. Even with all the cool DevOps tooling out there helping us do more with less, we still have situations that demand our attention. Raghavendra Prabhu and Yelp has been working on this problem with their Kubernetes enabled tool, PaaSTA. Yelp is able to assemble stacks quickly to give developers the fast track to being productive. We have a great conversation about the future of Cassandra and Kubernetes and maybe you'll be convinced that the robot uprising may not be all bad. See omnystudio.com/listener for privacy information.
Chris Bradford is a product manager at DataStax who's focus is on Kubernetes (K8s) and how users can utilize Cassandra in this fast moving project. He gives us some deep insights into how we got here and some important details on being successful when deploying Cassandra in your Kubernetes environment. We finish with some predictions on where the next 5-10 years look for cloud native databases. See omnystudio.com/listener for privacy information.
Alan Gibson has been working with the rest of the team at Sky UK to join the awesome forces of Apache Cassandra and Kubernetes. Sky took a methodical engineering approach to understanding the problem and how to make Cassandra truly cloud native. From CI/CD and test to running production across multiple clouds, you will find his insights invaluable as you begin your own journey into Cassandra and Kubernetes. See omnystudio.com/listener for privacy information.
Join David Jones-Gilardi as he interviews Jonathan "Shooky" Shook from DataStax, and Mick Semb Wever from TheLastPickle as they delve into the new NoSQLBench open source benchmarking tool for Apache Cassandra. If you weren't testing your data models before, now you have no excuse. See omnystudio.com/listener for privacy information.
Kathryn Erickson joins the show to double-click on a few of the exciting possibilities she's tracking with the potential for continued improvement in Cassandra performance. Hardware options on the horizon include persistent memory, FPGAs and networked storage). At the operating system level, she's tracking asynchronous IO support in Linux via the io_uring library. Java performance continues to become more predictable with the Z Garbage Collector (ZGC), which is already available as an experimental option in Java 12. Finally, Cassandra specific improvements include adding a pluggable storage engine API, and automated operations (AIOps) for more efficient tuning and collaboration between nodes. See omnystudio.com/listener for privacy information.
In this day of stay at home orders and social distancing the advocate team at DataStax had to rethink how to provide quality workshops while completely online. We share the inside story of how we scaled the registration process, the tools we use to deliver the workshops, and some surprising benefits of going fully remote. See omnystudio.com/listener for privacy information.
Rahul Singh of Anant talks with Cedrick Lunven about open source communities in the COVID-19 era, and why Cassandra's 4.0 release will be the springboard for an even brighter future. See omnystudio.com/listener for privacy information.
Rahul Singh of Anant talks with Cedrick Lunven about his background building a consultancy around Cassandra, why he loves Lucene, and how he got involved in maintaining Awesome Cassandra, an aggregation of the best Cassandra resources. See omnystudio.com/listener for privacy information.
In Part 2, we're continuing our discussion with Mike Loukides of O'Reilly Media from last week's episode, we dive into data privacy and how when we know we've crossed the line of violating the anonymity of people's social and physical fingerprints. We also discuss how interoperability and APIs are essential for securing data pipelines from the user all the way down to the database. See omnystudio.com/listener for privacy information.
One of the hardest areas in getting AI projects into production is operationalizing data. In this episode, Mike Loukides of O'Reilly Media joins Dr. Denise Gosnell and Jeff Carpenter to discuss how data provenance impacts our ability to get the most out of our data, using COVID-19 as an example. See omnystudio.com/listener for privacy information.
Longtime Cassandra community member Erick Ramirez shares how he got started on this journey, why he answers so many questions on the Cassandra Slack channel, email list, and forums at community.datastax.com, and the hardest Cassandra questions he's been asked. See omnystudio.com/listener for privacy information.
Continuing the conversation from last week's episode, DataStax Chief Strategy Officer Sam Ramji shares how open source projects can increase the pace of innovation through multiple speed lanes, the moral imperative of using AI to automate database operations, and why the 2020s will be the decade of scale-out data. See omnystudio.com/listener for privacy information.
DataStax Chief Strategy Officer Sam Ramji shares about his experience in working with multiple open source communities including the Cloud Native Computing Foundation, how DataStax is changing our posture toward open source, and his thoughts on how we can improve interaction in the broader Cassandra community. We also offer a sneak peek of part two of our conversation which will continue in next week's episode. See omnystudio.com/listener for privacy information.
David talks with Heidi Waterhouse about her role as a Developer Advocate at LaunchDarkly, explains how LaunchDarkly helps developers up their feature flag game, and discusses how cleaning up our big data can actually lead to more insights, not less. See omnystudio.com/listener for privacy information.
We are joined today by Jesse Anderson who is in pre-press on his book, Data Teams. Jesse has been wringing value out of data for a long time and is ready to share his insights on how you can join the can't-to-can club. We go over some of the important questions you should be asking as you create or improve your data teams. Is your team worth 1 trillion dollars? Let's find out. See omnystudio.com/listener for privacy information.
Jeff Carpenter sits down with Dr. Denise Gosnell to discuss how they work to incorporate the human element and storytelling in their writing about distributed data topics. Denise and Jeff both have books coming out with O'Reilly Media this spring: The Practitioner's Guide to Graph Data: http://shop.oreilly.com/product/0636920205746.do and Cassandra: The Definitive Guide, 3rd Edition: http://shop.oreilly.com/product/0636920299837.do. See omnystudio.com/listener for privacy information.
David catches up with Jonathan Ellis about his DataDay Texas presentation on the predictions on the future of machine learning, the cloud, open source, hardware, and graph databases within the next 5 years. You won't want to miss this! See omnystudio.com/listener for privacy information.
Join us on this podcast as Patrick McFadin and Denise Gosnell talk about the popularity of Python and the real surge in popularity it's seeing in the past few years. Pandas, Tensor Flow and killer robots? Data Engineering is standardizing around Python and making lives better. Find out about what it is, where to get started and maybe a few things you never knew. See omnystudio.com/listener for privacy information.
Chris Splinter, Product Manager at DataStax, joins the show to share the story behind improved open source support in the DataStax Kafka Connector, the bulk loader DSBulk, and the DataStax Cassandra drivers. See omnystudio.com/listener for privacy information.
Patrick McFadin and Jeff Carpenter discuss how to know when you've run out of capacity on your relational database and provide a recipe for migrating to Cassandra: 1) adapting your data model, 2) adapting your application, 3) planning your Cassandra deployment, 4) executing the migration. See omnystudio.com/listener for privacy information.
Patrick and Jeff share why DBAs are the biggest cheerleaders for our Cassandra Developer Workshops and some of the top things DBAs wish developers knew about Cassandra. 0:00 - Jeff and Patrick talk about our upcoming developer workshops and how we get into the code as quickly as possible: https://www.datastax.com/events/cassandra-developer-workshop-new-york https://www.datastax.com/events/cassandra-developer-workshop-london https://www.datastax.com/events/cassandra-developer-workshop-paris https://www.data See omnystudio.com/listener for privacy information.
Recently KubeCon 2019 was held in San Diego where we released the DataStax Kubernetes Operator. With KubeCon finished up I caught up with Christopher Bradford, Product Manager for the operator, about this release. 00:30 - 01:12 The not secret name of the DataStax Enterprise Operator for Kubernetes, or as I call it the DSE K8s Operator. 01:13 - 02:28 Discussing KubeCon, which reached 12k attendees this year! 02:29 - 04:19 What is the DataStax mission in creating a K8s Operator? How to squelch the complexity See omnystudio.com/listener for privacy information.
We previously discussed Mircoservices almost two years ago (https://www.youtube.com/watch?v=z1Kanef1QTk), now we're revisiting with Cedrick and Aleks to ask: Has microservices evolved since then? Related links and webinar you can find on our website including a demo. Slides https://drive.google.com/drive/u/1/folders/1sjgR2dnB4Hivcd00TLpPBi9KHpracamB Recordings :https://www.datastax.com/resources/webinar/scaler-vos-microservices-avec-apache-cassandratm (FR, more advanced) https://www.datastax.com/resources/ See omnystudio.com/listener for privacy information.
In this episode, Cristina and David talk about Apollo, what it is, and how this can benefit frontend developers by bypassing the backend altogether. Highlights: 1:36: What is Apollo? 3:15: How does Apollo work on the Operational side? 5:40: How does Apollo approach security? 8:00: Any other features that could benefit Frontend devs? 9:09: The user experience of Apollo 9:34: Orders of magnitude - Setting up Apollo 11:12: A UI can make or break a product 12:00: Easy & obvious is our motto 12:08: Cristina's F See omnystudio.com/listener for privacy information.
David Gilardi turns the tables on Jeff Carpenter to ask him the questions about what it took to port a Java microservice from Apache Cassandra to run on Apollo, DataStax's Apache Cassandra as a Service which is in public, free Beta. See omnystudio.com/listener for privacy information.
Graph database veteran Marko Rodriguez joins the show to discuss how he got involved with graph databases and the Apache TinkerPop project, what's coming in TinkerPop 4.0, his work in distributed computing research and stream ring theory, and the origin of that lovable Gremlin character. Highlights: 0:00 - Jeff welcomes Marko to the show to discuss a decade of work in the Graph database community starting at the Los Alamos National Laboratory when he needed a better way to store data. Graph databases were See omnystudio.com/listener for privacy information.
Jeff Carpenter talks with Jean Armel Luce about usage of Cassandra at French telecommunications provider Orange, all the way from their first work with Cassandra 2012 to their open-source Kubernetes operator for Cassandra which is expected to be production-ready in 2020. Highlights: 0:00 - Jeff welcomes Jean-Armel to the show. Orange has been using Cassandra since 2012. They started using this on their customer dataset which was multiple TB with 1000s of requests per second and were able to achieve higher See omnystudio.com/listener for privacy information.
Instagram engineer Michaël Figuière talks with Jeff Carpenter about the Cassandra abstraction layer his team maintains, how it helps development teams move faster, and when companies should consider creating their own abstractions. Highlights: 0:00 - Welcoming Michaël to the show and recapping past conversations on Cassandra usage at Instagram including the RocksDB storage engine we discussed with Dikang Gu and the geographic replication approach that Andrew Whang presented at the 2019 DataStax Accelera See omnystudio.com/listener for privacy information.
Matija Gobec shares with Patrick McFadin why he started working on a new compaction strategy for Apache Cassandra and how the Cassandra community can collaborate more effectively to introduce new capabilities such as partition-based compaction. Highlights: 0:00 - Patrick welcomes Matija Gobec to the show. 1:30 - Matija introduces the concept of compaction in Cassandra and some of the challenges with existing compaction strategies 2:53 - Existing strategies include size-tiered Compaction (the default) and See omnystudio.com/listener for privacy information.
Jeff Carpenter talks with Hiroyuki Yamada of Scalar to learn how his team have built Scalar DB, which provides a transaction capability on top of Apache Cassandra. To find out more, visit https://github.com/scalar-labs/scalardb Highlights: 0:00 - Jeff Carpenter welcomes Hiroyuki Yamada to the show. Hiroyuki gave a very interesting talk at ApacheCon on building transactions on top of Cassandra 1:40 - Hiroyuki shares his background in distributed systems and analytics 2:10 - Got involved with Cassandra look See omnystudio.com/listener for privacy information.
Patrick McFadin connects with Carlos Rolo from Pythian at ApacheCon NA to recap the talk Carlos gave on some of the most common issues he sees in production Cassandra clusters and how to avoid them. You can listen to the full talk at https://feathercast.apache.org/2019/09/12/day-to-day-with-cassandra-the-weirdest-and-complex-situations-we-found-carlos-rolo/ 0:00 - Patrick welcomes Carlos back to the show to recap his talk at ApacheCon about some of the worst cases he's seen with Cassandra clusters. New use See omnystudio.com/listener for privacy information.
Bonus episode this week! David talks with Seb and Jake about the DynamoDB Cassandra Proxy, learns it's history what it's all about, and gets all the important details for developers to understand how it works. Highlights 00:28 - Introductions with Seb and Jake 00:47 - What is the Cassandra DynamoDB Proxy? 1:53 - What’s the backstory here? 3:44 - Give Dynamo superpowers it doesn’t already have 4:10 - DataStax culture fosters autonomy for coding up ideas 5:52 - What is the use case for the Cassandra Dyna See omnystudio.com/listener for privacy information.
Patrick McFadin and Jeff Carpenter recap their favorite talks and hallway conversations from ApacheCon North America 2019 including DataStax announcements from the keynote by DataStax CTO Jonathan Ellis. Highlights: 0:00 - Enough talk - lets fight! 1:53 - Next Generation Cassandra Conference (NGCC) - the conference within a conference. Thanks to the Apache Software Foundation for making space. 3:42 - NGCC was focused around Cassandra 4.0 including the release of the first alpha and the testing that will be See omnystudio.com/listener for privacy information.
Marc Selwan, Product Manager for core database at DataStax, joins the show to discuss why we recently updated our recommended maximum data density for DataStax Enterprise nodes from 1 TB to 2 TB, the engineering behind this new guidance, and how this can save users money. Highlights: 0:00 - Intro 0:27 - We learn about Marc's background which includes everything from gaming to network operations, presales engineering to product management 3:03 - In 2013, Marc got involved with the Cassandra community and en See omnystudio.com/listener for privacy information.
Patrick and Jeff cover some of the universal questions that come up on "day 2" for Cassandra users around batches, lightweight transactions, secondary indexes, and materialized views. They also challenge some of the biases around using Cassandra for use cases such as banking. Highlights: 0:23 - Recapping a great day hosted by Yahoo! Japan 1:20 - Introducing some of the universal questions we hear no matter where we are in the world 4:11 - Many of the questions we hear most frequently have to do with skil See omnystudio.com/listener for privacy information.
Patrick talks with longtime Cassandra contributor and committer Yuki about his background with Cassandra, Change Data Capture, and the state of the Cassandra community in Japan. 0:00 - Patrick welcomes longtime Cassandra contributor and committer Yuki to the show and we learn what attracted Yuki to Cassandra 2:55 - Yuki and Patrick talk about what it was like working with early Cassandra releases before CQL 4:03 - Yuki's efforts on translating documentation into Japanese led to his becoming a Cassandra com See omnystudio.com/listener for privacy information.
It's always interesting to discuss technologies with real experts like Carlos Rolo, so we can't let him go without some questions answered. What are the most awaited version four features, who gets the biggest win from ZCS and why sidecar project is so important? All of these plus best and worst use-cases for Cassandra discussed in our new release of Distributed Data Show! See omnystudio.com/listener for privacy information.
Listen as Stephen Mallette gives us the drop on the latest with Tinkerpop 3.4 while elevating our Gremlin game with tips on becoming an advanced user and using application based DSL's. Highlights: 00:54 - what's going on with Tinkerpop 3.4 01:49 - what is version 3, what does that mean? 02:22 - lots of new contributions from the Tinkerpop community 03:31 - discussing changes for 3.4 05:21 - 3.4 is out and ready to go 05:51 - developing parity across different language variants 07:19 - A better serializatio See omnystudio.com/listener for privacy information.
This week the EMEA DataStax crew takes over the DDS to provide feedbacks about the DataStax Conference and announcements made during keynotes. This was also an occasion to highlight the talk Timeseries at scale performed by Alice and Patrick. See omnystudio.com/listener for privacy information.
Host Aleks Volochnev sits down with Netflix Cloud Database Architect, Vinay Chella to discuss Full Query Logging, how Sidecar makes ops people happy and why Netflix already plans to migrate to version 4? A lot is discussed, so stay tuned! Timeline: 00:00 Welcome 00:25 Introduction 01:53 Vinay's Talk I @ Accelerate 02:00 Full Query Logging 03:00 Vinay's Talk II @ Accelerate 03:20 What are you working on right now? 03:25 Sidecar 05:40 Performance Monitoring 06:40 Netflix' Technical Blog 07:05 Version Four 0 See omnystudio.com/listener for privacy information.
Starting off this episode Adron and Kat (Kathryn Erickson) kicks off the discussion with a little focused camera angle on the DataStax Accelerate 2019 Conference! Adron and Kat elaborate on DataStax Desktop and also AppStax! The conversation wraps up with details around DataStax Enterprise Graph, and future direction around that technology. Afterwards Amanda joins Mattias Broecheler for more discussion around the Desktop and AppStax technology. Mattias explains the focus, ideas behind, and core features th See omnystudio.com/listener for privacy information.
TheLastPickle, DataStax Accelerate, and exciting updates coming in Apache Cassandra 4.0. Cedrick talks with John Haddad and Alex Dejanovski from TheLastPickle to discuss their presentations at DataStax Accelerate along with Apache Cassandra tools managed by TLP and new updates coming with Cassandra 4.0. See omnystudio.com/listener for privacy information.
Donnie Roberson of the DataStax Partner team joins the show to talk about the amazing performance results observed in running DataStax Enterprise 6 on Intel's latest generation hardware including the Xeon processors and Optane DCPMM, and when and where you might be able to get your hands on this technology. See omnystudio.com/listener for privacy information.
Many DSE users have very long upgrade cycles due to time and complexity concerns. Using the CICD methodology Christopher Bradford has taken up the challenge to make the upgrade path both faster and lower risk. Today we get to dive in and take a look at what he has been up to. See omnystudio.com/listener for privacy information.
In this industry, showing the drive to expand one's technological skills is crucial. Valerie personifies this drive, and then some. We met with Valerie to discuss all things Cassandra documentation, her love of open source contributions, and some interesting projects she's working on at Pythian. See omnystudio.com/listener for privacy information.
There are not so many developers who joined the Cassandra Community in the very beginning and then never quit. Jake is one of them: he works with the Cassandra community as a PMC Member and leads a team at DataStax Enterprise for almost a dozen years already. Of course, he attended Datastax Accelerate conference and we didn't miss a chance to ask him a few questions about the past and future of Cassandra & DSE! See omnystudio.com/listener for privacy information.