POPULARITY
SQLite is embedded everywhere - phones, browsers, IoT devices. It's reliable, battle-tested, and feature-rich. But what if you want concurrent writes? Or CDC for streaming changes? Or vector indexes for AI workloads? The SQLite codebase isn't accepting new contributors, and the test suite that makes it so reliable is proprietary. So how do you evolve an embedded database that's effectively frozen?Glauber Costa spent a decade contributing to the Linux kernel at Red Hat, then helped build Scylla, a high-performance rewrite of Cassandra. Now he's applying those lessons to SQLite. After initially forking SQLite (which produced a working business but failed to attract contributors), his team is taking the bolder path: a complete rewrite in Rust called Turso. The project already has features SQLite lacks - vector search, CDC, browser-native async operation - and is using deterministic simulation testing (inspired by TigerBeetle) to match SQLite's legendary reliability without access to its test suite.The conversation covers why rewrites attract contributors where forks don't, how the Linux kernel maintains quality with thousands of contributors, why Pekka's "pet project" jumped from 32 to 64 contributors in a month, and what it takes to build concurrent writes into an embedded database from scratch.--Support Developer Voices on Patreon: https://patreon.com/DeveloperVoicesSupport Developer Voices on YouTube: https://www.youtube.com/@DeveloperVoices/joinTurso: https://turso.tech/Turso GitHub: https://github.com/tursodatabase/tursolibSQL (SQLite fork): https://github.com/tursodatabase/libsqlSQLite: https://www.sqlite.org/Rust: https://rust-lang.org/ScyllaDB (Cassandra rewrite): https://www.scylladb.com/Apache Cassandra: https://cassandra.apache.org/DuckDB (analytical embedded database): https://duckdb.org/MotherDuck (DuckDB cloud): https://motherduck.com/dqlite (Canonical distributed SQLite): https://canonical.com/dqliteTigerBeetle (deterministic simulation testing): https://tigerbeetle.com/Redpanda (Kafka alternative): https://www.redpanda.com/Linux Kernel: https://kernel.org/Datadog: https://www.datadoghq.com/Glauber Costa on X: https://x.com/glcstGlauber Costa on GitHub: https://github.com/glommerKris on Bluesky: https://bsky.app/profile/krisajenkins.bsky.socialKris on Mastodon: http://mastodon.social/@krisajenkinsKris on LinkedIn: https://www.linkedin.com/in/krisjenkins/--0:00 Intro3:16 Ten Years Contributing to the Linux Kernel15:17 From Linux to Startups: OSv and Scylla26:23 Lessons from Scylla: The Power of Ecosystem Compatibility33:00 Why SQLite Needs More37:41 Open Source But Not Open Contribution48:04 Why a Rewrite Attracted Contributors When a Fork Didn't57:22 How Deterministic Simulation Testing Works1:06:17 70% of SQLite in Six Months1:12:12 Features Beyond SQLite: Vector Search, CDC, and Browser Support1:19:15 The Challenge of Adding Concurrent Writes1:25:05 Building a Self-Sustaining Open Source Community1:30:09 Where Does Turso Fit Against DuckDB?1:41:00 Could Turso Compete with Postgres?1:46:21 How Do You Avoid a Toxic Community Culture?1:50:32 Outro
AWS Morning Brief for the week of November 17th, with Corey Quinn.Links:Custom domain names for VPC Lattice resourcesAWS Lambda networking over IPv6AWS Control Tower supports automatic enrollment of accountsAmazon Braket Notebook Environments Now Support CUDA-Q NativelyAmazon MSK Express brokers now support Intelligent Rebalancing for 180 times faster operation performanceAmazon Keyspaces now supports logged batches for atomic, multi-statement operationsAmazon CloudWatch Composite Alarms adds threshold-based alertingAmazon Keyspaces (for Apache Cassandra) now supports Logged BatchesAmazon Elastic Kubernetes Service gets independent affirmation of its zero operator access designAWS Fault Injection Service (FIS) launches new test scenarios for partial failuresAWS CloudFormation Hooks adds granular invocation details for Hooks invocation summaryIntroducing structured output for Custom Model Import in Amazon Bedrock
Learn how DataStax transformed customer feedback into a hybrid search solution that powers Fortune 500 companies through their partnership with AWS.Topics Include:AWS and DataStax discuss how quality data powers AI workloads and applications.DataStax built on Apache Cassandra powers Starbucks, Netflix, and Uber at scale.Their TIL app collects outside-in customer feedback to drive product development decisions.Hybrid search and BM25 kept trending in customer requests for several months.Customers wanted to go beyond pure vector search, not specifically BM25 itself.Research showed hybrid search improves accuracy up to 40% over single methods.ML-based re-rankers substantially outperform score-based ones despite added latency and cost.DataStax repositioned their product as a knowledge layer above the data layer.Developer-first design prioritizes simple interfaces and eliminates manual data modeling headaches.Hybrid search API uses simple dollar-sign parameters and integrates with Langflow automatically.AWS PrivateLink ensures security while Graviton processors boost efficiency and tenant density.Graviton reduced total platform operating costs by 20-30% with higher throughput.Participants:Alejandro Cantarero – Field CTO, AI, DataStaxRuskin Dantra - Senior ISV Solution Architect, AWS, Amazon Web ServicesSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon.com/isv/
In deze aflevering duiken we in de wereld van DataStax, samen met Michel de Ru, pre-sales specialist en expert op het gebied van enterprise data-oplossingen. DataStax, inmiddels onderdeel van IBM, biedt krachtige technologieën rondom de database Apache Cassandra. Denk aan Astra DB, Astra Streaming, de Hyper-Converged Database, DataStax Enterprise en het innovatieve Langflow.Michel neemt ons mee in wat DataStax precies is, waarom het zo'n belangrijke speler is in het datalandschap, en wat hun oplossingen kunnen betekenen voor organisaties die willen versnellen met data, schaalbaarheid en AI. We bespreken de unieke waarde van hun pakketten, de link met kunstmatige intelligentie, en hoe deze technologieën bijdragen aan moderne, intelligente automatisering.
Fredrik talks to Matt Topol about Arrow and how the Arrow ecosystem is evolving. Arrow is an open source, columnar in-memory data format designed for efficient data processing and analytics - which means passing data between things without needing to transform it, and ideally even without needing to copy it. What makes the ecosystem grow, and why is it very cool to have Arrow on the GPU? What is the connection between Arrow, machine learning, and Hugging face? Matt emphasizes the value of open standards, even as they work with or within more closed systems they can help open things up, and help bring about more modular solutions so that developers can focus on doing their core area really well. This episode can be seen as a follow-up to episode 567, where Matt first joined to discuss everything Arrow. Recorded during Øredev 2024. Thank you Cloudnet for sponsoring our VPS! Comments, questions or tips? We a re @kodsnack, @tobiashieta, @oferlund and @bjoreman on Twitter, have a page on Facebook and can be emailed at info@kodsnack.se if you want to write longer. We read everything we receive. If you enjoy Kodsnack we would love a review in iTunes! You can also support the podcast by buying us a coffee (or two!) through Ko-fi. Links Matt Matt’s Øredev 2023 talks: State of the Apache Arrow ecosystem: How your project can leverage Arrow! and Leveraging Apache Arrow for ML workflows Previous episodes with Matt Øredev 2024 Matt’s Øredev 2024 talks - on Arrow ADBC and Composable and modular data systems ADBC - Arrow database connectivity Arrow Snowflake Snowflake drivers for ADBC Bigquery The Bigquery driver Microsoft Fabric Duckdb Postgres SQLite Arrow flight - RPC framework for services based on Arrow data Arrow flight SQL Microsoft Power BI Velox Apache datafusion Query planning Substrait - query IR Polaris Libcudf Nvidia RAPIDS Pytorch Tensorflow Arrow device interface DLPack - in-memory tensor structure Tensors Nanoarrow Voltron data - where Matt used to work. He’s now at Columnar Theseus GPU compute engine The composable data management system manifesto Support us on Ko-fi! Matt’s book - In-memory analytics with Apache Arrow Spark Spark connect RPC UDFs Photon Datafusion Apache Cassandra ODBC JDBC R - programming language for statistical computing Hugging face Ray Stringview - “German-style strings” Scaling up with R and Arrow - the book on using Arrow with R Titles It’s gotten a lot bigger The bones of it are in the repo (Powered by ADBC) Individual compute components Feed it substrate Where the ecosystem is going Arrow on the GPU The data stays on the GPU A forced copy Leverage that device interface Without forcing the copy Shy of that last mile Turtles all the way down The guy who said yes German-style strings
In this episode, Lois Houston and Nikita Abraham continue their deep dive into Oracle GoldenGate 23ai, focusing on its evolution and the extensive features it offers. They are joined once again by Nick Wagner, who provides valuable insights into the product's journey. Nick talks about the various iterations of Oracle GoldenGate, highlighting the significant advancements from version 12c to the latest 23ai release. The discussion then shifts to the extensive new features in 23ai, including AI-related capabilities, UI enhancements, and database function integration. Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. ----------------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Lois: Hello and welcome to the Oracle University Podcast! I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead: Editorial Services. Nikita: Hi everyone! Last week, we introduced Oracle GoldenGate and its capabilities, and also spoke about GoldenGate 23ai. In today's episode, we'll talk about the various iterations of Oracle GoldenGate since its inception. And we'll also take a look at some new features and the Oracle GoldenGate product family. 00:57 Lois: And we have Nick Wagner back with us. Nick is a Senior Director of Product Management for GoldenGate at Oracle. Hi Nick! I think the last time we had an Oracle University course was when Oracle GoldenGate 12c was out. I'm sure there's been a lot of advancements since then. Can you walk us through those? Nick: GoldenGate 12.3 introduced the microservices architecture. GoldenGate 18c introduced support for Oracle Autonomous Data Warehouse and Autonomous Transaction Processing Databases. In GoldenGate 19c, we added the ability to do cross endian remote capture for Oracle, making it easier to set up the GoldenGate OCI service to capture from environments like Solaris, Spark, and HP-UX and replicate into the Cloud. Also, GoldenGate 19c introduced a simpler process for upgrades and installation of GoldenGate where we released something called a unified build. This means that when you install GoldenGate for a particular database, you don't need to worry about the database version when you install GoldenGate. Prior to this, you would have to install a version-specific and database-specific version of GoldenGate. So this really simplified that whole process. In GoldenGate 23ai, which is where we are now, this really is a huge release. 02:16 Nikita: Yeah, we covered some of the distributed AI features and high availability environments in our last episode. But can you give us an overview of everything that's in the 23ai release? I know there's a lot to get into but maybe you could highlight just the major ones? Nick: Within the AI and streaming environments, we've got interoperability for database vector types, heterogeneous capture and apply as well. Again, this is not just replication between Oracle-to-Oracle vector or Postgres to Postgres vector, it is heterogeneous just like the rest of GoldenGate. The entire UI has been redesigned and optimized for high speed. And so we have a lot of customers that have dozens and dozens of extracts and replicats and processes running and it was taking a long time for the UI to refresh those and to show what's going on within those systems. So the UI has been optimized to be able to handle those environments much better. We now have the ability to call database functions directly from call map. And so when you do transformation with GoldenGate, we have about 50 or 60 built-in transformation routines for string conversion, arithmetic operation, date manipulation. But we never had the ability to directly call a database function. 03:28 Lois: And now we do? Nick: So now you can actually call that database function, database stored procedure, database package, return a value and that can be used for transformation within GoldenGate. We have integration with identity providers, being able to use token-based authentication and integrate in with things like Azure Active Directory and your other single sign-on for the GoldenGate product itself. Within Oracle 23ai, there's a number of new features. One of those cool features is something called lock-free reservation columns. So this allows you to have a row, a single row within a table and you can identify a column within that row that's like an inventory column. And you can have multiple different users and multiple different transactions all updating that column within that same exact row at that same time. So you no longer have row-level locking for these reservation columns. And it allows you to do things like shopping carts very easily. If I have 500 widgets to sell, I'm going to let any number of transactions come in and subtract from that inventory column. And then once it gets below a certain point, then I'll start enforcing that row-level locking. 04:43 Lois: That's really cool… Nick: The one key thing that I wanted to mention here is that because of the way that the lock-free reservations work, you can have multiple transactions open on the same row. This is only supported for Oracle to Oracle. You need to have that same lock-free reservation data type and availability on that target system if GoldenGate is going to replicate into it. 05:05 Nikita: Are there any new features related to the diagnosability and observability of GoldenGate? Nick: We've improved the AWR reports in Oracle 23ai. There's now seven sections that are specific to Oracle GoldenGate to allow you to really go in and see exactly what the GoldenGate processes are doing and how they're behaving inside the database itself. And there's a Replication Performance Advisor package inside that database, and that's been integrated into the Web UI as well. So now you can actually get information out of the replication advisor package in Oracle directly from the UI without having to log into the database and try to run any database procedures to get it. We've also added the ability to support a per-PDB Extract. So in the past, when GoldenGate would run on a multitenant database, a multitenant database in Oracle, all the redo data from any pluggable database gets sent to that one redo stream. And so you would have to configure GoldenGate at the container or root level and it would be able to access anything at any PDB. Now, there's better security and better performance by doing what we call per-PDB Extract. And this means that for a single pluggable database, I can have an extract that runs at that database level that's going to capture information just from that pluggable database. 06:22 Lois And what about non-Oracle environments, Nick? Nick: We've also enhanced the non-Oracle environments as well. For example, in Postgres, we've added support for precise instantiation using Postgres snapshots. This eliminates the need to handle collisions when you're doing Postgres to Postgres replication and initial instantiation. On the GoldenGate for big data side, we've renamed that product more aptly to distributed applications in analytics, which is really what it does, and we've added a whole bunch of new features here too. The ability to move data into Databricks, doing Google Pub/Sub delivery. We now have support for XAG within the GoldenGate for distributed applications and analytics. What that means is that now you can follow all of our MAA best practices for GoldenGate for Oracle, but it also works for the DAA product as well, meaning that if it's running on one node of a cluster and that node fails, it'll restart itself on another node in the cluster. We've also added the ability to deliver data to Redis, Google BigQuery, stage and merge functionality for better performance into the BigQuery product. And then we've added a completely new feature, and this is something called streaming data and apps and we're calling it AsyncAPI and CloudEvent data streaming. It's a long name, but what that means is that we now have the ability to publish changes from a GoldenGate trail file out to end users. And so this allows through the Web UI or through the REST API, you can now come into GoldenGate and through the distributed applications and analytics product, actually set up a subscription to a GoldenGate trail file. And so this allows us to push data into messaging environments, or you can simply subscribe to changes and it doesn't have to be the whole trail file, it can just be a subset. You can specify exactly which tables and you can put filters on that. You can also set up your topologies as well. So, it's a really cool feature that we've added here. 08:26 Nikita: Ok, you've given us a lot of updates about what GoldenGate can support. But can we also get some specifics? Nick: So as far as what we have, on the Oracle Database side, there's a ton of different Oracle databases we support, including the Autonomous Databases and all the different flavors of them, your Oracle Database Appliance, your Base Database Service within OCI, your of course, Standard and Enterprise Edition, as well as all the different flavors of Exadata, are all supported with GoldenGate. This is all for capture and delivery. And this is all versions as well. GoldenGate supports Oracle 23ai and below. We also have a ton of non-Oracle databases in different Cloud stores. On an non-Oracle side, we support everything from application-specific databases like FairCom DB, all the way to more advanced applications like Snowflake, which there's a vast user base for that. We also support a lot of different cloud stores and these again, are non-Oracle, nonrelational systems, or they can be relational databases. We also support a lot of big data platforms and this is part of the distributed applications and analytics side of things where you have the ability to replicate to different Apache environments, different Cloudera environments. We also support a number of open-source systems, including things like Apache Cassandra, MySQL Community Edition, a lot of different Postgres open source databases along with MariaDB. And then we have a bunch of streaming event products, NoSQL data stores, and even Oracle applications that we support. So there's absolutely a ton of different environments that GoldenGate supports. There are additional Oracle databases that we support and this includes the Oracle Metadata Service, as well as Oracle MySQL, including MySQL HeatWave. Oracle also has Oracle NoSQL Spatial and Graph and times 10 products, which again are all supported by GoldenGate. 10:23 Lois: Wow, that's a lot of information! Nick: One of the things that we didn't really cover was the different SaaS applications, which we've got like Cerner, Fusion Cloud, Hospitality, Retail, MICROS, Oracle Transportation, JD Edwards, Siebel, and on and on and on. And again, because of the nature of GoldenGate, it's heterogeneous. Any source can talk to any target. And so it doesn't have to be, oh, I'm pulling from Oracle Fusion Cloud, that means I have to go to an Oracle Database on the target, not necessarily. 10:51 Lois: So, there's really a massive amount of flexibility built into the system. 11:00 Unlock the power of AI Vector Search with our new course and certification. Get more accurate search results, handle complex datasets easily, and supercharge your data-driven decisions. From now through May 15, 2025, we are waiving the certification exam fee (valued at $245). Visit mylearn.oracle.com to enroll. 11:26 Nikita: Welcome back! Now that we've gone through the base product, what other features or products are in the GoldenGate family itself, Nick? Nick: So we have quite a few. We've kind of touched already on GoldenGate for Oracle databases and non-Oracle databases. We also have something called GoldenGate for Mainframe, which right now is covered under the GoldenGate for non-Oracle, but there is a licensing difference there. So that's something to be aware of. We also have the OCI GoldenGate product. We are announcing and we have announced that OCI GoldenGate will also be made available as part of the Oracle Database@Azure and Oracle Database@ Google Cloud partnerships. And then you'll be able to use that vendor's cloud credits to actually pay for the OCI GoldenGate product. One of the cool things about this is it will have full feature parity with OCI GoldenGate running in OCI. So all the same features, all the same sources and targets, all the same topologies be able to migrate data in and out of those clouds at will, just like you do with OCI GoldenGate today running in OCI. We have Oracle GoldenGate Free. This is a completely free edition of GoldenGate to use. It is limited on the number of platforms that it supports as far as sources and targets and the size of the database. 12:45 Lois: But it's a great way for developers to really experience GoldenGate without worrying about a license, right? What's next, Nick? Nick: We have GoldenGate for Distributed Applications and Analytics, which was formerly called GoldenGate for big data, and that allows us to do all the streaming. That's also where the GoldenGate AsyncAPI integration is done. So in order to publish the GoldenGate trail files or allow people to subscribe to them, it would be covered under the Oracle GoldenGate Distributed Applications and Analytics license. We also have OCI GoldenGate Marketplace, which allows you to run essentially the on-premises version of GoldenGate but within OCI. So a little bit more flexibility there. It also has a hub architecture. So if you need that 99.99% availability, you can get it within the OCI Marketplace environment. We have GoldenGate for Oracle Enterprise Manager Cloud Control, which used to be called Oracle Enterprise Manager. And this allows you to use Enterprise Manager Cloud Control to get all the statistics and details about GoldenGate. So all the reporting information, all the analytics, all the statistics, how fast GoldenGate is replicating, what's the lag, what's the performance of each of the processes, how much data am I sending across a network. All that's available within the plug-in. We also have Oracle GoldenGate Veridata. This is a nice utility and tool that allows you to compare two databases, whether or not GoldenGate is running between them and actually tell you, hey, these two systems are out of sync. And if they are out of sync, it actually allows you to repair the data too. 14:25 Nikita: That's really valuable…. Nick: And it does this comparison without locking the source or the target tables. The other really cool thing about Veridata is it does this while there's data in flight. So let's say that the GoldenGate lag is 15 or 20 seconds and I want to compare this table that has 10 million rows in it. The Veridata product will go out, run its comparison once. Once that comparison is done the first time, it's then going to have a list of rows that are potentially out of sync. Well, some of those rows could have been moved over or could have been modified during that 10 to 15 second window. And so the next time you run Veridata, it's actually going to go through. It's going to check just those rows that were potentially out of sync to see if they're really out of sync or not. And if it comes back and says, hey, out of those potential rows, there's two out of sync, it'll actually produce a script that allows you to resynchronize those systems and repair them. So it's a very cool product. 15:19 Nikita: What about GoldenGate Stream Analytics? I know you mentioned it in the last episode, but in the context of this discussion, can you tell us a little more about it? Nick: This is the ability to essentially stream data from a GoldenGate trail file, and they do a real time analytics on it. And also things like geofencing or real-time series analysis of it. 15:40 Lois: Could you give us an example of this? Nick: If I'm working in tracking stock market information and stocks, it's not really that important on how much or how far down a stock goes. What's really important is how quickly did that stock rise or how quickly did that stock fall. And that's something that GoldenGate Stream Analytics product can do. Another thing that it's very valuable for is the geofencing. I can have an application on my phone and I can track where the user is based on that application and all that information goes into a database. I can then use the geofencing tool to say that, hey, if one of those users on that app gets within a certain distance of one of my brick-and-mortar stores, I can actually send them a push notification to say, hey, come on in and you can order your favorite drink just by clicking Yes, and we'll have it ready for you. And so there's a lot of things that you can do there to help upsell your customers and to get more revenue just through GoldenGate itself. And then we also have a GoldenGate Migration Utility, which allows customers to migrate from the classic architecture into the microservices architecture. 16:44 Nikita: Thanks Nick for that comprehensive overview. Lois: In our next episode, we'll have Nick back with us to talk about commonly used terminology and the GoldenGate architecture. And if you want to learn more about what we discussed today, visit mylearn.oracle.com and take a look at the Oracle GoldenGate 23ai Fundamentals course. Until next time, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 17:10 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
On this episode of Alexa's Input (AI), we're diving deep into the world of distributed databases with Patrick McFadin, Principal Technical Strategist at DataStax and a leading voice in the Apache Cassandra community. Patrick shares his journey into tech and how he became one of the foremost experts on Cassandra—an open-source, highly scalable NoSQL database that powers mission-critical applications across the globe.We explore Cassandra's unique architecture, its approach to the CAP theorem, real-world use cases, and how it continues to evolve in the era of AI and real-time analytics. Whether you're a developer, architect, or just database-curious, this episode offers a clear, insightful look at how Cassandra handles scale, availability, and open-source innovation.Links:LinkedIn: https://www.linkedin.com/in/patrick-mcfadin-53a8046/DataStax: https://www.datastax.com/our-people/patrick-mcfadinX: https://x.com/patrickmcfadinGithub: https://github.com/pmcfadinYou can support this podcast on the creators page. Make sure to subscribe and follow Alexa's Input Twitter account to get notified when a new podcast episode comes out.
DataStax is known for its expertise in scalable data solutions, particularly for Apache Cassandra, a leading NoSQL database. Recently, the company has focused on enhancing platform support for AI-driven applications, including vector search capabilities. Jonathan Ellis is the Co-founder of DataStax. He maintains a technical role at the company and has recently worked on developing The post DataStax and the Future of Real-Time Data Applications with Jonathan Ellis appeared first on Software Engineering Daily.
DataStax is known for its expertise in scalable data solutions, particularly for Apache Cassandra, a leading NoSQL database. Recently, the company has focused on enhancing platform support for AI-driven applications, including vector search capabilities. Jonathan Ellis is the Co-founder of DataStax. He maintains a technical role at the company and has recently worked on developing The post DataStax and the Future of Real-Time Data Applications with Jonathan Ellis appeared first on Software Engineering Daily.
An airhacks.fm conversation with Jake Luciani (@tjake) about: from Commodore 64 to cloud databases, early programming experiences with Basic and Excel macros, studying cognitive science and its influence on his career, transition to computer science, working at Bell Labs on R language, developing open-source projects like Night Rider MP3 player, creating a NoSQL database that led to involvement with Cassandra, building search API on top of Cassandra, joining datastax as an early employee, working on various aspects of Cassandra including compaction and streaming, challenges of byte buffer implementation, development of CQL (Cassandra Query Language), transition from NoSQL to SQL-like interfaces, separation of compute and storage in cloud databases, using S3 as the source of truth for Astra DB, implementing a Java file system abstraction for S3 integration, using etcd as a transactional cache for metadata, offering multiple APIs including REST and CQL drivers for astra DB, implementing JSON document storage and querying capabilities, cross-AZ cost considerations in cloud deployments, Java as a language for database development, future plans for jlama (Java-based LLM inference engine), the importance of open-source in cloud technologies, cost-driven architectures in cloud deployments, serverless vs. traditional deployments trade-offs, integration of AstraDB with cloud marketplaces and security considerations Jake Luciani on twitter: @tjake
DataStax is a generative AI data company that provides tools and services to build AI and other data-intensive applications. Ed Anuff is the Chief Product Officer at DataStax. He joins the show to talk about making Apache Cassandra accessible, adding vector support at DataStax, envisioning the future application stack for AI, and more. Full Disclosure: The post DataStax with Ed Anuff appeared first on Software Engineering Daily.
DataStax is a generative AI data company that provides tools and services to build AI and other data-intensive applications. Ed Anuff is the Chief Product Officer at DataStax. He joins the show to talk about making Apache Cassandra accessible, adding vector support at DataStax, envisioning the future application stack for AI, and more. Full Disclosure: The post DataStax with Ed Anuff appeared first on Software Engineering Daily.
An airhacks.fm conversation with Jonathan Ellis (@spyced) about: Jonathan's first computer experiences with IBM PC 8086 and Thinkpad laptop with Red Hat Linux, becoming a key contributor to Apache Cassandra and founding datastax, starting DataStax to provide commercial support for Cassandra, early experiences with Java, C++, and python, discussion about the evolution of Java and its ecosystem, the importance of vector databases for semantic search and retrieval augmented generation, the development of JVector for high-performance vector search in Java, the potential of integrating JVector with LangChain for Java / langchain4j and quarkus for serverless deployment, the advantages of Java's productivity and performance for building concurrent data structures, the shift from locally installed software to cloud-based services, the challenges of being a manager and the benefits of taking a sabbatical to focus on creative pursuits, the importance of separating storage and compute in cloud databases, Cassandra's write-optimized architecture and improvements in read performance, DataStax's investment in Apache Pulsar for stream processing, the llama2java project for high-performance language models in Java Jonathan Ellis on twitter: @spyced
Slightly different The Business of Open Source episode today! I spoke with Patrick McFadin and Mick Semb Wever about the relationship between Apache Cassandra and DataStax — how it was at the beginning and how the relationship has evolved over the years. We talked about:— How there was a dynamic around Cassandra where many of the many of the contributors ended up being sucked into the DataStax orbit, simply because it allowed those contributors to work on on Cassandra full-time— How there can be tensions between different stakeholders simply because everyone involved ultimately has their own interests at heart, and those interests are not always aligned. — How it is actually hard to really have open discussions about new features, and how often there can be a new feature dropped in a project that clearly had been developed behind closed doors for some time, and sometimes that created tension in the community— Some open source projects are just too complex to be hobby projects — Cassandra is so complex that you won't become a code contributor unless you're working full-time on Cassandra, because that's the level of skill you need to keep up. — How the relationship between a company and a project often changes as the technology matures. — The importance of addressing tensions between company and community head-on, as adults, when they occur — as well as why you need to remember to treat people as humans and remember that they have good days, bad days, goals and interests. Patrick on LinkedInMick on LinkedIn
Ian, Kito, and Josh are joined by Java Champion, Streaming Developer Advocate at DataStax, and President of Chicago-JUG, Mary Grygleski. They discuss news about Capacitor, Angular, PrimeNG Designer for Tailwind, JetBraiins Compose Multiplatform for iOS, JDK 21, AI developer tools, Jakarta EE 10, and more. Kito announces the work he is doing on the Jakarta EE Tutorial, and then they delve into Mary's background and event streaming with Apache Pulsar, plus tools like Apache Pinot, Apache Flink, RisingWave, ByteWax and Apache Cassandra. We Thank DataDog for sponsoring this podcast! https://www.pubhouse.net/datadog Front End - Announcing Capacitor 5.0 - Ionic Blog (https://ionic.io/blog/announcing-capacitor-5) - Angular v16 is here! (https://blog.angular.io/angular-v16-is-here-4d7a28ec680d) - Compose Multiplatform (https://blog.jetbrains.com/kotlin/2023/05/compose-multiplatform-for-ios-is-in-alpha/) - PrimeNG Designer - Tailwind (Q3 2023) (https://www.primefaces.org/primeng-theme-designer-with-tailwind/) Server Side Java - Kito is working with Bauke Scholtz and Arjan Tjmes to refresh the Jakarta EE Tutorial - Eclipse Documentation for Jakarta EE (https://projects.eclipse.org/projects/ee4j.jakartaee-documentation) - Antora (https://antora.org) - Asciidoc (http://asciidoc.org) - Jakarta EE 10; MicroProfile 6; Java SE 20; Open Liberty (https://openliberty.io/blog/2023/04/04/23.0.0.3.html) - Jakarta EE Starter (https://start.jakarta.ee/) AI/ML - Phind - AI search engine for developers (https://www.phind.com/) - 92% of devs using AI coding assistants (https://www.zdnet.com/article/github-developer-survey-finds-92-of-programmers-using-ai-tools/) Java Platform - JDK 21, the next LTS release, due out in September (https://www.infoworld.com/article/3689880/jdk-21-the-new-features-in-java-21.html) IDE and Tools - Grazie Professional - IntelliJ IDEs Plugin | Marketplace (https://plugins.jetbrains.com/plugin/16136-grazie-professional) Chat w/Mary - Twitter: @mgrygles (https://twitter.com/mgrygles) - Discord server: https://discord.gg/RMU4Juw - LinkedIn: https://www.linkedin.com/in/mary-grygleski/ - Apache Pulsar (https://pulsar.apache.org/) - Apache Pinot (https://pinot.apache.org/) - Apache Flink (https://flink.apache.org/) - RisingWave (https://www.risingwave.dev/) - ByteWax (https://bytewax.io/) - Apache Cassandra (https://cassandra.apache.org/) - Apache Kafka (https://kafka.apache.org/) Picks - Quantum Energy Squares (Kito) (https://quantumsquares.com/) - JBOSS EAP on Azure (Josh) (https://learn.microsoft.com/en-us/azure/developer/java/ee/jboss-on-azure) - Interstellar (Mary) (https://www.imdb.com/title/tt0816692/) - Black Mirror Season 6 Episode 1 - Joan Is Awful - Netflix (Ian) (https://www.rottentomatoes.com/tv/black_mirror/s06/e01) Other Pubhouse Network podcasts - Breaking into Open Source (https://www.pubhouse.net/breaking-into-open-source) - OffHeap (https://www.javaoffheap.com/) - Java Pubhouse (https://www.javapubhouse.com/) Events - Lone Star Software Symposium - July 14 - 15, Austin, TX, USA (https://nofluffjuststuff.com/austin) - ÜberConf - July 18 - 21, Denver, CO, USA (https://uberconf.com/) - Nebraska.code() - July 19-20, Lincoln, NE, USA (https://nebraskacode.amegala.com/)
Welcome to the newest episode of The Cloud Pod podcast! Justin, Ryan and Matthew are your hosts this week as we discuss all the latest news and announcements in the world of the cloud and AI - including what's new with Google Deepmind, as well as goings on over at the Finops X Conference. Join us! Titles we almost went with this week:
Patrick McFadin, VP of Developer Relations at DataStax and Chief Evangelist for Apache Cassandra, joins the Hacking Open Source Business Podcast on Episode 26 to deep dive into open source. In this episode Patrick talks about:- His time working in open source database community, including Apache Cassandra's journey and upcoming developments.- The role of evangelism and contributors in driving adoption and getting people to try your project.- The challenges and mistakes companies make when commercializing open source, with lessons he has learned from his time in the database community.- How new features are chosen based on his experience with Cassandra highlighting features such as transactions and open-source tool Guardrails?- Does open source innovation slow down as products mature?- What is cloud-native anyways? And what does it mean in the database context?- Building a diverse and gloabl team by building trust.- DevRel Best practices includeing, how do you measuring DevRel success.- Patrick McFadin's LinkedIn profile: https://www.linkedin.com/in/patrick-mcfadin-53a8046/- Learn more about Apache Cassandra: https://cassandra.apache.org/Checkout our other interviews, clips, and videos: https://l.hosbp.com/YoutubeDon't forget to visit the open-source business community at: https://opensourcebusiness.community/Visit our primary sponsor, Scarf, for tools to help analyze your #opensource growth and adoption: https://about.scarf.sh/Subscribe to the podcast on your favorite app:Spotify: https://l.hosbp.com/SpotifyApple: https://l.hosbp.com/AppleGoogle: https://l.hosbp.com/GoogleBuzzsprout: https://l.hosbp.com/Buzzsprout
On this episode of The Cloud Pod, the team discusses Amazon Pi Day, Google's upcoming I/O conference, the agricultural data manager by Microsoft, and the downturn in net profits of Oracle. They also round up cloud migrations by highlighting tools from different cloud service providers that are useful for the process. A big thanks to this week's sponsor, Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. This week's highlights
An airhacks.fm conversation with Dave Johnson (@snoopdave) about: PDP-8 with a paper tape reader, airhacks.tv questions and answers, TRS-80, playing asteroids, asteroids, Defender and Battlezone were based on vector graphics, learning Pascal and C, Data General Eclipse MV/8000, Geographic Resources Analysis Support System (GRASS GIS), working for University of Kingston, working on jfactory for Rouge Wave, HAHT Software, The Soul of a New Machine, distributed Visual Basic application server, using xdoclet to generate EJB, using castor for persistence, Apache Roller started as sample application, Sun hires dave, working on Lotus Notes social, starting at wayin, Roller supports Pingback, Lotus is using roller, using Rightscale to deploy Java software to AWS, using Jenkins and CloudFormation, episode with Scott McNealy "#19 SUN, JavaSoft, Java, Oracle", Roller uses Apache Velocity, working on RSS parser Rome, switching from MongoDB to Apache Cassandra, UserGrid data store, Oracle acquires apiary , starting at CloudBees, episode with Kohsuke Kawaguchi "#143 How Hudson and Jenkins happened", starting at Apollo, several thousand blogs on roller Dave Johnson on twitter: @snoopdave
This special episode of Open||Source||Data features an interview with Patrick McFadin. Patrick has been a distributed systems hacker since he first plugged a modem into his Atari computer. Looking for adventure, he joined the US Navy, working on the Naval Tactical Data System (NTDS), which cemented his love of distributed systems. He is now an Apache Cassandra Committer, and is the Vice President of Developer Relations at DataStax. Sam catches up with Patrick at Data Day Texas to discuss his book Managing Cloud Native Data on Kubernetes, Cassandra Forward, and the future of Apache Cassandra.-------------------“I can now use my Parquet file in Iceberg or DuckDB, and this is data that I created with Cassandra. And we're not getting to the point where we have to reinvent an entire database. We can just connect the Lego parts together and if they're open, then I don't have these encumbrances. I'm not like, ‘Well, I can connect that if I call a salesperson and get a license.' [...] That's what's exciting to me about Cassandra, the way that the ecosystem is evolving around Cassandra. It's not, ‘Cassandra's at the center, it's just a player.' It's at the party." – Patrick McFadin-------------------Episode Timestamps:(01:06): What open source data means to Patrick(02:11): Patrick discusses his book Managing Cloud Native Data on Kubernetes(10:02): Patrick discusses Cassandra Forward(11:09): The future of Apache Cassandra-------------------Links:LinkedIn - Connect with PatrickCassandra Forward
An airhacks.fm conversation with Mary Grygleski (@mgrygles) about: 808X as first computer, Hong Kong was high tech, enjoying space missions, Star Trek and Star Wars, the intriguing registration terminal, writing code in Pascal, 3 GL programming languages and SQL, set theory and SQL, the seven layers of OSI, OSI model, IBM MVS, AS 400 is the opposite of micro services, developers get bored too early, learning X-Windows, working with early Oracle databases, using dBASE, clipper and FoxPro, transarc, stratos tx, Transarc the transaction file system, Transaction Processing: Concepts and Techniques, working on SMTP / MTA, CouchDB and Lotus Notes, the Sun Ultra 30 workstation, starting at Sybase, EA server Sybase / Jaguar, using emacs for Java development, then netbeans, Java EE and the hierarchical class loaders, working on EJB 3 specs, mobile apps with Apache Cordova, reactive systems at IBM, using akka, Eclipse Vertex and MicroProfile, working for datastax and Pulsar, Datastax provides support for Apache Cassandra and Apache Pulsar, separating the compute from the storage, astra the managed cloud platform Mary Grygleski on twitter: @mgrygles
An airhacks.fm conversation with John Ceccarelli (@jceccarelli1) about: Macintosh 512K, writing short stories and playing Dark Castle, studying European politics, enjoying Brno and Prague, learning Czech from a communist book, technical writing for Sun Microsystems, working on NetBeans Matisse, WYSIWYG precision is challenging, NetBeans Visual Web Pack was extremely popular, Sun's JSF woodstock, separation of generated and implemented code is challenging, explaining AWS Lambdas with EJBs, visual representation of complex code is challenging, NetBeans vs. IntelliJ strategies, Installing Java Support in Visual Studio Code, working on JVM internals at Azul Systems, Azul JVMs Zulu vs. Prime, the Falcon JIT, optimising JVM for Apache Cassandra, the Renaissance Suite, memento and openJDK CRaC, Azul's CRAC optimization, crowdourcing the optimizations, quarkus on Azul's CRaC, Azul Prime is based on LLVM, Foojay and azul John Ceccarelli on twitter: @jceccarelli1
From the DoK Day North America 2022 (https://youtu.be/YWTa-DiVljY) ABSTRACT In the software industry we're fond of terms that define major trends, like “cloud native”, “Kubernetes native” and “serverless”. As more and more organizations move stateful workloads to Kubernetes, we've started to see these terms applied to data infrastructure, where they can get overtaken by marketing hype unless we work to define them. In this talk, we'll examine two different databases, TiDB and Apache Cassandra, in order to identify what it means for a database to be Kubernetes native and why it matters. We'll look at points including: - The differences between cloud native, Kubernetes native, and serverless - How databases become Kubernetes native - Benefits of Kubernetes native databases - How Kubernetes can better support databases
Apache Cassandra paved the way for today's biggest digital platforms to scale into the much bigger global scene. Patrick McFadin of Datastax is one of the people involved in this open-source project and saw first-hand how it burst into the world. He joins Ben Rometsch to share how Cassandra was developed, the many challenges they faced in its optimization, its relationship with Datastax, and how it changed database engine creation and data modeling. Patrick also talks about the measures they are implementing to continuously improve Cassandra and limit open-source access to ensure quality.
GumGum is a company whose platform serves up online ads related to the context in which potential customers are already shopping or searching. (For instance: it will send ads for Zurich restaurants to someone who's booked travel to Switzerland.) To handle that granular targeting, it relies on its proprietary machine learning platform, Verity. “For all of our publishers, we send a list of URLs to Verity,” according to Keith Sader, GumGum's director of engineering. “Verity goes in and basically categorizes those URLs as different [internal bus] categories. So the IB has tons of taxonomies, based on autos, based upon clothing based upon entertainment. And then that's how we do our targeting.” Verity's targeting data is stored in DynamoDB, but the rest of GumGum's data is stored in managed MySQL and its daily tracking data is stored in ScyllaDB, a database designed for data-intensive applications. Scylla, Sader said, helps his company avoid serving audiences the same ads over and over again, by keeping track of which ads customers have already seen. “That's where Scylla comes into the picture for us,” he said. “Scylla is our rate limiter on ad serving.” In this episode of The New Stack's Makers podcast, Sader and Dor Laor, CEO and co-founder of Scylla, told how GumGum has used ScyllaDB shift more IT resources to its core business and keep it from repeating ads to audiences that have already seen them, no matter where they travel. This case study episode of Makers, hosted Heather Joslyn, TNS features editor, was sponsored by ScyllaDB. ‘Where Do We Spend Our Limited Funds?' Before adding ScyllaDB to its stack, Sader said, “We had a Cassandra-based system that some very smart people put in. But Cassandra relies upon you to have an engineering staff to support it. “That's great. But like many types of systems, managing Cassandra databases is not really what our business makes money at.” GumGum was hosting its Cassandra database, installed on Amazon Web Services, by itself — and the drain on resources brought the company's teams to a crossroards, Sader said. “Where do we spend our limited funds? Do we spend it on Cassandra maintenance? Or do we hire someone to do it for us? And that's really what determined the switch away from a sort of self-installed, self-managed Cassanda to another provider.” A core issue for GumGum, Sader said, was making sure that it wasn't over-serving consumers, even as they moved around the globe. “If you see an ad in one place, we need to make sure, if you fly across the country, you don't see it agin,” he said. That's an issue Cassandra solved for his company, he said. Because ScyllaDB is a drop-in replacement for Apache Cassandra, it also helped prevent over-serving in all regions of the globe — thus preventing GumGum from losing money. In addition to managing its database for GumGum and other customers, Laor said that an advantage ScyllaDB brings is an “always on” guarantee. “We have a big legacy of infrastructure that's supposed to be resilient,” he said. “For example, every implementation of ours has consistent configurable consistency, so you can have multiple replicas.” Laor added, “Many many times organizations have multiple data centers. Sometimes it's for disaster recovery, sometimes it's also to shorten the latency and be closer to the client.” Replica databases located in data centers that are geographically distributed, he said, protect against failure in any one data center. Seeing Results Bringing ScyllaDB to GumGum was not without challenges, both Sader and Laor said. When ScyllaDB is added to an organization's stack, Laor said, it likes to start with as small a deployment as possible. “But in the GumGum case, all of these clients were new processes,” Laor said. So hundreds or thousands of processes, all trying to connect to the database, it's really a connection storm.” Scylla's team created a private version of its database to work on the problem and eventually solved it: “We had to massage the algorithm and make sure that all of the [open source] code committers upstream are summing it up.” It ultimately designed an admission control mechanism that measures the amount of parallel requests that the distributed database is handling, and to slow down requests that arrived for the first time from a new process. “We tried to have the complexity on our end,” Laor said. GumGum has seen the results of handing off that complexity and toil to a managed database. “We have pretty much reduced our entire operations effort with Scylla, to almost nothing,” Sader said. He added, “We're coming into our busy point of the year, ads really get picked up in Q4. So we reach out so we go, ‘Hey, we need more nodes in these regions, can you make that happen for us?' They go, ‘Yep.' Give us the things, we pay the money. And it happens.” In 2021, Sader said, “we increased our volume by probably 75% plus 50%, over our standard. The toughest thing to do in this industry is make things look easy. And Scylla helped us make ad serving look easy.” Check out the podcast to get more detail about GumGum's move to a managed database.
Hey Everyone, In this episode I invited Patrick McFadin who is an expert in the world of Cassandra and Data Modelling. Patrick currently works for DataStax as a VP Of Dev Rel. Patrick has given several techtalks on Cassandra and the ecosystem around it. We have covered the architecture of Cassandra in depth. Heres what we have covered: 00:00 Introduction 04:00 History of Cassandra 07:18 Patrick Apache Cassandra? 14:30 How writes work in Cassandra? 21:30 How many copies are written on a single write? 25:44 How does replication work? 32:00 How do reads work? (Read consistency levels) 39:00 Why is Allow Filtering not recommended? 43:00 Data Modelling in Cassandra 50:45 Modeling a Chat Application 01:05:00 How does CAP theorem fits Cassandra? 01:07:06 New features in Cassandra? References: Patrick McFadin: https://www.linkedin.com/in/patrick-m... Kaivalya Apte: https://www.linkedin.com/in/kaivalya-... Astra: astra.datastax.com Cassandra: https://cassandra.apache.org/_/index.... Webinar on Data Modeling: https://www.youtube.com/watch?v=4D39w... Playlist on Distributed Systems and Databases: https://www.youtube.com/playlist?list... I hope you enjoyed our discussion and learned from it. Please like, share and subscribe to the channel and keep supporting. Cheers, The GeekNarrator
Trazemos novamente o especialista Samuel Matioli para falar do banco de dados colunar mais querido da Fortuna 500, O Apache Cassandra é o banco de dados utilizado por grandes empresas como: Uber, Facebook, Netflix, Instagram, Spotify e Instacart.Nesse bato papo sobre banco de dados NoSQL falamos sobre os seguintes tópicos:Crescimento na Utilização de NoSQL no MercadoDiferença entre HBase e Apache CassandraO que é o Apache CassandraTipos de Deployment e Opções de UtilizaçãoCasos de Uso Quais os Problemas o Apache Cassandra ResolveApache Cassandra = https://cassandra.apache.org/ Samuel Matioli = https://www.linkedin.com/in/samuelmatioli/ No YouTube possuímos um canal de Engenharia de Dados com os tópicos mais importantes dessa área e com lives todas as quartas-feiras.https://www.youtube.com/channel/UCnErAicaumKqIo4sanLo7vQ Quer ficar por dentro dessa área com posts e updates semanais, então acesse o LinkedIN para não perder nenhuma notícia.https://www.linkedin.com/in/luanmoreno/ Disponível no Spotify e na Apple Podcasthttps://open.spotify.com/show/5n9mOmAcjra9KbhKYpOMqYhttps://podcasts.apple.com/br/podcast/engenharia-de-dados-cast/ Luan Moreno = https://www.linkedin.com/in/luanmoreno/
In this episode of the backend engineering show I discuss consistent hashing a very important algorithm in distributed computing specially in database systems such as Apache Cassandra and DynamoDB. 0:00 Intro 2:00 Problem of Distributed Systems 5:00 When to Distribute 7:00 Simple Hashing 9:30 Where Simple Hashing Breaks 11:40 Consistent Hashing 18:00 Adding a Server 21:15 Removing a Server 22:30 Limitations --- Support this podcast: https://anchor.fm/hnasr/support
Vor über drei Jahren hatten wir ja schon einmal eine Episode über Datenbanken. Da das ja nun schon ein bisschen her ist, dachten wir dass es vielleicht an der Zeit wäre, mal wieder über dieses Thema zu reden. Dazu haben wir (Dominik und Jochen) uns diesmal mit Susanne zusammengesetzt, die seit vielen Jahren Consulting und Schulungen zum Thema anbietet. Die alte Datenbank-Episode war unsere längste Episode bisher, und irgendwie ist auch diese hier länger als gewöhnlich geworden. Offenbar gibt es über Datenbanken mehr zu sagen als zu anderen Themen
https://go.dok.community/slack https://dok.community ABSTRACT OF THE TALK What about your streaming and analytic workloads? If you are all-in on Kubernetes you can't forget about these important parts of your infrastructure. I'll talk about the current state of the art. Why organizations may hesitate to go beyond deploying databases in Kubernetes and most important, some key things you need to be successful. BIO Patrick McFadin is the co-author of the upcoming O'Reilly book “Managing Cloud-Native Data on Kubernetes” He currently works at DataStax in Developer Relations and as a contributor to the Apache Cassandra project. Patrick has worked as Chief Evangelist for Apache Cassandra and as a consultant for DataStax, where he had a great time building some of the largest deployments in production. Previous to DataStax, he held positions as Chief Architect, Engineering Lead and Database DBA/Developer. KEY TAKE-AWAYS FROM THE TALK People should walk away with a better understanding of what it takes to deploy streaming and analytic workloads in Kubernetes.
https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) What does Kubernetes provide that allows us to reduce the complexity of Apache Cassandra while making it better suited for cloud native deployments? That was the question we started with as we began a mission to bring Cassandra closer to Kubernetes and eliminate the redundancy. Many great open source databases have been adapted to run on Kubernetes, without relying on the deep ecosystem of projects that it takes to run in Kubernetes(there is a difference). This talk will discuss the design and implementation of the Astra Serverless Database which re-architected Apache Cassandra to run only on Kubernetes infrastructure. Built to be optimized for multi-tenancy and auto-scaling, we set out with a design goal to completely separate compute and storage. Decoupling different aspects of Cassandra into scaleable services and relying on the benefits of Kubernetes and it's ecosystem created a simpler more powerful database service than a stand alone, bare-metal Cassandra cluster. The entire system is now built on Apache Cassandra, Stargate, Etcd, Prometheus, and object-storage like Minio or Ceph. In this talk we will discuss the downstream changes coming to several open source projects based on the work we have done. Jake is a lead developer and software architect at DataStax with over 20 years of experience in the areas of distributed systems, finance, and manufacturing. He is a member of the Apache Foundation and is on the project committee of the Apache Cassandra, Arrow, and Thrift projects. Jake has a reputation for developing creative solutions to solve difficult problems and fostering a culture of trust and innovation. He believes the best software is built by small diverse teams who are encouraged to think freely. Jake received his B.S. in Computer Science from Lehigh University along with a minor in Cognitive Science.
https://go.dok.community/slack https://dok.community/ From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE) The Rap God project acts as a great entry point to many incoming open-source enthusiasts who are interested in learning about the cloud native ecosystem. The Rap-God project uses Kubernetes orchestration for a stateful case which is an emerging topic, the Rap God project acts as a demonstration of how to use such features of Kubernetes. The project will be using Stateful sets that'd deploy Apache Cassandra (for its first cycle) and eventually it'll be implementing the same API endpoints for various databases that will be with/on Kubernetes. We in the community intend to do this with PersistenceVolumes and Persistent Volume Claims. Keeping in mind the issues, various developers face, we also will be making options for storage classes. The project will allow the members to explore how they can customize the whole storage class setup according to their setup. The project will be bringing Helm, Cassandra, Kubernetes and Argo under its watch and shall actively expand on its implementation with the further iterations. Abhijith Ganesh is an undergrad computer science major, currently pursuing his Freshman year. His areas of interest include DevOps, Kuberenetes and Open Source Projects. He is an active member of the DoK Community where he is currently an intern. He is also member of the Pyrsia and SeaQL communities.
In this episode, Ryan and Bhavin interview Patrick McFadin, VP of Developer Relations at Datastax, who is a co-author of the upcoming O'Reilly book “Managing Cloud-Native Data on Kubernetes” and a contributor to the Apache Cassandra project. The discussion dives into how K8ssandra helps users deploy Cassandra on Kubernetes clusters, and how customers are using Cassandra as the NoSQL, Distributed DB backend for their applications. We talk about the challenges, benefits, and best practices for running Cassandra on Kubernetes, and what users can look forward to in the near future. Show links: Patrick McFadin - LinkedIn - Twitter K8ssandra.io - https://k8ssandra.io Introduction to Cassandra - Crash Course - Youtube series - https://youtube.com/playlist?list=PL2g2h-wyI4SqCdxdiyi8enEyWvACcUa9R AWS Marketplace - https://aws.amazon.com/marketplace/pp/prodview-iy7gagaxm2foa Cassandra Discord community - https://discord.com/invite/qP5tAt6Uwt Data On Kubernetes - https://www.meetup.com/Data-on-Kubernetes-community/events/ Managing Cloud-Native Data on Kubernetes - https://portworx.com/resource/ebook-managing-cloud-native-data-on-kubernetes/ Cloud-Native News: Docker raises Series-C funding Garden.io raises Series A - $16M funding to combat waste in cloud development Are you Ready for K8s 1.24 NetApp acquires InstaClustr Spring4Shell - Zero Day Remote Code Execution Vulnerability Portworx Enterprise 2.10 Etcd v3.5.[0-2] is not recommended for production Announcing Postgres container apps: Easy deploy Postgres apps
In todays episode of KubernetesBytes, hosts Ryan Wallner and Bhavin Shah discuss the basic of running distributed databases like Apache Cassandra and Kafka along with Mongo, CockroachDB and others on Kubernetes. There are various capabilities of Kubernetes that were designed for these types of data services and this podcast should help you get a basic understanding of the landscape as well as WHY you may want to run them on Kubernetes. Show Links: https://thenewstack.io/new-tools-for-optimizing-data-resilience-in-kubernetes/ https://awesome-kubernetes.readthedocs.io/ / https://nubenetes.com/ https://www.containiq.com/post/should-you-run-a-database-on-kubernetes Log4j recap - https://blog.aquasec.com/log4j-vulnerabilities-overview IPv6 support for EKS - https://aws.amazon.com/blogs/aws/amazon-elastic-kubernetes-service-adds-ipv6-networking/ https://thenewstack.io/testkube-a-new-approach-to-cloud-native-testing/ GigaOM DP report 2 https://gigaom.com/report/gigaom-radar-for-kubernetes-data-protection-2/ https://portworx.com/blog/kubernetes-failover-mongodb/ https://thenewstack.io/the-perfect-pair-kubernetes-and-distributed-sql/ https://www.purestorage.com/docs.html?item=/type/pdf/subtype/doc/path/content/dam/pdf/en/white-papers/wp-kafka-on-kubernetes-with-portworx.pdf https://docs.datastax.com/en/cassandra-oss/3.0/cassandra/dml/dmlAboutDataConsistency.html https://developer.ibm.com/tutorials/ba-multi-data-center-cassandra-cluster-kubernetes-platform/ https://thenewstack.io/the-perfect-pair-kubernetes-and-distributed-sql/ https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/
Jonathan Ellis, CTO and co-founder of DataStax, has always had a startup mindset. In this episode, Jonathan joins me to discuss his journey and entrepreneurial roadmap thus far.In our conversation, Jonathan shares how he became involved with the Apache Cassandra project and his transition to founding DataStax. He also shares insight on the importance of hiring a go to market team, why hiring executives proves to be more challenging than engineers, building a company based around an open-source project, and more.Highlights: Jonathan's views on his identity as a founder and scratching his coding itch through art. (00:23) A look at Jonathan's journey from Mozy to the Apache Cassandra project. (05:40) The history of DataStax - and Jonathan explores the benefits of building a company around open source. (11:33) Lessons learned: the importance of implementing a go-to-market team, DataStax Kubernetes adoption, and why hiring executives is a challenge. (15:58) Jonathan's advice to technical founders - and his perspective and insight on remote work. (27:39) Links:JonathanLinkedIn: https://www.linkedin.com/in/jbellis/Twitter: https://twitter.com/spycedDataSTax: https://www.datastax.com/
How important is it to find a hosting partner that's a good fit for your company? Rock de Vocht is the Director of Technology, CTO, and Co-Founder at SimSage, the AI powered search platform designed to make finding information more efficient. There are plenty of obstacles to overcome when building a search engine but finding a hosting solution that's suitable and flexible shouldn't be one of them. Rock joined episode three, season two, of our Craft of Code podcast to discuss the technology, infrastructure, processing, and even the language theory behind SimSage's development. We also talked to Rock about his partnership with Linode, including why he switched from Google to Linode, the benefits of cloud hosting, and why human customer support is fundamental to success. In this episode, we discussed: How SimSage connects people with information in the workplace How Rock's background in computational linguistics and languages impacted SimSage's development Where many go wrong with neural networks Why the cloud is a natural fit for SimSage The technology infrastructure behind SimSage SimSage's roadmap for scaling Why Rock made the switch from GCP to Linode Rock's advice for future technologists You can find out more by visiting https://www.linode.com/craft-of-code/ (https://www.linode.com/craft-of-code/) Important Links & Mentions https://simsage.ai/ (SimSage) https://kotlinlang.org/ (Kotlin) https://cassandra.apache.org/_/index.html (Apache Cassandra) https://kubernetes.io/ (Kubernetes) https://www.docker.com/ (Docker) Follow Us https://github.com/linode/ (GitHub) https://www.instagram.com/linode/ (Instagram) https://www.linkedin.com/company/linode/ (LinkedIn) https://twitter.com/linode (Twitter) https://www.youtube.com/linode (YouTube) If you enjoyed our show, then please rate and review us on your podcast app of choice.
DataStax is the open, multi-cloud stack for modern data apps. DataStax gives enterprises the freedom of choice, simplicity, and true cloud economics to deploy massive data, delivered via APIs, powering rich interactions on multi-cloud, open source and Kubernetes.DataStax is built on proven Apache Cassandra™ and the Stargate™ open source API platform. DataStax Astra is the new stack for modern data apps as-a-service, built on the scale-out, cloud-native, open source K8ssandra™.DataStax powers modern data apps for 500 of the world's most demanding enterprises including The Home Depot, T-Mobile, Intuit and half of the Fortune 100.https://www.datastax.com/
On this episode of This Week in Linux, we'll cover the modular laptop that respects your Right to Repair called the Framework laptop. In the Distro News, we've got updates from Linux Mint and a very interesting potential plan for a Rolling Release edition of Pop!_OS from System76. We're going to jump into an enterprise grade tool with Apache Cassandra 4.0. Then in the Linux Mobile News, we've got an interview with Rudi Timmermans of the WayDroid project which is looking to make it possible to run Android Apps on Linux Phones like Ubuntu Touch. We've also got news from NVIDIA, they've released new Drivers with a lot of great features and they've even Open Sourced some content. All that and so much more on episode 162 of This Week in Linux, recorded live on July 31, 2021. Your Weekly Source for Linux GNews! SPONSORED BY: Digital Ocean ►► https://do.co/dln-mongo Bitwarden ►► https://bitwarden.com/dln TWITTER ►► https://twitter.com/michaeltunnell MASTODON ►► https://mastodon.social/@MichaelTunnell DLN COMMUNITY ►► https://destinationlinux.network/contact FRONT PAGE LINUX ►► https://frontpagelinux.com MERCH ►► https://dlnstore.com BECOME A PATRON ►► https://tuxdigital.com/contribute This Week in Linux is produced by the Destination Linux Network: https://destinationlinux.network SHOW NOTES ►► https://tuxdigital.com/twil162 00:00 = Welcome to TWIL 162 01:24 = Framework Modular Laptop Respects Your Right to Repair 09:37 = Element Raises $30 Million 13:50 = Linux Mint Getting New Website 16:59 = Digital Ocean: Managed MongoDB ( https://do.co/dln-mongo ) 18:16 = Pop!_OS Rolling Release? 22:16 = Apache Cassandra 4.0 Released 25:08 = WayDroid: Android Apps On Linux Phones 32:00 = Bitwarden Password Manager ( https://bitwarden.com/dln ) 33:59 = NVIDIA Drivers Security Bugs & Open Source 37:47 = K-9 Mail 5.800 Released 40:27 = Humble Bundles: Programming Games & More 42:58 = Outro Other Videos: 7 Reasons Why Firefox Is My Favorite Web Browser: https://youtu.be/bGTBH9yr8uw How To Use Firefox's Best Feature, Multi-Account Containers: https://youtu.be/FfN5L5zAJUo 5 Reasons Why I Use KDE Plasma: https://youtu.be/b0KA6IsO1M8 6 Cool Things You Didn't Know About Linux's History: https://youtu.be/u9ZY41mNB9I Thanks For Watching! Linux #TechNews #Podcast
Kirill Gavrylyuk and friends join Scott Hanselman to discuss Azure Cosmos DB updates: integrated cache, serverless for MongoDB API, and Managed Instance for Apache Cassandra with dual write proxy.[0:00:00]– Opening[0:01:33]– Integrated cache with Tim Sander[0:17:36]– Serverless for MongoDB API with Gahl Levy[0:24:00]– Managed Instance for Apache Cassandra with Theo van Kraay[0:37:35]– Wrap-upHow to configure the Azure Cosmos DB integrated cache (Preview)Azure Cosmos DB serverlessAzure Managed Instance for Apache Cassandra documentationGitHub - Azure-Samples / cassandra-proxyLearning path: Work with NoSQL data in Azure Cosmos DBCreate a free account (Azure)
Kirill Gavrylyuk and friends join Scott Hanselman to discuss Azure Cosmos DB updates: integrated cache, serverless for MongoDB API, and Managed Instance for Apache Cassandra with dual write proxy.[0:00:00]– Opening[0:01:33]– Integrated cache with Tim Sander[0:17:36]– Serverless for MongoDB API with Gahl Levy[0:24:00]– Managed Instance for Apache Cassandra with Theo van Kraay[0:37:35]– Wrap-upHow to configure the Azure Cosmos DB integrated cache (Preview)Azure Cosmos DB serverlessAzure Managed Instance for Apache Cassandra documentationGitHub - Azure-Samples / cassandra-proxyLearning path: Work with NoSQL data in Azure Cosmos DBCreate a free account (Azure)
Kirill Gavrylyuk and friends join Scott Hanselman to discuss Azure Cosmos DB updates: integrated cache, serverless for MongoDB API, and Managed Instance for Apache Cassandra with dual write proxy.[0:00:00]– Opening[0:01:33]– Integrated cache with Tim Sander[0:17:36]– Serverless for MongoDB API with Gahl Levy[0:24:00]– Managed Instance for Apache Cassandra with Theo van Kraay[0:37:35]– Wrap-upHow to configure the Azure Cosmos DB integrated cache (Preview)Azure Cosmos DB serverlessAzure Managed Instance for Apache Cassandra documentationGitHub - Azure-Samples / cassandra-proxyLearning path: Work with NoSQL data in Azure Cosmos DBCreate a free account (Azure)
Kirill Gavrylyuk and friends join Scott Hanselman to discuss Azure Cosmos DB updates: integrated cache, serverless for MongoDB API, and Managed Instance for Apache Cassandra with dual write proxy.[0:00:00]– Opening[0:01:33]– Integrated cache with Tim Sander[0:17:36]– Serverless for MongoDB API with Gahl Levy[0:24:00]– Managed Instance for Apache Cassandra with Theo van Kraay[0:37:35]– Wrap-upHow to configure the Azure Cosmos DB integrated cache (Preview)Azure Cosmos DB serverlessAzure Managed Instance for Apache Cassandra documentationGitHub - Azure-Samples / cassandra-proxyLearning path: Work with NoSQL data in Azure Cosmos DBCreate a free account (Azure)
Orchestrate all the Things podcast: Connecting the Dots with George Anadiotis
A flexible API is key to database accessibility and developer friendliness today. Apache Cassandra was lacking in that department, and DataStax is trying to address this with the release of a new API layer called Stargate. A discussion with Ed Anuff, formerly of Apogee and Google Cloud, and currently DataStax Chief Product Officer, on the rationale behind Stargate, its architecture and operation, how it compares to GraphQL, and a roadmap for the future. Article published on ZDNet
Welcome to our 5rd episode. This is the second part of a two part series where go deep into the internals of Yugabyte with Karthik and Kannan. Yugabyte is a highly scalable and developer friendly open source distributed SQL database. Yugabyte is built by an Ex-Facebook team that wanted to bring what they learnt running one of the latest databases on the planet out into the open source world. One thing I find really fascinating with Yugabyte is that they are fully compatible with Postgres, Redis and Apache Cassandra which makes it easy to replace a lot of infrastructure with just Yugabyte. Hope you enjoy the listen and remember to subscribe for many more of these deep technical discussions. Our guests for this episode are: Kannan Muthukkaruppan, Founder & President, Product Dev. @ Yugabyte Karthik Ranganathan, Founder & CTO @ YugaByte Links: Kudu: Storage for Fast Analytics on Fast Data - https://kudu.apache.org/kudu.pdf Under the Hood: Building and open-sourcing RocksDB - https://www.facebook.com/notes/facebook-engineering/under-the-hood-building-and-open-sourcing-rocksdb/10151822347683920/ The Log-Structured Merge-Tree (LSM-Tree) - https://www.cs.umb.edu/~poneil/lsmtree.pdf Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases - https://dl.acm.org/doi/epdf/10.1145/3035918.3056101
Orchestrate all the Things podcast: Connecting the Dots with George Anadiotis
Your good old on-premise SQL database is in terminal decline. A pure-play open-source cloud-native PostgreSQL, with support for Apache Cassandra and GraphQL interfaces, is what you need. Or at least, this is what the Yugabyte crew thinks. The company, founded by Facebook data infrastructure veterans, announced that it has raised $30 million in an oversubscribed Series B round to double down on community and team growth. This is a crowded market, but big enough to be a non-zero-sum game. We connected with Yugabyte founders Kannan Muthukkaruppan and Karthik Ranganathan, and newly recruited CEO Bill Cook, previously of Sun Microsystems and Pivotal, for a deep dive in the company, the funding, and the market. Article published on ZDNet in June 2020
In episode 25 of EnterpriseReady, Grant speaks with Jonathan Ellis of DataStax about Apache Cassandra and the complexity of distributed storage systems, as well as recruiting, interviewing, and hiring executives and engineers.
Kirill Gavrylyuk returns to Azure Friday to update Scott Hanselman on what's new in Azure Cosmos DB, such as the Cassandra API for applications that are written for Apache Cassandra, updates to the Azure Table storage API, the Apache Spark Connector, the Graph API, partitioned collections, 99.999% (five 9s) SLA, and more.Dear Cassandra Developers, welcome to Azure #CosmosDB!Introduction to Azure Cosmos DB Table APIApache Spark to Azure #CosmosDB Connector is now generally availableAzure #CosmosDB Graph API now generally availablePartition and scale in Azure Cosmos DBCreate a Free Account (Azure)Follow @SHanselman Follow @AzureFriday Follow @kirillg_msft
Kirill Gavrylyuk returns to Azure Friday to update Scott Hanselman on what's new in Azure Cosmos DB, such as the Cassandra API for applications that are written for Apache Cassandra, updates to the Azure Table storage API, the Apache Spark Connector, the Graph API, partitioned collections, 99.999% (five 9s) SLA, and more.Dear Cassandra Developers, welcome to Azure #CosmosDB!Introduction to Azure Cosmos DB Table APIApache Spark to Azure #CosmosDB Connector is now generally availableAzure #CosmosDB Graph API now generally availablePartition and scale in Azure Cosmos DBCreate a Free Account (Azure)Follow @SHanselman Follow @AzureFriday Follow @kirillg_msft
Kirill Gavrylyuk returns to Azure Friday to update Scott Hanselman on what's new in Azure Cosmos DB, such as the Cassandra API for applications that are written for Apache Cassandra, updates to the Azure Table storage API, the Apache Spark Connector, the Graph API, partitioned collections, 99.999% (five 9s) SLA, and more.Dear Cassandra Developers, welcome to Azure #CosmosDB!Introduction to Azure Cosmos DB Table APIApache Spark to Azure #CosmosDB Connector is now generally availableAzure #CosmosDB Graph API now generally availablePartition and scale in Azure Cosmos DBCreate a Free Account (Azure)Follow @SHanselman Follow @AzureFriday Follow @kirillg_msft
Kirill Gavrylyuk returns to Azure Friday to update Scott Hanselman on what's new in Azure Cosmos DB, such as the Cassandra API for applications that are written for Apache Cassandra, updates to the Azure Table storage API, the Apache Spark Connector, the Graph API, partitioned collections, 99.999% (five 9s) SLA, and more.Dear Cassandra Developers, welcome to Azure #CosmosDB!Introduction to Azure Cosmos DB Table APIApache Spark to Azure #CosmosDB Connector is now generally availableAzure #CosmosDB Graph API now generally availablePartition and scale in Azure Cosmos DBCreate a Free Account (Azure)Follow @SHanselman Follow @AzureFriday Follow @kirillg_msft