Podcasts about oltp

  • 56PODCASTS
  • 129EPISODES
  • 37mAVG DURATION
  • 1EPISODE EVERY OTHER WEEK
  • Apr 29, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about oltp

Latest podcast episodes about oltp

Oracle University Podcast
What is Oracle GoldenGate 23ai?

Oracle University Podcast

Play Episode Listen Later Apr 29, 2025 18:03


In a new season of the Oracle University Podcast, Lois Houston and Nikita Abraham dive into the world of Oracle GoldenGate 23ai, a cutting-edge software solution for data management. They are joined by Nick Wagner, a seasoned expert in database replication, who provides a comprehensive overview of this powerful tool.   Nick highlights GoldenGate's ability to ensure continuous operations by efficiently moving data between databases and platforms with minimal overhead. He emphasizes its role in enabling real-time analytics, enhancing data security, and reducing costs by offloading data to low-cost hardware. The discussion also covers GoldenGate's role in facilitating data sharing, improving operational efficiency, and reducing downtime during outages.   Oracle GoldenGate 23ai: Fundamentals: https://mylearn.oracle.com/ou/course/oracle-goldengate-23ai-fundamentals/145884/237273 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode. ---------------------------------------------------------------   Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Nikita: Welcome to the Oracle University Podcast! I'm Nikita Abraham, Team Lead: Editorial Services with Oracle University, and with me is Lois Houston: Director of Innovation Programs. Lois: Hi everyone! Welcome to a new season of the podcast. This time, we're focusing on the fundamentals of Oracle GoldenGate. Oracle GoldenGate helps organizations manage and synchronize their data across diverse systems and databases in real time.  And with the new Oracle GoldenGate 23ai release, we'll uncover the latest innovations and features that empower businesses to make the most of their data. Nikita: Taking us through this is Nick Wagner, Senior Director of Product Management for Oracle GoldenGate. He's been doing database replication for about 25 years and has been focused on GoldenGate on and off for about 20 of those years.  01:18 Lois: In today's episode, we'll ask Nick to give us a general overview of the product, along with some use cases and benefits. Hi Nick! To start with, why do customers need GoldenGate? Nick: Well, it delivers continuous operations, being able to continuously move data from one database to another database or data platform in efficiently and a high-speed manner, and it does this with very low overhead. Almost all the GoldenGate environments use transaction logs to pull the data out of the system, so we're not creating any additional triggers or very little overhead on that source system. GoldenGate can also enable real-time analytics, being able to pull data from all these different databases and move them into your analytics system in real time can improve the value that those analytics systems provide. Being able to do real-time statistics and analysis of that data within those high-performance custom environments is really important. 02:13 Nikita: Does it offer any benefits in terms of cost?  Nick: GoldenGate can also lower IT costs. A lot of times people run these massive OLTP databases, and they are running reporting in those same systems. With GoldenGate, you can offload some of the data or all the data to a low-cost commodity hardware where you can then run the reports on that other system. So, this way, you can get back that performance on the OLTP system, while at the same time optimizing your reporting environment for those long running reports. You can improve efficiencies and reduce risks. Being able to reduce the amount of downtime during planned and unplanned outages can really make a big benefit to the overall operational efficiencies of your company.  02:54 Nikita: What about when it comes to data sharing and data security? Nick: You can also reduce barriers to data sharing. Being able to pull subsets of data, or just specific pieces of data out of a production database and move it to the team or to the group that needs that information in real time is very important. And it also protects the security of your data by only moving in the information that they need and not the entire database. It also provides extensibility and flexibility, being able to support multiple different replication topologies and architectures. 03:24 Lois: Can you tell us about some of the use cases of GoldenGate? Where does GoldenGate truly shine?  Nick: Some of the more traditional use cases of GoldenGate include use within the multicloud fabric. Within a multicloud fabric, this essentially means that GoldenGate can replicate data between on-premise environments, within cloud environments, or hybrid, cloud to on-premise, on-premise to cloud, or even within multiple clouds. So, you can move data from AWS to Azure to OCI. You can also move between the systems themselves, so you don't have to use the same database in all the different clouds. For example, if you wanted to move data from AWS Postgres into Oracle running in OCI, you can do that using Oracle GoldenGate. We also support maximum availability architectures. And so, there's a lot of different use cases here, but primarily geared around reducing your recovery point objective and recovery time objective. 04:20 Lois: Ah, reducing RPO and RTO. That must have a significant advantage for the customer, right? Nick: So, reducing your RPO and RTO allows you to take advantage of some of the benefits of GoldenGate, being able to do active-active replication, being able to set up GoldenGate for high availability, real-time failover, and it can augment your active Data Guard and Data Guard configuration. So, a lot of times GoldenGate is used within Oracle's maximum availability architecture platinum tier level of replication, which means that at that point you've got lots of different capabilities within the Oracle Database itself. But to help eke out that last little bit of high availability, you want to set up an active-active environment with GoldenGate to really get true zero RPO and RTO. GoldenGate can also be used for data offloading and data hubs. Being able to pull data from one or more source systems and move it into a data hub, or into a data warehouse for your operational reporting. This could also be your analytics environment too. 05:22 Nikita: Does GoldenGate support online migrations? Nick: In fact, a lot of companies actually get started in GoldenGate by doing a migration from one platform to another. Now, these don't even have to be something as complex as going from one database like a DB2 on-premise into an Oracle on OCI, it could even be simple migrations. A lot of times doing something like a major application or a major database version upgrade is going to take downtime on that production system. You can use GoldenGate to eliminate that downtime. So this could be going from Oracle 19c to Oracle 23ai, or going from application version 1.0 to application version 2.0, because GoldenGate can do the transformation between the different application schemas. You can use GoldenGate to migrate your database from on premise into the cloud with no downtime as well. We also support real-time analytic feeds, being able to go from multiple databases, not only those on premise, but being able to pull information from different SaaS applications inside of OCI and move it to your different analytic systems. And then, of course, we also have the ability to stream events and analytics within GoldenGate itself.  06:34 Lois: Let's move on to the various topologies supported by GoldenGate. I know GoldenGate supports many different platforms and can be used with just about any database. Nick: This first layer of topologies is what we usually consider relational database topologies. And so this would be moving data from Oracle to Oracle, Postgres to Oracle, Sybase to SQL Server, a lot of different types of databases. So the first architecture would be unidirectional. This is replicating from one source to one target. You can do this for reporting. If I wanted to offload some reports into another server, I can go ahead and do that using GoldenGate. I can replicate the entire database or just a subset of tables. I can also set up GoldenGate for bidirectional, and this is what I want to set up GoldenGate for something like high availability. So in the event that one of the servers crashes, I can almost immediately reconnect my users to the other system. And that almost immediately depends on the amount of latency that GoldenGate has at that time. So a typical latency is anywhere from 3 to 6 seconds. So after that primary system fails, I can reconnect my users to the other system in 3 to 6 seconds. And I can do that because as GoldenGate's applying data into that target database, that target system is already open for read and write activity. GoldenGate is just another user connecting in issuing DML operations, and so it makes that failover time very low. 07:59 Nikita: Ok…If you can get it down to 3 to 6 seconds, can you bring it down to zero? Like zero failover time?   Nick: That's the next topology, which is active-active. And in this scenario, all servers are read/write all at the same time and all available for user activity. And you can do multiple topologies with this as well. You can do a mesh architecture, which is where every server talks to every other server. This works really well for 2, 3, 4, maybe even 5 environments, but when you get beyond that, having every server communicate with every other server can get a little complex. And so at that point we start looking at doing what we call a hub and spoke architecture, where we have lots of different spokes. At the end of each spoke is a read/write database, and then those communicate with a hub. So any change that happens on one spoke gets sent into the hub, and then from the hub it gets sent out to all the other spokes. And through that architecture, it allows you to really scale up your environments. We have customers that are doing up to 150 spokes within that hub architecture. Within active-active replication as well, we can do conflict detection and resolution, which means that if two users modify the same row on two different systems, GoldenGate can actually determine that there was an issue with that and determine what user wins or which row change wins, which is extremely important when doing active-active replication. And this means that if one of those systems fails, there is no downtime when you switch your users to another active system because it's already available for activity and ready to go. 09:35 Lois: Wow, that's fantastic. Ok, tell us more about the topologies. Nick: GoldenGate can do other things like broadcast, sending data from one system to multiple systems, or many to one as far as consolidation. We can also do cascading replication, so when data moves from one environment that GoldenGate is replicating into another environment that GoldenGate is replicating. By default, we ignore all of our own transactions. But there's actually a toggle switch that you can flip that says, hey, GoldenGate, even though you wrote that data into that database, still push it on to the next system. And then of course, we can also do distribution of data, and this is more like moving data from a relational database into something like a Kafka topic or a JMS queue or into some messaging service. 10:24 Raise your game with the Oracle Cloud Applications skills challenge. Get free training on Oracle Fusion Cloud Applications, Oracle Modern Best Practice, and Oracle Cloud Success Navigator. Pass the free Oracle Fusion Cloud Foundations Associate exam to earn a Foundations Associate certification. Plus, there's a chance to win awards and prizes throughout the challenge! What are you waiting for? Join the challenge today by visiting visit oracle.com/education. 10:58 Nikita: Welcome back! Nick, does GoldenGate also have nonrelational capabilities?  Nick: We have a number of nonrelational replication events in topologies as well. This includes things like data lake ingestion and streaming ingestion, being able to move data and data objects from these different relational database platforms into data lakes and into these streaming systems where you can run analytics on them and run reports. We can also do cloud ingestion, being able to move data from these databases into different cloud environments. And this is not only just moving it into relational databases with those clouds, but also their data lakes and data fabrics. 11:38 Lois: You mentioned a messaging service earlier. Can you tell us more about that? Nick: Messaging replication is also possible. So we can actually capture from things like messaging systems like Kafka Connect and JMS, replicate that into a relational data, or simply stream it into another environment. We also support NoSQL replication, being able to capture from MongoDB and replicate it onto another MongoDB for high availability or disaster recovery, or simply into any other system. 12:06 Nikita: I see. And is there any integration with a customer's SaaS applications? Nick: GoldenGate also supports a number of different OCI SaaS applications. And so a lot of these different applications like Oracle Financials Fusion, Oracle Transportation Management, they all have GoldenGate built under the covers and can be enabled with a flag that you can actually have that data sent out to your other GoldenGate environment. So you can actually subscribe to changes that are happening in these other systems with very little overhead. And then of course, we have event processing and analytics, and this is the final topology or flexibility within GoldenGate itself. And this is being able to push data through data pipelines, doing data transformations. GoldenGate is not an ETL tool, but it can do row-level transformation and row-level filtering.  12:55 Lois: Are there integrations offered by Oracle GoldenGate in automation and artificial intelligence? Nick: We can do time series analysis and geofencing using the GoldenGate Stream Analytics product. It allows you to actually do real time analysis and time series analysis on data as it flows through the GoldenGate trails. And then that same product, the GoldenGate Stream Analytics, can then take the data and move it to predictive analytics, where you can run MML on it, or ONNX or other Spark-type technologies and do real-time analysis and AI on that information as it's flowing through.  13:29 Nikita: So, GoldenGate is extremely flexible. And given Oracle's focus on integrating AI into its product portfolio, what about GoldenGate? Does it offer any AI-related features, especially since the product name has “23ai” in it? Nick: With the advent of Oracle GoldenGate 23ai, it's one of the two products at this point that has the AI moniker at Oracle. Oracle Database 23ai also has it, and that means that we actually do stuff with AI. So the Oracle GoldenGate product can actually capture vectors from databases like MySQL HeatWave, Postgres using pgvector, which includes things like AlloyDB, Amazon RDS Postgres, Aurora Postgres. We can also replicate data into Elasticsearch and OpenSearch, or if the data is using vectors within OCI or the Oracle Database itself. So GoldenGate can be used for a number of things here. The first one is being able to migrate vectors into the Oracle Database. So if you're using something like Postgres, MySQL, and you want to migrate the vector information into the Oracle Database, you can. Now one thing to keep in mind here is a vector is oftentimes like a GPS coordinate. So if I need to know the GPS coordinates of Austin, Texas, I can put in a latitude and longitude and it will give me the GPS coordinates of a building within that city. But if I also need to know the altitude of that same building, well, that's going to be a different algorithm. And GoldenGate and replicating vectors is the same way. When you create a vector, it's essentially just creating a bunch of numbers under the screen, kind of like those same GPS coordinates. The dimension and the algorithm that you use to generate that vector can be different across different databases, but the actual meaning of that data will change. And so GoldenGate can replicate the vector data as long as the algorithm and the dimensions are the same. If the algorithm and the dimensions are not the same between the source and the target, then you'll actually want GoldenGate to replicate the base data that created that vector. And then once GoldenGate replicates the base data, it'll actually call the vector embedding technology to re-embed that data and produce that numerical formatting for you.  15:42 Lois: So, there are some nuances there… Nick: GoldenGate can also replicate and consolidate vector changes or even do the embedding API calls itself. This is really nice because it means that we can take changes from multiple systems and consolidate them into a single one. We can also do the reverse of that too. A lot of customers are still trying to find out which algorithms work best for them. How many dimensions? What's the optimal use? Well, you can now run those in different servers without impacting your actual AI system. Once you've identified which algorithm and dimension is going to be best for your data, you can then have GoldenGate replicate that into your production system and we'll start using that instead. So it's a nice way to switch algorithms without taking extensive downtime. 16:29 Nikita: What about in multicloud environments?  Nick: GoldenGate can also do multicloud and N-way active-active Oracle replication between vectors. So if there's vectors in Oracle databases, in multiple clouds, or multiple on-premise databases, GoldenGate can synchronize them all up. And of course we can also stream changes from vector information, including text as well into different search engines. And that's where the integration with Elasticsearch and OpenSearch comes in. And then we can use things like NVIDIA and Cohere to actually do the AI on that data.  17:01 Lois: Using GoldenGate with AI in the database unlocks so many possibilities. Thanks for that detailed introduction to Oracle GoldenGate 23ai and its capabilities, Nick.  Nikita: We've run out of time for today, but Nick will be back next week to talk about how GoldenGate has evolved over time and its latest features. And if you liked what you heard today, head over to mylearn.oracle.com and take a look at the Oracle GoldenGate 23ai Fundamentals course to learn more. Until next time, this is Nikita Abraham… Lois: And Lois Houston, signing off! 17:33 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

BlueDragon Podcast
S02E04 Azure Integration fo Business - Josh Garverick

BlueDragon Podcast

Play Episode Listen Later Apr 29, 2025 51:36


Joshua Garverink, co-author of the Azure Integration Guide for Business, discusses journey into the tech industry, his experiences with Azure, and the importance of cloud integration for IT leaders.The conversation covers various themes including the benefits of moving to Azure, the cultural shifts required for cloud adoption, architectural considerations for cloud migration, the significance of network design, and the financial implications of cloud services through FinOps.In this conversation, Jetro and Josh discuss the critical aspects of cloud operations, focusing on Cloud FinOps, automation, cybersecurity, and the Azure ecosystem.They emphasize the importance of investing in skills for IT operations, the role of automation in enhancing security, and best practices for OLTP systems in Azure.The discussion also covers the significance of governance and security in cloud operations, the reality of serverless computing, and the future of Azure with technological innovations.CHAPTERS(00:00:00) INTRO (00:00:42) Introduction to Azure Integration and Author Background (00:05:33) Unlocking Opportunities with Azure for IT Leaders (00:10:09) Cultural Shifts in Cloud Adoption (00:12:04) Architectural Considerations for Cloud Migration (00:16:39) The Importance of Network Design in Azure (00:21:50) Understanding Cloud Costs and FinOps (00:25:12) Understanding Cloud FinOps and Cost Management (00:25:45) The Importance of Automation in Cloud Operations (00:30:33) Investing in Skills for IT Operations (00:31:38) The Role of Automation in Cybersecurity (00:32:09) Best Practices for OLTP Systems in Azure (00:35:07) Exploring the Azure Ecosystem for Data Analytics (00:37:33) Serverless Computing: Hype or Reality? (00:43:28) Governance and Security in Cloud Operations (00:45:47) The Future of Azure and Technological Innovations

MLOps.community
Streaming Ecosystem Complexities and Cost Management // Rohit Agarwal // #302

MLOps.community

Play Episode Listen Later Apr 4, 2025 48:51


Streaming Ecosystem Complexities and Cost Management // MLOps Podcast #302 with Rohit Agarwal, Director of Engineering at Tecton.Join the Community: https://go.mlops.community/YTJoinIn Get the newsletter: https://go.mlops.community/YTNewsletter // AbstractDemetrios talks with Rohit Agarwal, Director of Engineering at Tecton, about the challenges and future of streaming data in ML. Rohit shares his path at Tecton and insights on managing real-time and batch systems. They cover tool fragmentation (Kafka, Flink, etc.), infrastructure costs, managed services, and trends like using S3 for storage and Iceberg as the GitHub for data. The episode wraps with thoughts on BYOC solutions and evolving data architectures.// BioRohit Agrawal is an Engineering Manager at Tecton, leading the Real-Time Execution team. Before Tecton, Rohit was the a Lead Software Engineer at Salesforce, where he focused on transaction processign and storage in OLTP relational databases. He holds a Master's Degree in Computer Systems from Carnegie Mellon University and a Bachelor's Degree in Electrical Engineering from the Biria Institute of Technology and Science in Pilani, India.// Related Links~~~~~~~~ ✌️Connect With Us ✌️ ~~~~~~~Catch all episodes, blogs, newsletters, and more: https://go.mlops.community/TYExploreJoin our slack community [https://go.mlops.community/slack]Follow us on X/Twitter [@mlopscommunity](https://x.com/mlopscommunity) or [LinkedIn](https://go.mlops.community/linkedin)] Sign up for the next meetup: [https://go.mlops.community/register]MLOps Swag/Merch: [https://shop.mlops.community/]Connect with Demetrios on LinkedIn: /dpbrinkmConnect with Rohit on LinkedIn: /agrawalrohit10

The Changelog
The 1000x faster financial database (Interview)

The Changelog

Play Episode Listen Later Apr 2, 2025 100:28


In July of 2020, Joran Dirk Greef stumbled into a fundamental limitation in the general-purpose database design for transaction processing. This sent him on a path that ended with TigerBeetle, a redesigned distributed database for financial transactions that yielded three orders of magnitude faster OLTP performance over the usual (general-purpose) suspects. On this episode, Joran joins Jerod to explain how TigerBeetle got so fast, to defend its resilience and durability claims as a new market entrant, and to stake his claim at the intersection of open source and business. Oh, plus the age old question: Why Zig?

Changelog Master Feed
The 1000x faster financial database (Changelog Interviews #635)

Changelog Master Feed

Play Episode Listen Later Apr 2, 2025 100:28


In July of 2020, Joran Dirk Greef stumbled into a fundamental limitation in the general-purpose database design for transaction processing. This sent him on a path that ended with TigerBeetle, a redesigned distributed database for financial transactions that yielded three orders of magnitude faster OLTP performance over the usual (general-purpose) suspects. On this episode, Joran joins Jerod to explain how TigerBeetle got so fast, to defend its resilience and durability claims as a new market entrant, and to stake his claim at the intersection of open source and business. Oh, plus the age old question: Why Zig?

What's New In Data
Scaling Databases in the AI Era: Insights from Andy Pavlo (Carnegie Mellon University)

What's New In Data

Play Episode Listen Later Mar 18, 2025 70:59 Transcription Available


Join us for a deep dive into the world of databases with CMU professor Andy Pavlo. We discuss everything from OLTP vs. OLAP, the challenges of distributed databases, and why cloud-native databases require a fundamentally different approach than legacy systems. We discuss modern Vector Databases, RAG, Embeddings, Text to SQL and industry trends.You can follow Andy's work on:Blue Sky Youtube What's New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What's New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

Scrum Master Toolbox Podcast
BONUS Implementing Agile Practices for Data and Analytics Teams | Henrik Reich

Scrum Master Toolbox Podcast

Play Episode Listen Later Mar 14, 2025 37:49


Global Agile Summit Preview: Implementing Agile Practices for Data and Analytics Teams with Henrik Reich In this BONUS Global Agile Summit preview episode, we dive into the world of Agile methodologies specifically tailored for data and analytics teams. Henrik Reich, Principal Architect at twoday Data & AI Denmark, shares his expertise on how data teams can adapt Agile principles to their unique needs, the challenges they face, and practical tips for successful implementation. The Evolution of Data Teams "Data and analytics work is moving more and more to be like software development." The landscape of data work is rapidly changing. Henrik explains how data teams are increasingly adopting software development practices, yet there remains a significant knowledge gap in effectively using certain tools. This transition creates both opportunities and challenges for organizations looking to implement Agile methodologies in their data teams. Henrik emphasizes that as data projects become more complex, the need for structured yet flexible approaches becomes critical. Dynamic Teams in the Data and Analytics World "When we do sprint planning, we have to assess who is available. Not always the same people are available." Henrik introduces the concept of "dynamic teams," particularly relevant in consulting environments. Unlike traditional Agile teams with consistent membership, data teams often work with fluctuating resources. This requires a unique approach to sprint planning and task assignment. Henrik describes how this dynamic structure affects team coordination, knowledge sharing, and project continuity, offering practical strategies for maintaining momentum despite changing team composition. Customizing Agile for Data and Analytics Teams "In data and analytics, tools have ignored agile practices for a long time." Henrik emphasizes that Agile isn't a one-size-fits-all solution, especially for data teams. He outlines the unique challenges these teams face: Team members have varying expectations based on their backgrounds Experienced data professionals sometimes skip quality practices Traditional data tools weren't designed with Agile methodologies in mind When adapting Agile for data teams, Henrik recommends focusing on three key areas: People and their expertise Technology selection Architecture decisions The overarching goal remains consistent: "How can we deliver as quickly as possible, and keep the good mood of the team?" Implementing CI/CD in Data Projects "Our first approach is to make CI/CD available in the teams." Continuous Integration and Continuous Deployment (CI/CD) practices are essential but often challenging to implement in data teams. Henrik shares how his organization creates "Accelerators" - tools and practices that enable teams to adopt CI/CD effectively. These accelerators address both technological requirements and new ways of working. Through practical examples, he demonstrates how teams can overcome common obstacles, such as version control challenges specific to data projects. In this segment, we refer to the book How to Succeed with Agile Business Intelligence by Raphael Branger. Practical Tips for Agile Adoption "Start small. Don't ditch scrum, take it as an inspiration." For data teams looking to adopt Agile practices, Henrik offers pragmatic advice: Begin with small, manageable changes Use established frameworks like Scrum as inspiration rather than rigid rules Practice new methodologies together as a team to build collective understanding Adapt processes based on team feedback and project requirements This approach allows data teams to embrace Agile principles while accounting for their unique characteristics and constraints. The Product Owner Challenge "CxOs are the biggest users of these systems." A common challenge in data teams is the emergence of "accidental product owners" - individuals who find themselves in product ownership roles without clear preparation. Henrik explains why this happens and offers solutions: Clearly identify who owns the project from the outset Consider implementing a "Proxy PO" role between executives and Agile data teams Recognize the importance of having the right stakeholder engagement for requirements gathering and feedback Henrik also highlights the diversity within data teams, noting there are typically "people who code for living, and people who live for coding." This diversity presents both challenges and opportunities for Agile implementation. Fostering Creativity in Structured Environments "Use sprint goals to motivate a team, and help everyone contribute." Data work often requires creative problem-solving - something that can seem at odds with structured Agile frameworks. Henrik discusses how to balance these seemingly conflicting needs by: Recognizing individual strengths within the team Organizing work to leverage these diverse abilities Using sprint goals to provide direction while allowing flexibility in approach This balanced approach helps maintain the benefits of Agile structure while creating space for the creative work essential to solving complex data problems. About Henrik Reich Henrik is a Principal Architect and developer in the R&D Department at twoday Data & AI Denmark. With deep expertise in OLTP and OLAP, he is a strong advocate of Agile development, automation, and continuous learning. He enjoys biking, music, technical blogging, and speaking at events on data and AI topics. You can link with Henrik Reich on LinkedIn and follow Henrik Reich's blog.

Oracle University Podcast
Monitoring MySQL and HeatWave

Oracle University Podcast

Play Episode Listen Later Feb 25, 2025 21:02


In this episode, Lois Houston and Nikita Abraham chat with MySQL expert Perside Foster on the importance of keeping MySQL performing at its best. They discuss the essential tools for monitoring MySQL, tackling slow queries, and boosting overall performance.   They also explore HeatWave, the powerful real-time analytics engine that brings machine learning and cross-cloud flexibility into MySQL.   MySQL 8.4 Essentials: https://mylearn.oracle.com/ou/course/mysql-84-essentials/141332/226362 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X: https://x.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Kris-Ann Nansen, Radhika Banka, and the OU Studio Team for helping us create this episode.   ----------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Lois: Welcome to the Oracle University Podcast! I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me today is Nikita Abraham, Team Lead: Editorial Services. Nikita: Hey everyone! In our last two episodes, we spoke about MySQL backups, exploring their critical role in data recovery, error correction, data migration, and more. Lois: Today, we're switching gears to talk about monitoring MySQL instances. We'll also explore the features and benefits of HeatWave with Perside Foster, a MySQL Principal Solution Engineer at Oracle. 01:02 Nikita: Hi, Perside! We're thrilled to have you here for one last time this season. So, let's start by discussing the importance of monitoring systems in general, especially when it comes to MySQL. Perside: Database administrators face a lot of challenges, and these sometimes appear in the form of questions that a DBA must answer. One of the most basic question is, why is the database slow? To address this, the next step is to determine which queries are taking the longest. Queries that take a long time might be because they are not correctly indexed. Then we get to some environmental queries or questions. How can we find out if our replicas are out of date? If lag is too much of a problem? Can I restore my last backup? Is the database storage likely to fill up any time soon? Can and should we consider adding more servers and scaling out the system? And when it comes to users and making sure they're behaving correctly, has the database structure changed? And if so, who did it and what did they do? And more generally, what security issues have arisen? How can I see what has happened and how can I fix it? Performance is always at the top of the list of things a DBA worries about. The underlying hardware will always be a factor but is one of the things a DBA has the least flexibility with changing over the short time. The database structure, choice of data types and the overall size of retained data in the active data set can be a problem. 03:01 Nikita: What are some common performance issues that database administrators encounter? Perside: The sort of SQL queries that the application runs can be an issue. 90% of performance problems come from the SQL index and schema group.  03:18 Lois: Perside, can you give us a checklist of the things we should monitor? Perside: Make sure your system is working. Monitor performance continually. Make sure replication is working. Check your backup. Keep an eye on disk space and how it grows over time. Check when long running queries block your application and identify those queries. Protect your database structure from unauthorized changes. Make sure the operating system itself is working fine and check that nothing unusual happened at that level. Keep aware of security vulnerabilities in your software and operating system and ensure that they are kept updated. Verify that your database memory usage is under control. 04:14 Lois: That's a great list, Perside. Thanks for that. Now, what tools can we use to effectively monitor MySQL?     Perside: The slow query log is a simple way to monitor long running queries. Two variables control the log queries. Long_query_time. If a query takes longer than this many seconds, it gets logged. And then there's min_exam_row_limit. If a query looks at more than this many rows, it gets logged. The slow query log doesn't ordinarily record administrative statements or queries that don't use indexes. Two variables control this, log_slow_admin_statements and log_queries_not_using_indexes. Once you have found a query that takes a long time to run, you can focus on optimizing the application, either by limiting this type of query or by optimizing it in some way. 05:23 Nikita: Perside, what tools can help us optimize slow queries and manage data more efficiently? Perside: To help you with processing the slow query log file, you can use the MySQL dump slow command to summarize slow queries. Another important monitoring feature of MySQL is the performance schema. It's a system database that provides statistics of how MySQL executes at a low level. Unlike user databases, performance schema does not persist data to disk. It uses its own storage engine that is flushed every time we start MySQL. And it has almost no interaction with the storage media, making it very fast. This performance information belongs only to the specific instance, so it's not replicated to other systems. Also, performance schema does not grow infinitely large. Instead, each row is recorded in a fixed size ring buffer. This means that when it's full, it starts again at the beginning. The SYS schema is another system database that's strongly related to performance schema. 06:49 Nikita: And how can the SYS schema enhance our monitoring efforts in MySQL? Perside: It contains helper objects like views and stored procedures. They help simplify common monitoring tasks and can help monitor server health and diagnose performance issues. Some of the views provide insights into I/O hotspots, blocking and locking issues, statements that use a lot of resources in various statistics on your busiest tables and indexes. 07:26 Lois: Ok… can you tell us about some of the features within the broader Oracle ecosystem that enhance our ability to monitor MySQL? Perside: As an Oracle customer, you also have access to Oracle Enterprise Manager. This tool supports a huge range of Oracle products. And for MySQL, it's used to monitor performance, system availability, your replication topology, InnoDB performance characteristics and locking, bad queries caught by the MySQL Enterprise firewall, and events that are raised by the MySQL Enterprise audit. 08:08 Nikita: What would you say are some of the standout features of Oracle Enterprise Manager? Perside: When you use MySQL in OCI, you have access to some really powerful features. HeatWave MySQL enables continuous monitoring of query statistics and performance. The health monitor is part of the MySQL server and gathers raw data about the performance of queries. You can see summaries of this information in the Performance Hub in the OCI Console. For example, you can see average statement latency or top 100 statements executed. MySQL metrics lets you drill in with your own custom monitoring queries. This works well with existing OCI features that you might already know. The observability and management framework lets you filter by resource type and across several dimensions. And you can configure OCI alarms to be notified when some condition is reached. 09:20 Lois: Perside, could you tell us more about MySQL metrics? Perside: MySQL metrics uses the raw performance data gathered by the health monitor to measure the important characteristic of your servers. This includes CPU and storage usage and information relevant to your database connection and queries executed. With MySQL metrics, you can create your own custom monitoring queries that you can use to feed graphics. This gives you an up to the minute representation of all the performance characteristics that you're interested in. You can also create alarms that trigger on some performance condition. And you can be notified through the OCI alarms framework so that you can be aware instantly when you need to deal with some issue.  10:22 Are you keen to stay ahead in today's fast-paced world? We've got your back! Each quarter, Oracle rolls out game-changing updates to its Fusion Cloud Applications. And to make sure you're always in the know, we offer New Features courses that give you an insider's look at all of the latest advancements. Don't miss out! Head over to mylearn.oracle.com to get started. 10:47 Nikita: Welcome back! Now, let's dive into the key features of HeatWave, the cloud service that integrates with MySQL. Can you tell us what HeatWave is all about? Perside: HeatWave is the cloud service for MySQL. MySQL is the world's leading database for web applications. And with HeatWave, you can run your online transaction processing or OLTP apps in the cloud. This gives you all the benefits of cloud deployments while keeping your MySQL-based web application running just like they would on your own premises. As well as OLTP applications, you need to run reports with Business Intelligence and Analytics Dashboards or Online Analytical Processing, or OLAP reports. The HeatWave cluster provides accelerated analytics queries without requiring extraction or transformation to a separate reporting system. This is achieved with an in-memory analytics accelerator, which is part of the HeatWave service. In addition, HeatWave enables you to create Machine Learning models to embed artificial intelligence right there in the database. The ML accelerator performs classification, regression, time-series forecasting, anomaly detection, and other functions provided by the various models that you can embed in your architecture. HeatWave can also work directly with storage outside the database. With HeatWave Lakehouse, you can run queries directly on data stored in object storage in a variety of formats without needing to import that data into your MySQL database. 12:50 Lois: With all of these exciting features in HeatWave, Perside, what core MySQL benefits can users continue to enjoy? Perside: The reason why you chose MySQL in the first place, it's still a relational database and with full transactional support, low latency, and high throughput for your online transaction processing app. It has encryption, compression, and high availability clustering. It also has the same large database support with up to 256 terabytes support. It has advanced security features, including authentication, data masking, and database firewall. But because it's part of the cloud service, it comes with automated patching, upgrades, and backup. And it is fully supported by the MySQL team. 13:50 Nikita: Ok… let's get back to what the HeatWave service entails. Perside: The HeatWave service is a fully managed MySQL. Through the web-based console, you can deploy your instances and manage backups, enable high availability, resize your instances, create read replicas, and perform many common administration tasks without writing a single line of SQL. It brings with it the power of OCI and MySQL Enterprise Edition. As a managed service, many routine DBA tests are automated. This includes keeping the instances up to date with the latest version and patches. You can run analytics queries right there in the database without needing to extract and transform your databases, or load them in another dedicated analytics system. 14:52 Nikita: Can you share some common use cases for HeatWave? Perside: You have your typical OLTP workloads, just like you'd run on prem, but with the benefit of being managed in the cloud. Analytic queries are accelerated by HeatWave. So your reporting applications and dashboards are way faster. You can run both OLTP and analytics workloads from the same database, keeping your reports up to date without needing a separate reporting infrastructure. 15:25 Lois: I've heard a lot about HeatWave AutoML. Can you explain what that is? Perside: HeatWave AutoML enables in-database artificial intelligence and Machine Learning. Externally sourced data stores, such as sensor data exported to CSV, can be read directly from object store. And HeatWave generative AI enables chatbots and LLM content creation. 15:57 Lois: Perside, tell us about some of the key features and benefits of HeatWave. Perside: Autopilot is a suite of AI-powered tools to improve the performance and applicability of your HeatWave queries. Autopilot includes two features that help cut costs when you provision your service. There's auto provisioning and auto shape prediction. They analyze your existing use case and tell you exactly which shape you must provision for your nodes and how many nodes you need. Auto parallel loading is used when you import data into HeatWave. It splits the import automatically into an optimum number of parallel streams to speed up your import. And then there's auto data placement. It distributes your data across the HeatWave cluster node to improve your query retrieval performance. Auto encoding chooses the correct data storage type for your string data, cutting down storage and retrieval time. Auto error recovery automatically recovers a fail node and reloads data if that node becomes unresponsive. Auto scheduling prioritizes incoming queries intelligently. An auto change propagation brings data optimally from your DB system to the acceleration cluster. And then there's auto query time estimation and auto query plan improvement. They learn from your workload. They use those statistics to perform on node adaptive optimization. This optimization allows each query portion to be executed on every local node based on that node's actual data distribution at runtime. Finally, there's auto thread pooling. It adjusts the enterprise thread pool configuration to maximize concurrent throughput. It is workload-aware, and minimizes resource contention, which can be caused by too many waiting transactions. 18:24 Lois: How does HeatWave simplify analytics within MySQL and with external data sources? Perside: HeatWave in Oracle Cloud Infrastructure provides all the features you need for analytics, all in one system. Your classic OLTP application run on the MySQL database that you know and love, provision in a DB system. On-line analytical processing is done right there in the database without needing to extract and load it to another analytic system. With HeatWave Lakehouse, you can even run your analytics queries against external data stores without loading them to your DB system. And you can run your machine learning models and LLMs in the same HeatWave service using HeatWave AutoML and generative AI. HeatWave is not just available in Oracle Cloud Infrastructure. If you're tied to another cloud vendor, such as AWS or Azure, you can use HeatWave from your applications in those cloud too, and at a great price. 19:43 Nikita: That's awesome! Thank you, Perside, for joining us throughout this season on MySQL. These conversations have been so insightful. If you're interested in learning more about the topics we discussed today, head over to mylearn.oracle.com and search for the MySQL 8.4: Essentials course.  Lois: This wraps up our season on the essentials of MySQL. But before we go, we just want to remind you to write to us if you have any feedback, questions, or ideas for future episodes. Drop us an email at ou-podcast_ww@oracle.com. That's ou-podcast_ww@oracle.com. Nikita: Until next time, this is Nikita Abraham… Lois: And Lois Houston, signing off! 20:33 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

Oracle University Podcast
Introduction to MySQL

Oracle University Podcast

Play Episode Listen Later Jan 7, 2025 26:21


Join hosts Lois Houston and Nikita Abraham as they kick off a new season exploring the world of MySQL 8.4. Together with Perside Foster, a MySQL Principal Solution Engineer, they break down the fundamentals of MySQL, its wide range of applications, and why it's so popular among developers and database administrators. This episode also covers key topics like licensing options, support services, and the various tools, features, and plugins available in MySQL Enterprise Edition.   ------------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative  podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Lois: Hello and welcome to the Oracle University Podcast! I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me is Nikita Abraham, Team Lead: Editorial Services. Nikita: Happy New Year, everyone! Thank you for joining us as we begin a new season of the podcast, this time focused on the basics of MySQL 8.4. If you're a database administrator or want to become one, this is definitely for you. It's also great for developers working with data-driven apps or IT professionals handling MySQL installs, configurations, and support. 01:03 Lois: That's right, Niki. Throughout the season, we'll be delving into MySQL Enterprise Edition and covering a range of topics, including installation, security, backups, and even MySQL HeatWave on Oracle Cloud.  Nikita: Today, we're going to discuss the Oracle MySQL ecosystem and its various components. We'll start by covering the fundamentals of MySQL and the different licenses that are available. Then, we'll explore the key tools and features to boost data security and performance. Plus, we'll talk a little bit about MySQL HeatWave, which is the cloud version of MySQL.  01:39 Lois: To take us through all of this, we've got Perside Foster with us today. Perside is a MySQL Principal Solution Engineer at Oracle. Hi Perside! For anyone new to MySQL, can you explain what it is and why it's so widely used? Perside: MySQL is a relational database management system that organizes data into structured tables, rows, and columns for efficient programming and data management. MySQL is transactional by nature. When storing and managing data, actions such as selecting, inserting, updating, or deleting are required. MySQL groups these actions into a transaction. The transaction is saved only if every part completes successfully. 02:29 Lois: Now, how does MySQL work under the hood? Perside: MySQL is a high-performance database that uses its default storage engine, known as InnoDB. InnoDB helps MySQL handle complex operations and large data volumes smoothly. 02:49 Nikita: For the unversed, what are some day-to-day applications of MySQL? How is it used in the real world? Perside: MySQL works well with online transaction processing workloads. It handles transactions quickly and manages large volumes of transaction at once. OLTP, with low latency and high throughput, makes MySQL ideal for high-speed environments like banking or online shopping. MySQL not only stores data but also replicates it from a main server to several replicas. 03:31 Nikita: That's impressive! And what are the benefits of using MySQL?  Perside: It improves data availability and load balancing, which is crucial for businesses that need up-to-date information. MySQL replication supports read scale-out by distributing queries across servers, which increases high availability. MySQL is the most popular database on the web. 04:00 Lois: And why is that? What makes it so popular? What sets it apart from the other database management systems? Perside: First, it is a relational database management system that supports SQL. It also works as a document store, enabling the creation of both SQL and NoSQL applications without the need for separate NoSQL databases. Additionally, MySQL offers advanced security features to protect data integrity and privacy. It also uses tablespaces for better disk space management. This gives database administrators total control over their data storage. MySQL is simple, solid in its reliability, and secure by design. It is easy to use and ideal for both beginners and professionals. MySQL is proven at scale by efficiently handling large data volumes and high transaction rates. MySQL is also open source. This means anyone can download and use it for free. Users can modify the MySQL software to meet their needs. However, it is governed by the GNU General Public License, or GPL. GPL outlines specific rules for its use. MySQL offers two major editions. For developers and small teams, the Community Edition is available for free and includes all of the core features needed. For large enterprises, the Commercial Edition provides advanced features, management tools, and dedicated technical support. 05:58 Nikita: Ok. Let's shift focus to licensing. Who is it useful for?  Perside: MySQL licensing is essential for independent software vendors. They're called ISVs. And original manufacturers, they're called OEMs. This is because these companies often incorporate MySQL code into their software products or hardware system to boost the functionality and performance of their product. MySQL licensing is equally important for value-added resellers. We call those VARs. And also, it's important for other distributors. These groups bundle MySQL with other commercially licensed software to sell as part of their product offering. The GPL v.2 license might suit Open Source projects that distribute their products under that license.   07:02 Lois: But what if some independent software vendors, original manufacturers, or value-add resellers don't want to create Open Source products. They don't want their source to be publicly available and they want to keep it private? What happens then? Perside: This is why Oracle provides a commercial licensing option. This license allows businesses to use MySQL in their products without having to disclose their source code as required by GPL v2. 07:33 Nikita: I want to bring up the robust support services that are available for MySQL Enterprise. What can we expect in terms of support, Perside?  Perside: MySQL Enterprise Support provides direct access to the MySQL Support team. This team consists of experienced MySQL developers, who are experts in databases. They understand the issues and challenges their customers face because they, too, have personally tackled these issues and challenges. This support service operates globally and is available in 29 languages. So no matter where customers are located, Oracle Support provides assistance, most likely in their preferred language. MySQL Enterprise Support offers regular updates and hot fixes to ensure that the MySQL customer systems stays current with the latest improvements and security patches. MySQL Support is available 24 hours a day, 7 days a week. This ensures that whenever there is an issue, Oracle Support can provide the needed help without any delay. There are no restrictions on how many times customers can receive help from the team because MySQL Enterprise Support allows for unlimited incidents. MySQL Enterprise Support goes beyond simply fixing issues. It also offers guidance and advice. Whether customers require assistance with performance tuning or troubleshooting, the team is there to support them every step of the way.  09:27 Lois: Perside, can you walk us through the various tools and advanced features that are available within MySQL? Maybe we could start with MySQL Shell. Perside: MySQL Shell is an integrated client tool used for all MySQL database operations and administrative functions. It's a top choice among MySQL users for its versatility and powerful features. MySQL Shell offers multi-language support for JavaScript, Python, and SQL. These naturally scriptable languages make coding flexible and efficient. They also allow developers to use their preferred programming language for everything, from automating database tasks to writing complex queries. MySQL Shell supports both document and relational models. Whether your project needs the flexibility of NoSQL's document-oriented structures or the structured relationships of traditional SQL tables, MySQL Shell manages these different data types without any problems. Another key feature of MySQL Shell is its full access to both development and administrative APIs. This ability makes it easy to automate complex database operations and do custom development directly from MySQL Shell. MySQL Shell excels at DBA operations. It has extensive tools for database configuration, maintenance, and monitoring. These tools not only improve the efficiency of managing databases, but they also reduce the possibility for human error, making MySQL databases more reliable and easier to manage.  11:37 Nikita: What about the MySQL Server tool? I know that it is the core of the MySQL ecosystem and is available in both the community and commercial editions. But how does it enhance the MySQL experience? Perside: It connects with various devices, applications, and third-party tools to enhance its functionality. The server manages both SQL for structured data and NoSQL for schemaless applications. It has many key components. The parser, which interprets SQL commands. Optimizer, which ensures efficient query execution. And then the queue cache and buffer pools. They reduce disk usage and speed up access. InnoDB, the default storage engine, maintains data integrity and supports robust transaction and recovery mechanism. MySQL is designed for scalability and reliability. With features like replication and clustering, it distributes data, manage more users, and ensure consistent uptime. 13:00 Nikita: What role does MySQL Enterprise Edition play in MySQL server's capabilities? Perside: MySQL Enterprise Edition improves MySQL server by adding a suite of commercial extensions. These exclusive tools and services are designed for enterprise-level deployments and challenging environments. These tools and services include secure online backup. It keeps your data safe with efficient backup solutions. Real-time monitoring provides insight into database performance and health. The seamless integration connects easily with existing infrastructure, improving data flow and operations. Then you have the 24/7 expert support. It offers round the clock assistance to optimize and troubleshoot your databases. 14:04 Lois: That's an extensive list of features. Now, can you explain what MySQL Enterprise plugins are? I know they're specialized extensions that boost the capabilities of MySQL server, tools, and services, but I'd love to know a little more about how they work. Perside: Each plugin serves a specific purpose. Firewall plugin protects against SQL injection by allowing only pre-approved queries. The audit plugin logs database activities, tracking who accesses databases and what they do. Encryption plugin secures data at rest, protecting it from unauthorized access. Then we have the authentication plugin, which integrates with systems like LDAP and Active Directory for control access. Finally, the thread pool plugin optimizes performance in high load situation by effectively controlling how many execution threads are used and how long they run. The plugin and tools are included in the MySQL Enterprise Edition suite. 15:32 Join the Oracle University Learning Community and tap into a vibrant network of over 1 million members, including Oracle experts and fellow learners. This dynamic community is the perfect place to grow your skills, connect with likeminded learners, and celebrate your successes. As a MyLearn subscriber, you have access to engage with your fellow learners and participate in activities in the community. Visit community.oracle.com/ou to check things out today! 16:03 Nikita: Welcome back! We've been going through the various MySQL tools, and another important one is MySQL Enterprise Backup, right?  Perside: MySQL Enterprise Backup is a powerful tool that offers online, non-blocking backup and recovery. It makes sure databases remain available and performs optimally during the backup process. It also includes advanced features, such as incremental and differential backup. Additionally, MySQL Enterprise Backup supports compression to reduce backups and encryptions to keep data secure. One of the standard capabilities of MySQL Enterprise Backup is its seamless integration with media management software, or MMS. This integration simplifies the process of managing and storing backups, ensuring that data is easily accessible and secure. Then we have the MySQL Workbench Enterprise. It enhances database development and design with robust tools for creating and managing your diagram and ensuring proper documentation. It simplifies data migration with powerful tools that makes it easy to move databases between platforms. For database administration, MySQL Workbench Enterprise offers efficient tools for monitoring, performance tuning, user management, and backup and recovery. MySQL Enterprise Monitor is another tool. It provides real-time MySQL performance and availability monitoring. It helps track database's health and performance. It visually finds and fixes problem queries. This is to make it easy to identify and address performance issues. It offers MySQL best-practice advisors to guide users in maintaining optimal performance and security. Lastly, MySQL Enterprise Monitor is proactive and it provides forecasting. 18:40 Lois: Oh that's really going to help users stay ahead of potential issues. That's fantastic! What about the Oracle Enterprise Manager Plugin for MySQL? Perside: This one offers availability and performance monitoring to make sure MySQL databases are running smoothly and efficiently. It provides configuration monitoring. This is to help keep track of the database settings and configuration. Finally, it collects all available metrics to provide comprehensive insight into the database operation. 19:19 Lois: Are there any tools designed to handle higher loads and improve security? Perside: MySQL Enterprise Thread Pool improves scalability as concurrent connections grows. It makes sure the database can handle increased loads efficiently. MySQL Enterprise Authentication is another tool. This one integrates MySQL with existing security infrastructures. It provides robust security solutions. It supports Linux PAM, LDAP, Windows, Kerberos, and even FIDO for passwordless authentication. 20:02 Nikita: Do any tools offer benefits like customized logging, data protection, database security? Perside: The MySQL Enterprise Audit provides out-of-the-box logging of connections, logins, and queries in XML or JSON format. It also offers simple to fine-grained policies for filtering and log rotation. This is to ensure comprehensive and customizable logging. MySQL Enterprise Firewall detects and blocks out of policy database transactions. This is to protect your data from unauthorized access and activities. We also have MySQL Enterprise Asymmetric Encryption. It uses MySQL encryption libraries for key management signing and verifying data. It ensures data stays secure during handling. MySQL Transparent Data Encryption, another tool, provides data-at-rest encryption within the database. The Master Key is stored outside of the database in a KMIP 1.1-compliant Key Vault. That is to improve database security. Finally, MySQL Enterprise Masking offers masking capabilities, including string masking and dictionary replacement. This ensures sensitive data is protected by obscuring it. It also provides random data generators, such as range-based, payment card, email, and social security number generators. These tools help create realistic but anonymized data for testing and development. 22:12 Lois: Can you tell us about HeatWave, the MySQL cloud service? We're going to have a whole episode dedicated to it soon, but just a quick introduction for now would be great. Perside: MySQL HeatWave offers a fully managed MySQL service. It provides deployment, backup and restore, high availability, resizing, and read replicas, all the features you need for efficient database management. This service is a powerful union of Oracle Infrastructure and MySQL Enterprise Edition 8. It combines robust performance with top-tier infrastructure. With MySQL HeatWave, your systems are always up to date with the latest security fixes, ensuring your data is always protected. Plus, it supports both OLTP and analytics/ML use cases, making it a versatile solution for diverse database needs. 23:22 Nikita: So to wrap up, what are your key takeways when it comes to MySQL? Perside: When you use MySQL, here is the bottom line. MySQL Enterprise Edition delivers unmatched performance at scale. It provides advanced monitoring and tuning capabilities to ensure efficient database operation, even under heavy loads. Plus, it provides insurance and immediate help when needed, allowing you to depend on expert support whenever an issue arises. Regarding total cost of ownership, TCO, this edition significantly reduces the risk of downtime and enhances productivity. This leads to significant cost savings and improved operational efficiency. On the matter of risk, MySQL Enterprise Edition addresses security and regulatory compliance. This is to make sure your data meets all necessary standards. Additionally, it provides direct contact with the MySQL team for expert guidance. In terms of DevOps agility, it supports automated scaling and management, as well as flexible real-time backups, making it ideal for agile development environments. Finally, concerning customer satisfaction, it enhances application performance and uptime, ensuring your customers have a reliable and smooth experience. 25:18 Lois: Thank you so much, Perside. This is really insightful information. To learn more about all the support services that are available, visit support.oracle.com. This is the central hub for all MySQL Enterprise Support resources.  Nikita: Yeah, and if you want to know about the key commercial products offered by MySQL, visit mylearn.oracle.com and search for the MySQL 8.4: Essentials course. Join us next week for a discussion on installing MySQL. Until then, this is Nikita Abraham… Lois: And Lois Houston signing off! 25:53 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

Oracle University Podcast
Unrestricted Parallel DMLs and Direct Loads

Oracle University Podcast

Play Episode Listen Later Oct 1, 2024 14:31


In this episode, hosts Lois Houston and Nikita Abraham discuss new features in Oracle Database 23ai related to Data Manipulation Language (DML). They are joined by Senior Principal Database & MySQL Instructor, Bill Millar, who explains the concept of unrestricted parallel DMLs and their importance in speeding up large operations and maintaining summary tables. The discussion then turns to unrestricted direct loads, examining the evolution of direct loads with 23ai and the broader impact of these changes.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/140830/   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00  Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I'm Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! In our last episode, we discussed a ground-breaking caching solution in Oracle Database 23ai, known as True Cache. We spoke about its configuration and deployment, and explored how to apply True Cache to our applications. Nikita: Today, we're going to talk about two Oracle Database 23ai new features related to Data Manipulation Language, or DML. The first is Unrestricted Parallel DMLs and then we'll move on to Unrestricted Direct Loads. We'll talk about the situation prior to 23ai, identify the improvements that have been made, and look at their benefits. 01:15 Lois: And returning for another episode is Bill Millar, our Senior Principal Database & MySQL Instructor with Oracle University. Hi Bill! So, to start, can you explain what unrestricted parallel DMLs are and why they are important, especially in the context of Oracle Database? Bill: The Oracle Database allows DML statements such as inserts, updates, deletes, merge to be executed in parallel by breaking those statements into smaller task. These transactions can contain multiple DML statements. And they can modify multiple different tables. So transactions with the parallel DML is going to use the execution method by breaking up those large operations to execute the transaction in parallel. It helps speed up the large operations. And it's advantageous to data warehouse environments where we're maintaining like summary tables, historical tables. And even in OLTP systems, it can be beneficial for long-running batch jobs. The scale up. Well, it's basically dividing the executing SQL against those large tables and indexes into those smaller units of work. 02:36 Nikita: So, what were the limitations prior to 23ai? Bill: So once that object was modified by APLL statement, the object cannot be read or modified later in the same transaction. After that parallel DML modifies a table, there is no follow-on DML or query on the same table within that same transaction. If any attempt to access a table modified by that parallel statement, the transaction would be rejected. You're only allowed to query on those tables prior to that DML on that object itself. 03:16 Lois: Ok… So with these new improvements, I'm guessing some of these restrictions have been removed? Bill: In this case, in the same session, you can query the table multiple times. You can perform conventional DML on the same table within the same session. And you can also have multiple direct loads in the same session without having to do that commit. But there are still some restrictions with it. Heap tables only. You can't do it with any clustered tables or IOT, Index Organized Tables. Non-ASSM, the Automatic Segment Space Management tables. The temp table is not under ASSM. Why? Because it has to have uniform extents or any other tablespaces that you created with the uniform extents. So those restrictions still apply. So some of the improvements are some of the restrictions can help reduce the overhead. We can enable Parallel DML within that session. Allows the multiple operations on the same object. And it doesn't require that commit for each separate operation. Makes it a little bit easier to use by removing some of these limitations. Now users can run parallel DMLs and any combination of statements within that same transaction. And it can help simplify and speed up data loading analytic processes by making the database, the parallel execution and parallel queries, at the same time within that same session, again, eliminating having to do commits. 04:58 Nikita: Thanks for that summary of all the improvements, Bill. Now, how do you enable this? Is it enabled by default? Bill: To enable the Parallel DML mode, it is required for a session. It is disabled by default. That's because the Parallel DML and Serial DML, they have different locking, different ways to handle the transactions, different disk space requirements. When Parallel DML is enabled in a session, all DML statements are considered for parallel execution. Only a statement is considered for parallel execution when the Enable Parallel DML hint is used if I don't set it for a session. The sessions DML mode does not influence any parallelism of DDL statements. When the Parallel DML is disabled, no DML is executed in parallel, even if the hint is used. 05:59 Lois: Bill, I would like to dig a little deeper into the benefits. How do these lifted restrictions improve the overall performance and reduce overhead? Bill: There's no longer that requirement to commit everything separately. So that's going to reduce the overhead, not having to do the commit all the time. The scalability of accessing those large objects, executing parallel makes the decision support systems, those data warehouses and batch OLTP jobs or any other larger DML operation execute faster. By removing that one touch limitation, it allows the parallel DML statements to be read or modified by later statements of the same transaction in the same session. It's very similar to the non-parallel statements. And even OLTP systems can also benefit, for example, maintaining a larger operation, such as the creation of indexes, refreshing tables, or even creating summary tables. 07:14 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You'll find training on everything from cloud computing, database, and security to artificial intelligence and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started. 07:42 Nikita: Welcome back! Let's move on to the next new feature, which is unrestricted direct loads. Bill, what was the situation with direct loads like, prior to 23ai? Bill: After a direct load and prior-- it was always prior to a commit, queries in additional DMLs were not allowed on that same table. You might encounter the ORA error, the 12838, saying, hey, you can't read or modify this in parallel. That's because the DML on that direct load had access to that and that session for that. So you might have received that error. The enq contention, the wait event for the direct load issue in a different session from the other sessions during the direct load is having to wait, because of that queuing that-- because a transaction gets that table, locks that information to keep that table from being modified until that direct load has actually committed. Within the same transaction, within the same session, trying to do multiple DMLs with the-- while it is being modified with the direct loads itself. Unlike conventional loads, the direct loads, as the new blocks and extents are added to the segment, the high water mark does not actually get moved until the actual commit itself. So that's why there is restrictions in the same session or even in other sessions to be able to do anything. So to prevent the errors, the applications had to do a commit immediately after that direct load to prevent those errors from happening. Well now, there are restrictions when that direct load was done prior to that commit for that. The same table in the same session, couldn't query, couldn't do any additional DMLs, couldn't do any additional parallel DMLs. And even in other sessions, queries were not allowed on the same tables that was in use by the other session. So no additional conventional DMLs, no additional parallel DMLs were allowed. 10:09 Lois: Ok.. it was restrictive in what could be done. So, how have direct loads evolved with the 23ai release? Bill: Some of those previous restrictions have been lifted in that same session with that same table. So now you can immediately-- and notice that we're talking here, same session, same table. All right. So you can query multiple times within that same session. You can perform additional DML and you can also do multiple direct loads in the same session without having to do that commit. However, there still are restrictions. It has to be a heap table. It does not work with index organized tables or clustered tables. And the tablespace, if it's not using the automatic segment space management, it cannot-- it does not apply to those either, or if tables with a uniform extents-- tablespace with uniform extents. That's why anything in the temporary table is also included. Why? Because the temporary tablespace has to be uniform extents. 11:17 Nikita: So, what are the restrictions lifted for different sessions on the same table? Bill: Sessions can query that table, can perform conventional DML on that, able to also concurrently perform a direct load, and I can roll back to a save point. So you can see those added features can be very beneficial. But there's still restrictions that apply. It still applies to heap tables only, and it still applies to only tablespaces that are using the automatic segment space management for that. Of course, that includes the temporary tablespace and it doesn't work with tablespaces that have uniform extents. Your application DML might need to query the data after that direct load without committing, applications that might need to modify data within that same transaction as that direct load. You can enable multiple append hint. So you can specify the hint in addition to pending hint to disable. You can specify the no multi-append hint to disable it. 12:27 Lois: Bill, what's the broader impact of these changes. How do these improvements make things more development-friendly? Bill: So changes to the direct load make things a little bit more development friendly by removing those directions after that direct load itself. So previous restrictions when loading-- querying the data kept us from doing multiple things at the same time. So now I can query on that table direct load from the same session, from a different session. I can do conventional DMLs on the table within the same session. It allows me to do a rollback on it. I can do direct loads on the same table within the same session. Again, I can also allow rollback to a save point. As long as my compatibility is set to 21.0.0.0, I will be able to go ahead and benefit from this feature. And there is no increase with it as far as the space usage or causing any fragmentation to the table. So that will not be an issue. 13:35 Nikita: Well, that's the end of our time together, but I want to thank you, Bill, for sharing your expertise with us. Lois: To learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 14:03 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

Oracle University Podcast
Enhancements in SQL Plan Management, SecureFiles LOB Write Performance, and Column Width

Oracle University Podcast

Play Episode Listen Later Sep 17, 2024 17:56


Join Lois Houston and Nikita Abraham, along with Senior Principal Database & Security Instructor Ron Soltani, as they discuss how the new Automatic SQL Plan Management feature in Oracle Database 23ai improves performance consistency and simplifies management. Then, Senior Principal Database & MySQL Instructor Bill Millar shares insights into two new features: one that enhances SecureFiles LOB Write Performance, improving read and write speeds, and another that increases the column limit in a table to 4,096, making it easier to handle complex data.   Oracle MyLearn: https://mylearn.oracle.com/ou/course/oracle-database-23ai-new-features-for-administrators/137192/207062   Oracle University Learning Community: https://education.oracle.com/ou-community   LinkedIn: https://www.linkedin.com/showcase/oracle-university/   X: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode.   --------------------------------------------------------   Episode Transcript:   00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I'm Nikita Abraham, Principal Technical Editor with Oracle University, and joining me is Lois Houston, Director of Innovation Programs. Lois: Hi there! Last week, we looked at the Oracle Database 23ai enhancements that have been made to Hybrid Columnar Compression and Fast Ingest. In today's episode, we'll talk about the 23ai new feature for Automatic SQL Plan Management with Ron Soltani, a Senior Principal Database & Security Instructor with Oracle University.  01:01 Nikita: And later on, we'll be joined by Bill Millar, another Senior Principal Database & MySQL Instructor, who will tell us about the 23ai automatic feature that enhances SecureFiles LOB Write Performance. We'll also get him to talk about the Wide Columns update. So, let's get started. Hi Ron! What have been the common challenges with SQL plans and database performance? Ron: One of the problems that we have always had, if you remember, was when data changes, database setting configuration, parameter changes, SQL that were operating very well could now behave badly using the SQL plan that were associated to them. And remember, the same SQL plan generally Oracle likes to continuously reuse.  So the SQL plans were put in the baseline in the past, and we could have those SQL plan baseline, which are a set of approved plans to be used for a SQL from the SQL history stored in AWR, then could be used for the optimizer to choose from. However, which plan to choose and which one would be the best one to use, this is what the problem has been in managing the SQL plan baselines, and a lot of the operation would have been done manually.  02:22 Lois: And what have we done to overcome this?  Ron: So now this new system will going to perform all of those operations automatically for us. Now it can search the Automatic Workload Repository. It can find SQL plans for a particular SQL statement, then look for any alternative plans that may available in alternate sources like SQL tuning sets. And then validate those plans and see if those plans are going to be good and to be used as SQL plan baseline for executing SQL statement by the optimizer. 03:00 Nikita: So we now have the Automatic SQL Plan Management Evolve Advisor to help manage operations automatically, right? Can you tell us a little more about it? How does it ensure optimal performance? Ron: This is an automatic advisor that is created that can go look for different plans and validate the plans by examining them, making sure that they are not causing any regression compared to the previous operation, and then evolve that plan into a good baseline.  This simplifies management of the baseline repository for a SQL statement. So as data changes, as parameters changes, optimizer could come up with different type of plans that are set within this baseline that has been validated to be good baseline for each situational operation. So this way you reduce a lot of hard parsing operations.  04:00 Lois: And how does the SQL Evolve Advisor work, Ron? Ron: First, it will check the AWR to find what are the top SQLs that has been found. Then it will look to see if these top SQLs who did not perform well with the plan that they have, that's why they're top SQL, have other alternative plans that are stored in the SQL plan history, in AWR, or available in any other sources.  Then if it finds any additional plans, it will go ahead and add all of those plans into the plan history. So in the plan history, now you have accumulation of all the plans available in AWR and anything that has been brought from other sources. Then it will test every one of those plans and validate that by use of the plan, the SQL statement will not deprivate and get slower. The performance is either similar or actually better. So normally, there is a percentage that the SQL should improve. So we will then validate these baselines.  And finally, once the baselines or those plans have been validated, they will be accepted, and then they will be added as SQL plan baselines. They will remain in the statement history, in the AWR, and will be available for optimizer for the future use.  05:28 Nikita: What are the benefits of this? Ron: Number one is Autonomous Database. As you know, they want to automate all management, including management of the SQL execution due to changes that are happening for the application, for the data, or the database and its environment.  It totally eliminates any manual intervention for management of the statement, and it can transparently repair any statement that had been affected by a major change.  06:00 Lois: What sort of problems does this feature solve for us? Ron: Of course, this is a performance consistency. We want to make sure that every statement performed to its best performance and any specific changes that may impact those SQL statements would be taken into an account, and a better plan, if available, would then be available for use.  It also improves the application performance level, therefore database service level will get much improvement. And the SQL execution plans will be automatically managed behind the scene by expanding these baselines, by managing all of these baseline history and all of that that is managed by this automatic SQL plan management environment automatically.  06:50 Nikita: And when do we use this?  Ron: If there is a change in a database environment, like you add SGA, the change into the shared pool, change in the size of the buffer cache or any type of storage effects. So all of those can actually affect the SQL execution.  Now all of those changes, including data changes, can cause a SQL plan to not behave very well or behave as well as it was doing before. Therefore, if particular plans do not perform as well as they did before, that affects the performance of the application. This also affects the performance of the database and the instance.  07:35 Lois: So, how do we use this environment?  Ron: Well, best news that I have for you in that is that there is nothing manual needs to be done. All we need to do is, number one, make sure that we enable foreground automatic SQL plan management that we done through the package for the DBMS SPM for SQL plan management.  You will use the package with the configure option, and you enable the auto SPM evolve task, and you set it to auto. Once this is done, now the SQL evolve plan management and advisor are enabled, and they will then monitor your statements, review all of the top SQLs as they are found with all of the ADDM operation, and then do their work in looking for better plans and being able to maintain the SQL plan baselines we talked about.  Now for you to be able to view, monitor, and see how these operations are going, if it is enabled, you can take a look at the DBA SQL plan baseline's view. There are many, many columns in that particular baseline, and there are also columns that has been added that tell you where is the plan generated from, if a plan is approved, and any other user interaction with the plan or settings can then be verified using that DBA SQL plan baseline view.  09:13 Are you looking for practical use cases to help you plan and apply configurations that solve real-world challenges? With the new Applied Learning courses for Cloud Applications, you'll be able to practically apply the concepts learned in our implementation courses and work through case studies featuring key decisions and configurations encountered during a typical Oracle Cloud Applications implementation. Applied learning scenarios are currently available for General Ledger, Payables, Receivables, Accounting Hub, Global Human Resources, Talent Management, Inventory, and Procurement, with many more to come! Visit mylearn.oracle.com to get started. 09:54 Nikita: Welcome back! Let's bring Bill into the conversation. Hi  Bill! Can you tell us about the 23ai automatic feature that enhances SecureFiles LOB Write Performance?  Bill: The key here is that it is automatic and transparent. There's no parameters set. Nothing to configure in table, no hints, and nothing that you have to do with these improvements. It is tightly integrated with SecureFiles LOB infrastructure.  So now, multiple LOBs can be handled in a single transaction and can be buffered simultaneously. This will help with mixed workloads, switching between the LOBs that are writing in a single transaction. The PGA will adaptively resize based off the size for these large writes for the LOBs if you're using the No Cache option. Remember, no cache is going to bypass the buffer cache and does direct reads and writes from the PGA.  JSON type will be transformed into the OSON Oracle data type. It is an optimized native binary storage format for JSON data.  11:15 Lois: Ok. So, going forward, there will be better read and write performance for LOBs. Bill: Multiple LOBs in a single transaction can be buffered simultaneously, improving mixed workloads. We just talked about the PGA. Automatically, the buffer is automatically resized.  And the improved JSON support. The reason it will recognize, hey, this is a JSON data type. But traditionally, JSON data types were small.  So they were small to medium size. So the range from 32k to 32 meg was considered small to medium whereas LOBs were designed for data types larger than 100 meg. So by recognizing this a JSON data type, it can take advantage of the LOB architecture.  Other enhancements will also include the acceleration of compressed LOBs, the pen and compression caching, and improves the poor performance of your reads and writes to compressed LOBs. It's faster than previously.  12:24 Nikita: Bill, what do you think about the recent increase in the column limit? Previously, the limit was 1000 columns per table, which sometimes posed issues when migrating from other systems that allowed more than 1,000 columns, right?  Bill: Maybe because of workload requirements, the whole machine learning, the internet of things workloads, IOTs can have hundreds of thousands of attributes, dimensional attribute columns for that. And even our very own blockchain tables reserves up to 40 hidden virtual columns, so that takes away from the total amount.  Virtual columns count towards the column limits and some applications as they drop columns, what it does, it just converts them to unused, and it still applies towards the limit the number of columns that you can have to that limit. There were workarounds. However, they were most likely not the best way to do it, like column switching, table splitting for that. But big data really use cases, really saw where files have or required more than 1,000 columns.  13:42 Lois: So, now that we can have 4,096 columns in a table, I'm sure it's made handling complex data a lot easier.  Bill: So by increasing this, since other systems do support higher column limits, it can-- the increase can make migration from other systems easier and possibly even a little bit more attractive while it can make applications a little bit simpler because the 1,000 column limit was not always optimal for analytics. Where 1,000 might have been plenty for OLTP type environments, but not for the analytics, especially when it comes to machine learning and those internet of things that we talked about, where the previous workarounds, like splitting the tables, really caused more performance issue than anything else.  So we want to avoid those suboptimal workarounds. And the nice thing is there's no change to the SQL. So once you have that-- well, if we were doing SQL, if we had tables that were split and we're trying to do things that is actually going to help improve that SQL, now, we don't have multiple objects that we're dealing with.  14:57 Nikita: How do we actually go about increasing the column limit to 4,096? Bill: You do have to have the compatibility set to 23c. Why? Because it's a new feature. There is a new initialization parameter called Max columns, and you do set that. There's two different ways, two different values. We can set it to standard or we can set it to extended.  It is dynamic. When it's set to standard, it's only 1,000. When we set it to extended, it's going to allow the 4,096. It is modifiable at the PDB level. However, it will inherit what's at the root level, if it's not explicitly set at a PDB. It can't alter it in a session for that. And multiple instances of the RAC environment must use the same value.  Now one thing, notice that it cannot be set to standard if I created a table that had more than 1,000 columns. One thing that might get you, when you drop a table that has more 1,000 columns and you try to set it back to standard, it might tell you, hey, you have tables that have more than 1,000 columns. Don't forget your recycle bin unless you did a drop table purge.  16:09 Lois: Are there any performance considerations to keep in mind, Bill? Bill: There's really no DML or query performance degradation for the tables. However, it might require, as you would expect, the increase in memory when we have the new column limits. It might require additional shared pool, additional SGA with the additional columns, more buffer cache as we're bringing blocks in.  So that's shared pool along with the PGA. And also we can add in buffer cache in there, because that increased column count is going to be increase in the total PGA memory usage. And those are kind of expected for that. But the big advantage is it gives us the ability to eliminate some of these suboptimal workarounds that we had in the past.  17:02 Nikita: Ok! We covered a lot today so thank you Bill and Ron.  Lois: To learn more about what we discussed today, visit mylearn.oracle.com and search for the Oracle Database 23ai New Features for Administrators course. Join us next week for a discussion on some more Oracle Database 23ai new features. Until then, this is Lois Houston… Nikita: And Nikita Abraham signing off! 17:27 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

Software Engineering Daily
Building a Fast Financial Transactions Database with Joran Greef

Software Engineering Daily

Play Episode Listen Later Sep 12, 2024 59:18


Online transaction processing, or OLTP, is designed for managing high volumes of short, fast, and concurrent transactions, such as data entry and retrieval operations. OLTP systems solve the problem of efficiently handling numerous simultaneous transactions, making them essential for sectors like banking and retail. Joran Greef is the Founder and CEO of TigerBeetle, which is The post Building a Fast Financial Transactions Database with Joran Greef appeared first on Software Engineering Daily.

Podcast – Software Engineering Daily
Building a Fast Financial Transactions Database with Joran Greef

Podcast – Software Engineering Daily

Play Episode Listen Later Sep 12, 2024 58:47


Online transaction processing, or OLTP, is designed for managing high volumes of short, fast, and concurrent transactions, such as data entry and retrieval operations. OLTP systems solve the problem of efficiently handling numerous simultaneous transactions, making them essential for sectors like banking and retail. Joran Greef is the Founder and CEO of TigerBeetle, which is The post Building a Fast Financial Transactions Database with Joran Greef appeared first on Software Engineering Daily.

Data Engineering Podcast
Reconciling The Data In Your Databases With Datafold

Data Engineering Podcast

Play Episode Listen Later Mar 17, 2024 58:14


Summary A significant portion of data workflows involve storing and processing information in database engines. Validating that the information is stored and processed correctly can be complex and time-consuming, especially when the source and destination speak different dialects of SQL. In this episode Gleb Mezhanskiy, founder and CEO of Datafold, discusses the different error conditions and solutions that you need to know about to ensure the accuracy of your data. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Dagster offers a new approach to building and running data platforms and data pipelines. It is an open-source, cloud-native orchestrator for the whole development lifecycle, with integrated lineage and observability, a declarative programming model, and best-in-class testability. Your team can get up and running in minutes thanks to Dagster Cloud, an enterprise-class hosted solution that offers serverless and hybrid deployments, enhanced security, and on-demand ephemeral test deployments. Go to dataengineeringpodcast.com/dagster (https://www.dataengineeringpodcast.com/dagster) today to get started. Your first 30 days are free! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst (https://www.dataengineeringpodcast.com/starburst) and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Join us at the top event for the global data community, Data Council Austin. From March 26-28th 2024, we'll play host to hundreds of attendees, 100 top speakers and dozens of startups that are advancing data science, engineering and AI. Data Council attendees are amazing founders, data scientists, lead engineers, CTOs, heads of data, investors and community organizers who are all working together to build the future of data and sharing their insights and learnings through deeply technical talks. As a listener to the Data Engineering Podcast you can get a special discount off regular priced and late bird tickets by using the promo code dataengpod20. Don't miss out on our only event this year! Visit dataengineeringpodcast.com/data-council (https://www.dataengineeringpodcast.com/data-council) and use code dataengpod20 to register today! Your host is Tobias Macey and today I'm welcoming back Gleb Mezhanskiy to talk about how to reconcile data in database environments Interview Introduction How did you get involved in the area of data management? Can you start by outlining some of the situations where reconciling data between databases is needed? What are examples of the error conditions that you are likely to run into when duplicating information between database engines? When these errors do occur, what are some of the problems that they can cause? When teams are replicating data between database engines, what are some of the common patterns for managing those flows? How does that change between continual and one-time replication? What are some of the steps involved in verifying the integrity of data replication between database engines? If the source or destination isn't a traditional database engine (e.g. data lakehouse) how does that change the work involved in verifying the success of the replication? What are the challenges of validating and reconciling data? Sheer scale and cost of pulling data out, have to do in-place Performance. Pushing databases to the limit, especially hard for OLTP and legacy Cross-database compatibilty Data types What are the most interesting, innovative, or unexpected ways that you have seen Datafold/data-diff used in the context of cross-database validation? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Datafold? When is Datafold/data-diff the wrong choice? What do you have planned for the future of Datafold? Contact Info LinkedIn (https://www.linkedin.com/in/glebmezh/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. Links Datafold (https://www.datafold.com/) Podcast Episode (https://www.dataengineeringpodcast.com/datafold-proactive-data-quality-episode-205/) data-diff (https://github.com/datafold/data-diff) Podcast Episode (https://www.dataengineeringpodcast.com/data-diff-open-source-data-integration-validation-episode-303) Hive (https://hive.apache.org/) Presto (https://prestodb.io/) Spark (https://spark.apache.org/) SAP HANA (https://en.wikipedia.org/wiki/SAP_HANA) Change Data Capture (https://en.wikipedia.org/wiki/Change_data_capture) Nessie (https://projectnessie.org/) Podcast Episode (https://www.dataengineeringpodcast.com/nessie-data-lakehouse-data-versioning-episode-416) LakeFS (https://lakefs.io/) Podcast Episode (https://www.dataengineeringpodcast.com/lakefs-data-lake-versioning-episode-157) Iceberg Tables (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/iceberg-with-ryan-blue-episode-52/) SQLGlot (https://github.com/tobymao/sqlglot) Trino (https://trino.io/) GitHub Copilot (https://github.com/features/copilot) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

The GeekNarrator
TigerBeetle: World's Fastest Financial Transactions Database

The GeekNarrator

Play Episode Listen Later Feb 23, 2024 79:39


In an enlightening episode of the GeekNarrator Podcast, host Kaivalya Apte and TigerBeetle's CEO, Joran, delve deep into the world of online transaction processing (OLTP). They discuss the origin, unique architecture, and innovative methodologies behind TigerBeetle, a database tailored to efficiently handle high-volume transaction systems. The podcast explores the system's key features such as efficient scalability, performance-oriented design, and optimized memory usage, demonstrating its robustness in handling business transactions and accounting. It also elucidates TigerBeetle's adaptability to various domains beyond finance, like energy management and gaming, while highlighting the rigorous testing it undergoes for impeccable quality assurance. Chapters: 00:00 Introduction 01:19 Joran's Journey into Databases 03:59 Understanding Financial Transaction Databases 07:41 The Evolution of OLTP and OLAP 16:13 The Need for a New Database: TigerBeetle 16:53 Performance and Safety Features of TigerBeetle 28:49 The Importance of Safety in Financial Transactions 36:49 Changing Developer Experience with TigerBeetle 41:43 Understanding the CPU and Memory Bandwidth 42:12 The Importance of Data Format Language 43:27 The Concept of Serialization and its Impact 46:23 The Architecture of TigerBeetle 46:29 The Role of Replicated State Machine 48:18 The Importance of Consensus in Replication 50:20 The Structure of TigerBeetle 50:37 The Importance of Log in Systems 50:51 Understanding the State in Replicated State Machine 52:55 The Role of LSM in TigerBeetle 53:55 The Impact of Compaction Process on Performance 57:06 The Importance of Predictability in Software 01:06:15 The Read and Write Path in TigerBeetle 01:14:46 Potential Use Cases for TigerBeetle 01:17:09 Understanding the Limitations of TigerBeetle =============================================================================== For discount on the below courses: Appsync: https://appsyncmasterclass.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Testing serverless: https://testserverlessapps.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Production-Ready Serverless: https://productionreadyserverless.com/?affiliateId=41c07a65-24c8-4499-af3c-b853a3495003 Use the button, Add Discount and enter "geeknarrator" discount code to get 20% discount. =============================================================================== Follow me on Linkedin and Twitter: https://www.linkedin.com/in/kaivalyaapte/ and https://twitter.com/thegeeknarrator If you like this episode, please hit the like button and share it with your network. Also please subscribe if you haven't yet. Database internals series: https://youtu.be/yV_Zp0Mi3xs Popular playlists: Realtime streaming systems: https://www.youtube.com/playlist?list=PLL7QpTxsA4se-mAKKoVOs3VcaP71X_LA- Software Engineering: https://www.youtube.com/playlist?list=PLL7QpTxsA4sf6By03bot5BhKoMgxDUU17 Distributed systems and databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4sfLDUnjBJXJGFhhz94jDd_d Modern databases: https://www.youtube.com/playlist?list=PLL7QpTxsA4scSeZAsCUXijtnfW5ARlrsN Stay Curios! Keep Learning! #tigerbeetledb #databases #acid #olap #oltp #postgres #mysql

Oracle University Podcast
Everything You Need to Know About the MySQL HeatWave Implementation Associate Certification

Oracle University Podcast

Play Episode Listen Later Feb 13, 2024 14:33


What is MySQL HeatWave? How do I get certified in it? Where do I start? Listen to Lois Houston and Nikita Abraham, along with MySQL Developer Scott Stroz, answer all these questions and more on this week's episode of the Oracle University Podcast. MySQL Document Store: https://oracleuniversitypodcast.libsyn.com/mysql-document-store Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this  series of informative podcasts, we'll bring you foundational training on the most popular  Oracle technologies. Let's get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I'm Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! For the last two weeks, we've been having really exciting discussions on everything AI. We covered the basics of artificial intelligence and machine learning, and we're taking a short break from that today to talk about the new MySQL HeatWave Implementation Associate Certification with MySQL Developer Advocate Scott Stroz. 00:59 Nikita: You may remember Scott from an episode last year where he came on to discuss MySQL Document Store. We'll post the link to that episode in the show notes so you can listen to it if you haven't already. Lois: Hi Scott! Thanks for joining us again. Before diving into the certification, tell us, what is MySQL HeatWave?  01:19 Scott: Hi Lois, Hi Niki. I'm so glad to be back. So, MySQL HeatWave Database Service is a fully managed database that is capable of running transactional and analytic queries in a single database instance. This can be done across data warehouses and data lakes. We get all the benefits of analytic queries without the latency and potential security issues of performing standard extract, transform, and load, or ETL, operations. Some other MySQL HeatWave database service features are automated system updates and database backups, high availability, in-database machine learning with AutoML, MySQL Autopilot for managing instance provisioning, and enhanced data security.  HeatWave is the only cloud database service running MySQL that is built, managed, and supported by the MySQL Engineering team. 02:14 Lois: And where can I find MySQL HeatWave? Scott: MySQL HeatWave is only available in the cloud. MySQL HeatWave instances can be provisioned in Oracle Cloud Infrastructure or OCI, Amazon Web Services (AWS), and Microsoft Azure. Now, some features though are only available in Oracle Cloud, such as access to MySQL Document Store. 02:36 Nikita: Scott, you said MySQL HeatWave runs transactional and analytic queries in a single instance. Can you elaborate on that? Scott: Sure, Niki. So, MySQL HeatWave allows developers, database administrators, and data analysts to run transactional queries (OLTP) and analytic queries (OLAP).  OLTP, or online transaction processing, allows for real-time execution of database transactions. A transaction is any kind of insertion, deletion, update, or query of data. Most DBAs and developers work with this kind of processing in their day-to-day activities.   OLAP, or online analytical processing, is one way to handle multi-dimensional analytical queries typically used for reporting or data analytics. OLTP system data must typically be exported, aggregated, and imported into an OLAP system. This procedure is called ETL as I mentioned – extract, transform, and load. With large datasets, ETL processes can take a long time to complete, so analytic data could be “old” by the time it is available in an OLAP system. There is also an increased security risk in moving the data to an external source. 03:56 Scott: MySQL HeatWave eliminates the need for time-consuming ETL processes. We can actually get real-time analytics from our data since HeatWave allows for OLTP and OLAP in a single instance. I should note, this also includes analytic from JSON data that may be stored in the database. Another advantage is that applications can use MySQL HeatWave without changing any of the application code. Developers only need to point their applications at the MySQL HeatWave databases. MySQL HeatWave is fully compatible with on-premise MySQL instances, which can allow for a seamless transition to the cloud. And one other thing. When MySQL HeatWave has OLAP features enabled, MySQL can determine what type of query is being executed and route it to either the normal database system or the in-memory database. 04:52 Lois: That's so cool! And what about the other features you mentioned, Scott? Automated updates and backups, high availability… Scott: Right, Lois. But before that, I want to tell you about the in-memory query accelerator. MySQL HeatWave offers a massively parallel, in-memory hybrid columnar query processing engine. It provides high performance by utilizing algorithms for distributed query processing. And this query processing in MySQL HeatWave is optimized for cloud environments.  MySQL HeatWave can be configured to automatically apply system updates, so you will always have the latest and greatest version of MySQL. Then, we have automated backups. By this, I mean MySQL HeatWave can be configured to provide automated backups with point-in-time recovery to ensure data can be restored to a particular date and time. MySQL HeatWave also allows us to define a retention plan for our database backups, that means how long we keep the backups before they are deleted. High availability with MySQL HeatWave allows for more consistent uptime. When using high availability, MySQL HeatWave instances can be provisioned across multiple availability domains, providing automatic failover for when the primary node becomes unavailable. All availability domains within a region are physically separated from each other to mitigate the possibility of a single point of failure. 06:14 Scott: We also have MySQL Lakehouse. Lakehouse allows for the querying of data stored in object storage in various formats. This can be CSV, Parquet, Avro, or an export format from other database systems. And basically, we point Lakehouse at data stored in Oracle Cloud, and once it's ingested, the data can be queried just like any other data in a database. Lakehouse supports querying data up to half a petabyte in size using the HeatWave engine. And this allows users to take advantage of HeatWave for non-MySQL workloads. MySQL AutoPilot is a part of MySQL HeatWave and can be used to predict the number of HeatWave nodes a system will need and automatically provision them as part of a cluster. AutoPilot has features that can handle automatic thread pooling and database shape predicting. A “shape” is one of the many different CPU, memory, and ethernet traffic configurations available for MySQL HeatWave. MySQL HeatWave includes some advanced security features such as asymmetric encryption and automated data masking at query execution. As you can see, there are a lot of features covered under the HeatWave umbrella! 07:31 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You'll find training on everything from cloud computing, database, and security to artificial intelligence and machine learning, all free to subscribers. So, what are you waiting for? Pick a topic, leverage the Oracle University Learning Community to ask questions, and then sit for your certification. Visit mylearn.oracle.com to get started.  08:02 Nikita: Welcome back! Now coming to the certification, who can actually take this exam, Scott? Scott: The MySQL HeatWave Implementation Associate Certification Exam is designed specifically for administrators and data scientists who want to provision, configure, and manage MySQL HeatWave for transactions, analytics, machine learning, and Lakehouse. 08:22 Nikita: Can someone who's just graduated, say an engineering graduate interested in data analytics, take this certification? Are there any prerequisites? What are the career prospects for them? Scott: There are no mandatory prerequisites, but anyone who wants to take the exam should have experience with MySQL HeatWave and other aspects of OCI, such as virtual cloud networks and identity and security processes. Also, the learning path on MyLearn will be extremely helpful when preparing for the exam, but you are not required to complete the learning path before registering for the exam. The exam focuses more on getting MySQL HeatWave running (and keeping it running) than accessing the data. That doesn't mean it is not helpful for someone interested in data analytics. I think it can be helpful for data analysts to understand how the system providing the data functions, even if it is at just a high level. It is also possible that data analysts might be responsible for setting up their own systems and importing and managing their own data. 09:23 Lois: And how do I get started if I want to get certified on MySQL HeatWave? Scott: So, you'll first need to go to mylearn.oracle.com and look for the “Become a MySQL HeatWave Implementation Associate” learning path. The learning path consists of over 10 hours of training across 8 different courses.  These courses include “Getting Started with MySQL HeatWave Database Service,” which offers an introduction to some Oracle Cloud functionality such as security and networking, as well as showing one way to connect to a MySQL HeatWave instance. Another course demonstrates how to configure MySQL instances and copy that configuration to other instances. Other courses cover how to migrate data into MySQL HeatWave, set up and manage high availability, and configure HeatWave for OLAP. You'll find labs where you can perform hands-on activities, student and activity guides, and skill checks to test yourself along the way. And there's also the option to Ask the Instructor if you have any questions you need answers to. You can also access the Oracle University Learning Community and discuss topics with others on the same journey. The learning path includes a practice exam to check your readiness to pass the certification exam. 10:33 Lois: Yeah, and remember, access to the entire learning path is free so there's nothing stopping you from getting started right away. Now Scott, what does the certification test you on? Scott: The MySQL HeatWave Implementation exam, which is an associate-level exam, covers various topics. It will validate your ability to identify key features and benefits of MySQL HeatWave and describe the MySQL HeatWave architecture; identify Virtual Cloud Network (VCN) requirements and the different methods of connecting to a MySQL HeatWave instance; manage the automatic backup process and restore database systems from these backups; configure and manage read replicas and inbound replication channels; import data into MySQL HeatWave; configure and manage high availability and clustering of MySQL HeatWave instances. I know this seems like a lot of different topics. That is why we recommend anyone interested in the exam follow the learning path. It will help make sure you have the exposure to all the topics that are covered by the exam. 11:35 Lois: Tell us more about the certification process itself. Scott: While the courses we already talked about are valuable when preparing for the exam, nothing is better than hands-on experience. We recommend that candidates have hands-on experience with MySQL HeatWave with real-world implementations. The format of the exam is Multiple Choice. It is 90 minutes long and consists of 65 questions. When you've taken the recommended training and feel ready to take the certification exam, you need to purchase the exam and register for it. You go through the section on things to do before the exam and the exam policies, and then all that's left to do is schedule the date and time of the exam according to when is convenient for you. 12:16 Nikita: And once you've finished the exam? Scott: When you're done your score will be displayed on the screen when you finish the exam. You will also receive an email indicating whether you passed or failed. You can view your exam results and full score report in Oracle CertView, Oracle's certification portal. From CertView, you can download and print your eCertificate and even share your newly earned badge on places like Facebook, Twitter, and LinkedIn. 12:38 Lois: And for how long does the certification remain valid, Scott? Scott: There is no expiration date for the exam, so the certification will remain valid for as long as the material that is covered remains relevant.  12:49 Nikita: What's the next step for me after I get this certification? What other training can I take? Scott: So, because this exam is an associate level exam, it is kind of a stepping stone along a person's MySQL training. I do not know if there are plans for a professional level exam for HeatWave, but Oracle University has several other training programs that are MySQL-specific. There are learning paths to help prepare for the MySQL Database Administrator and MySQL Database Developer exams. As with the HeatWave learning paths, the learning paths for these exams include video tutorials, hands-on activities, skill checks, and practice exams. 13:27 Lois: I think you've told us everything we need to know about this certification, Scott. Are there any parting words you might have? Scott: We know that the whole process of training and getting certified may seem daunting, but we've really tried to simplify things for you with the “Become a MySQL HeatWave Implementation Associate” learning path. It not only prepares you for the exam but also gives you experience with features of MySQL HeatWave that will surely be valuable in your career. 13:51 Lois: Thanks so much, Scott, for joining us today. Nikita: Yeah, we've had a great time with you. Scott: Thanks for having me. Lois: Next week, we'll get back to our focus on AI with a discussion on deep learning. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off. 14:07 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University  Podcast.

Web3 Galaxy Brain
Ryan Smith, Founder of IndexSupply

Web3 Galaxy Brain

Play Episode Listen Later Jan 13, 2024 93:15


My guest today is Ryan Smith, founder of Index Supply. Index Supply is a company dedicated to open source indexing the EVM. Index Supply builds on Ryan's years of experience building backends and indexing services at Heroku, Chain, and Mint.fun, and through consulting with Zora, Reservoir, and more. On this episode we discuss Shovel, Index Supply's open source indexer that transcribes every event emitted in an EVM node to an OLTP postgres database. We discuss the ins-and-outs of indexing, Index Supply's companion Block API, and the company's unconventional approach to funding open source software development for long term sustainability. It was a pleasure catching up with Ryan who is generous with his knowledge and opinionated in his indexing. I hope you enjoy the show. As always, this show is provided as entertainment and does not constitute legal, financial, or tax advice or any form of endorsement or suggestion. Crypto has risks and you alone are responsible for doing your research and making your own decisions. Links Hosted by @nnnnicholas Index Supply Ryan Smith Henry de Valence Oleg Andreev Cockroach DB Pebble QuickNode Simple made easy Shovel Clickhouse Thoughts on funding

Oracle University Podcast
Best of 2023: Getting Started with Oracle Cloud Infrastructure

Oracle University Podcast

Play Episode Listen Later Nov 28, 2023 13:26


Oracle's next-gen cloud platform, Oracle Cloud Infrastructure, has been helping thousands of companies and millions of users run their entire application portfolio in the cloud. Today, the demand for OCI expertise is growing rapidly. Join Lois Houston and Nikita Abraham, along with Rohit Rahi, as they peel back the layers of OCI to discover why it is one of the world's fastest-growing cloud platforms.   Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, Kiran BR, Rashmi Panda, David Wright, the OU Podcast Team, and the OU Studio Team for helping us create this episode.   ------------------------------------------------------   Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started. 00:26 Lois: Welcome to the Oracle University Podcast. I'm Lois Houston, Director of Innovation Programs with Oracle University, and with me today is Nikita Abraham, Principal Technical Editor. Nikita: Hi there! You're listening to our Best of 2023 series, where over the next few weeks, we'll be revisiting six of our most popular episodes of the year. 00:47 Lois: Today is episode 2 of 6, and we're throwing it back to our very first episode of the Oracle University Podcast. It was a conversation that Niki and I had with Rohit Rahi, Vice President, CSS OU Cloud Delivery. During this episode, we discussed Oracle Cloud Infrastructure's core coverage on different tiers. Nikita: But we began by asking Rohit to explain what OCI is and tell us about its key components. So, let's jump right in. 01:14 Rohit: Some of the world's largest enterprises are running their mission-critical workloads on Oracle's next generation cloud platform called Oracle Cloud Infrastructure. To keep things simple, let us break them down into seven major categories: Core Infrastructure, Database Services, Data and AI, Analytics, Governance and Administration, Developer Services, and Application Services.  But first, the foundation of any cloud platform is the global footprint of regions. We have many generally available regions in the world, along with multi-cloud support with Microsoft Azure and a differentiated hybrid offering called Dedicated Region Cloud@Customer.  01:57 Rohit: We have building blocks on top of this global footprint, the seven categories we just mentioned. At the very bottom, we have the core primitives: compute, storage, and networking. Compute services cover virtual machines, bare metal servers, containers, a managed Kubernetes service, and a managed VMWare service.  These services are primarily for performing calculations, executing logic, and running applications. Cloud storage includes disks attached to virtual machines, file storage, object storage, archive storage, and data migration services. 02:35 Lois: That's quite a wide range of storage services. So Rohit, we all know that networking plays an important role in connecting different services. These days, data is growing in size and complexity, and there is a huge demand for a scalable and secure approach to store data. In this context, can you tell us more about the services available in OCI that are related to networking, database, governance, and administration? 03:01 Rohit: Networking features let you set up software defined private networks in Oracle Cloud. OCI provides the broadest and deepest set of networking services with the highest reliability, most security features, and highest performance.  Then we have database services, we have multiple flavors of database services, both Oracle and open source. We are the only cloud that runs Autonomous Databases and multiple flavors of it, including OLTP, OLAP, and JSON.  And then you can run databases and virtual machines, bare metal servers, or even Exadata in the cloud. You can also run open source databases, such as MySQL and NoSQL in the Oracle Cloud Infrastructure.  03:45 Rohit: Data and AI Services, we have a managed Apache Spark service called Dataflow, a managed service for tracking data artifacts across OCI called Data Catalog, and a managed service for data ingestion and ETL called Data Integration.  We also have a managed data science platform for machine learning models and training. We also have a managed Apache Kafka service for event streaming use cases.  Then we have Governance and Administration services. These services include security, identity, and observability and management. We have unique features like compartments that make it operationally easier to manage large and complex environments. Security is integrated into every aspect of OCI, whether it's automatic detection or remediation, what we typically refer as Cloud Security Posture Management, robust network protection or encryption by default.  We have an integrated observability and management platform with features like logging, logging analytics, and Application Performance Management and much more.  04:55 Nikita: That's so fascinating, Rohit. And is there a service that OCI provides to ease the software development process? Rohit: We have a managed low code service called APEX, several other developer services, and a managed Terraform service called Resource Manager.  For analytics, we have a managed analytics service called Oracle Analytics Cloud that integrates with various third-party solutions.  Under Application services, we have a managed serverless offering, call functions, and API gateway and an Events Service to help you create microservices and event driven architectures.  05:35 Rohit: We have a comprehensive connected SaaS suite across your entire business, finance, human resources, supply chain, manufacturing, advertising, sales, customer service, and marketing all running on OCI.  That's a long list. And these seven categories and the services mentioned represent just a small fraction of more than 80 services currently available in OCI.  Fortunately, it is quick and easy to try out a new service using our industry-leading Free Tier account. We are the first cloud to offer a server for just a penny per core hour.  Whether you're starting with Oracle Cloud Infrastructure or migrating your entire data set into it, we can support you in your journey to the cloud.   06:28 Have an idea and want a platform to share your technical expertise? Head over to the new Oracle University Learning Community. Drive intellectual, free-flowing conversations with your peers. Listen to experts and learn new skills. If you are already an Oracle MyLearn user, go to MyLearn to join the Community. You will need to log in first. If you have not yet accessed Oracle MyLearn, visit mylearn.oracle.com and create an account to get started.  Join the conversation today! 07:04 Nikita: Welcome back! Now let's listen to Rohit explain the core constructs of OCI's physical architecture, starting with regions. Rohit: Region is a localized geographic area comprising of one or more availability domains.  Availability domains are one or more fault tolerant data centers located within a region, but connected to each other by a low latency, high bandwidth network. Fault domains is a grouping of hardware and infrastructure within an availability domain to provide anti-affinity. So think about these as logical data centers.  Today OCI has a massive geographic footprint around the world with multiple regions across the world. And we also have a multi-cloud partnership with Microsoft Azure. And we have a differentiated hybrid cloud offering called Dedicated Region Cloud@Customer.  08:02 Lois: But before we dive into the physical architecture, can you tell us…how does one actually choose a region?  Rohit: Choosing a region, you choose a region closest to your users for lowest latency and highest performance. So that's a key criteria. The second key criteria is data residency and compliance requirements. Many countries have strict data residency requirements, and you have to comply to them. And so you choose a region based on these compliance requirements.  08:31 Rohit: The third key criteria is service availability. New cloud services are made available based on regional demand at times, regulatory compliance reasons, and resource availability, and several other factors. Keep these three criteria in mind when choosing a region.  So let's look at each of these in a little bit more detail. Availability domain. Availability domains are isolated from each other, fault tolerant, and very unlikely to fail simultaneously. Because availability domains do not share physical infrastructure, such as power or cooling or the internal network, a failure that impacts one availability domain is unlikely to impact the availability of others.  A particular region has three availability domains. One availability domain has some kind of an outage, is not available. But the other two availability domains are still up and running.  09:26 Rohit: We talked about fault domains a little bit earlier. What are fault domains? Think about each availability domain has three fault domains. So think about fault domains as logical data centers within availability domain.  We have three availability domains, and each of them has three fault domains. So the idea is you put the resources in different fault domains, and they don't share a single point of hardware failure, like physical servers, physical rack, top of rack switches, a power distribution unit. You can get high availability by leveraging fault domains.  We also leverage fault domains for our own services. So in any region, resources in at most one fault domain are being actively changed at any point in time. This means that availability problems caused by change procedures are isolated at the fault domain level. And moreover, you can control the placement of your compute or database instances to fault domain at instance launch time. So you can specify which fault domain you want to use.  10:29 Nikita: So then, what's the general guidance for OCI users?  Rohit: The general guidance is we have these constructs, like fault domains and availability domains to help you avoid single points of failure. We do that on our own. So we make sure that the servers, the top of rack switch, all are redundant. So you don't have hardware failures or we try to minimize those hardware failures as much as possible. You need to do the same when you are designing your own architecture.  So let's look at an example. You have a region. You have an availability domain. And as we said, one AD has three fault domains, so you see those fault domains here.  11:08 Rohit: So first thing you do is when you create an application you create this software-defined virtual network. And then let's say it's a very simple application. You have an application tier. You have a database tier.  So first thing you could do is you could run multiple copies of your application. So you have an application tier which is replicated across fault domains. And then you have a database, which is also replicated across fault domains.  11:34 Lois: What's the benefit of this replication, Rohit?  Rohit: Well, it gives you that extra layer of redundancy. So something happens to a fault domain, your application is still up and running.  Now, to take it to the next step, you could replicate the same design in another availability domain. So you could have two copies of your application running. And you can have two copies of your database running.  11:57 Now, one thing which will come up is how do you make sure your data is synchronized between these copies? And so you could use various technologies like Oracle Data Guard to make sure that your primary and standby-- the data is kept in sync here. And so that-- you can design your application-- your architectures like these to avoid single points of failure. Even for regions where we have a single availability domain, you could still leverage fault domain construct to achieve high availability and avoid single points of failure.  12:31 Nikita: Thank you, Rohit, for taking us through OCI at a high level.  Lois: For a more detailed explanation of OCI, please visit mylearn.oracle.com, create a profile if you don't already have one, and get started on our free training on OCI Foundations.  Nikita: We hope you enjoyed that conversation. Join us next week for another throwback episode. Until then, this is Nikita Abraham... Lois: And Lois Houston, signing off! 12:57 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

Data Engineering Podcast
Surveying The Market Of Database Products

Data Engineering Podcast

Play Episode Listen Later Oct 30, 2023 47:12


Summary Databases are the core of most applications, whether transactional or analytical. In recent years the selection of database products has exploded, making the critical decision of which engine(s) to use even more difficult. In this episode Tanya Bragin shares her experiences as a product manager for two major vendors and the lessons that she has learned about how teams should approach the process of tool selection. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack (https://www.dataengineeringpodcast.com/rudderstack) You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It's the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it's real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize (https://www.dataengineeringpodcast.com/materialize) today to get 2 weeks free! This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold (https://www.dataengineeringpodcast.com/datafold) Data projects are notoriously complex. With multiple stakeholders to manage across varying backgrounds and toolchains even simple reports can become unwieldy to maintain. Miro is your single pane of glass where everyone can discover, track, and collaborate on your organization's data. I especially like the ability to combine your technical diagrams with data documentation and dependency mapping, allowing your data engineers and data consumers to communicate seamlessly about your projects. Find simplicity in your most complex projects with Miro. Your first three Miro boards are free when you sign up today at dataengineeringpodcast.com/miro (https://www.dataengineeringpodcast.com/miro). That's three free boards at dataengineeringpodcast.com/miro (https://www.dataengineeringpodcast.com/miro). Your host is Tobias Macey and today I'm interviewing Tanya Bragin about her views on the database products market Interview Introduction How did you get involved in the area of data management? What are the aspects of the database market that keep you interested as a VP of product? How have your experiences at Elastic informed your current work at Clickhouse? What are the main product categories for databases today? What are the industry trends that have the most impact on the development and growth of different product categories? Which categories do you see growing the fastest? When a team is selecting a database technology for a given task, what are the types of questions that they should be asking? Transactional engines like Postgres, SQL Server, Oracle, etc. were long used as analytical databases as well. What is driving the broad adoption of columnar stores as a separate environment from transactional systems? What are the inefficiencies/complexities that this introduces? How can the database engine used for analytical systems work more closely with the transactional systems? When building analytical systems there are numerous moving parts with intricate dependencies. What is the role of the database in simplifying observability of these applications? What are the most interesting, innovative, or unexpected ways that you have seen Clickhouse used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on database products? What are your prodictions for the future of the database market? Contact Info LinkedIn (https://www.linkedin.com/in/tbragin/) Parting Question From your perspective, what is the biggest gap in the tooling or technology for data management today? Closing Announcements Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ (https://www.pythonpodcast.com) covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast (https://www.themachinelearningpodcast.com) helps you go from idea to production with machine learning. Visit the site (https://www.dataengineeringpodcast.com) to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com (mailto:hosts@dataengineeringpodcast.com)) with your story. To help other people find the show please leave a review on Apple Podcasts (https://podcasts.apple.com/us/podcast/data-engineering-podcast/id1193040557) and tell your friends and co-workers Links Clickhouse (https://clickhouse.com/) Podcast Episode (https://www.dataengineeringpodcast.com/clickhouse-data-warehouse-episode-88/) Elastic (https://www.elastic.co/) OLAP (https://en.wikipedia.org/wiki/Online_analytical_processing) OLTP (https://en.wikipedia.org/wiki/Online_transaction_processing) Graph Database (https://en.wikipedia.org/wiki/Graph_database) Vector Database (https://en.wikipedia.org/wiki/Vector_database) Trino (https://trino.io/) Presto (https://prestodb.io/) Foreign data wrapper (https://wiki.postgresql.org/wiki/Foreign_data_wrappers) dbt (https://www.getdbt.com/) Podcast Episode (https://www.dataengineeringpodcast.com/dbt-data-analytics-episode-81/) OpenTelemetry (https://opentelemetry.io/) Iceberg (https://iceberg.apache.org/) Podcast Episode (https://www.dataengineeringpodcast.com/tabular-iceberg-lakehouse-tables-episode-363) Parquet (https://parquet.apache.org/) The intro and outro music is from The Hug (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/Love_death_and_a_drunken_monkey/04_-_The_Hug) by The Freak Fandango Orchestra (http://freemusicarchive.org/music/The_Freak_Fandango_Orchestra/) / CC BY-SA (http://creativecommons.org/licenses/by-sa/3.0/)

Engenharia de Dados [Cast]
The Data Lakehouse Paradigm with Bill Inmon - The Father of Data Warehouse

Engenharia de Dados [Cast]

Play Episode Listen Later Oct 12, 2023 43:19


No episódio de hoje, Luan Moreno, Mateus Oliveira e Orlando Marley entrevistam Bill Inmon, criador do conceito de Data Warehouse e escritor de diversos livros com temáticas voltadas para dados.Data Warehouse é o conceito de centralização de dados analíticos das organizações, de forma estruturar um visão 360° do business. Neste episódio, você irá aprender: Diferenças entre OLTP e OLAP;Histórico dos dados para tomada de decisão;Criar um processo resiliente para entender os fatos dos dados.Falamos também, neste bate-papo, sobre os seguintes temas: História do Bill Inmon;Pilares de sistemas analíticos;Nova geração de plataforma de dados analíticos;Aprenda mais sobre análise de dados, como utilizar tecnologias para tornar o seu ambiente analítico confiável e resiliente com as palavras do pai do Data Warehouse. Bill Inmon = Linkedin Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Oracle University Podcast
MySQL Database Service and HeatWave

Oracle University Podcast

Play Episode Listen Later Jul 11, 2023 14:20


In this episode, Lois Houston and Nikita Abraham are joined by Autumn Black to discuss MySQL Database, a fully-managed database service powered by the integrated HeatWave in-memory query accelerator.   Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ Twitter: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Deepak Modi, Ranbir Singh, and the OU Studio Team for helping us create this episode.   ---------------------------------------------------------   Episode Transcript: 00;00;00;00 - 00;00;39;08 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started. Hello and welcome to the Oracle University Podcast. You're listening to our second season Oracle Database Made Easy. I'm Lois Houston, Director of Product Innovation and Go to Market Programs with Oracle University.   00;00;39;10 - 00;01;08;03 And with me is Nikita Abraham, Principal Technical Editor. Hi, everyone. In our last episode, we had a really fascinating conversation about Oracle Machine Learning with Cloud Engineer Nick Commisso. Do remember to catch that episode if you missed it. Today, we have with us Autumn Black, who's an Oracle Database Specialist. Autumn is going to take us through MySQL, the free version and the Enterprise Edition, and MySQL Data Service.   00;01;08;05 - 00;01;39;16 We're also going to ask her about HeatWave. So let's get started. Hi, Autumn. So tell me, why is MySQL such a popular choice for developers? MySQL is the number one open-source database and the second most popular database overall after the Oracle Database. According to a Stack Overflow survey, MySQL has been for a long time and remains the number one choice for developers, primarily because of its ease of use, reliability, and performance.   00;01;39;17 - 00;02;08;22 And it's also big with companies? MySQL is used by the world's most innovative companies. This includes Twitter, Facebook, Netflix, and Uber. It is also used by students and small companies. There are different versions of MySQL, right? What are the main differences between them when it comes to security, data recovery, and support? MySQL comes in two flavors: free version or paid version.   00;02;08;24 - 00;02;45;05 MySQL Community, the free version, contains the basic components for handling data storage. Just download it, install it, and you're ready to go. But remember, free has costs. That stored data is not exactly secure and data recovery is not easy and sometimes impossible. And there is no such thing as free MySQL Community support. This is why MySQL Enterprise Edition was created, to provide all of those missing important pieces: high availability, security, and Oracle support from the people who build MySQL.   00;02;45;10 - 00;03;09;24 You said MySQL is open source and can be easily downloaded and run. Does it run on-premises or in the cloud? MySQL runs on a local computer, company's data center, or in the cloud. Autumn, can we talk more about MySQL in the cloud? Today, MySQL can be found in Amazon RDS and Aurora, Google Cloud SQL, and Microsoft Azure Database for MySQL.   00;03;09;27 - 00;03;35;23 They all offer a cloud-managed version of MySQL Community Edition with all of its limitations. These MySQL cloud services are expensive and it's not easy to move data away from their cloud. And most important of all, they do not include the MySQL Enterprise Edition advanced features and tools. And they are not supported by the Oracle MySQL experts.   00;03;35;25 - 00;04;07;03 So why is MySQL Database Service in Oracle Cloud Infrastructure better than other MySQL cloud offerings? How does it help data admins and developers? MySQL Database Service in Oracle Cloud Infrastructure is the only MySQL database service built on MySQL Enterprise Edition and 100% built, managed, and supported by the MySQL team. Let's focus on the three major categories that make MySQL Database Service better than the other MySQL cloud offerings: ease of use, security, and enterprise readiness.   00;04;07;03 - 00;04;44;24 MySQL DBAs tend to be overloaded with mundane database administration tasks. They're responsible for many databases, their performance, security, availability, and more. It is difficult for them to focus on innovation and on addressing the demands of lines of business. MySQL is fully managed on OCI. MySQL Database Service automates all those time-consuming tasks so they can improve productivity and focus on higher value tasks.   00;04;44;26 - 00;05;07;13 Developers can quickly get all the latest features directly from the MySQL team to deliver new modern apps. They don't get that on other clouds that rely on outdated or forked versions of MySQL. Developers can use the MySQL Document Store to mix and match SQL and NoSQL content in the same database as well as the same application.   00;05;07;19 - 00;05;30;26 Yes. And we're going to talk about MySQL Document Store in a lot more detail in two weeks, so don't forget to tune in to that episode. Coming back to this, you spoke about how MySQL Database Service or MDS on OCI is easy to use. What about its security? MDS security first means it is built on Gen 2 cloud infrastructure.   00;05;30;28 - 00;05;57;13 Data is encrypted for privacy. Data is on OCI block volume. So what does this Gen 2 cloud infrastructure offer? Is it more secure? Oracle Cloud is secure by design and architected very differently from the Gen 1 clouds of our competitors. Gen 2 provides maximum isolation and protection. That means Oracle cannot see customer data and users cannot access our cloud control computer.   00;05;57;15 - 00;06;27;09 Gen 2 architecture allows us to offer superior performance on our compute objects. Finally, Oracle Cloud is open. Customers can run Oracle software, third-party options, open source, whatever you choose without modifications, trade-offs, or lock-ins. Just to dive a little deeper into this, what kind of security features does MySQL Database Service offer to protect data? Data security has become a top priority for all organizations.   00;06;27;12 - 00;06;55;17 MySQL Database Service can help you protect your data against external attacks, as well as internal malicious users with a range of advanced security features. Those advanced security features can also help you meet industry and regulatory compliance requirements, including GDPR, PCI, and HIPPA. When a security vulnerability is discovered, you'll get the fix directly from the MySQL team, from the team that actually develops MySQL.   00;06;55;19 - 00;07;22;16 I want to talk about MySQL Enterprise Edition that you brought up earlier. Can you tell us a little more about it? MySQL Database Service is the only public cloud service built on MySQL Enterprise Edition, which includes 24/7 support from the team that actually builds MySQL, at no additional cost. All of the other cloud vendors are using the Community Edition of MySQL, so they lack the Enterprise Edition features and tools.   00;07;22;22 - 00;07;53;24 What are some of the default features that are available in MySQL Database Service? MySQL Enterprise scalability, also known as the thread pool plugin, data-at-rest encryption, native backup, and OCI built-in native monitoring. You can also install MySQL Enterprise Monitor to monitor MySQL Database Service remotely. MySQL works well with your existing Oracle investments like Oracle Data Integrator, Oracle Analytics Cloud, Oracle GoldenGate, and more.   00;07;53;27 - 00;08;17;20 MySQL Database Service customers can easily use Docker and Kubernetes for DevOps operations. So how much of this is managed by the MySQL team and how much is the responsibility of the user? MySQL Database Service is a fully managed database service. A MySQL Database Service user is responsible for logical schema modeling, query design and optimization, define data access and retention policies.   00;08;17;22 - 00;08;44;26 The MySQL team is responsible for providing automation for operating system installation, database and OS patching, including security patches, backup, and recovery. The system backs up the data for you, but in an emergency, you can restore it to a new instance with a click. Monitoring and log handling. Security with advanced options available in MySQL Enterprise Edition.   00;08;44;28 - 00;09;01;18 And of course, maintaining the data center for you. To use MDS, users must have OCI tenancy, a compartment, belong to a group with required policies.   00;09;01;21 - 00;09;28;28 Did you know that Oracle University offers free courses on Oracle Cloud Infrastructure? You'll find training on everything from cloud computing, database, and security to artificial intelligence and machine learning, all of which is available free to subscribers. So get going. Pick a course of your choice, get certified, join the Oracle University Learning Community, and network with your peers. If you're already an Oracle MyLearn user, go to MyLearn to begin your journey.   00;09;29;03 - 00;09;40;24 If you have not yet accessed Oracle MyLearn, visit mylearn.oracle.com and create an account to get started.   00;09;40;27 - 00;10;05;20 Welcome back! Autumn, tell us about the system architecture of MySQL Database Service. A database system is a logical container for the MySQL instance. It provides an interface enabling management of tasks, such as provisioning, backup and restore, monitoring, and so on. It also provides a read and write endpoint, enabling you to connect to the MySQL instance using the standard protocols.   00;10;05;28 - 00;10;31;27 And what components does a MySQL Database Service DB system consist of? A computer instance, an Oracle Linux operating system, the latest version of MySQL server Enterprise Edition, a virtual network interface card, VNIC, that attaches the DB system to a subnet of the virtual cloud network, network-attached higher performance block storage. Is there a way to monitor how the MySQL Database Service is performing?   00;10;31;29 - 00;10;59;29 You can monitor the health, capacity, and performance of your Oracle Cloud Infrastructure MySQL Database Service resources by using metrics, alarms, and notifications. The MySQL Database Service metrics enable you to measure useful quantitative data about your MySQL databases such as current connection information, statement activity, and latency, host CPU, memory, and disk I/O utilization, and so on.   00;11;00;03 - 00;11;23;15 You can use metrics data to diagnose and troubleshoot problems with MySQL databases. What should I keep in mind about managing the SQL database? Stopped MySQL Database Service system stops billing for OCPUs, but you also cannot connect to the DB system. During MDS automatic update, the operating system is upgraded along with patching of the MySQL server.   00;11;23;17 - 00;11;49;15 Metrics are used to measure useful data about MySQL Database Service system. Turning on automatic backups is an update to MDS to enable automatic backups. MDS backups can be removed by using the details pages and OCI and clicking Delete. Thanks for that detailed explanation on MySQL, Autumn. Can you also touch upon MySQL HeatWave? Why would you use it over traditional methods of running analytics on MySQL data?   00;11;49;18 - 00;12;18;01 Many organizations choose MySQL to store their valuable enterprise data. MySQL is optimized for Online Transaction Processing, OLTP, but it is not designed for Online Analytic Processing, OLAP. As a result, organizations that need to efficiently run analytics on data stored in MySQL database move their data to another database to run analytic applications such as Amazon Redshift.   00;12;18;04 - 00;12;41;22 MySQL HeatWave is designed to enable customers to run analytics on data that is stored in MySQL database without moving data to another database. What are the key features and components of HeatWave? HeatWave is built on an innovative in-memory analytics engine that is architected for scalability and performance, and is optimized for Oracle Cloud Infrastructure, OCI.   00;12;41;24 - 00;13;05;29 It is enabled when you add a HeatWave cluster to a MySQL database system. A HeatWave cluster comprises a MySQL DB system node and two or more HeatWave nodes. The MySQL DB system node includes a plugin that is responsible for cluster management, loading data into the HeatWave cluster, query scheduling, and returning query results to the MySQL database system.   00;13;06;02 - 00;13;29;15 The HeatWave nodes store data and memory and processed analytics queries. Each HeatWave node contains an instance of the HeatWave. The number of HeatWave nodes required depends on the size of your data and the amount of compression that is achieved when loading the data into the HeatWave cluster. Various aspects of HeatWave use machine-learning-driven automation that helps to reduce database administrative costs.   00;13;29;18 - 00;13;52;11 Thanks, Autumn, for joining us today. We're looking forward to having you again next week to talk to us about Oracle NoSQL Database Cloud Service. To learn more about MySQL Data Service, head over to mylearn.oracle.com and look for the Oracle Cloud Data Management Foundations Workshop. Until next time, this is Nikita Abraham and Lois Houston signing off.   00;13;52;14 - 00;16;33;05 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.

Oracle University Podcast
Exadata Cloud Service

Oracle University Podcast

Play Episode Listen Later Jun 6, 2023 18:25


Hear Lois Houston and Nikita Abraham, along with Alex Bouchereau, talk about Exadata Cloud Service, and more specifically, Exadata Cloud Service X8M, which is the latest release of Oracle's premier Database Cloud Service.   They also take a look at how advanced cloud automation, dynamic resource scaling, and flexible subscription pricing enable customers to run database workloads faster and with lower costs.   Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ Twitter: https://twitter.com/Oracle_Edu   Special thanks to Arijit Ghosh, David Wright, Ranbir Singh, and the OU Studio Team for helping us create this episode.

GOTO - Today, Tomorrow and the Future
Unlocking the Power of Real-Time Analytics • Tim Berglund & Adi Polak

GOTO - Today, Tomorrow and the Future

Play Episode Listen Later Jun 2, 2023 44:07 Transcription Available


This interview was recorded for GOTO Unscripted.gotopia.techRead the full transcription of this interview hereTim Berglund - VP DevRel at StarTree & Author of "Gradle Beyond the Basics"Adi Polak - VP of Developer Experience at Treeverse & Contributing to lakeFS OSSRESOURCESTimtimberglund.comtwitter.com/tlberglundlinkedin.com/in/tlberglundAditwitter.com/AdiPolakinstagram.com/polak.codelinkedin.com/in/polak-adiTools & companiespinot.apache.orgtwitter.com/startreedatalinkedin.com/company/startreedatadev.startree.aistree.ai/slackYT videosData Mesh • Zhamak DehghaniBeyond Microservices • Gwen ShapiraDESCRIPTIONAdi Polak and Tim Berglund explore the concept of analytics and what it truly means in the software development world. They delve into the benefits of real-time analytics for product development, highlighting the fine line between compute and storage and the technical requirements for achieving effective real-time analytics. They also discuss the applications of real-time analytics through the lens of Apache Pinot and StarTree Cloud, exploring use cases such as the popular "Who's Watched My Profile on LinkedIn" feature powered by Apache Pinot.RECOMMENDED BOOKSAdi Polak • Scaling Machine Learning with SparkTim Berglund • Gradle Beyond the BasicsTim Berglund & Matthew McCullough • Building and Testing with GradleMark Needham • Building Real-Time Analytics SystemsGwen Shapira, Todd Palino, Rajini Sivaram & Krit Petty • Kafka: The Definitive GuideTwitterLinkedInFacebookLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted almost daily

OnBoard!
EP 27. 对话 PingCAP CTO 黄东旭:中国开源走向世界,数据库与未来基础软件的脑洞

OnBoard!

Play Episode Listen Later Feb 22, 2023 86:20


这一期可谓众望所归,Monica 硬核对话 PingCAP 联合创始人CTO 黄东旭。大年初四,我们在广西老家的一个露营地,吹着15度的暖风,聊了两个小时的数据库、中美开发者市场和技术产品哲学,好不惬意! Hello World, who is OnBoard?! 如果你对开源有所关注,一定知道开源分布式数据库 TiDB 及其背后的公司 PingCAP。PingCAP 可以说是中国商业化开源公司的先驱。2015年成立至今,TiDB 从零开始,在Github 上超过3万star, 超过800位来自世界各地的贡献者,除了包括众多一线互联网大厂在内的开源用户,PingCAP 还服务了20多个国家3000多个客户。 作为联合创始人和 CTO 的黄东旭,不仅是资深的基础软件工程师,架构师,还是狂热的开源爱好者以及开源软件作者,内存数据库 Redis 的高性能集群架构解决方案 Codis 就是他的作品之一 。 此外,PingCAP 也可谓是中国基础软件公司走向世界的先行者。过去一年多的时间,东旭几乎全身心铺在硅谷,对于中美市场异同、什么是给开发者用的数据库,什么是未来的开发范式,当然还有开脑洞的讨论:现代AI发展会对数据领域有什么影响?这次掏心窝的分享,一定能给你非常多启发! 最后需要一提的是,生活中的东旭还是一名摇滚乐手。本期最后的彩蛋,你会听到他展示最新学习的乐器! 干货满满准备上车,Enjoy! 【感谢AroundDeal 赞助本期播客!】 随着越来越多 IT SaaS、智能制造企业都开始开拓全球市场,精准获取海外B端客户线索就成了首要问题。AroundDeal 为企业提供全球商业信息SaaS平台。他们的平台上1亿多条联系人、企业及商业情报信息,覆盖全球200多个国家地区,3000多种细分行业,并且持续更新。绝对是企业出海的必备神器! OnBoard! 听众还有福利!访问 AroundDeal.com,在 Contact Sales 中备注 Onboard, 即可领取七天免费试用!还不赶紧去试试,立即找到你的下一个海外理想客户! 我们都聊了什么 02:30 开场:PingCAP 介绍,发展历程的几个重要节点 07:00 OLAP, OLTP 科普:场景和开发难点有什么不同 12:12 “未来的数据库都会是 HTAP 数据库”?!为什么说 HTAP 的核心能力是在TP能力 21:35 分久必合,数据库长尾需求会越来越收敛吗 24:03 Why now: 为什么 HTAP 概念在最近几年开始被广泛接受? 27:19 为什么 HTAP 概念在 infra 成熟的美国,流行得反而更晚? 29:55 主流的 HTAP 架构是怎样的?用户应该如何选型? 34:26 各个大厂都在跟进 HTAP 产品,对于早期公司意味着什么? 37:11 如何理解“万物皆可 SQL”? 对于数据库厂商意味着什么? 41:17 什么是“数据库的第一性原理”? 46:25 Vercel 如何做好开发者体验?为什么说要做好 infra, 你应该关注的反而不是 infra? 53:23 对于新创的数据库公司,没有 Serverless 就上不了牌桌? 59:45 serverless 开脑洞的未来!解决数据孤岛的终极方案 64:38 好的数据库 vs 好的数据库产品 66:12 为什么说新的数据库公司,需要新的研发组织? PingCAP 发生了哪些组织挑战与变革? 71:18 数据库用户的组织架构在发生哪些变化? 74:43 开源社区在组织不同阶段的作用有什么不一样?为什么说期待开源到商业转化不能太乐观? 80:27 在美国有哪些新出现的 infra 公司和新的技术趋势? 84:46 开拓北美市场,要从科技行业客户破圈,有哪些挑战?对搭建团队有哪些挑战? 92:21 美国之外的海外市场:东南亚有惊喜,顺序打法有讲究 98:01 中国与海外市场的异同,为什么创业公司也要先啃硬骨头 100:11 创业8年回顾:有哪些经验和心得? 104:11 未来令人兴奋的机会 108:40 不得不了的:chatGPT, 生成式 AI 的脑洞 111:58 快问快答:有彩蛋! 我们提到的公司 & 重点名词 Snowflake SingleStore Neon Vercel Supabase OSS Insight Snowflake Unistore Google Spanner ZeroETL Reverse ETL OLAP OLTP Serverless 嘉宾的推荐 推荐的书:禅与摩托车维修艺术(Zen and the Art of Motorcycle Maintenance, by Robert M.Pirsig) 推荐的书:Unix 编程艺术, by Eric S·Raymond Rob Pike, Go 语言之父 Werner Vogels, Amazon CTO Bansuri, 印度乐器 喜欢的音乐人:Sonic Youth 欢迎关注M小姐的微信公众号,了解更多中美软件行业的干货内容! M小姐研习录 (ID: MissMStudy) 大家的点赞、评论、转发是对我们最好的鼓励! 如果你有希望我们聊的话题,希望我们邀请的访谈嘉宾,都欢迎在留言中告诉我们 ❤️

Der Data Analytics Podcast
ER- Diagramm - OLTP Datenbank-Modellieren

Der Data Analytics Podcast

Play Episode Listen Later Jan 30, 2023 6:04


Datenbankmodellierung mit ER Diagramm. https://larsmuellensiefen.substack.com/

Der Data Analytics Podcast
DWH vs OLTP Datenbanken - was ist der Unterschied und wie werden sie eingesetzt?

Der Data Analytics Podcast

Play Episode Listen Later Dec 10, 2022 6:15


Data Warehouse vs Transaktionale Datenbanken. Wie grenzen sie sich von einander ab und wie werden sie in der Praxis eingesetzt?

Screaming in the Cloud
The Art and Science of Database Innovation with Andi Gutmans

Screaming in the Cloud

Play Episode Listen Later Nov 23, 2022 37:07


About AndiAndi Gutmans is the General Manager and Vice President for Databases at Google. Andi's focus is on building, managing and scaling the most innovative database services to deliver the industry's leading data platform for businesses. Before joining Google, Andi was VP Analytics at AWS running services such as Amazon Redshift. Before his tenure at AWS, Andi served as CEO and co-founder of Zend Technologies, the commercial backer of open-source PHP.Andi has over 20 years of experience as an open source contributor and leader. He co-authored open source PHP. He is an emeritus member of the Apache Software Foundation and served on the Eclipse Foundation's board of directors. He holds a bachelor's degree in Computer Science from the Technion, Israel Institute of Technology.Links Referenced: LinkedIn: https://www.linkedin.com/in/andigutmans/ Twitter: https://twitter.com/andigutmans TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode is brought to us by our friends at Google Cloud, and in so doing, they have gotten a guest to appear on this show that I have been low-key trying to get here for a number of years. Andi Gutmans is VP and GM of Databases at Google Cloud. Andi, thank you for joining me.Andi: Corey, thanks so much for having me.Corey: I have to begin with the obvious. Given that one of my personal passion projects is misusing every cloud service I possibly can as a database, where do you start and where do you stop as far as saying, “Yes, that's a database,” so it rolls up to me and, “No, that's not a database, so someone else can deal with the nonsense?”Andi: I'm in charge of the operational databases, so that includes both the managed third-party databases such as MySQL, Postgres, SQL Server, and then also the cloud-first databases, such as Spanner, Big Table, Firestore, and AlloyDB. So, I suggest that's where you start because those are all awesome services. And then what doesn't fall underneath, kind of, that purview are things like BigQuery, which is an analytics, you know, data warehouse, and other analytics engines. And of course, there's always folks who bring in their favorite, maybe, lesser-known or less popular database and self-manage it on GCE, on Compute.Corey: Before you wound up at Google Cloud, you spent roughly four years at AWS as VP of Analytics, which is, again, one of those very hazy type of things. Where does it start? Where does it stop? It's not at all clear from the outside. But even before that, you were, I guess, something of a legendary figure, which I know is always a weird thing for people to hear.But you were partially at least responsible for the Zend Framework in the PHP world, which I didn't realize what the heck that was, despite supporting it in production at a couple of jobs, until after I, for better or worse, was no longer trusted to support production environments anymore. Which, honestly, if you can get out, I'm a big proponent of doing that. You sleep so much better without a pager. How did you go from programming languages all the way on over to databases? It just seems like a very odd mix.Andi: Yeah. No, that's a great question. So, I was one of the core developers of PHP, and you know, I had been in the PHP community for quite some time. I also helped ideate. The Zend Framework, which was the company that, you know, I co-founded Zend Technologies was kind of the company behind PHP.So, like Red Hat supports Linux commercially, we supported PHP. And I was very much focused on developers, programming languages, frameworks, IDEs, and that was, you know, really exciting. I had also done quite a bit of work on interoperability with databases, right, because behind every application, there's a database, and so a lot of what we focused on is a great connectivity to MySQL, to Postgres, to other databases, and I got to kind of learn the database world from the outside from the application builders. We sold our company in I think it was 2015 and so I had to kind of figure out what's next. And so, one option would have been, hey, stay in programming languages, but what I learned over the many years that I worked with application developers is that there's a huge amount of value in data.And frankly, I'm a very curious person; I always like to learn, so there was this opportunity to join Amazon, to join the non-relational database side, and take myself completely out of my comfort zone. And actually, I joined AWS to help build the graph database Amazon Neptune, which was even more out of my comfort zone than even probably a relational database. So, I kind of like to do different things and so I joined and I had to learn, you know how to build a database pretty much from the ground up. I mean, of course, I didn't do the coding, but I had to learn enough to be dangerous, and so I worked on a bunch of non-relational databases there such as, you know, Neptune, Redis, Elasticsearch, DynamoDB Accelerator. And then there was the opportunity for me to actually move over from non-relational databases to analytics, which was another way to get myself out of my comfort zone.And so, I moved to run the analytic space, which included services like Redshift, like EMR, Athena, you name it. So, that was just a great experience for me where I got to work with a lot of awesome people and learn a lot. And then the opportunity arose to join Google and actually run the Google transactional databases including their older relational databases. And by the way, my job actually have two jobs. One job is running Spanner and Big Table for Google itself—meaning, you know, search ads and YouTube and everything runs on these databases—and then the second job is actually running external-facing databases for external customers.Corey: How alike are those two? Is it effectively the exact same thing, just with different API endpoints? Are they two completely separate universes? It's always unclear from the outside when looking at large companies that effectively eat versions of their own dog food, where their internal usage of these things starts and stops.Andi: So, great question. So, Cloud Spanner and Cloud Big Table do actually use the internal Spanner and Big Table. So, at the core, it's exactly the same engine, the same runtime, same storage, and everything. However, you know, kind of, internally, the way we built the database APIs was kind of good for scrappy, you know, Google engineers, and you know, folks are kind of are okay, learning how to fit into the Google ecosystem, but when we needed to make this work for enterprise customers, we needed a cleaner APIs, we needed authentication that was an external, right, and so on, so forth. So, think about we had to add an additional set of APIs on top of it, and management, right, to really make these engines accessible to the external world.So, it's running the same engine under the hood, but it is a different set of APIs, and a big part of our focus is continuing to expose to enterprise customers all the goodness that we have on the internal system. So, it's really about taking these very, very unique differentiated databases and democratizing access to them to anyone who wants to.Corey: I'm curious to get your position on the idea that seems to be playing it's—I guess, a battle that's been playing itself out in a number of different customer conversations. And that is, I guess, the theoretical decision between, do we go towards general-purpose databases and more or less treat every problem as a nail in search of a hammer or do you decide that every workload gets its own custom database that aligns the best with that particular workload? There are trade-offs in either direction, but I'm curious where you land on that given that you tend to see a lot more of it than I do.Andi: No, that's a great question. And you know, just for the viewers who maybe aren't aware, there's kind of two extreme points of view, right? There's one point of view that says, purpose-built for everything, like, every specific pattern, like, build bespoke databases, it's kind of a best-of-breed approach. The problem with that approach is it becomes extremely complex for customers, right? Extremely complex to decide what to use, they might need to use multiple for the same application, and so that can be a bit daunting as a customer. And frankly, there's kind of a law of diminishing returns at some point.Corey: Absolutely. I don't know what the DBA role of the future is, but I don't think anyone really wants it to be, “Oh, yeah. We're deciding which one of these three dozen manage database services is the exact right fit for each and every individual workload.” I mean, at some point it feels like certain cloud providers believe that not only every workload should have its own database, but almost every workload should have its own database service. It's at some point, you're allowed to say no and stop building these completely, what feel like to me, Byzantine, esoteric database engines that don't seem to have broad applicability to a whole lot of problems.Andi: Exactly, exactly. And maybe the other extreme is what folks often talk about as multi-model where you say, like, “Hey, I'm going to have a single storage engine and then map onto that the relational model, the document model, the graph model, and so on.” I think what we tend to see is if you go too generic, you also start having performance issues, you may not be getting the right level of abilities and trade-offs around consistency, and replication, and so on. So, I would say Google, like, we're taking a very pragmatic approach where we're saying, “You know what? We're not going to solve all of customer problems with a single database, but we're also not going to have two dozen.” Right?So, we're basically saying, “Hey, let's understand that the main characteristics of the workloads that our customers need to address, build the best services around those.” You know, obviously, over time, we continue to enhance what we have to fit additional models. And then frankly, we have a really awesome partner ecosystem on Google Cloud where if someone really wants a very specialized database, you know, we also have great partners that they can use on Google Cloud and get great support and, you know, get the rest of the benefits of the platform.Corey: I'm very curious to get your take on a pattern that I've seen alluded to by basically every vendor out there except the couple of very obvious ones for whom it does not serve their particular vested interests, which is that there's a recurring narrative that customers are demanding open-source databases for their workloads. And when you hear that, at least, people who came up the way that I did, spending entirely too much time on Freenode, back when that was not a deeply problematic statement in and of itself, where, yes, we're open-source, I guess, zealots is probably the best terminology, and yeah, businesses are demanding to participate in the open-source ecosystem. Here in reality, what I see is not ideological purity or anything like that and much more to do with, “Yeah, we don't like having a single commercial vendor for our databases that basically plays the insert quarter to continue dance whenever we're trying to wind up doing something new. We want the ability to not have licensing constraints around when, where, how, and how quickly we can run databases.” That's what I hear when customers are actually talking about open-source versus proprietary databases. Is that what you see or do you think that plays out differently? Because let's be clear, you do have a number of database services that you offer that are not open-source, but are also absolutely not tied to weird licensing restrictions either?Andi: That's a great question, and I think for years now, customers have been in a difficult spot because the legacy proprietary database vendors, you know, knew how sticky the database is, and so as a result, you know, the prices often went up and was not easy for customers to kind of manage costs and agility and so on. But I would say that's always been somewhat of a concern. I think what I'm seeing changing and happening differently now is as customers are moving into the cloud and they want to run hybrid cloud, they want to run multi-cloud, they need to prove to their regulator that it can do a stressed exit, right, open-source is not just about reducing cost, it's really about flexibility and kind of being in control of when and where you can run the workloads. So, I think what we're really seeing now is a significant surge of customers who are trying to get off legacy proprietary database and really kind of move to open APIs, right, because they need that freedom. And that freedom is far more important to them than even the cost element.And what's really interesting is, you know, a lot of these are the decision-makers in these enterprises, not just the technical folks. Like, to your point, it's not just open-source advocates, right? It's really the business people who understand they need the flexibility. And by the way, even the regulators are asking them to show that they can flexibly move their workloads as they need to. So, we're seeing a huge interest there and, as you said, like, some of our services, you know, are open-source-based services, some of them are not.Like, take Spanner, as an example, it is heavily tied to how we build our infrastructure and how we build our systems. Like, I would say, it's almost impossible to open-source Spanner, but what we've done is we've basically embraced open APIs and made sure if a customer uses these systems, we're giving them control of when and where they want to run their workloads. So, for example, Big Table has an HBase API; Spanner now has a Postgres interface. So, our goal is really to give customers as much flexibility and also not lock them into Google Cloud. Like, we want them to be able to move out of Google Cloud so they have control of their destiny.Corey: I'm curious to know what you see happening in the real world because I can sit here and come up with a bunch of very well-thought-out logical reasons to go towards or away from certain patterns, but I spent years building things myself. I know how it works, you grab the closest thing handy and throw it in and we all know that there is nothing so permanent as a temporary fix. Like, that thing is load-bearing and you'll retire with that thing still in place. In the idealized world, I don't think that I would want to take a dependency on something like—easy example—Spanner or AlloyDB because despite the fact that they have Postgres-squeal—yes, that's how I pronounce it—compatibility, the capabilities of what they're able to do under the hood far exceed and outstrip whatever you're going to be able to build yourself or get anywhere else. So, there's a dataflow architectural dependency lock-in, despite the fact that it is at least on its face, Postgres compatible. Counterpoint, does that actually matter to customers in what you are seeing?Andi: I think it's a great question. I'll give you a couple of data points. I mean, first of all, even if you take a complete open-source product, right, running them in different clouds, different on-premises environments, and so on, fundamentally, you will have some differences in performance characteristics, availability characteristics, and so on. So, the truth is, even if you use open-source, right, you're not going to get a hundred percent of the same characteristics where you run that. But that said, you still have the freedom of movement, and with I would say and not a huge amount of engineering investment, right, you're going to make sure you can run that workload elsewhere.I kind of think of Spanner in the similar way where yes, I mean, you're going to get all those benefits of Spanner that you can't get anywhere else, like unlimited scale, global consistency, right, no maintenance downtime, five-nines availability, like, you can't really get that anywhere else. That said, not every application necessarily needs it. And you still have that option, right, that if you need to, or want to, or we're not giving you a reasonable price or reasonable price performance, but we're starting to neglect you as a customer—which of course we wouldn't, but let's just say hypothetically, that you know, that could happen—that you still had a way to basically go and run this elsewhere. Now, I'd also want to talk about some of the upsides something like Spanner gives you. Because you talked about, you want to be able to just grab a few things, build something quickly, and then, you know, you don't want to be stuck.The counterpoint to that is with Spanner, you can start really, really small, and then let's say you're a gaming studio, you know, you're building ten titles hoping that one of them is going to take off. So, you can build ten of those, you know, with very minimal spend on Spanner and if one takes off overnight, it's really only the database where you don't have to go and re-architect the application; it's going to scale as big as you need it to. And so, it does enable a lot of this innovation and a lot of cost management as you try to get to that overnight success.Corey: Yeah, overnight success. I always love that approach. It's one of those, “Yeah, I became an overnight success after only ten short years.” It becomes this idea people believe it's in fits and starts, but then you see, I guess, on some level, the other side of it where it's a lot of showing up and doing the work. I have to confess, I didn't do a whole lot of admin work in my production years that touched databases because I have an aura and I'm unlucky, and it turns out that when you blow away some web servers, everyone can laugh and we'll reprovision stateless things.Get too close to the data warehouse, for example, and you don't really have a company left anymore. And of course, in the world of finance that I came out of, transactional integrity is also very much a thing. A question that I had [centers 00:17:51] really around one of the predictions you gave recently at Google Cloud Next, which is your prediction for the future is that transactional and analytical workloads from a database perspective will converge. What's that based on?Andi: You know, I think we're really moving from a world where customers are trying to make real-time decisions, right? If there's model drift from an AI and ML perspective, want to be able to retrain their models as quickly as possible. So, everything is fast moving into streaming. And I think what you're starting to see is, you know, customers don't have that time to wait for analyzing their transactional data. Like in the past, you do a batch job, you know, once a day or once an hour, you know, move the data from your transactional system to analytical system, but that's just not how it is always-on businesses run anymore, and they want to have those real-time insights.So, I do think that what you're going to see is transactional systems more and more building analytical capabilities, analytical systems building, and more transactional, and then ultimately, cloud platform providers like us helping fill that gap and really making data movement seamless across transactional analytical, and even AI and ML workloads. And so, that's an area that I think is a big opportunity. I also think that Google is best positioned to solve that problem.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built-in key rotation, permissions as code, connectivity between any two devices, reduce latency, and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. Tailscale is completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: On some level, I've found that, at least in my own work, that once I wind up using a database for something, I'm inclined to try and stuff as many other things into that database as I possibly can just because getting a whole second data store, taking a dependency on it for any given workload tends to be a little bit on the, I guess, challenging side. Easy example of this. I've talked about it previously in various places, but I was talking to one of your colleagues, [Sarah Ellis 00:19:48], who wound up at one point making a joke that I, of course, took way too far. Long story short, I built a Twitter bot on top of Google Cloud Functions that every time the Azure brand account tweets, it simply quote-tweets that translates their tweet into all caps, and then puts a boomer-style statement in front of it if there's room. This account is @cloudboomer.Now, the hard part that I had while doing this is everything stateless works super well. Where do I wind up storing the ID of the last tweet that it saw on his previous run? And I was fourth and inches from just saying, “Well, I'm already using Twitter so why don't we use Twitter as a database?” Because everything's a database if you're either good enough or bad enough at programming. And instead, I decided, okay, we'll try this Firebase thing first.And I don't know if it's Firestore, or Datastore or whatever it's called these days, but once I wrap my head around it incredibly effective, very fast to get up and running, and I feel like I made at least a good decision, for once in my life, involving something touching databases. But it's hard. I feel like I'm consistently drawn toward the thing I'm already using as a default database. I can't shake the feeling that that's the wrong direction.Andi: I don't think it's necessarily wrong. I mean, I think, you know, with Firebase and Firestore, that combination is just extremely easy and quick to build awesome mobile applications. And actually, you can build mobile applications without a middle tier which is probably what attracted you to that. So, we just see, you know, huge amount of developers and applications. We have over 4 million databases in Firestore with just developers building these applications, especially mobile-first applications. So, I think, you know, if you can get your job done and get it done effectively, absolutely stick to them.And by the way, one thing a lot of people don't know about Firestore is it's actually running on Spanner infrastructure, so Firestore has the same five-nines availability, no maintenance downtime, and so on, that has Spanner, and the same kind of ability to scale. So, it's not just that it's quick, it will actually scale as much as you need it to and be as available as you need it to. So, that's on that piece. I think, though, to the same point, you know, there's other databases that we're then trying to make sure kind of also extend their usage beyond what they've traditionally done. So, you know, for example, we announced AlloyDB, which I kind of call it Postgres on steroids, we added analytical capabilities to this transactional database so that as customers do have more data in their transactional database, as opposed to having to go somewhere else to analyze it, they can actually do real-time analytics within that same database and it can actually do up to 100 times faster analytics than open-source Postgres.So, I would say both Firestore and AlloyDB, are kind of good examples of if it works for you, right, we'll also continue to make investments so the amount of use cases you can use these databases for continues to expand over time.Corey: One of the weird things that I noticed just looking around this entire ecosystem of databases—and you've been in this space long enough to, presumably, have seen the same type of evolution—back when I was transiting between different companies a fair bit, sometimes because I was consulting and other times because I'm one of the greatest in the world at getting myself fired from jobs based upon my personality, I found that the default standard was always, “Oh, whatever the database is going to be, it started off as MySQL and then eventually pivots into something else when that starts falling down.” These days, I can't shake the feeling that almost everywhere I look, Postgres is the answer instead. What changed? What did I miss in the ecosystem that's driving that renaissance, for lack of a better term?Andi: That's a great question. And, you know, I have been involved in—I'm going to date myself a bit—but in PHP since 1997, pretty much, and one of the things we kind of did is we build a really good connector to MySQL—and you know, I don't know if you remember, before MySQL, there was MS SQL. So, the MySQL API actually came from MS SQL—and we bundled the MySQL driver with PHP. And so, kind of that LAMP stack really took off. And kind of to your point, you know, the default in the web, right, was like, you're going to start with MySQL because it was super easy to use, just fun to use.By the way, I actually wrote—co-authored—the tab completion in the MySQL client. So like, a lot of these kinds of, you know, fun, simple ways of using MySQL were there, and frankly, was super fast, right? And so, kind of those fast reads and everything, it just was great for web and for content. And at the time, Postgres kind of came across more like a science project. Like the folks who were using Postgres were kind of the outliers, right, you know, the less pragmatic folks.I think, what's changed over the past, how many years has it been now, 25 years—I'm definitely dating myself—is a few things: one, MySQL is still awesome, but it didn't kind of go in the direction of really, kind of, trying to catch up with the legacy proprietary databases on features and functions. Part of that may just be that from a roadmap perspective, that's not where the owner wanted it to go. So, MySQL today is still great, but it didn't go into that direction. In parallel, right, customers wanting to move more to open-source. And so, what they found this, the thing that actually looks and smells more like legacy proprietary databases is actually Postgres, plus you saw an increase of investment in the Postgres ecosystem, also very liberal license.So, you have lots of other databases including commercial ones that have been built off the Postgres core. And so, I think you are today in a place where, for mainstream enterprise, Postgres is it because that is the thing that has all the features that the enterprise customer is used to. MySQL is still very popular, especially in, like, content and web, and mobile applications, but I would say that Postgres has really become kind of that de facto standard API that's replacing the legacy proprietary databases.Corey: I've been on the record way too much as saying, with some justification, that the best database in the world that should be used for everything is Route 53, specifically, TXT records. It's a key-value store and then anyone who's deep enough into DNS or databases generally gets a slightly greenish tinge and feels ill. That is my simultaneous best and worst database. I'm curious as to what your most controversial opinion is about the worst database in the world that you've ever seen.Andi: This is the worst database? Or—Corey: Yeah. What is the worst database that you've ever seen? I know, at some level, since you manage all things database, I'm asking you to pick your least favorite child, but here we are.Andi: Oh, that's a really good question. No, I would say probably the, “Worst database,” double-quotes is just the file system, right? When folks are basically using the file system as regular database. And that can work for, you know, really simple apps, but as apps get more complicated, that's not going to work. So, I've definitely seen some of that.I would say the most awesome database that is also file system-based kind of embedded, I think was actually SQLite, you know? And SQLite is actually still very, very popular. I think it sits on every mobile device pretty much on the planet. So, I actually think it's awesome, but it's, you know, it's on a database server. It's kind of an embedded database, but it's something that I, you know, I've always been pretty excited about. And, you know, their stuff [unintelligible 00:27:43] kind of new, interesting databases emerging that are also embedded, like DuckDB is quite interesting. You know, it's kind of the SQLite for analytics.Corey: We've been using it for a few things around a bill analysis ourselves. It's impressive. I've also got to say, people think that we had something to do with it because we're The Duckbill Group, and it's DuckDB. “Have you done anything with this?” And the answer is always, “Would you trust me with a database? I didn't think so.” So no, it's just a weird coincidence. But I liked that a lot.It's also counterintuitive from where I sit because I'm old enough to remember when Microsoft was teasing the idea of WinFS where they teased a future file system that fundamentally was a database—I believe it's an index or journal for all of that—and I don't believe anything ever came of it. But ugh, that felt like a really weird alternate world we could have lived in.Andi: Yeah. Well, that's a good point. And by the way, you know, if I actually take a step back, right, and I kind of half-jokingly said, you know, file system and obviously, you know, all the popular databases persist on the file system. But if you look at what's different in cloud-first databases, right, like, if you look at legacy proprietary databases, the typical setup is wright to the local disk and then do asynchronous replication with some kind of bounded replication lag to somewhere else, to a different region, or so on. If you actually start to look at what the cloud-first databases look like, they actually write the data in multiple data centers at the same time.And so, kind of joke aside, as you start to think about, “Hey, how do I build the next generation of applications and how do I really make sure I get the resiliency and the durability that the cloud can offer,” it really does take a new architecture. And so, that's where things like, you know, Spanner and Big Table, and kind of, AlloyDB databases are truly architected for the cloud. That's where they actually think very differently about durability and replication, and what it really takes to provide the highest level of availability and durability.Corey: On some level, I think one of the key things for me to realize was that in my own experiments, whenever I wind up doing something that is either for fun or I just want see how it works in what's possible, the scale of what I'm building is always inherently a toy problem. It's like the old line that if it fits in RAM, you don't have a big data problem. And then I'm looking at things these days that are having most of a petabyte's worth of RAM sometimes it's okay, that definition continues to extend and get ridiculous. But I still find that most of what I do in a database context can be done with almost any database. There's no reason for me not to, for example, uses a SQLite file or to use an object store—just there's a little latency, but whatever—or even a text file on disk.The challenge I find is that as you start scaling and growing these things, you start to run into limitations left and right, and only then it's one of those, oh, I should have made different choices or I should have built-in abstractions. But so many of these things comes to nothing; it just feels like extra work. What guidance do you have for people who are trying to figure out how much effort to put in upfront when they're just more or less puttering around to see what comes out of it?Andi: You know, we like to think about ourselves at Google Cloud as really having a unique value proposition that really helps you future-proof your development. You know, if I look at both Spanner and I look at BigQuery, you can actually start with a very, very low cost. And frankly, not every application has to scale. So, you can start at low cost, you can have a small application, but everyone wants two things: one is availability because you don't want your application to be down, and number two is if you have to scale you want to be able to without having to rewrite your application. And so, I think this is where we have a very unique value proposition, both in how we built Spanner and then also how we build BigQuery is that you can actually start small, and for example, on Spanner, you can go from one-tenth of what we call an instance, like, a small instance, that is, you know, under $65 a month, you can go to a petabyte scale OLTP environment with thousands of instances in Spanner, with zero downtime.And so, I think that is really the unique value proposition. We're basically saying you can hold the stick at both ends: you can basically start small and then if that application doesn't need to scale, does need to grow, you're not reengineering your application and you're not taking any downtime for reprovisioning. So, I think that's—if I had to give folks, kind of, advice, I say, “Look, what's done is done. You have workloads on MySQL, Postgres, and so on. That's great.”Like, they're awesome databases, keep on using them. But if you're truly building a new app, and you're hoping that app is going to be successful at some point, whether it's, like you said, all overnight successes take at least ten years, at least you built in on something like Spanner, you don't actually have to think about that anymore or worry about it, right? It will scale when you need it to scale and you're not going to have to take any downtime for it to scale. So, that's how we see a lot of these industries that have these potential spikes, like gaming, retail, also some use cases in financial services, they basically gravitate towards these databases.Corey: I really want to thank you for taking so much time out of your day to talk with me about databases and your perspective on them, especially given my profound level of ignorance around so many of them. If people want to learn more about how you view these things, where's the best place to find you?Andi: Follow me on LinkedIn. I tend to post quite a bit on LinkedIn, I still post a bit on Twitter, but frankly, I've moved more of my activity to LinkedIn now. I find it's—Corey: That is such a good decision. I envy you.Andi: It's a more curated [laugh], you know, audience and so on. And then also, you know, we just had Google Cloud Next. I recorded a session there that kind of talks about database and just some of the things that are new in database-land at Google Cloud. So, that's another thing that if folks more interested to get more information, that may be something that could be appealing to you.Corey: We will, of course, put links to all of this in the [show notes 00:34:03]. Thank you so much for your time. I really appreciate it.Andi: Great. Corey, thanks so much for having me.Corey: Andi Gutmans, VP and GM of Databases at Google Cloud. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment, then I'm going to collect all of those angry, insulting comments and use them as a database.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Der Data Analytics Podcast
OLTP - Online Transactional Processing Begriffserklärung

Der Data Analytics Podcast

Play Episode Listen Later Nov 17, 2022 3:08


Online Transactional Processing. Quelle 1. Abschnitt: Scrum Implementation for Online Transaction Processing (OLTP) in Hospital Management, International Conference on Telecommunication Systems, Services, and Applications (TSSA), Setiadi et. al. S. 1-2.. Quelle 2. Abschnitt: Basiswissen Wirtschaftsinformatik, Buch, Weber et. al. 2022, 4. Auflage, Seite 152-153

Engineering Kiosk
#45 Datengetriebene Entscheidungen und der perfekte Dashboard Stack

Engineering Kiosk

Play Episode Listen Later Nov 15, 2022 51:08


Datengetriebene Entscheidungen oder auch "Glaube keiner Statistik, die du nicht selbst gefälscht hast".Entscheidungen treffen und die nächsten Schritte planen ist nicht einfach. Relevante Daten können einem die Entscheidung erleichtern. Doch wie fängt man mit datengetriebenen oder daten-unterstützenden Entscheidungen eigentlich an? Woher weiß man, ob man die richtigen Daten hat? Was wird zur entsprechenden Aufbereitung benötigt und wie kann ich die Daten entsprechend visualisieren?In dieser Episode geben wir einen Einblick in das Feld der datengetriebenen Entscheidungen. Wir beantworten, warum Tortendiagramme blödsinn sind, wie die Architektur aussehen kann, ob das Bauchgefühl überhaupt noch relevant ist und warum man nicht mehr sein eigenes JavaScript Frontend mehr bauen muss.Bonus: Was warmes Bier mit Husten zu tun hat und wie das Oktoberfest unsere Podcast-Statistiken beeinflusst.Feedback (gerne auch als Voice Message)Email: stehtisch@engineeringkiosk.devTwitter: https://twitter.com/EngKioskWhatsApp +49 15678 136776Gerne behandeln wir auch euer Audio Feedback in einer der nächsten Episoden, einfach Audiodatei per Email oder WhatsApp Voice Message an +49 15678 136776LinksWartungsfenster Podcast mit "Make or Buy": https://wartungsfenster.podigee.io/20-make-or-buyEngineering Kiosk #43 Cloud vs. On-Premise: Die Entscheidungshilfe: https://engineeringkiosk.dev/podcast/episode/43-cloud-vs-on-premise-die-entscheidungshilfe/Engineering Kiosk #12 Make oder Buy: https://engineeringkiosk.dev/podcast/episode/12-make-oder-buy/ClickHouse Datenbank: https://clickhouse.com/Google BigQuery: https://cloud.google.com/bigquery?hl=deQlikView: https://www.qlik.com/de-deTableau: https://www.tableau.com/de-dePowerBI: https://powerbi.microsoft.com/de-de/Amazon QuickSight: https://aws.amazon.com/de/quicksight/GCP Looker Studio: https://cloud.google.com/looker-studioMetabase: https://github.com/metabase/metabaseApache Superset: https://github.com/apache/supersetRedash: https://github.com/getredash/redashGrafana: https://github.com/grafana/grafanaOpen Podcast: https://openpodcast.dev/Engineering Kiosk Podcasts zum Thema Datenbanken: https://engineeringkiosk.dev/tag/datenbanken/@EngKiosk Tweet mit Metabase Statistiken: https://twitter.com/EngKiosk/status/1590373145793396736Sprungmarken(00:00:00) Intro(00:01:00) Cloud vs. On-Premise im Wartungsfenster Podcast(00:04:32) Das heutige Thema: Datengetriebene und Daten unterstützende Entscheidungen(00:05:16) Was verstehen wir unter datengetriebenen Entscheidungen?(00:08:18) Gefälschte Statistiken und die richtige Daten-Visualisierung(00:10:25) Wer hat Zugang zu den Daten und wie sieht die Daten-Transparenz aus?(00:14:05) Muss jeder Mitarbeiter SQL-Abfragen erstellen können?(00:15:55) Die Architektur für datengetriebene Entscheidungen(00:18:53) Pre-Processing, OLAP, OLTP und Datenbank-Normalformen(00:21:46) Was ist Clickhouse und welche Tools gibt es auf dem Markt?(00:22:59) Sind Tortendiagramme Blödsinn?(00:23:46) Die Visualisierung: Wie finde ich heraus, welche Fragen wir eigentlich aus den Daten beantwortet haben wollen?(00:25:53) Wie verwende ich Datenvisualisierung, ohne ein eigenes Team?(00:28:30) Schnelle Dashboards und Performance von Queries(00:29:28) Was ist bei Datenbanken in Bezug auf Analytics optimiert?(00:31:03) Muss man noch sein eigenes Dashboard-Frontend mit JavaScript bauen?(00:36:21) Welche Tipps würdest du Neulingen zur Dashboards-Erstellungen geben?(00:39:17) Ist das Bauchgefühl noch relevant?(00:41:30) Ab wann sind Daten aussagekräftig (statistisch signifikant)?(00:45:51) Welche Firmen sind Vorreiter im Bereich datengetriebene Entscheidungen?(00:47:29) Kann man zu datengetrieben sein?(00:48:21) Woher weiß ich, auf welche Daten ich gucke?(00:50:10) Outro: Podcast-StatistikenHostsWolfgang Gassler (https://twitter.com/schafele)Andy Grunwald (https://twitter.com/andygrunwald)Feedback (gerne auch als Voice Message)Email: stehtisch@engineeringkiosk.devTwitter: https://twitter.com/EngKioskWhatsApp +49 15678 136776

TheMummichogBlog - Malta In Italiano
"Analytics Engineer Slice North Macedonia What is Slice? Slice is the leading technology and marketing platform made exclusively for local pizzerias, making it super easy to order delicious, au

TheMummichogBlog - Malta In Italiano

Play Episode Listen Later Nov 15, 2022 9:01


"Analytics Engineer Slice North Macedonia What is Slice? Slice is the leading technology and marketing platform made exclusively for local pizzerias, making it super easy to order delicious, authentic local pizza anywhere, anytime. We serve the $45 billion U.S. pizzeria market in two ways: by pro" "--START AD- #TheMummichogblogOfMalta Amazon Top and Flash Deals(Affiliate Link - You will support our translations if you purchase through the following link) - https://amzn.to/3CqsdJH Compare all the top travel sites in just one search to find the best hotel deals at HotelsCombined - awarded world's best hotel price comparison site. (Affiliate Link - You will support our translations if you purchase through the following link) - https://www.hotelscombined.com/?a_aid=20558 “So whatever you wish that others would do to you, do also to them, for this is the Law and the Prophets."" #Jesus #Catholic. END AD---" "viding a pizza-centric mobile and web ordering experience for consumers, and by empowering local restaurants with the technology, tools, and marketing to grow their business, while helping them compete with Big Pizza. Yes, we have a pizza philosophy Across the country, pizza is made by people who care about craft, history, and culture. We created Slice to champion the small pizzerias by connecting those proud makers with customers who are just as passionate about their pizza. We celebrate pizza as the ultimate shareable food that brings people together for more than just a bite. We're growing our team every day — so, if you've got a passion for local, authentic pizza and the drive to help share it with the world, we'd love to have you on the team! You will be responsible for building and maintaining the data pipelines that are providing business insight here at slice. The role involves productionizing analytics and optimizing query performance on our chosen technologies. You'll work closely with the data engineering team, analysts and data scientists to provide Slice with data products that can be relied on throughout the business. The analytics engineers will be responsible for owning the development and optimization of analytic pipelines and building out clear dashboards to allow executives to understand the impact of the analysis. You'll ensure that data is accurate and timely to meet the needs of Slice. You'll have the opportunity to grow this role within Slice and have input into the correct technologies to deliver this function What You'll Do Work with delta lake in DataBricks analytical platform, extend existing delta lake by adding tables with aggregations needed for Data Science and Data Analysts team Analyze requirements, prepare technical documentation, ER diagram, and implement them in the DWH schema by creating appropriate fact and dimension tables Schedule DWH incremental loads in DataBricks and Airflow Maintain and optimize existing data extraction transformation processes for calculating metrics Be able to analyze and understand existing Python (Airflow) solution and extend according needs Build and maintain Looker Dashboards Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery Research technologies and build automated solutions to support daily work of Data Scientists and Data Analysts Act as an intermediary between Data Scientists and Data Engineers and find solutions for data processing challenges Manage, monitor, and troubleshoot the analytics and machine learning infrastructure. Work with complex data infrastructure, data lake (AWS S3 bucket), delta lake DataBricks Essentials More than 3 years work experience in DWH/DB Development/Business Intelligence/ETL Advanced SQL development knowledge and experience working with OLTP and OLAP databases Experience with Python in data related context Experience with data visualization tools Performance optimization (query tuning, indexing data, optimization practices, querying

Der Data Analytics Podcast
#DE MustKnow => OLAP, OLTP & Data Warehouse Begriffe erläutert

Der Data Analytics Podcast

Play Episode Listen Later Nov 13, 2022 7:31


OLAP, OLTP & Data Warehouse Begriffe erläutert. Was ist der Unterschied der einzelnen Datenspeicher?

Der Data Analytics Podcast
OLTP vs OLAP Praxis-Beispiel & direkt analytische Fragestellung beantworten

Der Data Analytics Podcast

Play Episode Listen Later Oct 23, 2022 6:05


Was ist ein OLTP vs OLAP - Fallbeispiel & Analytische Fragestellung beantworten

Der Data Analytics Podcast
Short OLTP vs OLAP

Der Data Analytics Podcast

Play Episode Listen Later Oct 9, 2022 2:34


OLTP vs OLAP - Online Transactional Processing System vs Online Analytical Processing System

Screaming in the Cloud
HeatWave and the Latest Evolution of MySQL with Nipun Agarwal

Screaming in the Cloud

Play Episode Listen Later Oct 6, 2022 38:43


About NipunNipun Agarwal is a Senior Vice President, MySQL HeatWave Development, Oracle. His interests include distributed data processing, machine learning, cloud technologies and security. Nipun was part of the Oracle Database team where he introduced a number of new features. He has been awarded over 170 patents.Links Referenced: Oracle: https://oracle.com MySQL HeatWave info: https://www.oracle.com/mysql/ MySQL Service on AWS and OCI login (Oracle account required): https://cloud.mysql.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Datadog. Datadog's SaaS monitoring and security platform that enables full stack observability for developers, IT operations, security, and business teams in the cloud age. Datadog's platform, along with 500 plus vendor integrations, allows you to correlate metrics, traces, logs, and security signals across your applications, infrastructure, and third party services in a single pane of glass.Combine these with drag and drop dashboards and machine learning based alerts to help teams troubleshoot and collaborate more effectively, prevent downtime, and enhance performance and reliability. Try Datadog in your environment today with a free 14 day trial and get a complimentary T-shirt when you install the agent.To learn more, visit datadoghq.com/screaminginthecloud to get. That's www.datadoghq.com/screaminginthecloudCorey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode is sponsored by our friends at Oracle, and back for a borderline historic third round going out and telling stories about these things, we have Nipun Agarwal, who is, as opposed to his first appearance on the show, has been promoted to senior vice president of MySQL HeatWave. Nipun, thank you for coming back. Most people are not enamored enough with me to subject themselves to my slings and arrows a second time, let alone a third. So first, thanks. And are you okay, over there?Nipun: Thank you, Corey. Yeah, very happy to be back.Corey: [laugh]. So, since the last time we've spoken, there have been some interesting developments that have happened. It was pre-announced by Larry Ellison on a keynote stage or an earnings call, I don't recall the exact format, that HeatWave was going to be coming to AWS. Now, you've conducted a formal announcement, this usual media press blitz, et cetera, talking about it with an eye toward general availability later this year, if I'm not mistaken, and things seem to be—if you'll forgive the term—heating up a bit.Nipun: That is correct. So, as you know, we have had MySQL HeatWave on OCI for just about two years now. Very good reception, a lot of people who are using MySQL HeatWave, are migrating from other clouds, specifically from AWS, and now we have announced availability of MySQL HeatWave on AWS.Corey: So, for those who have not done the requisite homework of listening to the entire back catalog of nearly 400 episodes of this show, what exactly is MySQL HeatWave, just so we make sure that we set the stage for what we're going to be talking about? Because I sort of get the sense that without a baseline working knowledge of what that is, none of the rest of this is going to make a whole lot of sense.Nipun: MySQL HeatWave is a managed MySQL service provided by Oracle. But it is different from other MySQL-based services in the sense that we have significantly enhanced the service such that it can very efficiently process transactions, analytics, and in-database machine learning. So, what customers get with the service, with MySQL HeatWave, is a single MySQL database which can process OLTP, transaction processing, real-time analytics, and machine learning. And they can do this without having to move the data out of MySQL into some other specialized database services who are running analytics or machine learning. And all existing tools and applications which work with MySQL work as is because this is something that enhances the server. In addition to that, it provides very good performance and very good price performance compared to other similar services out there.Corey: The idea historically that some folks were pushing around the idea of multi-cloud was that you would have workloads that—oh, they live in one cloud, but the database was going to be all the way across the other side of the internet, living in a different provider. And in practice, what we generally tend to see is that where the data lives is where the compute winds up living. By and large, it's easier to bring the compute resources to the data than it is to move the data to the compute, just because data egress in most of the cloud providers—notably exempting yours—is astronomically expensive. You are, if I recall correctly, less than 10% of AWS's data egress charge on just retail pricing alone, which is wild to me. So first, thank you for keeping that up and not raising prices because I would have felt rather annoyed if I'd been saying such good things. And it was, haha, it was a bait and switch. It was not. I'm still a big fan. So, thank you for that, first and foremost.Nipun: Certainly. And what you described is absolutely correct that while we have a lot of customers migrating from AWS to use MySQL HeatWave and OCI, a class of customers are unable to, and the number one reason they're unable to is that AWS charges these customers all very high egress fees to move the data out of AWS into OCI for them to benefit from MySQL HeatWave. And this has definitely been one of the key incentives for us, the key motivation for us, to offer MySQL HeatWave on AWS so that customers don't need to pay this exorbitant data egress fees.Corey: I think it's fair to disclose that I periodically advise a variety of different cloud companies from a perspective of voice-of-the-customer feedback, which essentially distills down to me asking really annoying slash obnoxious questions that I, as a customer, legitimately want to know, but people always frown at me when I asked that in vendor pitches. For some reason, when I'm doing this on an advisory basis, people instead nod thoughtfully and take notes, so that at least feels better from my perspective. Oracle Cloud has been one of those, and I've been kicking the tires on the AWS offering that you folks have built out for a bit of time now. I have to say, it is legitimate. I was able to run a significant series of tests on this, and what I found going through that process was interesting on a bunch of different levels.I'm wondering if it's okay with you, if we go through a few of them, just things that jumped out to me as we went through a series of conversations around, “So, we're going to run a service on AWS.” And my initial answer was, “Is this Oracle? Are you sure?” And here we are today; we are talking about it and press releases.Nipun: Yes, certainly fine with me. Please go ahead.Corey: So, I think one of the first questions I had when you said, “We're going to run a database service on AWS itself,” was, if I'm true to type, is going to be fairly sarcastic, which is, “Oh, thank God. Finally, a way to run a MySQL database on AWS. There's never been one of those before.” Unless you count EC2 or Aurora or Redshift depending upon how you squint at it, or a variety of other increasingly strange things. It feels like that has been a largely saturated market in many respects.I generally don't tend to advise on things that I find patently ridiculous, and your answer was great, but I don't want to put words in your mouth. What was it that you saw that made you say, “Ah, we're going to put a different database offering on AWS, and no, it's not a terrible decision.”Nipun: Got it. Okay, so if you look at it, the value proposition which MySQL HeatWave offers is that customers of MySQL or customers have MySQL compatible databases, whether Aurora, or whether it's RDS MySQL, right, or even, like, you know, customers of Redshift, they have been migrating to MySQL HeatWave on OCI. Like, for the reasons I said: it's a single database, customers don't need to have multiple databases for managing different kinds of workloads, it's much faster, it's a lot less expensive, right? So, there are a lot of value propositions. So, what we found is that if you were to offer MySQL HeatWave on AWS, it will significantly ease the migration of other customers who might be otherwise thinking that it will be difficult for them to migrate, perhaps because of the high egress cost of AWS, or because of the high latency some of the applications in the AWS incur when the database is running somewhere else.Or, if they really have an ecosystem of applications already running on AWS and they just want to replace the database, it'll be much easier for them if MySQL HeatWave was offered on AWS. Those are the reasons why we feel it's a compelling proposition, that if existing customers of AWS are willing to migrate the cloud from AWS to OCI and use MySQL HeatWave, there is clearly a value proposition we are offering. And if we can now offer the same service in AWS, it will hopefully increase the number of customers who can benefit from MySQL HeatWave.Corey: One of the next questions I had going in was, “Okay, so what actually is this under the hood?” Is this you effectively just stuffing some software into a machine image or an AMI—or however they want to mispronounce that word over an AWS-land—and then just making it available to your account and running that, where's the magic or mystery behind this? Like, it feels like the next more modern cloud approach is to stuff the whole thing into a Docker container. But that's not what you wound up doing.Nipun: Correct. So, HeatWave has been designed and architected for scale-out processing, and it's been optimized for the cloud. So, when we decided to offer MySQL HeatWave on AWS, we have actually gone ahead and optimize our server for the AWS architecture. So, the processor we are running on, right, we have optimized our software for that instance types in AWS, right? So, the data plane has been optimized for AWS architecture.The second thing is we have a brand new control plane layer, right? So, it's not the case that we're just taking what we had in OCI and running it on AWS. We have optimized the data plane for AWS, we have a native control plane, which is running on AWS, which is using the respective services on AWS. And third, we have a brand new console which we are offering, which is a very interactive console where customers can run queries from the console. They can do data management from the console, they're able to use Autopilot from the console, and we have performance monitoring from the console, right? So, data plane, control plane, console. They're all running natively in AWS. And this provides for a very seamless integration or seamless experience for the AWS customers.Corey: I think it's also a reality, however much we may want to pretend otherwise, that if there is an opportunity to run something in a different cloud provider that is better than where you're currently running it now, by and large, customers aren't going to do it because it needs to not just be better, but so astronomically better in ways that are foundational to a company's business model in order to justify the tremendous expense of a cloud migration, not just in real, out of pocket, cost in dollars and cents that are easy to measure, but also in terms of engineering effort, in terms of opportunity cost—because while you're doing that you're not doing other things instead—and, on some level, people tend to only do that when there's an overwhelming strategic reason to do it. When folks already have existing workloads on AWS, as many of them do, it stands to reason that they are not going to want to completely deviate from that strategy just because something else offers a better database experience any number of axes. So, meeting customers where they are is one of the, I guess, foundational shifts that we've really seen from the entire IT industry over the last 40 years, rather than you will buy it from us and you will tolerate it. It's, now customers have choice, and meeting them where they are and being much more, I guess, able to impedance-match with them has been critical. And I'm really optimistic about what the launch of this service portends for Oracle.Nipun: Indeed, but let me give you another data point. We find a very large number of Aurora customers migrating to MySQL HeatWave on OCI, right? And this is the same workload they were running on Aurora, but now they want to run the same workload on MySQL HeatWave on OCI. They are willing to undertake this journey of migration because their applications, they get much faster, and for a lot less price, but they get much faster. Then the second aspect is, there's another class of customers who are for instance running, on Aurora or other transactions or workloads, but then they have to keep moving the data, they'll keep performing the ETL process into some other service, whether it's Snowflake, or whether it's Redshift for analytics.Now, with this migration, when they move to MySQL HeatWave, customers don't need to, like, have multiple databases, and they get real-time analytics, meaning that if any data changes inside the server inside the OLTP as a database service, right? If they were to run a query, that query is giving them the latest results, right? It's not stale. Whereas with an ETL process, it gets to be stale. So, given that we already found that there were so many customers migrating to OCI to use MySQL HeatWave, I think there's a clear value proposition of MySQL HeatWave, and there's a lot of demand.But like, as I was mentioning earlier, by having MySQL HeatWave be offered on AWS, it makes the proposition even more compelling because, as you said, yes, there is some engineering work that customers will need to do to migrate between clouds, and if they don't want to, then absolutely now they have MySQL HeatWave which they can now use in AWS itself.Corey: I think that one of the things I continually find myself careening into, perhaps unexpectedly, is a failure to really internalize just how vast this entire industry really is. Every time I think I've seen it all, all I have to do is talk to one more cloud customer and I learn something completely new and different. Sometimes it's an innovative, exciting use of a thing. Other times, it's people holding something fearfully wrong and trying to use it as a hammer instead. And you know, if it's dumb and it works, is it really dumb? There are questions around that.And this in turn gave rise to one of my next obnoxious questions as I was looking at what you were building at the time because a lot of your pricing and discussions and framing of this was targeting very large enterprise-style customers, and the price points reflected that. And then I asked the question that Big E enterprise never quite expects, for whatever reason, it's like, “That looks awesome if I have a budget with many commas in it. What can I get for $4?” And as of this recording, pricing has not been finalized slash published for the service, but everything that you have shown me so far absolutely makes developing on this for a proof of concept or an evening puttering around, completely tenable: it is not bound to a fixed period of licensing; it's, use it when you want to use it, turn it off when you're done; and the hourly pricing is not egregious. I think that is something that historically, Oracle Database offerings have not really aligned with.OCI very much has, particularly with an eye toward its extraordinarily awesome free tier that's always free. But this feels like it's a weird blending of the OCI model versus historical Oracle Database pricing models in a way that, honestly I'm pretty excited about.Nipun: So, we react to what the customer requirements and needs are. So, for this class of customers who are using, say, RDS, MySQL, Aurora, we understand that they are very cost sensitive, right? So, one of the things which we have done in addition to offering MySQL HeatWave on AWS is based on the customer feedback and such. We are now offering a small shape of HeatWave instance in addition to the regular large shape. So, if customers want to just, you know, kick the tires, if developers just want to get started, they can get a MySQL node with HeatWave for less than ten cents an hour. So, for less than ten cents an hour, they get the ability to run transaction processing, analytics, and machine learning.And if you were to compare the corresponding cost of Aurora for the same, like, you know, core count, it's, like, you know, 12-and-a-half cents. And that's just Aurora, without Redshift or without SageMaker. So yes, you're right that based on the feedback and we have found that it would be much more attractive to have this low-end shape for the AWS developers. We are offering this smaller shape. And yeah, it's very, very affordable. It's about just shy of ten cents an hour.Corey: This brings up another question that I raised pretty early on in the process because you folks kept talking about shapes, and it turns out that is the Oracle Cloud term that applies to instance size over an AWS-land. And as we dug into this a bit further, it does make sense for how you think about these things and how you build them to customers. Specifically, if I want to run this, I log into cloud.oracle.com and sign up for it there, and pay you over on that side of the world, this does not show up on my AWS bill. What drove that decision?Nipun: Okay, so a couple of things. One clarification is that the site people log in to is cloud.mysql.com. So, that's where they come to: cloud.mysql.com.Corey: Oh, my apologies. I keep forgetting that you folks have multiple cloud offerings and domains. They're kind of a thing. How do they work? Given I have a bad domain by habit myself, I have no room to judge.Nipun: So, they come to cloud.mysql.com. From there, they can provision an instance. And we, as, like, you know, Oracle or MySQL, go ahead and create an instance in AWS, in the Oracle tenancy. From there, customers can then, you know, access their data on AWS and such. Now, what we want to provide the customers is a very seamless experience, that they just come to cloud.mysql.com, and from there, they can do everything: provisioning an instance, running the queries, payment and such. So, this is one of the reasons that we want customers just to be able to come to the site, cloud.mysql.com, and take care of the billing and such.Now, the other thing is that, okay, why not allow customers to pay from AWS, right? Now, one of the things over there is that if you were to do that and there's a customer, they'll be like, “Hey, I got to pay something to AWS, something to Oracle, so we'd prefer, it'd be better to have a one-stop shop.” And since many of these are already Oracle customers, it's helpful to do it this way.Corey: Another approach you could have taken—and I want to be very clear here that I am not suggesting that this would have been a good idea—but an approach that you could have taken would have been to go down the weird AWS partner rabbit hole, and we're going to provide this to customers on the AWS Marketplace. Because according to AWS, that's where all of their customers go to discover new softwares. Yeah, first, that's a lie. They do not. But aside from that, what was it about that Marketplace model that drove you to a decision point where okay, at launch, we are not going to be offering this on the AWS Marketplace? And to be clear, I'm not suggesting that was the wrong decision.Nipun: Right. The main reason is we want to offer the MySQL HeatWave service at the least expensive cost to the user, right, or like, the least cost. If you were to, like, have MySQL HeatWave in the Marketplace, AWS charges a premium. This the customers would need to pay. So, we just didn't want the customers to have to pay this additional premium just because they can now source this thing from the Marketplace. So, it's really to, like, save costs for the customer.Corey: The value of the Marketplace, from my perspective, has been effectively not having to deal as much with customer procurement departments because well, AWS is already on the procurement approved list, so we're just going to go ahead and take the hit to wind up making it accessible from that perspective and calling it good. The downside to this is that increasingly, as customers are making larger and longer-term commitments that are tied to certain levels of spend on AWS, they're increasingly trying to drag every vendor with whom they do business into the your AWS bill so they can check those boxes off. And the problem that I keep seeing with that is vendors who historically have been doing just fine, have great working relationships with a customer are reporting that suddenly customers are coming back with, “Yeah, so for our contract renewal, we want to go through the AWS Marketplace.” In return, effectively, these companies are then just getting a haircut off whatever it is they're able to charge their customers but receiving no actual value for any of this. It attenuates the relationship by introducing a third party into the process, and it doesn't make anything better from the vendor's point of view because they already had something functional and working; now they just have to pay a commission on it to AWS, who, it seems, is pathologically averse to any transaction happening where they don't get a cut, on some level. But I digress. I just don't like that model very much at all. It feels coercive.Nipun: That's absolutely right. That's absolutely right. And we thought that, yes, there is some value to be going to Marketplace, but it's not worth the additional premium customers would need to pay. Totally agree.Corey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: It's also worth pointing out that in Oracle's historical customer base, by which I mean the last 40 years that you folks have been in business, you do have significant customers with very sizable estates. A lot of your cloud efforts have focused around, I guess, we'll call it an Oracle-specific currency: Oracle Credits. Which is similar to the AWS style of currency just for a different company in different ways. One of the benefits that you articulated to me relatively early on was that by going through cloud.mysql.com, customers with those credits—which can be in sizable amounts based upon various differentiating variables that change from case to case—and apply that to their use of MySQL HeatWave on AWS.Nipun: Right. So, in fact, just for starters, right, what we give to customers is we offer some free credits for customers to try a service on OCI of, you know, $300. And that's the same thing, the same experience you would like customers who are trying HeatWave on AWS to get. Yes, so you're right, this is the kind of consistency we want to have, and yet another reason why cloud.mysql.com makes sense is the entry point for customers to try the service.Corey: There was a time where I would have struggled not to laugh in your face at the idea that we're talking about something in the context of an Oracle database, and well, there's $300 in credit. That's, “What can I get for that? Hung up on?” No. A surprising amount, when it comes to these things.I feel like that opens up an entirely new universe of experimentation. And, “Let's see how this thing actually works with his workload,” and lets people kick the tires on it for themselves in a way that, “Oh, we have this great database. Well, can I try it? Sure, for $8 million, you absolutely can.” “Well, it can stay great and awesome over there because who wants to take that kind of a bet?” It feels like it's a new world and in a bunch of different respects, and I just can't make enough noise about how glad I am to see this transformation happening.Nipun: Yeah. Absolutely, right? So, just think about it. So, you're getting MySQL and HeatWave together for just shy of ten cents an hour, right? So, what you could get for $300 is 3000 hours for MySQL HeatWave instance, which is very good for people to try for free. And then, you know, decide if they want to go ahead with it.Corey: One other, I guess, obnoxious question that I love to ask—it's not really a question so much as a statement; that that's part of the first thing that makes it really obnoxious—but it always distills down to the following that upsets product people left and right, which is, “I don't get it.” And one of the things that I didn't fully understand at the outset of how you were structuring things was the idea of separating out HeatWave from its constituent components. I believe it was Autopilot if I'm not mistaken, and it was effectively different SKUs that you could wind up opting to go for. And okay, if I'm trying to kick the tires on this and contextualize it as someone for whom the world's best database is Route 53, then it really felt like an additional decision point that I wasn't clear on the value of. And I'm still not entirely sure on the differentiation point and the value there, but now you offer it bundled as a default, which I think is so much better, from the user experience perspective.Nipun: Okay, so let me clarify a couple of things.Corey: Please. Databases are not my forte, so expect me to wind up getting most of the details hilariously wrong.Nipun: Sure. So, MySQL Autopilot provides machine-learning-based automation for various aspects of the MySQL service; very popular. There is no charge for it. It is built into MySQL HeatWave; there is no additional charge for it, right, so there never was any SKU for it. What you're referring to is, we have had a SKU for the MySQL node or the MySQL instance, and there's a separate SKU for HeatWave.The reason there is a need to have a different SKU for these two is because you always only have one node of MySQL. It could be, like, you know, running on one core, or like, you know, multiple cores, but it's always, like, you know, one node. But with HeatWave, it's a scale-out architecture, so you can have multiple nodes. So, the users need to be able to express how many nodes of HeatWave are they provisioning, right? So, that's why there is a need to have two SKUs, and we continue to have those two SKUs.What we are doing now differently is that when users instantiate a MySQL instance, by default, they always get the HeatWave node associated with it, right? So, they don't need to, like, you know, make the decision to—okay when to add HeatWave; they always get HeatWave along with the MySQL instance, and that's what I was saying a combination of both of these is, you know, like, just about ten cents an hour. If for whatever reason, they decide that they do not want HeatWave, they can turn it off, and then the price drops to half. But what we're providing is the AWS service that HeatWave is turned on by default.Corey: Which makes an awful lot of sense. It's something that lets people opt out if they decide they don't need this as they continue to scale out, but for the newcomer who does not, in many cases—in my particular case—have a nuanced understanding of where this offering starts and stops, it's clearly the right decision of—rather than, “Oh, yeah. The thing you were trying and it didn't work super well? Well, yeah. If you enable this other thing, it would have been awesome.” “Well, great. Please enable it for me by default and let me opt out later in time as my level of understanding deepens.”Nipun: That's right. And that's exactly what we are doing. Now, this was a feedback we got because many, if not most, of our customers would want to have HeatWave, and we just kind of, you know, mitigating them from going through one more step, it's always enabled by default.Corey: As far as I'm aware, you folks are running this effectively as any other AWS customer might, where you establish a private link connection to your customers, in some cases, or give them a public or private endpoint where they can wind up communicating with this service. It doesn't require any favoritism or special permissions from AWS themselves that they wouldn't give to any other random customer out there, correct?Nipun: Yes, that is correct. So, for now, we are exposing this thing as a public endpoint. In the future, we have plans to support the private endpoint as well, but for now, it's public.Corey: Which means that foundationally what you're building out is something that fits into a model that could work extraordinarily well across a variety of different environments. How purpose-tuned is the HeatWave installation you have running on AWS for the AWS environment, versus something that is relatively agnostic, could be dropped into any random cloud provider, up to and including the terrifyingly obsolete rack I have in the spare room?Nipun: So, as I mentioned, when we decided to offer MySQL HeatWave on AWS, the idea was that okay, for the AWS customers, we now want to have an offering which is completely optimized for AWS, provides the best price-performance on AWS. So, we have determined which instance types underneath will provide the best price performance, and that's what we have optimized for, right? So, I can tell you, like, in terms of many of—for instance, take the case of the cache size of the underlying processor that we're using on AWS is different than what we're using for OCI. So, we have gone ahead, made these optimizations in our code, and we believe that our code is really optimized now for the AWS infrastructure.Corey: I think that makes a fair deal of sense because, again, one of the big problems AWS has had is the proliferation of EC2 instance types to the point now where the answer is super easy, too, “Are you using the correct instance type for your workload?” Because that answer now is, “Of course not. Who could possibly say that they were with any degree of confidence?” But when you take the time to look at a very specific workload that's going to be scaled out, it's worth the time investment to figure out exactly how to optimize things for price and performance, given the constraints. Let's be very clear here, I would argue that the better price performance for HeatWave is almost certainly not going to be on AWS themselves, if for no other reason than the joy that is their data transfer pricing, even for internal things moving around from time to time.Personally, I love getting charged data transfer for taking data from S3, running it through AWS Glue, putting it into a different S3 bucket, accessing it with Athena, then hooking that up to Tableau as we go down and down and down the spiraling rabbit hole that never ends. It's not exactly what I would call well-optimized economically. Their entire system feels almost like it's a rigged game, on some level. But given those constraints, yeah, dialing in it and making it cost-effective is absolutely something that I've watched you folks put significant time and effort into.Nipun: So, I'll make two points, right, to the questions. First is yes, I just want to, like, be clear about it, that when a user provisions MySQL HeatWave via cloud.mysql.com and we create an instance in AWS, we don't give customers a multitude of things to, like, you know, choose from.We have determined which instance type is going to provide the customer the best price performance, and that's what we provision. So, the customer doesn't even need to know or care, is it going to be, like, you know, AMD? Is it going to be Intel? Is it going to be, like, you know, ARM, right? So, it's something which we have predetermined and we have optimized for it. That's first.The second point is in terms of the price performance. So, you're absolutely right, that for the class of customers who cannot migrate away from AWS because of the egress costs or because of the high latency because of AWS, right, sure, MySQL HeatWave on AWS will provide the best price-performance compared to other services out in AWS like Redshift, or Aurora, or Snowflake. But if customers have the flexibility to choose a cloud of their choice, it is indeed the case that customers are going to find that running MySQL HeatWave on OCI is going to provide them, by far, the best price performance, right? So, the price performance of running MySQL HeatWave on OCI is indeed better than MySQL HeatWave on AWS. And just because of the fact that when we are running the service in AWS, we are paying the list price, right, on AWS; that's how we get the gear. Whereas with OCI, like, you know, things are a lot less expensive for us.But even when you're running on AWS, we are very, very price competitive with other services. And you know, as you've probably seen from the performance benchmarks and such, what I'm very intrigued about is that we're able to run a standard workload, like some, like, you know, TPC-H and offer seven times better price-performance while running in AWS compared to Redshift. So, what this goes to show is that we are really passing on the savings to the customers. And clearly, Redshift is not doing a good job of performance or, like, you know, they're charging too much. But the fact that we can offer seven times better price performance than Redshift in AWS speaks volumes, both about architecture and how much of savings we are passing to our customers.Corey: What I love about this story is that it makes testing the waters of what it's like to run MySQL HeatWave a lot easier for customers because the barrier to entry is so much lower. Where everything you just said I agree with it is more cost-effective to run on Oracle Cloud. I think there are a number of workloads that are best placed on Oracle Cloud. But unless you let people kick the tires on those things, where they happen to be already, it's difficult to get them to a point where they're going to be able to experience that themselves. This is a massive step on that path.Nipun: Yep. Right.Corey: I really want to thank you for taking time out of your day to walk us through exactly how this came to be and what the future is going to look like around this. If people want to learn more, where should they go?Nipun: Oh, they can go to oracle.com/mysql, and there they can get a lot more information about the capabilities of MySQL HeatWave, what we are offering in AWS, price-performance. By the way, all the price performance numbers I was talking about, all the scripts are available publicly on GitHub. So, we welcome, we encourage customers to download the scripts from GitHub, try for themselves, and all of this information is available from oracle.com/mysql where they can get this detailed information.Corey: And we will, of course, put links to that in the show notes. Thank you so much for your time. I appreciate it.Nipun: Sure thing, Corey. Thank you for the opportunity.Corey: Nipun Agarwal, Senior Vice President of MySQL HeatWave. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry insulting comment. You will then be overcharged for the data transfer to submit that insulting comment, and then AWS will take a percentage of that just because they're obnoxious and can.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Der Data Analytics Podcast
Data Engineers vs Software Engineers OLAP vs OLTP

Der Data Analytics Podcast

Play Episode Listen Later Jun 27, 2022 5:21


Data Engineers arbeiten in OLAP Systems, Software Engineers in OLTP Systems

Red Hat X Podcast Series
Red Hat X Tech Talk: LINBIT SDS and OpenShift

Red Hat X Podcast Series

Play Episode Listen Later Jun 14, 2022 34:52


LINBIT SDS is a software-defined storage, that perfectly fits with Red Hat's OpenShift. It provides persistent volumes, exposed through the CSI interface. You can use it to build out of internal storage devices the persistent volumes for your cloud-native container workload.While it was initially designed for the on-prem data center, it is also a fit for multi-cloud and edge deployments. It is a good fit for IO demanding workloads, like OLTP and OLAP databases, message queuing, and AI.

Screaming in the Cloud
Data Analytics in Real Time with Venkat Venkataramani

Screaming in the Cloud

Play Episode Listen Later Apr 27, 2022 38:41


About VenkatVenkat Venkataramani is CEO and co-founder of Rockset. In his role, Venkat helps organizations build, grow and compete with data by making real-time analytics accessible to developers and data teams everywhere. Prior to founding Rockset in 2016, he was an Engineering Director for the Facebook infrastructure team that managed online data services for 1.5 billion users. These systems scaled 1000x during Venkat's eight years at Facebook, serving five billion queries per second at single-digit millisecond latency and five 9's of reliability. Venkat and his team also created and contributed to many noted data technologies and open-source projects, including Facebook's TAO distributed data store, RocksDB, Memcached, MySQL, MongoRocks, and others. Prior to Facebook, Venkat worked on tools to make the Oracle database easier to manage. He has a master's in computer science from the University of Wisconsin-Madison, and bachelor's in computer science from the National Institute of Technology, Tiruchirappalli.Links Referenced: Company website: https://rockset.com Company blog: https://rockset.com/blog TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and its spelled R-E-V-E-L-O. It means “I reveal.” Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Revelo has recognized is something I've been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to, basically, those of us without a presence in Latin America via their platform. It's the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes—but isn't limited to—talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability, as well as you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone overlap with what we have here in the United States, so you can hire full-time remote engineers who share most of the workday as your team. It's an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles all of it. If you're hiring engineers, check out revelo.io/screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming.Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Today's promoted guest episode is one of those questions I really like to ask because it can often come across as incredibly, well, direct, which is one of the things I love doing. In this case, the question that I am asking is, when you look around at the list of colossal blunders that people make in the course of careers in technology and the rest, it's one of the most common is, “Oh, yeah. I don't like the way that this thing works, so I'm going to build my own database.” That is the siren call to engineers, and it is often the prelude to horrifying disasters. Today, my guest is Venkat Venkataramani, co-founder and CEO at Rockset. Venkat, thank you for joining me.Venkat: Thanks for having me, Corey. It's a pleasure to be here.Corey: So, it is easy for me to sit here in my beautiful ivory tower that is crumbling down around me and use my favorite slash the best database imaginable, which is TXT records shoved into Route 53. Now, there are certainly better databases than that for most use cases. Almost anything really, to be honest with you, because that is a terrifying pattern; good joke, terrible practice. What is Rockset as we look at the broad landscape of things that store data?Venkat: Rockset is a real-time analytics platform built for the cloud. Let me break that down a little bit, right? I think it's a very good question when you say does the world really need another database? Don't we have enough already? SQL databases, NoSQL databases, warehouses, and lake houses now.So, if you really break it down, the first digital transformation that happened in the '80s was when people actually retired pen and paper records and started using a relational database to actually manage their business records and what have you instead of ledgers and books and what have you. And that was the first digital transformation. That was—and Oracle called the rows in a table ‘records' for a reason. They're called records to this date. And then, you know, 20 years later, when all businesses were doing system of record and transactions and transactional databases, then analytics was born, right?This was, like, the whole reason why I wanted to make better data-driven business decisions, and BI was born, warehouses and data lakes started becoming more and more mainstream. And there was really a second category of database management systems because the first category it was very good at to be a system of record, but not really good at complex analytics that businesses are asking to be able to guide their decisions. Fast-forward 20 years from then, the nature of applications are changing. The world is going from batch to real-time, your data never stops coming, advent of Apache Kafka and technologies like that, 5G, IoTs, data is coming from all sorts of nooks and corners within an enterprise, and now customers in enterprises are acquiring the data in real-time at a scale that the world has never seen before.Now, how do you get analytics out of that? And then if you look at the database market—entire market—there are still only two large categories of databases: OLTP databases for transaction processing, and warehouses and data lakes for batch analytics. Now suddenly, you need the speed of OLTP at the scale of batch, right, in terms of, like, complexity of compute, complexity of storage. So, that is really why we thought the data management space needs that third leg, and we call it real-time analytics platform or real-time analytics processing. And this is where the data never stops coming; the queries never stopped coming.You need the speed and the scale, and it's about time we innovate and solve the problem well because in 2015, 2016, when I was researching for this, every company that was looking to solve build applications that were real-time applications was building a custom Rube Goldberg machine of sorts. And it was insanely complex, it was insanely expensive. Fast-forward now, you can build a real-time application in a matter of hours with the simplicity of the cloud using Rockset.Corey: There's a lot to be said that the way we used to do things after the first transformation and we got into the world of batch processing, where—in the days of punch cards, which was a bit before my time and I believe yours as well—where they would drop them off and then the next day, or two days, they would come back later after the run, they would get the results only to figure out syntax error because you put the wrong card first or something like that. And it was maddening. In time, that got better, but still, nightly runs have become a thing to the point where even now, by default, if you wind up looking at the typical timing of a default Linux install, for example, you see that in the middle of the night is when a bunch of things will rotate when various cleanup jobs get done, et cetera, et cetera. And that seemed like a weird direction to go in. One of the most famous Google April Fools Day jokes was when they put out their white paper on MapReduce.And then Yahoo fell for it hook, line, and sinker, built out Hadoop, and we've been stuck with this idea of performing these big query jobs on top of existing giant piles of data, where ideally, you can measure it with a wall clock; in practice, you often measure the calendar in some cases. And as the world continues to evolve, being able to do streaming processing and understand in real-time what is going on, is unlocking different approaches, at least by all accounts. Do you have an example you can give me of a problem that real-time analytics solves for a customer? Because I can sit here and talk all day about how things might theoretically work, but I have to get out of my Route 53-based ivory tower over here, what are customers seeing?Venkat: That's a great question. And I want one hundred percent agree. I think Google did build MapReduce, and I think it's a very nice continuation of what happened there and what is happening in the world now. And built MapReduce and they quickly realized re-indexing the whole world [laugh] every night, as the size of the internet is exploding is a bad idea. And you know how Google index is now? They do real-time indexing.That is how they index the wor—you know, web. And they look for the changes that are happening in the internet, and they only index the changes. And that is exactly the same principle behind—one of the core principles behind Rockset's real-time analytics platform. So, what is the customer story? So, let me give you one of my favorite ones.So, the world's number one or number two buy now, pay later company, they have hundreds of millions of users, they have 300,000-plus merchants, they operate in, like, maybe 100-plus countries, so many different payment methods, you can imagine the complexity. At any given point in time, some part of the product is broken, well, Apple Pay stopped working in Switzerland for this e-commerce merchant. Oh God, like, we got to first detect that. Forget even debugging and figuring out what happened and having an incident response team. So, what did they do as they scale the number of payments processed in the system across the world—it's, like, in millions; first, it was millions in the day, and there was millions in an hour—so like everybody else, they built a batch-based system.So, they would accumulate all these payment records, and every six hours—so initially, it was a day, and then afterwards, you know, you try to see how far I can push it, and they couldn't push it beyond every six hours. Every six hours, some batch job would come and process through all the payments that happened, have some statistical models to detect, hey, here are some of the things that you might want to double-click and follow up on. And as they were scaling, the batch job that they will kick off every six hours was starting to take more than six hours. So, you can see how the story goes. Now, fast-forward, they came to us and say—it's almost like Rockset has, like, a big red button that says, “Real-time this.”And then they kind of like, “Can you make this real-time? Because not only that we are losing millions of potential revenue dollars in a year because something stops working and we're not processing payments, and we don't find out about that up to, like, three hours later, five hours later, six hours later, but our merchants are also very unhappy. We are also not able to protect our customers' business because that is all we are about.” And so fast-forward, they use Rockset, and simply using SQL now they have all the metrics and statistical computation that they want to do, happens in real-time, that are accurate up to the second. All of their anomaly detectors run every minute and the anomaly detectors take, like, hundreds of milliseconds to run.And so, now they've cut down the business observability, I would say. It's not metrics and machine observability is actually the—you know, they have now business observability in real-time. And that not only actually saves them a lot of potential revenue loss from downtimes, that's also allowing them to build a better product and give their customers a better experience because they are now telling their merchants and their customers that something is not working in some part of your e-commerce footprint before even the customers notice that something is wrong. And that allows them to build a better product and a better customer experience than their competitors. So, this is a very real-world example of why companies and enterprises are moving from batch to real-time.Corey: With the stories that you, and frankly, a lot of other data analytics companies tend to fall back on all the time has been stories of the ones you're telling, where you're talking about the largest buy now, pay later lender, for example. These are companies operating at massive scale who have tremendous existing transaction volume, and they're built out already. That's great, but then I wanted to try to cut to the truth of some of these things. And when I visit your pricing page at Rockset, it doesn't have what I would expect if that were the only use case. And what that would be is, “Great. Call here to conta—open up a sales quote, and we'll talk to you et cetera, et cetera, et cetera.”And the answer then is, “Okay, I know it's going to have at least two commas in it, ideally, not three, but okay, great.” Instead, you have a free tier where it's, “Hey, we'll give you a pile of credits, here's some limits on our free account, et cetera, et cetera.” Great. That is awesome. So, it tells me that there is a use case here for folks who have not already, on some level, made a good show of starting the process of conquering the world.Rather, someone with an idea some evening at two in the morning can wind up diving in and getting started. What is the Twitter for Pets, in my garage, spare-time side project story for using something like Rockset? What problem will I have as I wind up building those things out, when I don't have any user traffic or data yet, but I want to, you know for once in my life, do the smart thing in advance rather than building an impressive tower of technical debt?Venkat: That is the first thing we built, by the way. When we finish our product, the first thing we built was self-service. The first thing we built was a free forever tier, which has certain limits because somebody has to pay the bill, right? And then we also have compute instances that are very, very affordable that cost you, like, approximately $1 a day. And so, we built all of that because real-time analytics is not a need that only, like, the large-scale companies have. And I'll give you a very, very simple example.Let's say you're building a game, it's a mobile game. You can use Amazon DynamoDB and use AWS Lambdas and have a serverless stack and, like, you're really only paying… you're kind of keeping your footprint very, very small, and you're able to build a very lively game and see if it gets [wider 00:12:16], and it's growing. And once it grows, you can have all the big company scaling problems. But in the early days, you're just getting started. Now, if you think about DynamoDB and Lambdas and whatnot, you can build almost every part of the game except probably the leaderboard.So, how do I build a leaderboard when thousands of people are playing and all of their individual gameplays and scores and everything is just another simple record in DynamoDB. It's all serverless. But DynamoDB doesn't give me a SQL SELECT *, order by score, limit 100, distinct by the same player. No, this is a analytical question, and it has to be updated in real-time, otherwise, you really don't have this thing where I just finished playing. I go to the leaderboard, and within a second or two, if it doesn't update, you kind of lose people along the way. So, this is one of actually a very popular use case, when the scale is much smaller, which is, like, Rockset augments NoSQL database like a Dynamo or a Mongo where you can continue to use that for—or even a Postgres or MySQL for that case where you can use that as your system of record and keep it small, but cover all of your compute-heavy and analytical parts of your application with Rockset.So, it's almost like kind of a CQRS pattern where you use your OLTP database as your system of record, you connect Rockset to it, and so—Rockset comes in with built-in connectors, by the way, so you don't have to write a single line of code for your inserts and updates and deletes in your transactional database to get reflected in Rockset within one to two seconds. And so now, all of a sudden you have a fully indexed, fast SQL replica of your transactional database that on which you can do all sorts of analytical queries and that's fully isolated with your transactional database. So, this is the pattern that I'm talking about. The mobile leaderboard is an example of that pattern where it comes in very handy. But you can imagine almost everybody building some kind of an application has certain parts of it that is very analytical in nature. And by augmenting your transactional database with Rockset, you can have your cake and eat it too.Corey: One of the challenges I think that at least I've run into when it comes to working with data—and let's be clear, I tend to deal with data in relatively small volumes, mostly. The stuff that's significantly large, like, oh, I don't know, AWS bills from large organizations, the format of those is mostly predefined. When I'm building something out, we're using, I don't know, DynamoDB or being dangerous with SQLite or whatnot, invariably I find that even at small-scale, I paint myself into a corner by data model design or how I wind up structuring access or the rest, and the thing that I'm doing that makes perfect sense today winds up being incredibly challenging to change later. And I still, in production and have a DynamoDB table that has the word ‘test' in its name because of course I do.It's not a great place to find yourself in some cases. And I'm curious as to what you've seen, as you've been building this out and watching customers, especially ones who already had significant datasets as they move to you. Do you have any guidance around how to avoid falling down that particular well?Venkat: I will say a lot of the complexity in this world is by solving the right problems using the wrong tool, or by solving the right problem on the wrong part of the stack. I'll unpack this a little bit, right? So, when your patterns change, your application is getting more complex, it is demanding more things, that doesn't necessarily mean the first part of the application you build—and let's say DynamoDB was your solution for that—was the wrong choice. That is the right choice, but now you're expanded the scope of your application and the demand that you have on your backend transactional database. And now you have to ask the question, now in the expanded scope, which ones are still more of the same category of things on why I chose Dynamo and which ones are actually not at all?And so, instead of going and abusing the GSIs and other really complex and expensive indexing options and whatnot, that Dynamo, you know, has built, and has all sorts of limitations, instead of that, what do I really need and what is the best tool for the job, right? What is the best system for that? And how do I augment? And how do I manage these things? And this goes to the first thing I said, which is, like, this tremendous complexity when you start to build a Rube Goldberg machine of sorts.Okay, now, I'm going to start making changes to Dynamo. Oh, God, like, how do I pick up all of those things and not miss a single record? Now, replicate that to another second system that is going to be search-centric or reporting-centric, and do I have to rethink this once in a while? Do I have to build and manage these pipelines? And suddenly, instead of going from one system to two system, you actually end up going from one system to, like, four different things that with all the pipes and tubes going into the middle.And so, this is what we really observed. And so, when you come in to Rockset and you point us at your DynamoDB table, you don't write a single line of code, and Rockset will automatically scan your Dynamo tables, move that into Rockset, and in real-time, your changes, insert, updates, deletes to Dynamo will be reflected in Rockset. And this is all using Dynamo Streams API, Dynamo Scan API, and whatnot, behind the scenes. And this just gives you an example of if you use the right tool for the job here, when suddenly your application is demanding analytical queries on Dynamo, and you do the right research and find the right tool, your complexity doesn't explode at all, and you can still, again, continue to use Dynamo for what it is very, very good at while augmenting that with a system built for analytics with full-featured SQL and other capabilities that I can talk about, for the parts of your application for which Dynamo is not a good fit. And so, if you use the right tool for the job, you should be in very good place.The other thing is part about this wrong part of the stack. I'll give a very kind of naive example, and then maybe you can extrapolate that to, like, other patterns on how people could—you know, accidental complexities the worst. So, let's just say you need to implement access control on your data. Let's say the best place to implement access control is at the database level, just happens to be that is the right thing. But this database that I picked, doesn't really have role-based access control or what have you, it doesn't really give me all the security features to be able to protect the data the way I want it.So, then what I'm going to do is, I'm going to go look at all the places that is actually having business logic and querying the database and I'm going to put a whole bunch of permission management and roles and privileges, and you can just see how that will be so error-prone, so hard to maintain, and it will be impossible to scale. And this is what is the worst form of accidental complexity because if you had just looked at it that one week or two weeks, how do I get something out, or the database I picked doesn't have it, and then the two weeks, you feel like you made some progress by, kind of like, putting some duct tape if conditions on all the access paths. But now, [laugh] you've just painted yourself into a really, really bad corner.And so, this is another variation of the same problem where you end up solving the right problems in the wrong part of the stack, and that just introduces tremendous amount of accidental complexity. And so, I think yeah, both of these are the common pitfalls that I think people make. I think it's easy to avoid them. I would say there's so much research, there's so much content, and if you know how to search for these things, they're available in the internet. It's a beautiful place. [laugh]. But I guess you have to know how to search for these things. But in my experience, these are the two common pitfalls a lot of people fall into and paint themselves in a corner.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: A question I have, though, that is an extension is this—and I want to give some flavor to it—but why is there a market for real-time analytics? And what I mean by that is, early on in my tenure of fixing horrifying AWS bills, I saw a giant pile of money being hurled over at effectively a MapReduce cluster for Elastic MapReduce. Great. Okay, well, stream-processing is kind of a thing; what about migrating to that? Well, that was a complete non-starter because it wasn't just the job running on those things; there were downstream jobs, and with their own downstream jobs. There were thousands of business processes tied to that thing.And similarly, the idea of real-time analytics, we don't have any use for that because of, oh I don't know, I only wind up pulling these reports on a once-a-week basis, and that's fine, so what do I need that updated for in real-time if I'm looking at them once a week? In practice, the answer is often something aligned with the, “Well, yeah, but you had a real-time updating dashboard, you would find that more useful than those reports.” But people's expectations and business processes have shaped themselves around constraints that now can be removed, but how do you get them to see that? How do you get them to buy in on that? And then how do you untangle that enormous pile of previous constraint into something that leverages the technology that's now available for a brighter future?Venkat: I think [unintelligible 00:21:40] a really good question, who are the people moving to real-time analytics? What do they see? And why can they do it with other tech? Like, you know, as you say… EMR, you know, it's just MapReduce; can't I just run it in sort of every twenty-four hours, every six hours, every hour? How about every five minutes? It doesn't work that way.Corey: How about I spin up a whole bunch of parallel clusters on different timescales so I constantly—Venkat: [laugh].Corey: Have a new report coming in. It's real-time, except—Venkat: Exactly.Corey: You're constantly putting out new ones, but they're just six hours delayed every time.Venkat: Exactly. So, you don't really want to do this. And so, let me unpack it one at a time, right? I mean, we talked about a very good example of a business team which is building business observability at the buy now, pay later company. That's a very clear value-prop on why they want to go from batch to real-time because it saves their company tremendous losses—potential losses—and also allows them to build a better product.So, it could be a marketing operations team looking to get more real-time observability to see what campaigns are working well today and how do I double down and make sure my ad budget for the day is put to good use? I don't have to mention security operations, you know, needing real-time. Don't tell me I got owned three days ago. Tell me—[laugh] somebody is, you know, breaking glass and might be, you know, entering into your house right now. And tell me then and not three days later, you know—Corey: “Yeah, what alert system do you have for security intrusion?” “So, I read the front page of_The New York Times_ every morning and waiting to see my company's name.” Yeah, there probably are better ways to reduce that cycle time.Venkat: Exactly, right. And so, that is really the need, right? Like, I think more and more business teams are saying, “I need operational intelligence and not business intelligence.” Don't make me play Monday morning quarterback.My favorite analogy is it's the middle of the third quarter. I'm six points down. A couple of people, star players in my team and my opponent's team are injured, but there's some in offense, some in defense. What plays do I do and how do I play the game slightly differently to change the outcome of the game and win this game as opposed to losing by six points. So, that I think is kind of really what is driving businesses.You know, I want to be more agile, I want to be more nimble, and take, kind of, being data-driven decision-making to another level. So that, I think, is the real force in play. So, now the real question is, why can they do it already? Because if you go ask a hundred people, “Do you want fast analytics on real-time data or slow analytics on stale data?” How many people are going to say give me slow and stale? Zero, right? Exactly zero people.So, but then why hasn't it happened yet? I think it goes back to the world only has seen two kinds of databases: Transaction processing systems, built for system of record, don't lose my data kind of systems; and then batch analytics, you know, all these warehouses and data lakes. And so, in real-time analytics use cases, the data never stops coming, so you have to actually need a system that is running 24/7. And then what happens is, as soon as you build a real-time dashboard, like this example that you gave, which is, like, I just want all of these dashboards to automatically update all the time, immediately people respond, says, “But I'm not going to be like Clockwork Orange, you know, toothpicks in my eyelids and be staring at this 24/7. Can you do something to alert or detect some anomalies and tap on my shoulder when something off is going on?”And so, now what happens is somebody's actually—a program more than a person—is actually actively monitoring all of these metrics and graphs and doing some analysis, and only bringing this to your attention when you really need to because something is off, right? So, then suddenly what happens is you went from, accumulate all the data and run a batch report to [unintelligible 00:25:16], like, the data never stops coming, the queries never stopped coming, I never stop asking questions; it's just a programmatic way of asking those things. And at that point, you have a data app. This is not a analytics dashboard report anymore. You have a full-fledged application.In fact, that application is harder to build and scale than any application you've ever built before [laugh] because in those situations, again, you don't have this torrent of data coming in all the time and complex analytical questions you're asking on the data 24/7, you know? And so, that I think is really why real-time analytics platform has to be built as almost a third leg. So, this is what we call data apps, which is when your data never stops coming and your queries never stop coming. So, this is really, I think, what is pushing all the expensive EMR clusters or misusing your warehouse, misusing your data lakes. At the end of the day, is what is I think blowing up your Snowflake bills, is what blowing up your warehouse builds because you somehow accidentally use the wrong tool for the job [laugh] going back to the one that we just talked about.You accidentally say, “Oh, God, like, I just need some real-time.” With enough thrust, pigs can fly. Is that a good idea? Probably not, right? And so, I don't want to be building a data app on my warehouse just because I can. You should probably use the best tool for the job, and really use something that was built ground up for it.And I'll give you one technical insight about how real-time analytics platforms are different than warehouses.Corey: Please. I'm here for this.Venkat: Yes. So really, if you think about warehouses and data lakes, I call them storage-optimized systems. I've been building databases all my life, so if I have to really build a database that is for batch analytics, you just break down all of your expenses in terms of let's say, compute and storage. What I'm burning 24/7 is storage. Compute comes and goes when I'm doing a batch data load, or I'm running—an analyst who logs in and tries to run some queries.But what I'm actually burning 24/7 is storage, so I want to compress the heck out of the data, and I want to store it in very cheap media. I want to store it—and I want to make the storage as cheap as possible, so I want to optimize the heck out of the storage use. And I want to make computation on that possible but not efficient. I can shuffle things around and make the analysis possible, but I'm not trying to be compute-efficient. And we just talked about how, as soon as you get into real-time analytics, you very quickly get into the data app business. You're not building a real-time dashboard anymore, you're actually building your application.So, as soon as you get into that, what happens is you start burning both storage and compute 24/7. And we all know, relatively, [laugh] compute and RAM is about a hundred to a thousand times more expensive than storage in the grand scheme of things. And so, if you actually go and look at your Snowflake bill, if you go look at your warehouse bill—BigQuery, no matter what—I bet the computational part of it is about 90 to 95% of the bill and not the storage. And then, if you again, break down, okay, who's spending all the compute, and you'll very quickly narrow down all these real-time-y and data app-y use cases where you can never turn off the compute on your warehouse or your BigQuery, and those are the ones that are blowing up your costs and complexity. And on the Rockset side, we are actually not storage-optimized; we're compute-optimized.So, we index all the data as it comes in. And so, the storage actually goes slightly higher because the, you know, we stored the data and also the indexes of those data automatically, but we usually fold the computational cost to a quarter of what a typical warehouse needs. So, the TCO for our customers goes down by two to four folds, you know? It goes down by half or even to a quarter of what they used to spend. Even though their storage cost goes up in net, that is a very, very small fraction of their spend.And so really, I think, good real-time analytics platforms are all compute-optimized and not storage-optimized, and that is what allows them to be a lot more efficient at being the backend for these data applications.Corey: As someone who spends a lot of time staring into the depths of AWS bills, I think that people also lose sight of the reality that it doesn't matter what you're spending on AWS; it invariably pales in comparison to what you're spending on people to work with these things. The reason to go to cloud is not because it is the cheapest possible way to get computers to do things; it's because it's a capability story. It's about unlocking capacity and capabilities you do not have otherwise. And that dramatically increases your feature velocity and it lets you to achieve things faster, sooner, with better results. And unlocking a capability is always going to be more interesting to a company than saving money on it. When a company cares first, last, and always about just save money, make the bill lower, the end, it's usually a company in decline. Or alternately, something very strange is going on over there.Venkat: I agree with that. One of our favorite customers told us that Rockset took their six-month roadmap and shrunk it to a single afternoon. And their supply chain SaaS backend for heavy construction, 80% of concrete that are being delivered and tracked in North America follows through their platform, and Rockset powers all of their real-time analytics and reporting. And before Rockset, what did they have? They had built a beautiful serverless stack using DynamoDB, even have AWS Lambdas and what-have-you.And why did they have to do all serverless? Because the entire team was two people. [laugh]. And maybe a third person once in a while, they'll get, so 2.5. Brilliant people, like, you know, really pioneers of building an entire data stack on AWS in a serverless fashion; no pipes, no ETL.And then they were like, oh God, finally, I have to do something because my business demands and my customers are demanding real-time reporting on all of these concrete trucks and aggregate trucks delivering stuff. And real-time reporting is the name of the game for them, and so how do I power this? So, I have to build a whole bunch of pipes, deliver it to, like, some Elasticsearch or some kind of like a cluster that I had to keep up in real-time. And this will take me a couple of months, that will take me a couple of months. They came into Rockset on a Thursday, built their MVP over the weekend, and they had the first working version of their product the following Tuesday.So—and then, you know, there was no turning back at that point, not a single line of code was written. You know, you just go and create an account with Rockset, point us at your Dynamo, and then off you go. You know, you can use start using SQL and go start building your real-time application. So again, I think the tremendous value, I think a lot of customers like us, and a lot of customers love us. And if you really ask them what is one thing about Rockset that you really like, I think it'll come back to the same thing, which is, you gave me a lot of time back.What I thought would take six months is now a week. What I thought would be three weeks, we got that in a day. And that allows me to focus on my business. I want to spend more time with my stakeholders, you know, my CPO, my sales teams, and see what they need to grow our business and succeed, and not build yet another data pipeline and have data pipelines and other things coming out of my nose, you know? So, at the end of the day, the simplicity aspects of it is very, very important for real-time analytics because, you know, we can't really realize our vision for real-time being the new default in every enterprise for whenever analytics concern without making it very, very simple and accessible to everybody.And so, that continues to be one of our core thing. And I think you're absolutely right when you say the biggest expense is actually the people and the time and the energy they have to spend. And not having to stand up a huge data ops team that is building and managing all of these things, is probably the number one reason why customers really, really like working with our product.Corey: I want to thank you for taking so much time to talk me through what you're working on these days. If people want to learn more, where's the best place to find you?Venkat: We are Rockset, I'll spell it out for your listeners ROCKSET—rock set—rockset.com. You can go there, you can start a free trial. There is a blog, rockset.com/blog has a prolific blog that is very active. We have all sorts of stories there, and you know engineers talking about how they implemented certain things, to customer case studies.So, if you're really interested in this space, that's one on space to follow and watch. If you're interested in giving this a spin, you know, you can go to rockset.com and start a free trial. If you want to talk to someone, there is, like, a ‘Request Demo' button there; you click it and one of our solutions people or somebody that is more familiar with Rockset would get in touch with you and you can have a conversation with them.Corey: Excellent. And links to that will of course go in the [show notes 00:34:20]. Thank you so much for your time today. I appreciate it.Venkat: Thanks, Corey. It was great.Corey: Venkat Venkataramani, co-founder and CEO at Rockset. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an insulting crappy comment that I will immediately see show up on my real-time dashboard.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Google Cloud Platform Podcast
Spanner Myths Busted with Pritam Shah and Vaibhav Govil

Google Cloud Platform Podcast

Play Episode Listen Later Apr 20, 2022 35:47


This week, we're busting myths around Cloud Spanner with our guests Pritam Shah and Vaibhav Govil. Mark Mirchandani and Max Saltonstall host this episode and learn about the fantastic capabilities of Cloud Spanner. Our guests give us a quick run-down of Spanner database software and its fully-managed offerings. Spanner's unique take on the relational database has sparked some myths. We start by addressing cost and the idea that Spanner is expensive. With its high availability achieved through synchronously replicating data, failures are virtually a non-issue, making the cost well worth it. Our guests describe other features that add to the value of Spanner as well. Workloads of any size are a good fit for Spanner because of its scalability and pricing based on use. Despite rumors, Spanner is now very easy to start using. New additions like the PostgreSQL interface and ORM support have made the usability of Spanner much more familiar. Regional and multi-regional instances are supported, busting the myth that Spanner is only good for global workloads. Our guests offer examples of projects using local and global configurations with Spanner. In the database world, Vaibhav sees trends like the convergence of non-relational and relational databases as well as convergence in the OLTP and OLAP database semantics, and he tells us how Spanner is adapting and growing with these trends. Pritam points out that customers are paying more attention to total cost of ownership, the importance of scalable and reliable database solutions, and the peace of mind that comes with a managed database system. Spanner helps customers with these, freeing up business resources for other things. This year, Spanner has made many announcements about new capabilities coming soon, like PostgreSQL interface on spanner GA, Query Insights visualization tools, cross-regional backups GA, and more. We hear all about these awesome updates. Pritam Shah Pritam is the Director of Engineering for Cloud Spanner. He has been with Google for about four and a half years. Before Spanner, he was the Engineering Lead for observability libraries at Google. That included Distributed Tracing and Metrics at Google scale. His mission was to democratize the instrumentation libraries. That is when he launched Open Census and then took on Cloud Spanner. Vaibhav Govil Vaibhav is the Product lead for Spanner. He has been in this role for the past three years, and before this he was a Product Manager in Google Cloud Storage in Google. Overall, he has spent close to four years at Google, and it has been a great experience. Cool things of the week Our plans to invest $9.5 billion in the U.S. in 2022 blog A policy roadmap for 24⁄7 carbon-free energy blog SRE Prodcast site Meet the people of Google Cloud: Grace Mollison, solutions architect and professional problem solver blog GCP Podcast Episode 224: Solutions Engineering with Grace Mollison and Ann Wallace podcast Interview Spanner site Cloud Spanner myths busted blog PostgreSQL interface docs Cloud Spanner Ecosystem site Spanner: Google's Globally-Distributed Database white paper Spanner Docs docs Spanner Qwiklabs site Using the Cloud Spanner Emulator docs GCP Podcast Episode 62: Cloud Spanner with Deepti Srivastava podcast GCP Podcast Episode 248: Cloud Spanner Revisited with Dilraj Kaur and Christoph Bussler podcast Cloud Spanner federated queries docs What's something cool you're working on? Max is working on a new podcast platform and some spring break projects. Hosts Mark Mirchandani and Max Saltonstall

Screaming in the Cloud
Throwing Houlihans at MongoDB with Rick Houlihan

Screaming in the Cloud

Play Episode Listen Later Mar 24, 2022 40:44


About RickI lead the developer relations team for strategic accounts at MongoDB. My responsibilities include defining technical standards for the global strategic accounts team and consulting with the largest customers and opportunities for the business. My role spans technology sectors and as part of my engagements I routinely provide guidance on industry best practices, technology transformation, distributed systems implementation, cloud migration, and more. I led the architecture and design effort at Amazon for migrating thousands of relational workloads from RDBMS to NoSQL and built the center of excellence team responsible for defining the best practices and design patterns used today by thousands of Amazon internal service teams and AWS customers. I currently operate as the technical leader for our global strategic account teams to build the market for MongoDB technology by facilitating center of excellence capabilities within our customer organizations through training, evangelism, and direct design consultation activities.30+ years of software and IT expertise.9 patents in Cloud Virtualization, Complex Event Processing, Root Cause Analysis, Microprocessor Architecture, and NoSQL Database technology.Links: MongoDB: https://www.mongodb.com/ Twitter: https://twitter.com/houlihan_rick TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: The company 0x4447 builds products to increase standardization and security in AWS organizations. They do this with automated pipelines that use well-structured projects to create secure, easy-to-maintain and fail-tolerant solutions, one of which is their VPN product built on top of the popular OpenVPN project which has no license restrictions; you are only limited by the network card in the instance. To learn more visit: snark.cloud/deployandgoCorey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that's snark.cloud/oci-free.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. A year or two before the pandemic hit, I went on a magical journey to a mythical place called Australia. I know, I was shocked as anyone to figure out that this was in fact real. And while I was there, I gave the opening keynote at a conference that was called Latency Conf, which is great because there's a heck of a timezone shift, and I imagine that's what it's talking about.The closing keynote was delivered by someone I hadn't really heard of before, and he started talking about single table design with respect to DynamoDB, which, okay, great; let's see what he's got to say. And the talk started off engaging and entertaining and a high-level overview and then got deeper and deeper and deeper and I felt, “Can I please be excused? My brain is full.” That talk was delivered by Rick Houlihan, who now is the Director of Developer Relations for Strategic Accounts over at MongoDB, and I'm fortunate enough to be able to get him here to more or less break down some of what he was saying back then, catch up with what he's been up to, and more or less suffer my slings and arrows. Rick, thank you for joining me.Rick: Great. Thanks, Corey. I really appreciate—you brought back some memories, you know, trip down memory lane there. And actually, interestingly enough, that was the world's introduction to single table design was that. That was my dry-run rehearsal for re:Invent 2018 is where I delivered that talk, and it has become since the most positive—Corey: This was two weeks before re:Invent, which was just a great thing. I'd been invited to go; why not? I figured I'd see a couple of clients I had out in that direction. And I learned things like Australia is a big place. So, doing a one-week trip, including Sydney, Melbourne, and Perth. Don't do that.Rick: I had no idea that it took so long to fly from one side to the other, right? I mean, that's a long plane [laugh] [crosstalk 00:02:15]—Corey: Oh, yeah. And you were working at AWS at the time—Rick: Absolutely.Corey: —so I can only assume that they basically stuffed you into a dog kennel and threw you underneath the seating area, given their travel policy?Rick: Well, you know, I have the—[clear throat] actually at the time, they just upgraded the policy to allow the intermediate seating, right? So, if you wanted to get the—Corey: Ohhh—Rick: I know—Corey: Big spender. Big spender.Rick: Yes, yes. I can get a little bit extra legroom, so I didn't have my knees shoved into some of these back. But it was good.Corey: So, let's talk about, I guess… we'll call it the elephant in the room. You were at MongoDB, where you were a big proponent of the whole no-SQL side of the world. Then you went to go work at AWS and you carried the good word of DynamoDB far and wide. It made an impression; I built my entire newsletter pipeline production system on top of DynamoDB. It has the same data in three different tables because I'm not good at listening or at computers.But now you're back at Mongo. And it's easy to jump to the conclusion of, “Oh, you're just shilling for whoever it is that happens to sign your paycheck.” And at this point, are you—what's the authenticity story? But I've been paying attention to what you've been saying, and I think that's a bad take because you have been saying the same things all along since before you were on the Dynamo side of it. I do some research for this show, and you've been advocating for outcomes and the right ways to do things. How do you view it?Rick: That's basically the story here, right? I've always been a proponent of NoSQL. You know, what I took—the knowledge—it was interesting, the knowledge I took from MongoDB evolved as I went to AWS and I delivered, you know, thousands of applications and deployed workloads that I'd never even imagined I would have my hands on before I went there. I mean, honestly, what a great place it was to cut your teeth on data modeling at scale, right? I mean, that's the—there is no greater scale.That's when you learn where things break. And honestly, a lot of the lessons I took from MongoDB, well, when I applied them at scale at AWS, they worked with varying levels of success, and we had to evolve those into the sets of design patterns, which I started to propose for DynamoDB customers, which had been highly effective. I still believe in all those patterns. I would never tell somebody that they need to drop everything and run to MongoDB, but, you know, again, all those patterns apply to MongoDB, too, right? A very—a lot—I wouldn't say all of them, but many of them, right?So, I'm a proponent of NoSQL. And I think we talked before the call a little bit about, you know, if I was out there hocking relational technology right now and saying RDBMS is the future, then everybody who criticizes anything I say, I would absolutely have to, you know, say that there's some validity there. But I'm not saying anything different I've ever said. MongoDB announced Serverless, if you remember, in July, and that was a big turning point for me because the API that we offer, the developer experience for MongoDB is unmatched, and this is what I talk to people now. And it's the patterns that I've always proposed, I still model data the same way, I don't do it any different, and I've always said, if you go back to my earlier sessions on NoSQL, it's all the same.It doesn't matter if it's MongoDB, DynamoDB, or any other technology. I've always shown people how to model their data and NoSQL and I don't care what database you're using, I've actually helped MongoDB customers do their job better over the years as well. So.Corey: Oh, yeah. And looking back at some of your early talks as well, you passed my test for, “Is this person a shill?” Because you wound up in those talks, addressing head-on when is a relational model the right thing to do? And then you put the answers up on a slide, and this—and what—it didn't distill down to, “If you're a fool.”Rick: [laugh].Corey: Because there are use cases where if you don't [unintelligible 00:05:48] your access patterns, if you have certain constraints and requirements, then yeah. That you have always been an advocate for doing the right thing for the workload. And in my experience, for my use cases, when I looked at MongoDB previously, it was not a fit for me. It was very much a you run this on an instance basis, you have to handle all this stuff. Like three—you kno, keeping it in triplicate in three different DynamoDB tables, my newsletter production pipeline now, including backups and the rest, of DynamoDB portion has climbed to the princely sum of $1.30 a month, give or take.Rick: A month. Yes, exactly.Corey: So, there's no answer for that there. Now that Mongo Serverless is coming out into the world, oh, okay, this starts to be a lot more compelling. It starts to be a lot more flexible.Rick: I was just going to say, for your use case there, Corey, you're probably looking at the very similar pricing experience now, with MongoDB Serverless. Especially when you look at the pricing model, it's very close to the on-demand table model. It actually has discounted tiering above it, which I haven't really broken it down yet against a provision capacity model, but you know, there's a lot of complexity in DynamoDB pricing. And they're working on this, they'll get better at it as well, but right now you have on-demand, you have provisioned throughput, you have [clear throat] reserved capacity allocations. And, you know, there's a time and place for all of those, but it puts the—again, it's just complexity, right?This is the problem that I've always had with DynamoDB. I just wish that we'd spent more time on improving the developer experience, right, enhancing the API, implementing some of these features that, you know, help. Let's make single table design a first-class citizen of the DynamoDB API. Right now it's a red—it's a—I don't want to say redheaded stepchild, I have two [laugh] I have two redhead children and my wife is redhead, but yeah. [laugh].Corey: [laugh]. That's—it's—Rick: That's the way it's treated, right? It's treated like a stepchild. You know, it's like, come on, we're fully funding the solutions within our own umbrella that are competing with ourselves, and at the same time, we're letting the DynamoDB API languish while our competitors are moving ahead. And eventually, it just becomes, you know, okay, guys, I want to work with the best tooling on the market, and that's really what it came down to. As long as DynamoDB was the king of serverless, yes, absolutely; best tooling on the market.And they still are [clear throat] the leader, right? There's no doubt that DynamoDB is ahead in the serverless landscape, that the MongoDB solution is in its nascency. It's going to be here, it's going to be great, that's part of what I'm here for. And that's again, getting back to why did you make the move, I want to be part of this, right? That's really what it comes down to.Corey: One of the things that I know that was my own bias has always been that if I'm looking at something like—that I'm looking at my customer environments to see what's there, I can see DynamoDB because it has its own line item in the bill. MongoDB is generally either buried in marketplace charges, or it's running on a bunch of EC2 instances, or it just shows up as data transfer. So, it's not as top-of-mind for the way that I view things in… through the lens of you know, billing. So, that does inform my perception, but I also know that when I'm talking to large-scale companies about what they're doing, when they're going all-in on AWS, a large number of them still choose things like Mongo. When I've asked them why that is, sometimes you get the answer of, “Oh, legacy. It's what we built on before.” Cool—Rick: Sure.Corey: —great. Other times, it's a, “We're not planning to leave, but if we ever wanted to go somewhere else, it's nice to not have to reimagine the entire data architecture and change the integration points start to finish because migrations are hard enough without that.” And there is validity to the idea of a strategic exodus being possible, even if it's not something you're actively building for all the time, which I generally advise people not to do.Rick: Yeah. There's a couple things that have occurred over the last, you know, couple of years that have changed the enterprise CIO and CTO's assessment of risk, right? Risk is the number one decision factor in a CTOs portfolio and a CIO's, you know, decision-making process, right? What is the risk? What is the impact of that risk? Do I need to mitigate that risk, or do I accept that risk? Okay?So, right now, what you've seen is with Covid, people have realized that you know, on-prem infrastructure is a risk, right? It used to be an asset; now it's a risk. Those personnel that have to run that on-prem infrastructure, hey, what happens when they're not available? The infrastructure is at risk. Okay.So, offloading that to cloud providers is the natural solution. Great. So, what happens when you offload to a cloud provider and IAD goes down, or you know, us-east-1 goes down—we call it IAD or we used to call it IAD internally at AWS when I was there because, you know, the regions were named by airport codes, but it's us-east-1—how many times has us-east-1 had problems? Do you want to really be the guy that every time us-east-1 goes down, you're in trouble? What happens when people in us-east-1 have trouble? Where do they go?Corey: Down generally speaking.Rick: [crosstalk 00:10:37]—well, if they're well-architected, right, if they're well-architected, what do they do? They go to us-west-2. How much infrastructure is us-west-2 have? So, if everybody in us-east-1 is well-architected, then they all go to us-west-2. What happens in us-west-2? And I guarantee you—and I've been warning about this at AWS for years, there's a cascade failure coming, and it's going to be coming because we're well-architecting everybody to failover from our largest region to our smaller regions.And those smaller regions, they cannot take the load and nobody's doing any of that planning, so, you know, sooner or later, what you're going to see is dominoes fall, okay? [clear throat]. And it's not just going to be us-east-1, it's going to be us-east-1 failed, and the rollover caused a cascade failure in us-west-2, which caused a cascade—Corey: Because everyone's failing over during—Rick: That's right. That's right.Corey: —this event the same way. And also—again, not to dunk on them unnecessarily, but when—Rick: No, I'm not dunking.Corey: —us-east-1 goes, down a lot of the control plane services freeze up—Rick: Oh, of course they do.Corey: —like [unintelligible 00:11:25].Rick: Exactly. Oh, we not single point of failure, right? Uh-huh, exactly. There you go, Route 53, now—and that actually surprised me is DynamoDB instead of Route 53 is your primary database. So, I'm actually must have had some impact on you—Corey: To move one workload off of Dynamo to Route 53 [crosstalk 00:11:39] issue number because I have to practice what I preach.Rick: That's right. Exactly.Corey: It was weird; they the thing slower and little bit less, uh—Rick: [laugh]. I love it when [crosstalk 00:11:45]—yeah, yeah—Corey: —and a little bit [crosstalk 00:11:45] cache-y. But yeah.Rick: —sure. Okay, I can understand that. [laugh].Corey: But it made the architecture diagram a little bit more head-scratching, and really, that's what it's all about. Getting a high score.Rick: Right. So, if you think about your data, right, I mean, would you rather be running on an infrastructure that's tied to a cloud provider that could experience these kinds of regional failures and cascade failures, or would you rather have your data infrastructure go across cloud providers so that when provider has problems, you can just go ahead and switch the light bulb over on the other one and ramp right back up, right? You know? And honestly, you're running active, active configurations and that kind of, [clear throat] you know, deployment, you know, design, and you're never going to go down. You're always going—Corey: The challenge I've had—Rick: —to be the one that stays up.Corey: The theory is sound, but the challenge I've had in production with trying these things is that one, the thing that winds up handling the failover piece is often causes more outage than the underlying stuff itself.Rick: Well, sure. Yeah.Corey: Two, when you're building something to run a workload to run in multiple cloud providers, you're forced to use a lot of—Rick: Lowest common denominator?Corey: Lowest common denominator stuff. Yeah.Rick: Yeah, yeah totally. I hear that all the time.Corey: Unless you're actively running it in both places, it looks like a DR Plan, which doesn't survive the next commit to the codebase. It's the—Rick: I totally buy that. You're talking about the stack, stack duplication, all that kind of—that's an overhead and complexity, I don't worry about at the data layer, right?Corey: Oh, yeah.Rick: The data layer—Corey: If you're talking about—Rick: —[crosstalk 00:12:58]Corey: —[crosstalk 00:12:58] data layer, oh, everything you're saying makes perfect sense.Rick: Makes perfect sense, right? And honestly, you know, let's put it this way: If this is what you want to do—Corey: What do you mean identity management and security handover working differently? Oh, that's a different team's problem. Oh, I miss those days.Rick: Yeah, you know, totally right. It's not ideal. But you know, I mean, honestly, it's not a deal that somebody wants to manage themselves, is moving that data around. The data is the lock-in. The data is the thing that ties you to—Corey: And the cost of moving it around in some cases, too.Rick: That's exactly right. You know, so you know, having infrastructure that spans providers and spans both on-prem and cloud, potentially, you know, that can span multiple on-prem locations, man, I mean, that's just that's power. And MongoDB provides that; I mean, DynamoDB can't. And that's really one of the biggest limitations that it will always have, right? And we talked about, and I still believe in the power of global tables, and multi-region deployments, and everything, it's all real.But these types of scenarios, I think this is the next generation of failure that the cloud providers are not really prepared for, they haven't experienced it, they don't know what it's even going to look like, and I don't think you want to be tied to a single provider when these things start happening, right, if you have a large amount of infrastructure deployed someplace. It just seems like [clear throat] that's a risk that you're running at these days, and you can mitigate that risk somewhat by going with a MongoDB Atlas. I agree, all those other considerations. But you know, I also heard—it's a lot of fun, too, right? There's a lot of fun in that, right?Because if you think about it, I can deploy technologies in ways on any cloud provider, they're going to be cloud provider agnostic, right? I can use, you know, containerized technologies, Kubernetes, I can use—hell, I'm not even afraid to use Lambda functions, and just, you know, put a wrapper around that code and deploy it both as a Lambda or a Cloud Function in GCP. The code's almost the same in many cases, right? What it's doing with the data, you can code this stuff in a way—I used to do it all the time—you abstract the data layer, right? Create a DAL. How about a CAL? A cloud [laugh] cloud access layer, right, you know? [laugh].Corey: I wish, on some level, we could go down some of these paths. And someone asked me once a while back of, “Well, you seem to have a lot of opinions on this. Do you think you could build a better cloud than AWS?” And my answer—Rick: Hell yes.Corey: —look them a bit by surprise of, “Absolutely. Step one, I want similar resources, so give me $20 billion to spend”—Rick: I was going to say, right?Corey: —”then I'm going to hire the smart people.” Not that we're somehow smarter or better or anything else than the people who built AWS originally, but now—Rick: We have all those lessons learned.Corey: —we have fifteen years of experience to fall back on.Rick: Exactly.Corey: “Oh. I wouldn't make that mistake again.”Rick: Exactly. Don't need to worry about that. Yeah exactly.Corey: You can't just turn off a cloud service and relaunch it with a completely different interface and API and the rest.Rick: People who criticize, you know, services like DynamoDB, like—and other AWS services—look, these things are like any kind of retooling of the services, it's like rebuilding the engine on the airplane while it's flying.Corey: Oh, yeah.Rick: And you have to do it with a level of service assurance that—I mean, come on. DynamoDB provides four nines out of the box, right? Five nines if you turn on global tables. And they're doing this at the same time as they have pipeline releases dropping regularly, right? So, you can imagine what kind of, you know, unit testing goes on there, what kind of Canary deployments are happening.It's just, it's an amazing infrastructure that they maintain, incredibly complex, you know? In some ways, these are lessons that we need to learn in MongoDB if we're going to be successful operating a shared backplane serverless, you know, processing fabric. We have to look at what DynamoDB does right. And we need to build our own infrastructure that mirrors those things, right? And in some ways, these things are there, in some ways, they're working on, in some ways, we got a long ways to go.But you know, I mean, it's this is the exciting part of that journey for me. Now, in my case, I focus on strategic accounts, right? Strategic accounts are big, you know, they're the potential to be our whale customers, right? These are probably not customers who would be all that interested in serverless, right? They're customers that would be more interested in provisioned infrastructure because they're the people that I talked to when I was at DynamoDB; I would be talking to customers who are interested in like, reserved capacity allocations, right? If you're talking about—Corey: Yeah, I wanted to ask you about that. You're developer advocacy—which I get—for strategic accounts.Rick: Right.Corey: And I'm trying to wrap my head around—Rick: Why [crosstalk 00:17:19]—Corey: [crosstalk 00:17:19] strategic accounts are the big ones, potential spend lots of stuff. Why do they need special developer advocacy?Rick: [laugh]. Well, yeah, it's funny because, you know, one of the reasons why it started talking to Mark Porter about this, you know, was the fact that, you know, the overlap is really around [clear throat] the engagements that I ran when I was doing the Amazon retail migration, right? When Amazon retail started to move to NoSQL, we deprecated 3000 Oracle server instances, we moved a large percentage of those workloads to NoSQL. The vast majority probably just were lift-and-shift into RDS and whatnot because they were too small, too old, not worth upgrading whatnot, but every single tier, what we call tier-one service, right, every money-making service was redesigned and redeployed on DynamoDB, right? So, we're talking about 25,000 developers that we had to ramp. This is back four years ago; now we have, like, 75,000.But back then we had 25,000 global developers, we had [clear throat] a technology shift, a fundamental paradigm shift between relational modeling and NoSQL modeling, and the whole entire organization needed to get up to speed, right? So, it was about creating a center of excellence, it was about operating as an office of the CTO within the organization to drive this technology into the DNA of our company. And so that exercise was actually incredibly informative, educational, in that process of executing a technology transformation in a major enterprise. And this is something that we want to reproduce. And it's actually what I did for Dynamo as well, really more than anything.Yes, I was on Twitter, I was on Twitch, I did a lot of these things that were kind of developer advocate, you know, activities, but my primary job at AWS was working with large strategic customers, enabling their teams, you know, teaching them how to model their data in NoSQL, and helping them cross the chasm, right, from relational. And that is advocacy, right? The way I do it is I use their workloads. [clear throat]. I use their—the customers, you know, project teams themselves, I break down their models, I break down their access patterns when I leave, essentially—with the whole day of design reviews, we'll walk through 12 or 15 workloads, and when I leave these guys have an idea: How would I do it if I wanted to use NoSQL, right?Give them enough breadcrumbs so that they can actually say, “Okay, if I want to take it to the next step, I can do it without calling up and say, ‘Hey, can we get a professional services team in here?'” right? So, it's kind of developer advocacy and it's kind of not, right? We're kind of recognizing that these are whales, these are customers with internal resources that are so huge, they could suck our Developer's Advocacy Team in and chew it up, right? So, what we're trying to do is form a focus team that can hit hard and move the needle inside the accounts. That's what I'm doing. Essentially, it's the same work I did for [clear throat] AWS for DynamoDB. I'm just doing it for, you know—they traded for a new quarterback. Let's put it that way. [laugh].Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They've also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That's S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.Corey: So, one thing that I find appealing about the approach maps to what I do in the world of cloud economics, where I—like, in my own environment, our AWS bill is creeping up again—we have 14 AWS accounts—and that's a little over $900 a month now. Which, yeah, big money, big money.Rick: [laugh].Corey: In the context of running a company, that no one notices or cares. And our customers spend hundreds of millions a year, pretty commonly. So, I see the stuff in the big accounts and I see the stuff in the tiny account here. Honestly, the more interesting stuff is generally in on the smaller side of the scale, just because you're not going to have a misconfiguration costing a third of your bill when a third of your bill is $80 million a year. So—Rick: That's correct. If you do then that's a real problem, right?Corey: Oh yeah.Rick: [laugh].Corey: It's very much a two opposite ends of a very broad spectrum. And advice for folks in one of those situations is often disastrous to folks on the other side of that.Rick: That's right. That's right. I mean, at some scale, managing granularity hurts you, right? The overhead of trying to keep your costs, you know, it—but at the same time, it's just different, a different measure of cost. There's a different granularity that you're looking at, right? I mean, things below a certain, you know, level stop becoming important when, you know, the budget start to get a certain scale or a certain size, right? Theoretically—Corey: Yeah, for there's certain workloads, things that I care about with my dollar-a-month Dynamo spend, if I were to move that to Mongo Serverless, great, but my considerations are radically different than a company that is spending millions a month on their database structure.Rick: That's right. Really, that's what it comes down to.Corey: Yeah, we don't care about the pennies. We care about is it going to work? How do we back it up? What's the replication factor?Rick: And that—but also, it's more than that. It's, you know, for me, from my perspective, it really comes down to that, you know, companies are spending millions of dollars a year in database services. These are companies that are spending ten times that, five times that, in you know, in developers, you know, expense, right? Building services, maintaining the code that runs—that the services run.You know, the biggest problem I had with MongoDB is the level of code complexity. It's a cut after cut after cut, right? And the way I kind of describe the experience—and other people have described it to me; I didn't come up with this analogy. I had a customer tell me this as they were leaving DynamoDB—“DynamoDB is death by a thousand cuts. You love it, you start using it, you find a little problem, you start fixing it. You start fixing it. You start fixing—you come up with a pattern. Talk to Rick, he'll come up with something. He'll tell you how to do that.” Okay?And you know, how many customers did I would do this with? You know, and it's honestly, they're 15-minute phone calls for me, but every single one of those 15-minute phone calls turns into eight hours of developer time writing the code, debugging it, deploying it over and over again, it's making sure it's going the way it's [crosstalk 00:23:02]—Corey: Have another 15-minute call with Rick, et cetera, et cetera. Yeah.Rick: Another 15—exactly. And it's like okay, that's you know—eventually, they just get tired of it, right? And I actually had a customer that tell me—a big customer—tell me flat out, “Yeah, you proved that the DynamoDB can support our workload and it'll probably do it cheaper, but I don't have a half-a-dozen Ricks on my team, right? I don't have any Ricks on my team. I can't be getting you in here every single time we have to do a complex data model overhaul, right?”And this was—granted, it was one of the more complex implementations that I've ever done. In order to make it work. I had to overload the fricking table with multiple access patterns on the partition key, something I never done in my life. I made it work, but it was just—honestly, that was an exercise to me that taught me something. If I have to do this, it's unnatural, okay?And that's—[laugh] you know what I mean? And honestly, there's API improvements that we could have done to make that less of a problem. It's not like we haven't known since the last, I don't know, I joined the company that a thousand WCUs per storage partition was pretty small. Okay? We've kind of known that for I don't know, since DynamoDB, was invented. As matter of fact is, from what I know, talking to people who were around back then, that was a huge bone of contention back in the day, right? A thousand WCUs, ten gigabytes, there were a lot of the PEs on the team that were going, “No way. No way. That's way too small.” And then there were other people that were like, “Nah, nobody's ever going to need more than that.” And you know, a lot of this was based on the analysis of [crosstalk 00:24:28]—Corey: Oh, nothing ever survives first contact from—Rick: Of course.Corey: —customer, particularly a customer who is not themselves deeply familiar with what's happening under the hood. Like, I had this problem back when I was traveling trainer for Puppet for a while. It was, “Great. Well, Puppet is obviously a piece of crap because everyone I talked to has problems with it.” So, I was one of the early developers behind SaltStack—Rick: Oh nice.Corey: —and, “Ah, this is going to be a thing of beauty and it'll be awesome.” And that lasted until the first time I saw what somebody's done with it in the wild. It was, “Oh, okay, that's an [unintelligible 00:25:00] choice.”Rick: Okay, that's how—“Yeah, I never thought about that,” right? Happy path. We all love the happy path, right? As we're working with technologies, we figure out how we like to use it, we all use it that way. Of course, you can solve any problem you want the way that you'd like to solve it. But as soon as someone else takes that clay, they mold a different statue and you go, “Oh, I didn't realize it could look like that.” Right, exactly.Corey: So, here's one for you that I've been—I still struggle with this from time to time, but why would I, if I'm building something out—well, first off, why on earth would I do that? I have people for that who are good at things—but if I'm building something out and it has a database layer, why would someone choose NoSQL over—Rick: Oh, sure.Corey: —over SQL?Rick: [crosstalk 00:25:38] question.Corey: —and let me be clear here—and I'm coming at this from the perspective of someone who, basically me a few years ago, who has no real understanding of what databases are. So, my mental model of a database is Microsoft Excel, where I can fire up a [unintelligible 00:25:51] table of these things—Rick: Sure. [laugh]. Hey, well then, you know what? Then you should love NoSQL because that's kind of the best analogy of what is NoSQL. It's like a spreadsheet, right? Whereas a relational database is like a bunch of spreadsheets, each with their own types of rows, right? So—[laugh].Corey: Oh, my mind was blown with relational stuff [unintelligible 00:26:07] wait, you could have multiple tables? It's, “What do you think relational meant there, buddy?” My map of NoSQL was always key and value, and that was it. And that's all it can be. And sure, for some things, that's what I use, but not everything.Rick: That's right. So, you know, the bottom line is, when you think about the relational database, it all goes back to, you know, the first paper ever written on the relational model, Edgar Codd—and I can't remember the exact title, but he wrote the distributed model, the data model for distributed systems, something like that. He discussed, you know, the concept of normalization, the power of normalization, why you would want this. And the reason why we wanted this, why he thought this was important, this actually kind of demonstrates how—boy, they used to write killer abstracts to papers, right? It's like the very first sentence, this is why I'm write in this paper. You read the first sentence, you know: “Future users of modern computer systems must have a way to be able to ask questions of the data without knowing how to write code.”I mean, I don't know if those were the words, but that was basically what he said, that was why he invented the normalized data model. Because, you know, with the hierarchical management systems at the time, everyone had to know everything about the data in order to be able to get any answers, right? And he was like, “No, I want to be able to just write a question and have the system answer that.” Now, at the time, a lot of people felt like that's great, and they agreed with his normalized model—it was elegant—but they all believe that the CPU overhead at the time was way too high, right? To generate these views of data on the fly, no freaking way. Storage is expensive. But it ain't that expensive, right?Well, this little thing called Moore's Law, right? Moore's Law balanced his checkbook for, like, 40 years, 50 years, it balanced the relational database checkbook, okay? So, as the CPUs got faster and faster, crunching, the data became less and less of a problem, okay? And so we crunched bigger and bigger data sets, we got very, very happy with this. Up until about 2014.At 2014, a really interesting thing happened. If you look at the top 500, which is the supercomputers, the top 500 supercomputing clusters around the world, and you look at their performance increases year-to-year after 2014, it went off a cliff. No longer beating Moore's Law. Ever since, they've been—and per-core performance, you know, CPU, you know, instructions executed per second, everything. It's just flattening. Those curves are flattening. Moore's Law is broken.Now, you'll get people argue about it, but the reality is, if it wasn't broken, the top 500 would still be cruising away. They're not. Okay? So, what this is telling us is that the relational database is losing its horsepower. Okay?Why is it happening? Because, you know, gate length has an absolute minimum, it's called zero, right? We can't have a logic gate that's the—with negative distance, right? [laugh]. So, you know, these things—but storage, storage, hey, it just keeps on getting cheaper and cheaper, right?We're going the other way with storage, right? It's gigabytes, it's terabytes, it's petabytes, you know, with CPU, we're going smaller and smaller and smaller, and the fab cost is increasing. There's just—it's going to take a next-generation CPU technology to get back on track with Moore's Law.Corey: Well, here's the challenge. Everything you're saying makes perfect sense from where your perspective is. I reiterate, you are working with strategic accounts, which means ‘big.' When I'm building something out in the evenings because I want to see if something is possible, performance considerations and that sort of characteristic does not factor into it. When I'm a very small-scale, I care about cost to some extent—sure, whatever—but the far more expensive aspect of it, in the ways that matter have been what is the expensive—what—the big expensive piece is—Rick: We've talked about it.Corey: —engineering time—Rick: That's what we just talked about, right?Corey: —where it's, “What I'm I familiar with?”Rick: As a developer, right, why would I use MongoDB over DynamoDB? Because the developer experience [crosstalk 00:29:33]—Corey: Exactly. Sure, down the road there are performance characteristics and yeah, at the time I have this super-large, scaled-out, complex workload, yeah, but most workloads will not get to that.Rick: Will not ever get there. Ever get there. [crosstalk 00:29:45]—Corey: Yeah, so optimizing for [crosstalk 00:29:45], how's it going to work when I'm Facebook-scale? It's—Rick: So, first of—no, exactly, Facebook scale is irrelevant here. What I'm talking about is actually a cost ratchet that's going to lever on midsize workloads soon, right? Within the next four to five years, you're going to see mid-level workloads start to suffer from significant performance cost deficiencies compared to NoSQL workloads running on the same. Now you—hell, you see it right now, but you don't really experience it, like you said, until you get to scale, right? But in midsize workloads, [clear throat] that's going to start showing up, right? This cost overhead cannot go away.Now, the other thing here that you got to understand is, just because it's new technology doesn't make it harder to use. Just because you don't know how to use something, right, doesn't mean that it's more difficult. And NoSQL databases are not more difficult than the relational database. I can express every single relationship in a NoSQL database that I express in a relational database. If you think about the modern OLTP applications, we've done the analysis, ad nauseum: 70% of access patterns are for a single object, a single row of data from a single table; another 20% are for a row of datas—a range of rows from a single table. Okay, that leaves only 10% of your access patterns involve any kind of complex table traversal or entity traversals. Okay?And most of those are simple one-to-many hierarchies. So, let's put those into perspective here: 99% of the access patterns in an OLTP application can be modeled without denormalization in a single table. Because single table doesn't require—just because I put all the objects in one place doesn't mean that it's denormalized. Denormalized requires strong redundancies in the stored set. Duplication of data. Okay?Edgar Codd himself said that the normalized data model does not depend on storage, that they are irrelevant. I could put all the objects in the same document. As long as there's no duplication of data, there's no denormalization. I know, I can see your head going, “Wow,” but it's true, right? Because as long as I can clearly express the relationships of the data without strong redundancies, it is a normalized data model.That's what most people don't understand. NoSQL does not require denormalization. That's a decision you make, and it usually happens when you have many-to-many relationships; then we need to start duplicating the data.Corey: In many cases, at least my own experience—because again, I am bad at computers—I find that the data model is not something that is sat out—that you sit down and consciously plan very often. It's rather something—Rick: Oh yeah.Corey: —happens to you instead. I mean—Rick: That's right. [laugh].Corey: —realistically, like, using DynamoDB for this is aspirational. I just checked, and if I look at—so I started this newsletter back in March of 2017. I spun up this DynamoDB table that backs it, and I know it's the one that's in production because it has the word ‘test' in its name, because of course it does. And I'm looking into it, and it has 8700 items in it now and it's 3.7 megabytes. It's—Rick: Sure, oh boy. Nothing, right?Corey: —not for nothing, this could have been just as easily and probably less complex for my level of understanding at the time, a CSV file that I—Rick: Right. Exactly, right.Corey: —grabbed from a Lambda out of S3, do the thing to it, and then put it back.Rick: [unintelligible 00:32:45]. Right.Corey: And then from a performance and perspective side on my side, it would make no discernible difference.Rick: That's right because you're not making high-velocity requests against the small object. It's just a single request every now and then. S3 performance would probably—you might even be less. It might even cost you less to use S3.Corey: Right. And 30 to 100 of the latest ones are the only things that are ever looked at in any given week, the rest of it is mostly deadstock that could be transitioned out elsewhere.Rick: Exactly.Corey: But again, like, now that they have their lower cost infrequent access storage, then great. It's not item level; it's table levels, so what's the point? I can knock that $1.30 a month down to, what, $1.10?Rick: Oh well, yeah, no, I mean, again, Corey for those small workloads, you know what? It's like, go with what you know. But the reality is, look, as a developer, we should always want to know more, and we should always want to know new things, and we should always be aware of where the industry is headed. And honestly, I've heard through—I'm an old, old school, relational guy, okay, I cut my teeth on—oh, God, I don't even know what version of MS SQL Server it was, but when I was, you know, interviewing at MongoDB. I was talking to Dan Pasette, about the old Enterprise Manager, where we did the schema designer and all this, and we were reminiscing about, you know, back in the day, right?Yeah, you know, reality of things are is that if you don't get tuned into the new tooling, then you're going to get left behind sooner or later. And I know a lot of people who that has happened to over the years. There's a reason why I'm 56 years old and still relevant in tech, okay? [laugh].Corey: Mainframes, right? I kid.Rick: Yes, mainframes.Corey: I kid. You're not that much older than I am, let's be clear here.Rick: You know what? I worked on them, okay? And some of my peers, they never stopped, right? They just kind of stayed there.Corey: I'm still waiting for AWS/400. We don't see them yet, but hope springs eternal.Rick: I love it. I love that. But no, one of the things that you just said that I think it hit me really, it's like the data model isn't something you think about. The data model is something that just happens, right? And you know what, that is a problem because this is exactly what developers today think. They think know the relational database, but they don't.You talk to any DBA out there who's coming in after the fact and cleaned up all the crappy SQL that people like me wrote, okay? I mean, honestly, I wrote some stuff in the day that I thought, “This is perfect. There's no way that could be anything better than this,” right? Nice derived table joins insi—and you know what? Then here comes the DBA when the server is running at 90% CPU and 100% percent memory utilization and page swapping like crazy, and you're saying we got to start sharding the dataset.And you know, my director of engineering at the time said, “No, no, no. What we need is somebody to come in and clean up our SQL.” I said, “What do you mean? I wrote that SQL.” He's like, “Like I said, we need someone to come and clean up our SQL.”I said, “Okay, fine.” We brought the guy in. 1500 bucks an hour, we paid this guy, I was like, “There's no way that this guy is going to be worth that.” A day and a half later, our servers are running at 50% CPU and 20% memory utilization. And we're thinking about, you know, canceling orders for additional hardware. And this was back in the day before cloud.So, you know, developers think they know what they're doing. [clear throat]. They don't know what they're doing when it comes to the database. And don't think just because it's a relational database and they can hack it easier that it's better, right? Yeah, it's, there's no substitute for knowing what you're doing; that's what it comes down to.So, you know, if you're going to use a relational database, then learn it. And honestly, it's a hell of a lot more complicated to learn a relational database and do it well than it is to learn how to model your data in NoSQL. So, if you sit two developers down, and you say, “You learn NoSQL, you learn relational,” two months later, this guy is still going to be studying. This guy's going to be writing code for seven weeks. Okay? [laugh]. So, you know, that's what it comes down to. You want to go fast, use NoSQL and you won't have any problems.Corey: I think that's a good place to leave it. If people want to learn more about how you view these things, where's the best place to find you?Rick: You know, always hit me up on Twitter, right? I mean, @houlihan_rick, that's my—underbar rick, that's my Twitter handle. And you know, I apologize to folks who have hit me up on Twitter and gotten no response. My Twitter as you probably have as well, my message request box is about 3000 deep.So, you know, every now and then I'll start going in there and I'll dig through, and I'll reply to somebody who actually hit me up three months ago if I get that far down the queue. It is a Last In, First Out, right? I try to keep things as current as possible. [laugh].Corey: [crosstalk 00:36:51]. My DMs are a trash fire. Apologies as well. And we will, of course, put links to it in the [show notes 00:36:55].Rick: Absolutely.Corey: Thank you so much for your time. I really do appreciate it. It's always an education talking to you about this stuff.Rick: I really appreciate being on the show. Thanks a lot. Look forward to seeing where things go.Corey: Likewise.Rick: All right.Corey: Rick Houlihan Director of Developer Relations, Strategic Accounts at MongoDB. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an upset comment talking about how we didn't go into the proper and purest expression of NoSQL non-relational data, DNS TXT records.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Software Engineering Radio - The Podcast for Professional Software Developers

Frank McSherry, Chief Scientist at Materialize talks to Host Akshay Manchale about Materialize which is a SQL database that maintains incremental views over streaming data. Frank talks about how Materialize can complement analytical systems...

Outrageous Love the Podcast: Our Journeys to Responsiveness
Ben Kingsbury's Journey to Responsiveness, the Educrats Series, 3 of 3

Outrageous Love the Podcast: Our Journeys to Responsiveness

Play Episode Listen Later Mar 16, 2022 58:16


OLTP ends the "educrats series" in California with a look inside the belly of the beast of the California Department of Education (CDE). This time, we hear the perspective of educator Ben Kingsbury. Ben serves as a program consultant in the CDE. He offers deep insights, not only on the bureaucracy but also on his journey, which takes us to Saudi Arabia and we land with a dope reference to the GAP Band. Ben's journey reminds us of the old adage you cannot judge a book by its cover...ever. Fascinating and fun! Dr. Hollie's two cents raises the question, why do we to continue to believe in  an educational system that has perpetually not equitably served all students? How many chances does the system get? Listen in for the answer.

Web Dev 101 - Front End, Back End, Full Stack
#1 – Intro to DataNation, what is OLTP and OLAP

Web Dev 101 - Front End, Back End, Full Stack

Play Episode Listen Later Feb 5, 2022 9:36


Alex Merced introduces the new DataNation podcast that will be available on all podcast providers the discusses what is OLTP and what is OLAP and why they matter. Join the DataNation community at https://www.DataNation.click Register for the Subsurface Conference at - https://www.dremio.com/subsurface/live/winter2022/?utm_medium=social&utm_source=dremio&utm_term=alexmercedsocial&utm_content=na&utm_campaign=event-subsurface-2022 Join the Subsurface Slack Community at - https://join.slack.com/t/subsurfaceworkspace/shared_invite/zt-ghkyk4ox-8FmydCM_6xGdx9Li0qo3Jg

Engenharia de Dados [Cast]
Data Warehouse vs. Data Lakehouse - Casos de Uso e Comparações com Orlando Marley

Engenharia de Dados [Cast]

Play Episode Listen Later Aug 19, 2021 70:46


Você gostaria de compreender a diferença entre Data Warehouse e Data Lakehouse e ir além para entender de fato o que acontece na realidade das empresas que adotam essas soluções? O Orlando Marley é um dos grandes especialistas nessa área e com ele, iremos dar dicas de como entender melhor esses dois paradigmas e como você pode unir essas duas soluções para entregar um Analytics marcante para sua empresa.Data Lakehouse é um novo conceito que vem ganhando tração rapidamente e para você poder se destacar como um engenheiro de dados se faz necessário aprender sobre. Luan Moreno = https://www.linkedin.com/in/luanmoreno/

Engenharia de Dados [Cast]
Postgres como Plataforma de Dados Estruturados e Semi-Estruturados com Raul Oliveira

Engenharia de Dados [Cast]

Play Episode Listen Later Aug 14, 2021 71:30


Bancos de dados relacionais estão presentes em todas as empresas, o motivo principal é porque esse sistema pode garantir propriedades como: atomicidade, consistência, isolamento e durabilidade.Não somente isso, armazenar dados de forma segura é um dos fatores chaves para qualquer empresa, porém nessa nova era de Big Data, diferentemente do que muitos pensam, bases de dados open-sources como: MySQL e Postgres são amplamente utilizadas para cenários de transações e analytics.Nesse bate papo super técnico, trazemos o Raul Oliveira que é um profissional diferenciado de mercado para falar e explicar o que o Postgres pode entregar para cenários de Big Data e Analytics assim como a visão de futuro do produto. Luan Moreno = https://www.linkedin.com/in/luanmoreno/

The Cloud Migration Podcast
Episode 9 - Rules Migration and Controlling Risk

The Cloud Migration Podcast

Play Episode Listen Later May 24, 2021 19:04


Cloud Data Migration, the body must move broad data into a cloud distribution center. These assignments incorporate source management and planning (mapping), making a data pipeline for each table, and pursue changes over all data sources between others. However, there are various approaches to dispose of failures by utilizing automation. For example: Selecting ETL over ELT: In Extract, Transform, Load, data is extracted from data sources like OLTP database A and B and then deposited into the dimension authority (Staging Area). After that, the data is clean and transformed into a logical data warehouse. In comparison, ETL needs data to pass through a secondary server.

Engenharia de Dados [Cast]
YugaByteDB - Banco de Dados Distribuído com Consistência e Transações [ACID]

Engenharia de Dados [Cast]

Play Episode Listen Later Feb 19, 2021 53:44


Nesse episódio você irá descobrir um novo banco de dados que realiza transações distribuídas e que é totalmente pronto para o kubernetes.Iremos navegar nas diversas características e recursos do banco de dados open-source que foi inicialmente desenvolvido dentro do Facebook. O YugaByteDB oferece suporte as apis de Postgres e Cassandra ou seja, você pode de forma transparente mover seus workloads com mínimo esforço para dentro da plataforma.Transações [ACID] Distribuídas = http://bit.ly/3ui2flxRaft-Consensus = http://bit.ly/3uie7Eq Luan Moreno = https://www.linkedin.com/in/luanmoreno/

The Podlets - A Cloud Native Podcast
The Past, Present and Future of Kubernetes with Craig McLuckie (Ep 13)

The Podlets - A Cloud Native Podcast

Play Episode Listen Later Jan 20, 2020 46:56


Today on The Podlets Podcast, we are joined by VMware's Vice President of Research and Development, Craig McLuckie! Craig is also a founder of Heptio, who were acquired by VMware and during his time at Google he was part of bringing Kubernetes into being. Craig has loads of expertise and shareable experience in the cloud native space and we have a fascinating chat with him, asking about his work, Heptio and of course, Kubernetes! Craig shares some insider perspective on the space, the rise of Kubernetes and how the increase in Kubernetes' popularity can be managed. We talk a lot about who can use Kubernetes and the prerequisites for implementation; Craig insists it is not a one-size-fits-all scenario. We also get into the lack of significantly qualified minds and how this is impacting competition in the hiring pool. Craig comments on taking part in the open source community and the buy-in that is required to meaningfully contribute as well as sharing his thoughts on the need to ship new products and services regularly. We finish off the episode with some of Craig's perspectives on the future of Kubernetes, dangers it poses to code if neglected and the next phase of its lifespan. For this amazing chat with a true expert in his field, make sure to join us on for this episode! Follow us: https://twitter.com/thepodlets Website: https://thepodlets.io Feeback: info@thepodlets.io https://github.com/vmware-tanzu/thepodlets/issues Special guest: Craig McLuckie Hosts: Carlisia Campos Duffie Cooley Josh Rosso Key Points From This Episode: • A brief introduction to Craig's history and his work in the cloud native space. • The questions that Craig believes more people should be asking about Kubernetes. • Weighing the explosion of the Kubernetes space; fragmentation versus progress. • The three pieces of enterprise software and aiming to enlarge the 'crystalline core'.• Craig's thoughts on specialized Kubernetes operating systems and their tradeoffs. • Quantifying the readiness of an organization to implement Kubernetes. • Craig's reflections on Heptio and the lessons he feels he learned in the process.• The skills shortage for Kubernetes and how companies are approaching this issue. • Balancing the needs and level of the community and shipping products regularly.• Involvement in the open source community and the leap of faith that is inherent in the process. • The question of microliths; making monoliths more complex and harder to manage. • Masking problems with Kubernetes and how detrimental this can be to your code. • Craig's thoughts on the future of the Kubernetes space and possible changes.• The two duty cycles of any technology; the readiness phase that follows the hype. Quotes: “I think Kubernetes has opened it up, not just in terms of the world of applications that can run Kubernetes, but also this burgeoning ecosystem of supporting technologies that can create value.” — @cmcluck [0:06:20] “You're not a cool mainstream enterprise software provider if you don’t have a Kubernetes story today. I think we’ll start to see continued focus and consolidation around a set of the larger organizations that are operating in this space.” — @cmcluck [0:06:39] “We are so much better served as a software company if we can preserve consistency from environment to environment.” — @cmcluck [0:09:12] “I’m a fan of rendered down, container-optimized operating system distributions. There’s a lot of utility there, but I think we also need to be practical and recognize that enterprises have gotten comfortable with the OS landscape that they have.” — @cmcluck [0:14:54] Links Mentioned in Today’s Episode: Craig McLuckie on LinkedIn Craig McLuckie on Twitter The Podlets on Twitter Kubernetes VMware Brendan Burns Cloud Native Computing Foundation Heptio Mesos Valero vSphere Red Hat IBM Microsoft Amazon KubeCon Transcript: EPISODE 13 [INTRODUCTION] [0:00:08.7] ANNOUNCER: Welcome to The Podlets Podcast, a weekly show that explores Cloud Native one buzzword at a time. Each week, experts in the field will discuss and contrast distributed systems concepts, practices, tradeoffs and lessons learned to help you on your cloud native journey. This space moves fast and we shouldn’t reinvent the wheel. If you’re an engineer, operator or technically-minded decision maker, this podcast is for you. [INTERVIEW] [00:00:41] CC: Hi, everybody. Welcome back to The Podlets podcast, and today we have a special guest, Craig McLuckie. Craig, I have the hardest time pronouncing your last name. You will correct me, but let me just quickly say, well, I’m Carlisia Campos and today we also have Duffy Colley and Josh Rosso on the show. Say that three times fast, Craig McLuckie. Please help us say your last name and give us a brief introduction. You are super well-known in the Kubernetes community and inside VMware, but I’m sure there are not enough people that should know about you that didn’t know about you. [00:01:20] CM: All right. I’ll do a very quick intro. Hi, I’m Craig McLuckie. I’m a Vice President of Research and Development here at VMware. Prior of VMware, I spent a fair amount of time at Google where my friend Joe and I were responsible for building and shipping Google Compute Engine, which was an interesting exercise in bringing traditional enterprise virtualized workloads into the very sophisticated Google data center. We then went ahead and as our next project with Brendan Burns, started Kubernetes, and that obviously worked out okay, and I was also responsible for the ideation and formation of the Cloud Native Computing Foundation. I then wanted to work with Joe again. So we started Heptio, a little startup in the Kubernetes ecosystem. Almost precisely a year ago, we were acquired by VMware. So I’m now part of the VMware company and I’m working on our broader strategy around cloud native apps under the brand [inaudible 00:02:10]. [00:02:11] CC: Let me start off with a question. I think it is going to be my go-to first question for every guest that we have in the show. Some people are really well-versed in the cloud native technologies and Kubernetes and some people are completely not. Some people are asking really good questions out there, and I try to too as I’m one of those people who are still learning. So my question for you is what do you think people are asking that they are not asking the right frame, that you wish they would be asking that question in a different way. [00:02:45] CM: It’s a very interesting question. I don’t think there’s any bad questions in the world, but one question I encountered a fair bit is, “Hey, I’ve heard about this Kubernetes thing and I want one.” I’m not sure it’s actually the right question, right? Kubernetes is a powerful technology. I definitely think we’re in this sort of peak hype phase of the project. There are a set of opportunities that Kubernetes really brings a much more robust ability to manage, it abstracts a way infrastructure — there are some very powerful things. But to be able to be really successful with Kubernetes project, there’re a number of additional ingredients that really need to be thought through. The questions that ought to be asked are, "I understand the utility of Kubernetes and I believe that it would bring value to my organization, but do I have the skills and capabilities necessary to stand up and run a successful Kubernetes program?" That’s something to really think about. It’s not just about the nature of the technology, but it really brings in a lot of new concepts that challenge organizations. If we think about applications that exist in Kubernetes, there’s challenges with observability. When you think the mechanics of delivering into a containerized sort of environment, there are a lot of dos and don’ts that make a ton of sense there. A lot of organizations I’ve worked with are excited about the technology, but they don’t necessarily have the depth of understanding of where it's best used and then how to operate it. The second addendum to that is, “Okay, I’m able to deploy Kubernetes, but what happens the next day? What happens if I need to update it? When I need to maintain it? What happens when I discover that I need not one Kubernetes cluster or even 10 Kubernetes clusters, but a hundred or a thousand or 10,000.” Which is what we are starting to see out there in the industry. “Have I taken the right first step on that journey to set me up for success in the long-term?” I do think there’s just a tremendous amount of opportunity and excitement around the technology, but also think it’s something that organizations really need to look at as not just about deploying a platform technology, but introducing the necessary skills that are necessary to operate and maintain it and the supporting technologies that are necessary to get the workloads on to it in a sustainable way. [00:04:42] JR: You’ve raised a number of assumptions around how people think about it I think, which are interesting. Even just starting with the idea of the packaging problem that represents containerization is a reasonable start. So infrequently, do we describe like the context of the problems that — all of the problems that Kubernetes solve that frequently I think people just get way ahead of themselves. It’s a pretty good description. [00:05:04] DC: So maybe in a similar vein, Craig, we had mentioned all the pieces that go into running Kubernetes successfully. You have to bolt some things on maybe for security or do some things to ensure observability as adequate, and it seems like the ecosystem has taken notice of all those needs and has built a million projects and products around that space. I’m curious of your thoughts on that because it’s like in one way it’s great because it shows it’s really healthy and thriving. In another way, it causes a lot of fragmentation and confusion for people who are thinking whether they can or cannot run Ku, because there are so many options out there to accomplish those kinds of things. So I was just curious of your general thoughts on that and where it’s headed. [00:05:43] CM: It’s fascinating to see the sort of burgeoning ecosystem around Kubernetes, and I think it’s heartening, because if you think at the very highest level, the world is going to go one of two ways with the introduction of the hyper-scale public cloud. It’s either going to lead us into a world which feels like mainframe era again, where no one ever got [inaudible 00:06:01] Amazon in this case, or by Microsoft, whatever the case. Whoever sort of merges over time as the dominant force. But it also represents some challenges where you have these vertically integrated closed systems, innovation becomes prohibitively difficult. It’s hard to innovate in a closed system, because you’re innovating only for organizations that have already taken that dependancy. I think Kubernetes has opened it up, not just in terms of the world of applications that can run Kubernetes, but also this burgeoning ecosystem of supporting technologies that can create value. There’s a reason why startups are building around Kubernetes. There’s a reason they’re looking to solve these problems. I do think we’ll see a continued period of consolidation. You're not a cool mainstream enterprise software provider if you don’t have a Kubernetes story today. I think we’ll start to see continued focus and consolidation around a set of the larger organizations that are operating in this space. It’s not accidental that Heptio is a part of VMware at this point. When I looked at the ecosystem, it was pretty clear we need to take a boat to fully materialize the value of Kubernetes and I am pleased to be part of this organization. So I do think you’ll start to see a variety of different vendors emerging with a pretty clear, well-defined opinions and relatively turnkey solutions that address the gamut of capabilities. One organization needs to get into Kubernetes. One of the things that delights me about Kubernetes is that if you are a sophisticated organization that is self-identifying as a software company, and this is sort of manifest in the internet space if you’re running a sort of hyper-scale internet service, you are kind of by definition a software company. You probably have the skills on hand to make great choices around what projects, follow the communities, identify when things are reaching point of critical mass. You’re running in a space where your system is relatively homogenous. You don’t have just the sort of massive gamut of workloads, a lot of dimension enterprise organizations have. There’s going to be different approaches to the ecosystem depending on which organization is looking at the problem space. I do think this is prohibitively challenging for a lot of organizations that are not resourced at the level of a hyper-scale internet company from a technology perspective, where their day job isn’t running a production service for millions or billions of users. I do think situations like that, it makes a tremendous amount of sense to identify and work with someone you trust in the ecosystem, that can help you just navigate the wild map that is the Kubernetes landscape, that can participate in a number of these emerging communities that has the ability to put their thumb on the scale where necessary to make sure that things converge. I think it’s situational. I think the lovely thing about Kubernetes is that it does give organizations a chance to cut their teeth without having to dig into like a deep procurement cyclewith a major vendor. We see a lot of self-service Kubernetes projects getting initiated. But at some point, almost inevitably, people need a little bit more help, and that’s the role of a lot of these vendors. I think that I truly hope that I’m personally committed to, is that as we start to see the convergence of this ecosystem, as we start to see the pieces falling into place, that we retain an emphasis on the value of community that we also sort of avoid the balkanization and fragmentation, which sometimes comes out of these types of systems. We are so much better served as a software company if we can preserve consistency from environment to environment. The reality is as we start looking at large organizations, enterprises that are consuming Kubernetes, it’s almost inevitable that they’re going to be consuming Kubernetes from a number of different sources. Whether the sources are cloud provider delivering Kubernetes services or whether they handle Kubernetes clusters that are dedicated centralized IT team is delivering or whether it’s vendor provided Kubernetes. There’s going to be a lot of different flavors and variants on it. I think working within the community not as king makers, but as concerned citizens that are looking to make sure that there are very high-levels of consistency from offering to offering, means that our customers are going to be better served. We’re right now in a time where this technology is burgeoning. It’s highly scrutinized, but it’s not necessarily very widely deployed. So I think it’s important to just keep an eye on that sort of community centricity. Stay as true to our stream as possible. Avoid balkanization, and I think everyone will benefit from that. [00:10:16] DC: Makes sense. One of the things I took away from my year, I was just looking kind of back at my year and learning, consolidating my thoughts on what had happened. One of the big takeaways for me in my customer engagements this year was that a number of customers outright came out explicitly and said, “Our success as a company is not going to be measured by our ability to operate Kubernetes, which is true and obvious.” But at the same time, I think that that’s a really interesting moment of awareness for a lot of the people that I work with out there in the field, where they realized, you know what, Kubernetes may be the next best thing. It may be an incredible technology, but fundamentally, it’s not going to be the measure by which we are graded success. It’s going to be what we do on top of that that is more interesting. So I think that your point about that ecosystem is large enough that people will be consuming Kubernetes for multiple searches is sort of amplified by that, because people are going to look for that easy button as inroad. They’re going to look for some way to get the Kubernetes thing so that they can actually start exploring what will happen on top of it as their primary goal rather than how to get Kubernetes from an operational perspective or even understand the care and feeding of it because they don’t see that as the primary measure of success. [00:11:33] CM: That is entirely true. When I think about enterprise software, there’s sort of these three pieces of it. The first piece is the sort of crystaline core of enterprise software. That’s consistent from enterprise to enterprise to enterprise. It’s purchased from primary vendors or it’s built by open source communities. It represents a significant basis for everything. There’s the sort of peripheral, the sort of sea of applications that exist around that enterprises built that are entirely unique to their environment, and they’re relatively fluid. Then there’s this weird sort of interstitial layer, which is the integration glue that exists between their crystalline core and those applications and operating practices that enterprises create. So I think from my side, we benefit if that crystalline core is as large as possible so that enterprises don’t have to rely on bespoke integration practices as much possible. We also need to make allowances for the idea that that interstitial layer between the sort of core of a technology like Kubernetes and the applications may be modular or sort of extended by a variety of different vendors. If you’re operating in this space, like the telco space, your problems are going to be unique to telco, but they’re going to be shared by every other telco provider. One of the beautiful things about Kubernetes is it is sufficiently modular, it is a pretty well-thought resistant. So I think we will start to see a lot of specialization in terms of those integration pieces. A lot of specialization in terms of how Kubernetes is fit to a specific area, and I think that represents an awful opportunity for the community to continue to evolve. But I also think it means that we as contributors to the project need to make allowances for that. We can’t hold opinion to the point where it precludes massive significant value for organizations as they look at modularized and extending the platform. [00:13:19] CC: What is your opinion on people making specialized Kubernetes operating systems? For example, we’re talking about telcos. I think there’s a Kubernetes OSS specifically for telcos that strip away things that kind of industry doesn’t need. What are the tradeoffs that you see? [00:13:39] CM: It’s almost inevitable that you’re going to start to see specialized operating system distributions that are tailored to container-based workloads. I think as we start looking at like the telco space with network function virtualization, Kubernetes promises to be something that we never really saw before. At the end of the day, telco is very broadly deployed open stack as this primary substrate for network function virtualization. But at the end of the day, they ended up not just deploying one rendition of open stack. But in many cases, three, four, five, depending on what functions they wanted to run, and there wasn’t a sufficient commonality in terms of the implementations. It became very sort of vendor-centric and balkanized in many ways. I think there’s an opportunity here to work hard as a community to drive convergence around a lot of those Kubernetes constructs so that, sure, the operating system is going to be different. If you’re running an NFV data plane implementation, doing a lot of bit slinging, it’s going to look fundamentally different to anything else in the industry, right? But that shouldn’t necessarily mean that you can’t use the same tools to organize, manage and reason about the workloads. A lot of the innovations that happen above that shouldn’t necessarily be tied to that. I think there’s promise there and it’s going to be an amazing test for Kubernetes itself to see how well it scales into those environments. By and large, I’m a fan of rendered down, container-optimized operating system distributions. There’s a lot of utility there, but I think we also need to be practical and recognize that enterprises have gotten comfortable with the OS landscape that they have. So we have to make allowances that as part of containerizing and distributing your application, maybe you don’t necessarily need to and hopefully re-qualify the underlying OS and challenge a lot of the assumptions. So I think we just need to pragmatic about it. [00:15:19] DC: I know that’s a dear topic to Josh and I. We’ve fought that battle in the past as well. I do think it’s another one of those things where it’s a set of assumptions. It’s fascinating to me how many different ecosystems are sort of collapsing, maybe not ecosystems. How many different audiences are brought together by a technology like container orchestration. That you are having that conversation with, “You know what? Let’s just change the paradigm for operating systems.” That you are having that conversation with, “Let’s change the paradigm for observability and lifecycle stuff. Let’s change the paradigm for packaging. We’ll call it containers.” You know what I mean? It’s so many big changes in one idea. It’s crazy. [00:15:54] CM: It’s a little daunting if you think about it, right? I always say, change is easiest across one dimension, right? If I’m going to change everything all at once across all the dimensions, life gets really hard. I think, again, it’s one of these things where Kubernetes represents a lot of value. I walk into a lot of customer accounts and I spend a lot of time with customers. I think based on their experiences, they sort of make one of two assumptions. There’s a set of vendors that will come into an environment and say, “Hey, just run this tool against your virtual machine images – and Kubernetes, right?” Then they have another set of vendors that will come in and say, “Yeah. Hey, you just need to go like turn this thing into 12 factor cloud native service mesh-linked applications driven through CICD, and your life is magic.” There are some cases where it makes sense, but there’re some cases where it just doesn’t. Hey, what uses a 24 gigabyte container? Is that really solving the problems that you have in some systematic way? At the other end of the spectrum, like there’s no world in which an enterprise organization is rewriting 3,000, 5,000 applications to be cloud native from the ground up. It just is not going to happen, right? So just understanding the return investment associated with the migration into Kubernetes. I’m not saying where it make sense and where it doesn’t. It’s such an important part of this story. [00:17:03] JR: On that front, and this is something Duffy and I talk to our customers about all the time. Say you’re sitting with someone and you’re talking about potentially using Kubernetes or they’re thinking about it, are there like some key indicators that you see, Craig, as like, “Okay. Maybe Kubernetes does have that return on investment pretty soon to justify it." Or maybe even in the reverse, like some things where you think, “Okay, these people are just going to implement Kubernetes and it’s going to become shelf weary.” How do you qualify as an org, “I might be ready to bring on something like Kubernetes.” [00:17:32] CM: It’s interesting. For me, it’s almost inevitably – as much about the human skills as anything else. I mean, the technology itself isn’t rocket science. I think the sort of critical success criteria, when I start looking at engagement, is there a cultural understanding of what Kubernetes represents? Kubernetes is not easy to use. That initial [inaudible 00:17:52] to the face is kind of painful for people that are used to different experiences. Making sure that the basic skills and expectations are met is really important. I think there’s definitely some sort of acid test around workloads fit as you start looking at Kubernetes. It’s an evolving ecosystem and it’s maturing pretty rapidly, but there are still areas that need a little bit more heavy lifting, right? So if you think about like, “Hey, I want to run a vertically-scaled OLTP database in Kubernetes today.” I don’t know. Maybe not the best choice. If the customer knows that, if they have enough familiarity or they’re willing to engage, I think it makes a tremendous amount of sense. By and large, the biggest challenge I see is not so much in the Kubernetes space. It’s easy enough to get to a basic cluster. There’re sort of two dimensions to this, there is day two operations. I see a lot of organizations that have worked to create scale up programs of platform technologies. Before Kubernetes there was Mesos and there’s obviously PCF that we’ll be coming more increasingly involved in. Organizations that have chewed on creating and deploying a standardized platform often have the operational skills, but you also need to look at like why did that previous technology really meet sort of criteria, and do you have the skills to operate it on a day two basis? Often there’s not – They’ve worked out the day two operational issues, but they still haven’t figured out like what it means to create a modern software supply chain that can deliver into the Kubernetes space. They haven’t figured out necessarily how to create the right incentive structures and experiences for the developers that are looking to build, package and deliver into that environment. That’s probably the biggest point of frustration I see with enterprises, is, “Okay. I got to Kubernetes. Now what?” That question just hasn’t been answered. They haven’t really thought through, “These are the CICD processes. This is how you engage your cyber team to qualify the platform for these classes of workloads. This is how you set up a container repo and run scans against it. This is how you assign TTL on images, so you don’t just get massive repo.” There’s so much in the application domain that just needs to exist that I think people often trivialize and it’s really taking the time and picking a couple of projects being measured in the investments. Making sure you have the right kind of cultural profile of teams that are engaged. Create that sort of celebratory moment of success. Make sure that the team is sort of metricking and communicating the productivity improvements, etc. That really drives the option and engagement with the whole customer base. [00:20:11] CC: It sounds to me like you have a book in the making. [00:20:13] CM: Oh! I will never write a book. It just seems like a lot of work. Brendan and a buch of my friends write books. Yeah, that seems like a whole lot of work. [00:20:22] DC: You had mentioned that you decided you wanted to work with Joe again. You formed Heptio. I was actually there for a year. I think I was around for a bit longer than that obviously. I’m curious what your thoughts about that were as an experiment win. If you just think about it as that part of the journey, do you think that was a success and what did you learn from that whole experiment that you wished everybody knew, just from a business perspective? It might have been business or it might have been running a company, any of that stuff. [00:20:45] CM: So I’m very happy with the way that Heptio went. There were a few things that sort of stood out for me as things that folks should think about if they’re going to start a startup or they want to join a startup. The first and foremost I would say is design the culture to the problem at hand. Culture isn’t accidental. I think that Heptio had a pretty distinct and nice culture, and I don’t want to sound self-congratulatory. I mean, as with anything, a certain amount of this is work, but a lot of it is luck as well. Making sure that the cultural identity of the company is well-suited to the problem at-hand. This is critical, right? When I think about what Heptio embodied, it was really tailored to the specific journey that we were setting ourselves up for. We were looking to be passionate advocates for Kubernetes. We were looking to walk the journey with our customers in an authentic way. We were looking to create a company that was built around sustainability. I think the culture is good and I encourage folks either the thing you’re starting is a startup or looking to join one, to think hard about that culture and how it’s going to map to the problems they’re trying to solve. The other thing that I think really motivated me to do Heptio, and I think this is something that I’m really excited to continue on with VMware, was the opportunity to walk the journey with customers. So many startups have this massive reticence to really engage deeply in professional services. In many ways, Google is fun. I had a blast there. It’s a great company to work for. We were able to build out some really cool tech and do good things. But I grew kind of tired of writing letters from the future. I was, “Okay, we are flying cars." When you're interacting with the customer. I can’t start my car and get to work. It’s great that you have flying cars, but right now I just need to get in my car, drive down the block and get out and get to work. So walking the journey with customers is probably the most important learning from Heptio and it’s one of the things I’m kind of most proud of. That opportunity to share the pain. Get involved from day one. Look at that as your most valuable apparatus to not just build your business, but also to learn what you need to build. Having a really smart set of people that are comfortable working directly with customers or invested in the success of those customers is so powerful. So if you’re in the business or in the startup game, investors may be leery of building out a significant professional service as a function, because that’s just how Silicon Valley works. But it is absolutely imperative in terms of your ability to engage with customers, particularly around nascent technologies, filled with gaps where the product doesn’t exist. Learn from those experiences and bring that back into the core product. It’s just a huge part of what we did. If I was ever in a situation where I had to advice a startup in the sort of open source space, I’d say lean into the professional service. Lean into field engineering. It’s a critical way to build your business. Learn what customers need. Walk the journey with them and just develop a deep empathy. [00:23:31] CC: With new technology, that was a concern about having enough professionals in the market who are knowledgeable in that new technology. There is always a gap for people to catch up with that. So I’m curious to know what customers or companies, prospective customers, how they are thinking in terms of finding professionals to help them? Are they’re concerned that there’s enough professionals in the market? Are they finding that the current people who are admins and operators are having an easy time because their skills are transferable, if they’re going to embark on the Kubernetes journey? What are they telling you? [00:24:13] CM: I mean, there’s a huge skills shortage. This is one of the kind of primary threats to the short term adoption of Kubernetes. I think Kubernetes will ultimately permeate enterprise organizations. I think it will become a standard for distributed systems development. Effectively emerging as an operating system for distributed systems, is people build more natively around Kubernetes. But right now it’s like the early days of Linux, where you deploy Linux, you’d have to kind of build it from scratch type of thing. It is definitely a challenge. For enterprise organizations, it’s interesting, because there’s a war for talent. There’s just this incredible appetite for Kubernetes talent. There’s always that old joke around the job description for like 10 years of Kubernetes experience on a five-year project. That certainly is something we see a lot. I’d take it from two sides. One is recognizing that as an enterprise organization, you are not going to be able to hire this talent. Just accept that sad truth. You can hire a seed crystal for it, but you really need to look at that as something that you’re going to build out as an enablement function for your own consumption. As you start assessing individuals that you’re going to bring on in that role, don’t just assess for Kubernetes talent. Assess for the ability to teach. Look for people that can come in and not just do, but teach and enable others to do it, right? Because at the end of the day, if you need like 50 Kubernauts at a certain level, so does your competitor and all of your other competitors. So does every other function out there. There’s just massive shortage of skills. So emphasizing your own – taking on the responsibility of building your own expertise. Educating your own organization. Finding ways to identify people that are motivated by this type of technology and creating space for them and recognizing and rewarding their work as they build this out. Because it’s far more practical to hire into existing skillset and then create space so that the people that have the appetite and capability to really absorb these types of disruptive technologies can do so within the parameters of your organization. Create the structures to support them and then make it their job to help permeate that knowledge and information into the organization. It’s just not something you can just bring in. The skills just don’t exist in the broader world. Then for professionals that are interested in Kubernetes, this is definitely a field that I think we’ll see a lot of job security for a very long-time. Taking on that effort, it’s just well worth the journey. Then I’d say the other piece of this is for vendors like VMware, our job can’t be just delivering skills and delivering technology. We need to think about our role as an enablers in the ecosystem as folks that are helping not just build up our own expertise of Kubernetes that we can represent to customers, but we’re well-served by our customers developing their own expertise. It’s not a threat to us. It actually enables them to consume the technologies that we provide. So focusing on that enablement through us as integration partners and [inaudible] community, focusing on enablement for our customers and education programs and the things that they need to start building out their capacity internally, is going to serve us all well. [00:27:22] JR: Something going back to maybe the Heptio conversation, I’m super interested in this. Being a very open source-oriented company, at VMware this is of course this true as well. We have to engage with large groups of humans from all different kinds of companies and we have to do that while building and shipping product to some degree. So where I’m going with this is like – I remember back in the Heptio days, there was something with dynamic audit logging that we were struggling with, and we needed it for some project we were working on. But we needed to get consensus in a designed approve at like a bigger community level. I do know to some degree that did limit our ability to ship quickly. So you probably know where I’m going with this. When you’re working on projects or products, how do you balance, making sure the whole community is coming along with you, but also making sure that you can actually ship something? [00:28:08] DM: That harkens back to that sort of catch phrase that Tim Sinclair always uses. If you want to go fast, go alone. If you want to go far, go together. I think as with almost everything in the world, these things are situational, right? There are situations where it is so critical that you bring the community along with you that you don’t find yourself carrying the load for something by yourself that you just have to accept and absorb that it’s going to be pushing string. Working with an engaged community necessitates consensus, necessitates buy-in not just from you, but from potentially your competitors. The people that you’re working with and recognizing that they’ll be doing their own sort of mental calculus around whether this advantages them or not and whatnot. But hopefully, I think certainly in the Kubernetes community, this is general recognition that making the underlying technology accessible. Making it ubiquitous, making it intrinsically supportable profits everyone. I think there’re a couple of things that I look at. Make the decision pretty early on as to whether this is something you want to kind of spark off and sort of stride off on your own an innovate around, whether it’s something that’s critical to bring the community along with you around. I’ll give you two examples of this, right? One example was the work we did around technologies like Valero, which is a backup restore product. It was an urgent and critical need to provide a sustainable way to back up and recover Kubernetes. So we didn’t have the time to do this through Kubernetes. But also it didn’t necessarily matter, because everything we’re doing was build this addendum to Kubernetes. That project created a lot of value and we’ve donated to open source project. Anyone can use it. But we took on the commitment to drive the development ourselves. It’s not just we need it to. Because we had to push very quickly in that space. Whereas if you look at the work that we’re doing around things like cluster API and the sort of broader provisioning of Kubernetes, it’s so important that the ecosystem avoids the tragedy of the commons around things like lifecycle management. It’s so important that we as a community converge on a consistent way to reason about the deployment upgrade and scaling of Kubernetes clusters. For any single vendor to try to do that by themselves, they’re going to take on the responsibility of dealing with not just one or two environments if you’re a hyperscale cloud provider [inaudible 00:30:27] many can do that. But we think about doing that for, in our case, “Hey, we only deploy into vSphere. Not just what’s coming next, but also earlier versions of vSphere. We need to be able to deploy into all of the hyper-scalers. We need to deploy into some of the emerging cloud providers. We need to start reasoning about edge. We need to start thinking about all of these. We’re a big company and we have a lot of engineers. But you’re going to get stretched very thin, very quickly if you try to chew that off by yourself. So I think a lot of it is situational. I think there are situations where it does pay for organizations to kind of innovate, charge off in a new direction. Run an experiment. See if it sticks. Over time, open that up to the community as it makes sense. The thing that I think is most important is that you just wear your heart on your sleeve. The worst thing you can do is to present a charter that, “Hey, we’re doing this as a community-centric, open project with open design, open community, open source,” and then change your mind later, because that just creates dramas. I think it’s situational. Pick the path that makes sense to the problem at-hand. Figure out how long your customer can wait for something. Sometimes you can bring things back to communities that are very open and accepting community. You can look at it as an experiment, and if it makes sense in that experiment perform factor, present it back to the Kubernetes communities and see if you can kind of get it back in. But in some case it just makes sense to work within the structure and constraints of the community and just accept that great things from a community angle take a lot of time. [00:31:51] CC: I think too, one additional thing that I don’t think was mentioned is that if a project grows too big, you can always break it off. I mean, Kubernetes is such a great example of that. Break it off into separate components. Break it off into separate governance groups, and then parts can move with different speeds. [00:32:09] CM: Yeah, and there’s all kinds of options. So the heart of it is no one rule, right? It’s entirely situational. What are you trying to accomplish on what arise and acknowledge and accept that the evolution of the core of Kubernetes is slowing as it should. That’s a signal that the project is maturing. You cannot deliver value at a longer timeline that your business or your customers can absorb then maybe it makes sense to do something on the outside. Just wear your heart on your sleeve and make sure your customers and your partners know what you’re doing. [00:32:36] DC: One of your earlier points about how do companies – I think Josh's question and was around how do companies attract talent. You’re basically pointing, and I think that there are some relation to this particular topic because, frequently, I’ve seen companies find some success by making room for open source or upstream engineers to focus on the Kubernetes piece and to help drive that adoption internally. So if you’re going to adopt something like a Kubernetes strategy as part of a larger company goal, if you can actually make room within your organization to bring people who are – or to support people who want to focus on that up stream, I think that you get a lot of ancillary benefits from that, including it makes it easier to adopt that technology and understand it and actually have some more skin in the game around where the open source project itself is going. [00:33:25] CM: Yeah, absolutely. I think one of the lovely things about the Kubernetes community is this idea of your position is earned, not granted, right? The way that you earn influence and leadership and basically the good will of everyone else in that community is by chopping wood, carrying water. Doing the things that are good for the community. Over time, any organization, any human being can become influential and lead based on their merits of their contributions. It’s important that vendors think about that. But at the same time, I have a hard time taking exception with practically any use of open source. At the end of the day, open source by its nature is a leap of faith. You’re making that technology accessible. If someone else can take it, operationalize it well and deliver value for organizations, that’s part of your contract. That’s what you absorb as a vendor when you start the thing. So people shouldn’t feel like they have to. But if you want to influence and lead, you do need to. Participate in these communities in an open way. [00:34:22] DC: When you were helping form the CNCF and some of those projects, did you foresee it being like a driving goal for people, not just vendors, but also like consumers of the technologies associated with those foundations? [00:34:34] CM: Yeah, it was interesting. Starting the CNCF, I can speak from the position of where I was inside Google. I was highly motivated by the success of Kubernetes. Not just personally motivated, because it was a project that I was working on. I was motivated to see it emerge as a standard for distributed systems development that attracts the way the infrastructure provider. I’m not ashamed of it. It was entirely self-serving. If you looked at Google’s market position at that time, if you looked at where we were as a hyper-scale cloud provider. Instituting something that enabled the intrinsic mobility of workloads and could shuffle around the cards on the deck so to speak [inaudible 00:35:09]. I also felt very privileged that that was our position, because we didn’t necessarily have to create artificial structures or constraints around the controls of the system, because that process of getting something to become ubiquitous, there’s a natural path if you approach it as a single provider. I’m not saying who couldn’t have succeeded with Kubernetes as a single provider. But if Red Hat and IBM and Microsoft and Amazon had all piled on to something else, it’s less obvious, right? It’s less obvious that Kubernetes would have gone as far as it did. So I was setting up CNCF, I was highly motivated by preserving the neutrality. Creating structures that separated the various sort of forms of governance. I always joke that at the time of creating CNCF, I was motivated by the way the U.S. Constitution is structured. Where you have these sort of different checks and balances. So I wanted to have something that would separate vendor interests from things that are maintaining taste on the discreet project. The sort of architecture integrity, and maintain separation from customer segments, so that you’d create the sort of natural self-balancing system. It was definitely in my thinking, and I think it worked out pretty well. Certainly not perfect, but it did lead down a path which I think has supported the success of the project a fair bit. [00:36:26] DC: So we talked a lot about Kubernetes. I’m curious, do you have some thoughts, Carlisia? [00:36:31] CC: Actually, I know you have a question about microliths. I was very interested in exploring that. [00:36:37] CM: There’s an interesting pattern that I see out there in the industry and this manifests in a lot of different ways, right? When you think about the process of bringing applications and workloads into Kubernetes, there’s this sort of pre-dispositional bias towards, “Hey, I’ve got this monolithic application. It’s vertically scaled. I’m having a hard time with the sort of team structure. So I’m going to start tuning it up into a set of microservices that I can then manage discretely and ideally evolve on a separate cadence. This is an example of a real customer situation where someone said, “Hey, I’ve just broken this monolith down into 27 microservices.” So I was sort of asking a couple of questions. The first one was when you have to update those 27 – if you want to update one of those, how many do you have to touch? The answer was 27. I was like, “Ha! You just created a microlith.” It’s like a monolith, except it’s just harder to live with. You’re taking a packaging problem and turn it into a massively complicated orchestration problem. I always use that jokingly, but there’s something real there, which is there’s a lot of secondary things you need to think through as you start progressing on this cloud native journey. In the case of microservice development, it’s one thing to have API separated microservices. That’s easy enough to institute. But instituting the organization controls around an API versioning strategy such you can start to establish stable API with consistent schema and being able to sort of manage the dependencies to consuming teams requires a level of sophistication that a lot of organizations haven’t necessarily thought through. So it’s very easy to just sort of get caught up in the hype without necessarily thinking through what happens downstream. It’s funny. I see the same thing in functions, right? I interact with organizations and they’re like, “Wow! We took this thing that was running in a container and we turned it into 15 different functions.” I’m like, “Ha! Okay.” You start asking questions like, “Well, do you have any challenges with state coherency?” They’re like, “Yeah! It’s funny you say that. Because these things are a little bit less transactionally coherent, we have to write state watches. So we try and sort of watermark state and watch this thing." I’m like, “You’re building a distributed transaction coordinator on your free time. Is this really the best use of your resources?" Right? So it really gets back to that idea that there’s a different tool for a different job. Sometimes the tool is a virtual machine. Sometimes it’s not. Sometimes the tool is a bare metal deployment. If you’re building a quantitative trading application that’s microsecond latency sensitive, you probably don’t want to hypervisor there. Sometimes a VM is the natural destination and there’s no reason to move from a VM. Sometimes it’s a container. Sometimes you want to start looking at that container and just modularizing it so you can run a set of things next to each other in the same process space. Sometimes you’re going to want to put APIs between those things and separate them out into separate containers. There’s an ROI. There’s a cause and there’s a benefit associated with each of those transitions. More importantly, there are a set of skills that you have to have as you start looking at their continuum and making sure that you’re making good choices and being wise about it. [00:39:36] CC: That is a very good observation. Design is such an important part of software development. I wonder if Kubernetes helps mask these design problems. For example, the ones you are mentioning, or does Kubernetes sort of surfaces them even more? [00:39:53] CM: It’s an interesting philosophical question. Kubernetes certainly masks some problems. I ran into an early – this is like years ago. I ran into an early customer, who confided in me, "I think we’re writing worse code now." I was like, ”What do you mean?” He was like, “Well, it used to be when we went out of memory on something, we get paged. Now we’ve set out that we go and it just restarts the container and everything continuous.” There’s no real incentive for the engineers to actually go back and deal with the underlying issues and recourse it, because the system is just more intrinsically robust and self-healing by nature. I think there's definitely some problems that Kubernetes will compound. If you’re very sloppy with your dependencies, if you create a really large, vertically scaled monolith that’s running at VM today, putting it in a container is probably strictly going to make your life worse. Just be respectful of that. But at the same time, I do think that the discipline associated with transition to Kubernetes, if you walk it a little bit further along. If you start thinking about the fact that you’re not running a lot of imperative processes during a production in a push, where deployment container is effectively a bin copy with some minimal post-deployment configuration changes that happen. It sort of leads you on to a much happier path naturally. I think it can mask some issues, but by and large, the types of systems you end up building are going to be more intrinsically operationally stable and scalable. But it is also worth recognizing that it’s — you are going to encounter corner cases. I’ve run into a lot of customers that will push the envelope in a direction that was unanticipated by the community or they accidentally find themselves on new ground that’s just unstable, because the technology is relatively nascent. So just recognizing that if you’re going to walk down a new path, I’m not saying don’t, just recognize that you’re probably going to encounter some stuff that’s going to take over to working through. [00:41:41] DC: We get an earlier episode about API contracts, which I think highlights some of these stuff as well, because it sort of gets into some of those sharp edges of like why some of those things are super important when you start thinking about microservices and stuff. We’re coming to the end of our time, but one of the last questions I want to ask you, we’ve talked a lot about Kubernetes in this episode, I’m curious what the future holds. We see a lot of really interesting things happening in the ecosystem around moving more towards serverless. There are a lot of people who are like — thinking that perhaps a better line would be to move away from like infrastructure offering and just basically allow cloud providers in this stuff to manage your nodes for you. We have a few shots on goal for that ourselves. It’s been really an interesting evolution over the last year in that space. I’m curious, what sort of lifetime would you ascribe to it today? What do you think that this is going to be the thing in 10 years? Do you think it will be a thing in 5 years? What do you see coming that might change it? [00:42:32] CM: It’s interesting. Well, first of all, I think 2018 was the largest year ever for mainframe sales. So we have these technologies, once they’re in enterprise, it tends to be pretty durable. The duty cycle of enterprise software technology is pretty long-lived. The real question is we’ve seen a lot of technologies in this space emerge, ascend, reach a point of critical mass and then fade and they’re disrupted by the technologies. Is Kubernetes going to be a Linux or is Kubernetes going to be a Mesos, right? I mean, I don’t claim to know the answer. My belief, and I think this is probably true, is that it’s more like a Linux. When you think about the heart of what Kubernetes is doing, is it’s just providing a better way to build and organized distributed systems. I’m sure that the code will evolve rapidly and I’m sure there will be a lot of continued innovation enhancement. But when you start thinking about the fact that what Kubernetes has really done is brought controller reconciler based management to distributed systems developed everywhere. When you think about the fact that pretty much every system these days is distributed by nature, it really needs something that supports that model. So I think we will see Kubernetes sticking. We’ll see it become richer. We’ll start to see it becoming more applicable for a lot of things that we’re starting to just running in VMs. It may well continue to run in VMs and just be managed by Kubernetes. I don’t have an opinion about how to reason about the underlying OS and virtualization structure. The thing I do have opinion about is it makes a ton of sense to be able to use a declarative framework. Use a set of well-structured controllers and reconcilers to drive your world into a non-desired state. I think that pattern will be – it’s been quite successful. It can be quite durable. I think we’ll start to see organizations embrace a lot of these technologies over time. It is possible that something brighter, shinier, newer, comes along. Anyone will tell you that we made enough mistakes during the journey and there is stuff that I think everyone regret some of the Kubernetes train. I do think it’s likely to be pretty durable. I don’t think it’s a silver bullet. Nothing is, right? It’s like any of these technologies, there’s always the cost and there’s a benefit associated with it. The benefits are relatively well-understood. But there’s going to be different tools to do different jobs. There’s going to be new patterns that emerge that simplify things. Is Kubernetes the best framework for running functions? I don’t know. Maybe. Kind of like what the [inaudible] people are doing. But are there more intrinsically optimal ways to do this, maybe. I don’t know. [00:45:02] JR: It has been interesting watching Kubernetes itself evolve in that moving target. Some of the other technologies I’ve seen kind of stagnate on their one solution and don’t grow further. But that’s definitely not what I see within this community. It’s like always coming up with something new. Anyway, thank you very much for your time. That was an incredible session. [00:45:22] CM: Yeah. Thank you. It’s always fun to chat. [00:45:24] CC: Yeah. We’ll definitely have you back, Craig. Yes, we are coming up at the end, but I do want to ask if you have any thoughts that you haven’t brought up or we haven’t brought up that you’d like to share with the audience of this podcast. [00:45:39] CM: I guess the one thing that was going through my head earlier I didn’t say which is as you look at these technologies, there’s sort of these two duty cycles. There’s the hype duty cycle, where technology ascends in awareness and everyone looks at it as an answer to all the everythings. Then there’s the readiness duty cycle, which is sometimes offset. I do think we’re certainly peak hype right now in Kubernetes if you attended KubeCon. I do think there’s perhaps a gap between the promise and the reality for a lot of organizations. It's always just council caution and just be judicious about how you approach this. It’s a very powerful technology and I see a very bright future for it. Thanks for your time. [00:46:17] CC: Really, thank you so much. It’s so refreshing to hear from you. You have great thoughts. With that, thank you very much. We will see you next week. [00:46:28] JR: Thanks, everybody. See you. [00:46:29] DC: Cheers, folks. [END OF INTERVIEW] [00:46:31] ANNOUNCER: Thank you for listening to The Podlets Cloud Native Podcast. Find us on Twitter at https://twitter.com/ThePodlets and on the http://thepodlets.io/ website, where you'll find transcripts and show notes. We'll be back next week. Stay tuned by subscribing. [END]See omnystudio.com/listener for privacy information.

Drill to Detail
Drill to Detail Ep.66 'ETL, Incorta and the Death of the Star Schema' with Special Guest Matthew Halliday

Drill to Detail

Play Episode Listen Later May 27, 2019 50:24


Mark Rittman is joined by Matthew Halliday to talk about the challenge of ETL and analytics on complex relational OLTP data models, previous attempts to solve these problems with products such as Oracle Essbase and Oracle E-Business Suite Extensions for Oracle Endeca and how those experiences, and others, led to his current role as co-founder and VP of Products at Incorta.The Death of the Star Schema: 3 Key Innovations Driving the Rapid DemiseAccelerating Analytics with Direct Data MappingAccelerating Operational Reporting & Analytics for Oracle E-Business Suite (EBS)The Good, the Bad, and the Ugly of Extract Transform Load (ETL)E-Business Suite Extensions for Endeca: Technical ConsiderationsThe Pain of Operational Reporting Solutions for Oracle E-Business Suite (EBS)

The Future of Data Podcast | conversation with leaders, influencers, and change makers in the World of Data & Analytics

In this podcast @AndyPalmer from @Tamr sat with @Vishaltx from @AnalyticsWeek to talk about the emergence/need/market for Data Ops, a specialized capability emerging from merging data engineering and dev ops ecosystem due to increased convoluted data silos and complicated processes. Andy shared his journey on what some of the businesses and their leaders are doing wrong and how businesses need to rethink their data silos to future proof themselves. This is a good podcast for any data leader thinking about cracking the code on getting high-quality insights from data. Timelines: 0:28 Andy's journey. 4:56 What's Tamr? 6:38 What's Andy's role in Tamr. 8:16 What's data ops? 13:07 Right time for business to incorporate data ops. 15:56 Data exhaust vs. data ops. 21:05 Tips for executives in dealing with data. 23:15 Suggestions for businesses working with data. 25:48 Creating buy-in for experimenting with new technologies. 28:47 Using data ops for the acquisition of new companies. 31:58 Data ops vs. dev ops. 36:40 Big opportunities in data science. 39:35 AI and data ops. 44:28 Parameters for a successful start-up. 47:49 What still surprises Andy? 50:19 Andy's success mantra. 52:48 Andy's favorite reads. 54:25 Final remarks. Andy's Recommended Read: Enlightenment Now: The Case for Reason, Science, Humanism, and Progress by Steven Pinker https://amzn.to/2Lc6WqK The Three-Body Problem by Cixin Liu and Ken Liu https://amzn.to/2rQyPvp Andy's BIO: Andy Palmer is a serial entrepreneur who specializes in accelerating the growth of mission-driven startups. Andy has helped found and/or fund more than 50 innovative companies in technology, health care, and the life sciences. Andy's unique blend of strategic perspective and disciplined tactical execution is suited to environments where uncertainty is the rule rather than the exception. Andy has a specific passion for projects at the intersection of computer science and the life sciences. Most recently, Andy co-founded Tamr, a next-generation data curation company, and Koa Labs, a start-up club in the heart of Harvard Square, Cambridge, MA. Specialties: Software, Sales & Marketing, Web Services, Service Oriented Architecture, Drug Discovery, Database, Data Warehouse, Analytics, Startup, Entrepreneurship, Informatics, Enterprise Software, OLTP, Science, Internet, eCommerce, Venture Capital, Bootstrapping, Founding Team, Venture Capital firm, Software companies, early-stage venture, corporate development, venture-backed, venture capital fund, world-class, stage venture capital About #Podcast: #FutureOfData podcast is a conversation starter to bring leaders, influencers, and lead practitioners to discuss their journey to create the data-driven future. Podcast link: https://futureofdata.org/emergence-of-dataops-age-andypalmer-futureofdata-podcast/ Wanna Join? If you or any you know wants to join in, Register your interest and email at info@analyticsweek.com Want to sponsor? Email us @ info@analyticsweek.com Keywords: #FutureOfData #DataAnalytics #Leadership #Podcast #BigData #Strategy

Life After Business
3 IPOs – How to Look at Your Company Through Wall Street's Eyes

Life After Business

Play Episode Listen Later Jul 5, 2018 60:59


Roger Sippl is the founder and former CEO of Informix Software. Roger began his career in computer science during the early days of the computer age. He was diagnosed with Hodgkin’s Lymphoma during college, Roger explains how this life and death situation changed how he approaches business. After the health scare, Roger switched his major to computer science and found a need for cleaner and more efficient database management software. We discuss what the software business was like in the early days and how it has changed over time. Roger has a lot of experience with running public companies and building companies to sell. He shares what he liked about his time in the software business and why he decided to focus more on investing and business mentoring. Roger also has some useful advice for new entrepreneurs who want to build a lasting company. You will learn about: Roger's business background. The cancer diagnosis and how it changed his life. Why Roger switched to the software business. His goals for the company in the early days. Why Informix became a public company. What it was like running a public company. Why Roger left Informix. How the software industry has changed over the years. What Roger considers when looking to invest in a company. The common red flags Roger sees when he evaluates a business. Roger's parting advice for the audience. Takeaways: Today's biggest takeaway is to be aware that every business has a relevance window. Your company's value will change with the market. Make sure you are prepared to sell your company when it is the most valuable. Links and Resources Roger Sippl Creative Writing About Roger Sippl Roger Sippl is a Silicon Valley software pioneer, entrepreneur, and innovator. His 30 years of contributions have helped shape the enterprise software technology landscape of today. In 1980 he founded Informix Software, and was CEO for 10 years, taking it public in 1986. Under his leadership, Informix pioneered SQL relational databases, report generators, screen data entry packages, 4GL application development tools, and scalable OLTP database technology. It is now a part of IBM, after peaking at a $4B market cap as a public company. Sippl was also co-founder and Chairman of The Vantive Corporation. Vantive became a leader in CRM, became a public company, peaked at a $1B market cap, and is now a part of PeopleSoft/Oracle. In 1993, he founded and was CEO of Visigenic Software, helping pioneer distributed object computing and the concept of the application server (based on CORBA, prior to the J2EE standard) in enterprises. Visigenic was acquired by Borland, after becoming a public company. After the Visigenic IPO Mr. Sippl earned the “Golden Hat Trick Award” from Cristina Morgan at JP Morgan/Hambrecht and Quist for three Silicon Valley IPOs. In the mid-nineties, Sippl became a founding partner of Sippl Macdonald Ventures. He invested in several successful software companies, including Illustra (acquired by Informix), Broadvision (IPO), SupportSoft (IPO) and Red Pepper (acquired by PeopleSoft). In 2002, Sippl founded Above All Software, a composite application platform that used web services and service-oriented architecture (SOA).