POPULARITY
Send us a textCloud security threats continue to evolve at an alarming pace, with state-sponsored actors developing increasingly sophisticated attack strategies. We dive into the emergence of China's Silk Typhoon group, which represents the concerning evolution of previous Salt Typhoon attacks. While initially targeting service provider infrastructure, these attackers are now leveraging stolen credentials to compromise enterprise cloud accounts through password spraying and API key theft. This progression demonstrates why encryption through provider networks is essential and why organizations must remain vigilant even when threats initially appear to target only their service providers.Europe is making bold moves toward cloud standardization with the Sovereign European Cloud API (SECA) initiative. This collaborative effort between European cloud providers aims to create true interoperability across cloud platforms, potentially ending vendor lock-in for organizations operating in the EU. Drawing parallels to the USB-C standardization for mobile devices, this regulatory approach could force major cloud service providers to adapt their proprietary interfaces to maintain access to the European market. While technical challenges remain significant given the diverse service offerings across providers, the economic importance of Europe means this initiative deserves close attention as it could fundamentally change how organizations interact with cloud infrastructure globally.The Kubernetes security landscape is evolving beyond traditional cluster protection with Aviatrix's launch of its Kubernetes Cloud Firewall. Rather than competing in the crowded space of intra-cluster security, this solution addresses the often-overlooked challenge of securing egress traffic and integrations between Kubernetes workloads and legacy systems. By reading the Kubernetes API to build security policies based on native attributes like pods and namespaces, the firewall helps organizations manage the reality that few environments are purely containerized. Looking to enhance your cloud security posture across hybrid environments? Subscribe to our podcast for more insights and visit https://www.cables2clouds.com for comprehensive show notes and resources.Check out our book!https://www.amazon.com/Certified-Advanced-Networking-Certification-certification/dp/1835080839/Check out the Fortnightly Cloud Networking Newshttps://docs.google.com/document/d/1fkBWCGwXDUX9OfZ9_MvSVup8tJJzJeqrauaE6VPT2b0/Visit our website and subscribe: https://www.cables2clouds.com/Follow us on Twitter: https://twitter.com/cables2cloudsFollow us on YouTube: https://www.youtube.com/@cables2clouds/Follow us on TikTok: https://www.tiktok.com/@cables2cloudsMerch Store: https://store.cables2clouds.com/Join the Discord Study group: https://artofneteng.com/iaatj
As the original architect and API design lead of Kubernetes, Brian joins the show to chat about why "APIs are forever", the keys to evangelizing impactful projects, and being an Uber Tech at Google, and more. Segments: (00:03:01) Internship with Mark Ewing (00:07:10) “Mark and Brian's Excellent Environment” manual (00:11:58) Poker on VT100 terminals (00:14:46) Grad school and research (00:17:23) The value of studying computer science (00:21:07) Intuition and learning (00:24:06) Reflecting on career patterns (00:26:37) Hypergrowth and learning at Transmeta (00:28:37) Debugging at the atomic level (00:34:27) Evangelizing multithreading at Google (00:39:56) The humble beginnings of Borg and Kubernetes (00:47:10) The concept of inertia in system design (00:50:07) The genesis of Kubernetes (00:53:45) The open-source proposal (00:57:25) The Unified Compute Working Group (01:02:16) Designing the Kubernetes API (01:05:03) AIP.dev and API design conventions (01:08:02) The vision for a declarative model in Kubernetes (01:17:25) Kubernetes as a DIY platform (01:19:07) The evolution of Kubernetes (01:21:40) The complexity of building a platform (01:25:11) Style guides? (01:28:23) Gotchas in Kubernetes workload APIs (01:32:02) Understanding your thinking style (01:35:37) Reflections on Kubernetes design choices (01:44:08) The importance of getting it right the first time (01:48:13) Designing for flexibility (01:51:16) Collaboration and leadership (01:52:21) The role of an Uber tech lead at Google (01:56:33) “Giving away the Legos” (02:02:29) Picking the right person to hand off (02:06:41) Overcoming writer's block Show Notes: API Design conventions: https://google.aip.dev/ Brian's blog: https://medium.com/@bgrant0607 Stay in touch:
In the season's final episode, hosts Lois Houston and Nikita Abraham interview senior OCI instructor Mahendra Mehra about the security practices that are vital for OKE clusters on OCI. Mahendra shares his expert insights on the importance of Kubernetes security, especially in today's digital landscape where the integrity of data and applications is paramount. OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode. --------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:26 Nikita: Welcome to the Oracle University Podcast! I'm Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! In our last episode, we spoke about self-managed nodes and how you can manage Kubernetes deployments. Nikita: Today is the final episode of this series on OCI Container Engine for Kubernetes. We're going to look at the security side of things and discuss how you can implement vital security practices for your OKE clusters on OCI, and safeguard your infrastructure and data. 00:59 Lois: That's right, Niki! We can't overstate the importance of Kubernetes security, especially in today's digital landscape, where the integrity of your data and applications is paramount. With us today is senior OCI instructor, Mahendra Mehra, who will take us through Kubernetes security and compliance practices. Hi Mahendra! It's great to have you here. I want to jump right in and ask you, how can users add a service account authentication token to a kubeconfig file? Mahendra: When you set up the kubeconfig file for a cluster, by default, it contains an Oracle Cloud Infrastructure CLI command to generate a short-lived, cluster-scoped, user-specific authentication token. The authentication token generated by the CLI command is appropriate to authenticate individual users accessing the cluster using kubectl and the Kubernetes Dashboard. However, the generated authentication token is not appropriate to authenticate processes and tools accessing the cluster, such as continuous integration and continuous delivery tools. To ensure access to the cluster, such tools require long-lived non-user-specific authentication tokens. One solution is to use a Kubernetes service account. Having created a service account, you bind it to a cluster role binding that has cluster administration permissions. You can create an authentication token for this service account, which is stored as a Kubernetes secret. You can then add the service account as a user definition in the kubeconfig file itself. Other tools can then use this service account authentication token when accessing the cluster. 02:47 Nikita: So, as I understand it, adding a service account authentication token to a kubeconfig file enhances security and enables automated tools to interact seamlessly with your Kubernetes cluster. So, let's talk about the permissions users need to access clusters they have created using Container Engine for Kubernetes. Mahendra: For most operations on Container Engine for Kubernetes clusters, IAM leverages the concept of groups. A user's permissions are determined by the IAM groups they belong to, including dynamic groups. The access rights for these groups are defined by policies. IAM provides granular control over various cluster operations, such as the ability to create or delete clusters, add, remove, or modify node pool, and dictate the Kubernetes object create, delete, view operations a user can perform. All these controls are specified at the group and policy levels. In addition to IAM, the Kubernetes role-based access control authorizer can enforce additional fine-grained access control for users on specific clusters via Kubernetes RBAC roles and ClusterRoles. 04:03 Nikita: What are Kubernetes RBAC roles and ClusterRoles, Mahendra? Mahendra: Roles here defines permissions for resources within a specific namespace and ClusterRole is a global object that will provide access to global objects as well as non-resource URLs, such as API version and health endpoints on the API server. Kubernetes RBAC also includes RoleBindings and ClusterRoleBindings. RoleBinding grants permission to subjects, which can be a user, service, or group interacting with the Kubernetes API. It specified an allowed operation for a given subject in the cluster. RoleBinding is always created in a specific namespace. When associated with a role, it provides users permission specified within that role related to the objects within that namespace. When associated with a ClusterRole, it provides access to namespaced objects only defined within that cluster rule and related to the roles namespace. ClusterRoleBinding, on the other hand, is a global object. It associates cluster roles with users, groups, and service accounts. But it cannot be associated with a namespaced role. ClusterRoleBinding is used to provide access to global objects, non-namespaced objects, or to namespaced objects in all namespaces. 05:36 Lois: Mahendra, what's IAM's role in this? How do IAM and Kubernetes RBAC work together? Mahendra: IAM provides broader permissions, while Kubernetes RBAC offers fine-grained control. Users authorized either by IAM or Kubernetes RBAC can perform Kubernetes operations. When a user attempts to perform any operation on a cluster, except for create role and create cluster role operations, IAM first determines whether a group or dynamic group to which the user belongs has the appropriate and sufficient permissions. If so, the operation succeeds. If the attempted operation also requires additional permissions granted via a Kubernetes RBAC role or cluster role, the Kubernetes RBAC authorizer then determines whether the user or group has been granted the appropriate Kubernetes role or Kubernetes ClusterRoles. 06:41 Lois: OK. What kind of permissions do users need to define custom Kubernetes RBAC rules and ClusterRoles? Mahendra: It's common to define custom Kubernetes RBAC rules and ClusterRoles for precise control. To create these, a user must have existing roles or ClusterRoles with equal or higher privileges. By default, users don't have any RBAC roles assigned. But there are default roles like cluster admin or super user privileges. 07:12 Nikita: I want to ask you about securing and handling sensitive information within Kubernetes clusters, and ensuring a robust security posture. What can you tell us about this? Mahendra: When creating Kubernetes clusters using OCI Container Engine for Kubernetes, there are two fundamental approaches to store application secrets. We can opt for storing and managing secrets in an external secrets store accessed seamlessly through the Kubernetes Secrets Store CSI driver. Alternatively, we have the option of storing Kubernetes secret objects directly in etcd. 07:53 Lois: OK, let's tackle them one by one. What can you tell us about the first method, storing secrets in an external secret store? Mahendra: This integration allows Kubernetes clusters to mount multiple secrets, keys, and certificates into pods as volumes. The Kubernetes Secrets Store CSI driver facilitates seamless integration between our Kubernetes clusters and external secret stores. With the Secrets Store CSI driver, our Kubernetes clusters can mount and manage multiple secrets, keys, and certificates from external sources. These are accessible as volumes, making it easy to incorporate them into our application containers. OCI Vault is a notable external secrets store. And Oracle provides the Oracle Secrets Store CSI driver provider to enable Kubernetes clusters to seamlessly access secrets stored in Vault. 08:54 Nikita: And what about the second method? How can we store secrets as Kubernetes secret objects in etcd? Mahendra: In this approach, we store and manage our application secrets using Kubernetes secret objects. These objects are directly managed within etcd, the distributed key value store used for Kubernetes cluster coordination and state management. In OKE, etcd reads and writes data to and from block storage volumes in OCI block volume service. By default, OCI ensures security of our secrets and etcd data by encrypting it at rest. Oracle handles this encryption automatically, providing a secure environment for our secrets. Oracle takes responsibility for managing the master encryption key for data at rest, including etcd and Kubernetes secrets. This ensures the integrity and security of our stored secrets. If needed, there are options for users to manage the master encryption key themselves. 10:06 Lois: OK. We understand that managing secrets is a critical aspect of maintaining a secure Kubernetes environment, and one that users should not take lightly. Can we talk about OKE Container Image Security? What essential characteristics should container images possess to fortify the security posture of a user's applications? Mahendra: In the dynamic landscape of containerized applications, ensuring the security of containerized images is paramount. It is not uncommon for the operating system packages included in images to have vulnerabilities. Managing these vulnerabilities enables you to strengthen the security posture of your system and respond quickly when new vulnerabilities are discovered. You can set up Oracle Cloud Infrastructure Registry, also known as Container Registry, to scan images in a repository for security vulnerabilities published in the publicly available Common Vulnerabilities and Exposures Database. 11:10 Lois: And how is this done? Is it automatic? Mahendra: To perform image scanning, Container Registry makes use of the Oracle Cloud Infrastructure Vulnerability Scanning Service and Vulnerability Scanning REST API. When new vulnerabilities are added to the CVE database, the container registry initiates automatic rescanning of images in repositories that have scanning enabled. 11:41 Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we're offering both the course and certification for free! So, don't miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That's https://education.oracle.com/genai. 12:20 Nikita: Welcome back! Mahendra, what are the benefits of image scanning? Mahendra: You can gain valuable insights into each image scan conducted over the past 13 months. This includes an overview of the number of vulnerabilities detected and an overall risk assessment for each scan. Additionally, you can delve into comprehensive details of each scan featuring descriptions of individual vulnerabilities, their associated risk levels, and direct links to the CVE database for more comprehensive information. This historical and detailed data empowers you to monitor, compare, and enhance image security over time. You can also disable image scanning on a particular repository by removing the image scanner. 13:11 Nikita: Another characteristic that container images should have is unaltered integrity, right? Mahendra: For compliance and security reasons, system administrators often want to deploy software into a production system. Only when they are satisfied that the software has not been modified since it was published compromising its integrity. Ensuring the unaltered integrity of software is paramount for compliance and security in production environment. 13:41 Lois: Mahendra, what are the mechanisms that guarantee this integrity within the context of Oracle Cloud Infrastructure? Mahendra: Image signatures play a pivotal role in not only verifying the source of an image but also ensuring its integrity. Oracle's Container Registry facilitates this process by allowing users or systems to push images and sign them using a master encryption key sourced from the OCI Vault. It's worth noting that an image can have multiple signatures, each associated with a distinct master encryption key. These signatures are uniquely tied to an image OCID, providing granularity to the verification process. Furthermore, the process of image signing mandates the use of an RSA asymmetric key from the OCI Vault, ensuring a robust and secure validation of the image's unaltered integrity. 14:45 Nikita: In the context of container images, how can users ensure the use of trusted sources within OCI? Mahendra: System administrators need the assurance that the software being deployed in a production system originates from a source they trust. Signed images play a pivotal role, providing a means to verify both the source and the integrity of the image. To further strengthen this, administrators can create image verification policies for clusters, specifying which master encryption keys must have been used to sign images. This enhances security by configuring container engine for Kubernetes clusters to allow the deployment of images signed with specific encryption keys from Oracle Cloud Infrastructure Registry. Users or systems retrieving signed images from OCIR can trust the source and be confident in the image's integrity. 15:46 Lois: Why is it imperative for users to use signed images from Oracle Cloud Infrastructure Registry when deploying applications to a Container Engine for Kubernetes cluster? Mahendra: This practice is crucial for ensuring the integrity and authenticity of the deployed images. To achieve this enforcement. It's important to note that an image in OCIR can have multiple signatures, each linked to a different master encryption key. This multikey association adds layers of security to the verification process. A cluster's image verification policy comes into play, allowing administrators to specify up to five master encryption keys. This policy serves as a guideline for the cluster, dictating which keys are deemed valid for image signatures. If a cluster's image verification policy doesn't explicitly specify encryption keys, any signed image can be pulled regardless of the key used. Any unsigned image can also be pulled potentially compromising the security measures. 16:56 Lois: Mahendra, can you break down the essential permissions required to bolster security measures within a user's OKE clusters? Mahendra: To enable clusters to include master encryption key in image verification policies, you must give clusters permission to use keys from OCI Vault. For example, to grant this permission to a particular cluster in the tenancy, we must use the policy—allow any user to use keys in tenancy where request.user.id is set to the cluster's OCID. Additionally, for clusters to seamlessly pull signed images from Oracle Cloud Infrastructure Registry, it's vital to provide permissions for accessing repositories in OCIR. 17:43 Lois: I know this may sound like a lot, but OKE container image security is vital for safeguarding your containerized applications. Thank you so much, Mahendra, for being with us through the season and taking us through all of these important concepts. Nikita: To learn more about the topics covered today, visit mylearn.oracle.com and search for the OCI Container Engine for Kubernetes Specialist course. Join us next week for another episode of the Oracle University Podcast. Until then, this is Nikita Abraham… Lois Houston: And Lois Houston, signing off! 18:16 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
Curious about how OCI Container Engine for Kubernetes (OKE) can transform the way your development team builds, deploys, and manages cloud-native applications? Listen to hosts Lois Houston and Nikita Abraham explore OKE's key features and benefits with senior OCI instructor Mahendra Mehra. Mahendra breaks down complex concepts into digestible bits, making it easy for you to understand the magic behind OKE. OCI Container Engine for Kubernetes Specialist: https://mylearn.oracle.com/ou/course/oci-container-engine-for-kubernetes-specialist/134971/210836 Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ X (formerly Twitter): https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Radhika Banka, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00:00 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started! 00:25 Nikita: Hello and welcome to the Oracle University Podcast. I'm Nikita Abraham, Principal Technical Editor with Oracle University, and with me is Lois Houston, Director of Innovation Programs. Lois: Hi there! If you've been listening to us these last few weeks, you'll know we've been discussing containerization, the Oracle Cloud Infrastructure Registry, and the basics of Kubernetes. Today, we'll dive into the world of OCI Container Engine for Kubernetes, also referred to as OKE. Nikita: We're joined by Mahendra Mehra, a senior OCI instructor with Oracle University, who will take us through the key features and benefits of OKE and also talk about working with managed nodes. Hi Mahendra! Thanks for joining us today. 01:09 Lois: So, Mahendra, what is OKE exactly? Mahendra: Oracle Cloud Infrastructure Container Engine for Kubernetes is a fully managed, scalable, and highly available service that empowers you to effortlessly deploy your containerized applications to the cloud. But that's just the beginning. OKE can transform the way you and your development team build, deploy, and manage cloud native applications. 01:36 Nikita: What would you say are some of its most defining features? Mahendra: One of the defining features of OKE is the flexibility it offers. You can specify whether you want to run your applications on virtual nodes or opt for managed nodes. Regardless of your choice, Container Engine for Kubernetes will efficiently provision them within your existing OCI tenancy on Oracle Cloud Infrastructure. Creating OKE cluster is a breeze, and you have a couple of fantastic tools at your disposal-- the console and the rest API. These make it super easy to get started. OKE relies on Kubernetes, which is an open-source system that simplifies the deployment, scaling, and management of containerized applications across clusters of hosts. Kubernetes is an incredible system that groups containers into logical units known as pods. And these pods make managing and discovering your applications very simple. Not to mention, Container Engine for Kubernetes uses Kubernetes versions that are certified as conformant by the Cloud Native Computing Foundation, also abbreviated as CNCF. And here's the icing on the cake. Container Engine for Kubernetes is ISO-compliant. The other two ISO-IEC standards—27001, 27017, and 27018. That's your guarantee of a secure and reliable platform. 03:08 Lois: That's great. But how do you access all this power? Mahendra: You can define and create your Kubernetes cluster using the intuitive console and the robust rest API. Once your clusters are up and running, you can manage them using the Kubernetes command line, also known as kubectl, the user-friendly Kubernetes dashboard, and the powerful Kubernetes API. 03:32 Nikita: I love the idea of an intuitive console and being able to manage everything from a centralized place. Lois: Yeah, that's fantastic! Mahendra, can you talk us through the magic that happens behind the scenes? What's Oracle's role in all this? Mahendra: All the master nodes or control plane nodes are managed by Oracle. This includes components like etcd, the API server, and the controller manager among others. To ensure reliability, we make sure multiple copies of these master components are distributed across different availability domains. And we don't stop there. We also manage the Kubernetes dashboard and even handle the self-healing mechanism of both the cluster and the worker nodes. All of these are meticulously created and managed within your Oracle tenancy. 04:19 Lois: And what happens at the user's end? What is their responsibility? Mahendra: At your end, you have the power to manage your worker nodes. Using different compute shapes, you can create and control them in your own user tenancy. So, as you can see, it's a perfect blend of Oracle's expertise and your control. 04:38 Nikita: So, in your opinion, why should users consider OKE their go-to solution for all things Kubernetes? Mahendra: Imagine a world where building and maintaining Kubernetes environments, be it master nodes or worker nodes, is no longer complex, costly, or even time-consuming. OKE is here to make your life easier by seamlessly integrating Kubernetes with various container life cycle management products, which includes container registries, CI/CD frameworks, networking solutions, storage options, and top-notch security features. And speaking of security, OKE gives you the tools you need to manage and control team access to production clusters, ensuring granular access to Kubernetes cluster in a straightforward process. It empowers developers to deploy containers quickly, provides devops teams with visibility and control for seamless Kubernetes management, and brings together Kubernetes container orchestration with Oracle's advanced cloud infrastructure. This results in robust control, top tier security, IAM, and consistent performance. 05:50 Nikita: OK…a lot of benefits! Mahendra, I know there have been ongoing enhancements to the OKE service. So, when creating a new cluster with Container Engine for Kubernetes, what is the cluster type we should specify? Mahendra: The first type is the basic clusters. Basic clusters support all the core functionality provided by Kubernetes and Container Engine for Kubernetes. Basic clusters come with a service-level objective, but not a financially backed service level agreement. This means that Oracle guarantees a certain level of availability for the basic cluster, but there is no monetary compensation if that level is not met. On the other hand, we have the enhanced clusters. Enhanced clusters support all available features, including features not supported by basic clusters. 06:38 Lois: OK. So, can you tell us more about the features supported by enhanced clusters? Mahendra: As we move towards a more digitized world, the demand for infrastructure continues to rise. However, with virtual nodes, managing the infrastructure of your cluster becomes much simpler. The burden of manually scaling, upgrading, or troubleshooting worker nodes is removed, giving you more time to focus on your applications rather than the underlying infrastructure. Virtual nodes provide a great solution for managing large clusters with a high number of nodes that require frequent updates or scaling. With this feature, you can easily simplify the management of your cluster and focus on what really matters, that is your applications. Managing cluster add-ons can be a daunting task. But with enhanced clusters, you can now deploy and configure them in a more granular way. This means that you can manage both essential add-ons like CoreDNS and kube-proxy as well as a growing portfolio of optional add-ons like the Kubernetes Dashboard. With enhanced clusters, you have complete control over the add-ons you install or disable, the ability to select specific add-on versions, and the option to opt-in or opt-out of automatic updates by Oracle. You can also manage add-on specific customizations to tailor your cluster to meet the needs of your application. 08:05 Lois: Do users need to worry about deploying add-ons themselves? Mahendra: Oracle manages the lifecycle of add-ons so that you don't have to worry about deploying them yourself. This level of control over add-ons gives you the flexibility to customize your cluster to meet the unique needs of your applications, making managing your cluster a breeze. 08:25 Lois: What about scaling? Mahendra: Scaling your clusters to meet the demands of your workload can be a challenging task. However, with enhanced clusters, you can now provision more worker nodes in a single cluster, allowing you to deploy larger workloads on the same cluster which can lead to better resource utilization and lower operational overhead. Having fewer larger environments to secure, monitor, upgrade, and manage is generally more efficient and can help you save on cost. Remember, there are limits to the number of worker nodes supported on an enhanced cluster, so you should review the Container Engine for Kubernetes limits documentation and consider the additional considerations when defining enhanced clusters with large number of managed nodes. 09:09 Nikita: Ensuring the security of my cluster would be of utmost importance to me, right? How would I do that with enhanced clusters? Mahendra: With enhanced clusters, you can now strengthen cluster security through the use of workload identity. Workload identity enables you to define OCI IAM policies that authorize specific pods to make OCI API calls and access OCI resources. By scoping the policies to Kubernetes service account associated with application pods, you can now allow the applications running inside those pods to directly access the API based on the permissions provided by the policies. 09:48 Nikita: Mahendra, what type of uptime and server availability benefits do enhanced clusters provide? Mahendra: You can now rely on a financially backed service level agreement tied to Kubernetes API server uptime and availability. This means that you can expect a certain level of uptime and availability for your Kubernetes API server, and if it degrades below the stated SLA, you'll receive compensation. This provides an extra level of assurance and helps ensure that your cluster is highly available and performant. 10:20 Lois: Mahendra, do you have any tips for us to remember when creating basic and enhanced clusters? Mahendra: When using the console to create a cluster, a new cluster is created as an enhanced cluster by default unless you explicitly choose to create a basic cluster. If you don't select any enhanced features during cluster creation, you have the option to create the new cluster as a basic cluster. When using CLI or API to create a cluster, you can specify whether to create a basic cluster or an enhanced cluster. If you don't explicitly specify the type of cluster to create, a new cluster is created as a basic cluster by default. Creating a new cluster as an enhanced cluster enables you to easily add enhanced features later even if you didn't select any enhanced features initially. If you do choose to create a new cluster as a basic cluster, you can still choose to upgrade the basic cluster to an enhanced cluster later on. However, you cannot downgrade an enhanced cluster to a basic cluster. These points are really important while you consider selection of a basic cluster or an enhanced cluster for your usage. 11:34 Do you want to stay ahead of the curve in the ever-evolving AI landscape? Look no further than our brand-new OCI Generative AI Professional course and certification. For a limited time only, we're offering both the course and certification for free! So, don't miss out on this exclusive opportunity to get certified on Generative AI at no cost. Act fast because this offer is valid only until July 31, 2024. Visit https://education.oracle.com/genai to get started. That's https://education.oracle.com/genai. 12:13 Nikita: Welcome back! I want to move on to serverless Kubernetes with virtual nodes. But I think before we do that, we first need to have a basic understanding of what managed nodes are. Mahendra: Managed nodes run on compute instances within your tenancy, and are at least partly managed by you. In the context of Kubernetes, a node is a compute host that can be either a virtual machine or a bare metal host. As you are responsible for managing managed nodes, you have the flexibility to configure them to meet your specific requirements. You are responsible for upgrading Kubernetes on managed nodes and for managing cluster capacity. Nodes are responsible for running a collection of pods or containers, and they are comprised of two system components: the kubelet, which is the host brain, and the container runtime such as CRI-O, or containerd. 13:07 Nikita: Ok… so what are virtual nodes, then? Mahendra: Virtual nodes are fully managed and highly available nodes that look and act like real nodes to Kubernetes. They are built using the open source CNCF Virtual Kubelet Project, which provides the translation layer between OCI and Kubernetes. 13:25 Lois: So, what makes Oracle's managed virtual Kubernetes product different? Mahendra: OCI is the first major cloud provider to offer a fully managed virtual Kubelet product that provides a serverless Kubernetes experience through virtual nodes. Virtual nodes are configured by customers and are located within a single availability and fault domain within OCI. Virtual nodes have two main components: port management and container instance management. Virtual nodes delegates all the responsibility of managing the lifecycle of pods to virtual Kubernetes while on a managed node, the kubelet is responsible for managing all the lifecycle state. The key distinction of virtual nodes is that they support up to a 1,000 pods per virtual node with the expectation of supporting more in the future. 14:15 Nikita: What are the other benefits of virtual nodes? Mahendra: Virtual nodes offer a fully managed experience where customers don't have to worry about managing the underlying infrastructure of their containerized applications. Virtual nodes simplifies scaling patterns for customers. Customers can scale their containerized application up or down quickly without worrying about the underlying infrastructure, and they can focus solely on their applications. With virtual nodes, customers only pay for the resources that their containerized application use. This allows customers to optimize their costs and ensures that they are not paying for any unused resources. Virtual nodes can support over 10 times the number of pods that a normal node can. This means that customer can run more containerized applications on virtual nodes, which reduces operational burden and makes it easier to scale applications. Customers can leverage container instances in serverless offering from OCI to take advantage of many OCI functionalities natively. These functionalities include strong isolation and ultimate elasticity with respect to compute capacity. 15:26 Lois: When creating a cluster using Container Engine for Kubernetes, we have the flexibility to customize the worker nodes within the cluster, right? Could you tell us more about this customization? Mahendra: This customization includes specifying two key elements. Firstly, you can select the operating system image to be used for worker nodes. This image serves as a template for the worker node's virtual hard drive, and determines the operating system and other software installed. Secondly, you can choose the shape for your worker nodes. The shape defines the number of CPUs and the amount of memory allocated to each instance, ensuring it meets your specific requirements. This customization empowers you to tailor your OKE cluster to your exact needs. It is important to note that you can define and create OKE clusters using both the console and the REST API. This level of control is specially valuable for your development team when building, deploying, and managing cloud native applications. You have the option to specify whether applications should run on virtual nodes or managed nodes. And Container Engine for Kubernetes efficiently provisions them on Oracle Cloud Infrastructure within your existing OCI tenancy. This flexibility ensures that you can adapt your OKE cluster to suit the specific requirements of your projects and workloads. 16:56 Lois: Thank you so much, Mahendra, for giving us your time today. For more on the topics we discussed, visit mylearn.oracle.com and look for the OCI Container Engine for Kubernetes Specialist course. Join us next week as we dive deeper into working with OKE virtual nodes. Until then, this is Lois Houston… Nikita: And Nikita Abraham, signing off! 17:18 That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.
#268: Kubernetes has become the de facto standard for container orchestration, but its true strength lies in its API. Today, containers are prevalent, but tomorrow we might be dealing with a completely different runtime. The Kubernetes API, with its robust and flexible design, is poised to support these transitions seamlessly. In this episode, Darin and Viktor talk about a few of the different ways that Kubernetes is currently being used and also where it might be headed in the future. Today's sponsor: Save 25% on your first Barbaro Mojo order using the code DevOps25 https://barbaromojo.com/discount/DevOps25 YouTube channel: https://youtube.com/devopsparadox Review the podcast on Apple Podcasts: https://www.devopsparadox.com/review-podcast/ Slack: https://www.devopsparadox.com/slack/ Connect with us at: https://www.devopsparadox.com/contact
We talk with Nikhita Raghunath, Nabarun Pal, and Paco Xu. Nikhita, Nabarun, and Paco have each held various leadership positions related to the Kubernetes project. They talk about their journeys, the various leadership roles they've been in, and offer advice for new contributors and those who want to move into leadership in the project. Nikhita is a Staff Software Engineer at Broadcom. She is currently a member of the CNCF Technical Oversight Committee (TOC) overseeing all technical matters of the CNCF. In the past, she was a member of the Kubernetes Steering Committee, a technical lead for SIG Contributor Experience and has also won the CNCF Top Committer Award. Currently, she is also a co-chair of the KubeCon+CloudNativeCon conference. Nabarun is a Staff Software Engineer at Broadcom, a maintainer of the Kubernetes project, a member of the Kubernetes Steering Committee and a chair of Kubernetes SIG Contributor Experience. In the past, he was the release lead for Kubernetes 1.21 and has served eight release teams. Nabarun also works actively with the Python community by organizing PyCon India and has been recognized in media publications for his work. Paco is an open source team lead in DaoCloud. He started to work on container/docker in 2016 and later started to participate in the Kubernetes Community in 2018. He is a current member of Kubernetes Steering Committee and works mainly on kubeadm and sig-node. He is Co-chair of KubeCon+CloudNativeCon China 2024. Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod News of the week Blog: 10 Years of Kubernetes CNCF-Hosted Co-Located Events Overview CFP for CNCF-hosted Co-located Events Kubernetes Community Days Links from the interviews CNCF Technical Oversight Committee SIG ContribEx Google Summer of Code CNCF Top Committer Award 2021 - Nikhita Raghunath Blog Post: Google Summer of Code with Kubernetes by Nikhita Raghunath Kubernetes Docs: Extend the Kubernetes API with CustomResourceDefinitions SIG API Machinery SIG Testing SIG Release CNCF Chop Wood Carry Water Award 2018 - Nikhita Raghunath Kubernetes Steering Committee KubeCon India KubeCon NA Kubernetes 1.21: Power to the Community Pycon India Kubernetes Python Client on GitHub Kubernetes Contributor Summit 2019 YouTube Playlist Kubernetes Release Team KubeCon NA 2024 Scholarships (applications due by September 1, 2024) Kubeadm SIG Node KubeCon China 2024 Kubelet Kubernetes Production Readiness Review Process Kubernetes Release Team CI Signal Lead Runbook
Bret and Nirmal are joined by Neil Cresswell and Steven Kang from Portainer to look at K2D, a new project that enables us to leverage Kubernetes tooling to manage Docker containers on tiny devices at the far edge.K2D stands for Kubernetes to Docker, which is a bit of a crazy idea -- it's a partial Kubernetes API running on top of Docker Engine without needing a full Kubernetes control plane. If you work with very small devices, including older Raspberry PIs, 32-bit machines, maybe industry sensors and the infrastructure we now call 'edge', the container hardware is often hard for you to make simple, reliable, and automated all at the same time. So this project uses less resources than a single node K3S and still allows you to use Kubernetes tools to deploy and manage your containers, which are in fact just running on a Docker Engine with no full-fledged Kubernetes distribution going on there.We get into far more detail on the architecture, the Portainer team's motivations for this new open source project and what its limitations are, because it's not real Kubernetes, so it can't do everything.Be sure to check out the live recording of the complete show from March 28, 2024 on YouTube (Ep. 260). Includes demos.★Topics★K2D websiteK2D DocsCreators & Guests Cristi Cotovan - Editor Beth Fisher - Producer Bret Fisher - Host Neil Cresswell - Guest Nirmal Mehta - Host Steven Kang - Guest (00:00) - Intro (02:40) - Introducing the guests (03:56) - Why K2D? Architecture and Motivations (05:55) - How Efficient is K2D? (10:25) - K2D Architecture Explained: Components and Operations (20:42) - What Happens When Resources are Exhausted? (23:18) - K2D for Edge Deployment with Portainer or Argo CD (28:22) - K2D Future Roadmap (30:36) - Getting Started with K2D You can also support my free material by subscribing to my YouTube channel and my weekly newsletter at bret.news!Grab the best coupons for my Docker and Kubernetes courses.Join my cloud native DevOps community on Discord.Grab some merch at Bret's Loot BoxHomepage bretfisher.com
Kubevirt, a relatively new capability within Kubernetes, signifies a shift in the virtualization landscape, allowing operations teams to run KVM virtual machines nested in containers behind the Kubernetes API. This integration means that the Kubernetes API now encompasses the concept of virtual machines, enabling VM-based workloads to operate seamlessly within a cluster behind the API. This development addresses the challenge of transitioning traditional virtualized environments into cloud-native settings, where certain applications may resist containerization or require substantial investments for adaptation.The emerging era of virtualization simplifies the execution of virtual machines without concerning the underlying infrastructure, presenting various opportunities and use cases. Noteworthy advantages include simplified migration of legacy applications without the need for containerization, thereby reducing associated costs.Kubevirt 1.1, discussed at KubeCon in Chicago by Red Hat's Vladik Romanovsky and Nvidia's Ryan Hallisey, introduces features like memory hotplug and vCPU hotplug, emphasizing the stability of Kubevirt. The platform's stability now allows for the implementation of features that were previously constrained.Learn more from The New Stack about Kubevirt and the Cloud Native Computing Foundation:The Future of VMs on Kubernetes: Building on KubeVirtA Platform for KubernetesScaling Open Source Community by Getting Closer to Users
MLOps Coffee Sessions #178 with LLMs in Production Conference part 2 LLM on K8s Panel, Manjot Pahwa, Rahul Parundekar, and Patrick Barker hosted by Outerbounds, Inc.'s Shrinand Javadekar. // Abstract Large Language Models require a new set of tools... or do they? K8s is a beast and we like it that way. How can we best leverage all the battle-hardened tech that K8s has to offer to make sure that our LLMs go brrrrrrr. Let's talk about it in this chat. // Bio Shrinand Javadekar Shri Javadekar is currently an engineer at Outerbounds, focussed on building a fully managed, large-scale platform for running data-intensive ML/AI workloads. Earlier, he spent time trying to start an MLOps company for which he was a co-founder and head of engineering. He led the design, development, and operations of Kubernetes-based infrastructure at Intuit, running thousands of applications, built by hundreds of teams and transacting billions of $$. He has been a founding engineer of the Argo open-source project and also spent precious time at multiple startups that were acquired by large organizations like EMC/Dell and VMWare. Manjot Pahwa Manjot is an investor at Lightspeed India and focuses on SaaS and enterprise tech. She has had an operating career of over a decade within the space of fintech, SaaS, and developer tools spanning various geos such as the US, Singapore, and India. Before joining Lightspeed, Manjot headed Stripe in India, successfully obtaining the payment aggregator license, growing the team from ~10 to 100+, and driving acquisitions in the region during that time. Rahul Parundekar Rahul has 13+ years of experience building AI solutions and leading teams. He is passionate about building Artificial Intelligence (A.I.) solutions for improving the Human Experience. He is currently the founder of A.I. Hero - a platform to help you fix and enrich your data with ML. At AI Hero, he has also been a big proponent of declarative MLOps - using Kubernetes to operationalize the training and serving lifecycle of ML models and has published several tutorials on his Medium blog. Before AI Hero, he was the Director of Data Science (ML Engineering) at Figure-Eight (acquired by Appen), a data annotation company, where he built out a data pipeline and ML model serving architecture serving 36 models (NLP, Computer Vision, Audio, etc.) and traffic of up to 1M predictions per day. Patrick Barker Patrick started his career in Big Data back when that was cool, then moved into Kubernetes near its inception. He has put major features into the Kubernetes API and built several platforms on top of it. In recent years he has moved into AI, with a focus on distributed machine learning. He is now working with a startup to reshape the world of AI agents. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: https://www.angellist.com/venture/relay Foundation by Isaac Asimov: https://www.amazon.com/Foundation-Isaac-Asimov/dp/0553293354 AngelList Relay blog: https://www.angellist.com/blog/introducing-angellist-relay --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Shri on LinkedIn: https://www.linkedin.com/in/shrijavadekar/ Connect with Manjot on LinkedIn: https://www.linkedin.com/in/manjotpahwa/ Connect with Rahul on LinkedIn: https://www.linkedin.com/in/rparundekar/ Connect with Patrick on LinkedIn: https://www.linkedin.com/in/patrickbarkerco/
Bret and his co-host, Matt, are joined by Jason Dellaluce and Luca Guerra from Sysdig to talk about Falco, a tool I recommend for production clusters and knowing about any bad behavior on your servers. -------------------------------------★ Enroll now for my next Live course, GitHub Actions + Argo CD, scheduled for July 10-21. Go to bret.courses/autodeploy to sign up. ★------------------------------------Falco is a security tool I've mentioned multiple times on this show, because I mostly think that a low level security focused logging product is something that every production server needs. The ability to log unexpected events and behaviors on your Linux host is powerful and necessary to be able to audit what's really happening on your infrastructure outside of your app itself. Falco has been a CNCF incubating project for over four years, and I was immediately drawn to it in its early days, because it was container and Kubernetes aware and it could log and alert with default rules for everything, from someone starting a shell inside a container, to a bash history file being deleted, to a container trying to talk to the Kubernetes API. This episode will be useful for those of you new to tools like Falco and for those familiar with its basics, but also wanting to learn about newer features and use cases, which I did some learning on myself in this episode.Live recording of the complete show from April 6, 2023 is on YouTube (Ep. #210).★Topics★Falco websiteFalco on CNCFSupport this show and get exclusive benefits on Patreon, YouTube, or bretfisher.com!★Join my Community★New live course on CI automation and gitops deploymentsBest coupons for my Docker and Kubernetes coursesChat with us and fellow students on our Discord Server DevOps FansGrab some merch at Bret's Loot BoxHomepage bretfisher.comCreators & Guests Bret Fisher - Host Cristi Cotovan - Editor Beth Fisher - Producer Matt Williams - Host Jason Dellaluce - Guest Luca Guerra - Guest (00:00) - Intro (04:18) - Introducing the guests (07:19) - What is Falco? Why do we need it? (09:54) - What can Falco monitor? (19:05) - How are events logged? (32:53) - Does Falco classify alerts by severity?
Eswar Bala, Director of Amazon EKS at AWS, joins Corey on Screaming in the Cloud to discuss how and why AWS built a Kubernetes solution, and what customers are looking for out of Amazon EKS. Eswar reveals the concerns he sees from customers about the cost of Kubernetes, as well as the reasons customers adopt EKS over ECS. Eswar gives his reasoning on why he feels Kubernetes is here to stay and not just hype, as well as how AWS is working to reduce the complexity of Kubernetes. Corey and Eswar also explore the competitive landscape of Amazon EKS, and the new product offering from Amazon called Karpenter.About EswarEswar Bala is a Director of Engineering at Amazon and is responsible for Engineering, Operations, and Product strategy for Amazon Elastic Kubernetes Service (EKS). Eswar leads the Amazon EKS and EKS Anywhere teams that build, operate, and contribute to the services customers and partners use to deploy and operate Kubernetes and Kubernetes applications securely and at scale. With a 20+ year career in software , spanning multimedia, networking and container domains, he has built greenfield teams and launched new products multiple times.Links Referenced: Amazon EKS: https://aws.amazon.com/eks/ kubernetesthemuchharderway.com: https://kubernetesthemuchharderway.com kubernetestheeasyway.com: https://kubernetestheeasyway.com EKS documentation: https://docs.aws.amazon.com/eks/ EKS newsletter: https://eks.news/ EKS GitHub: https://github.com/aws/eks-distro TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: It's easy to **BEEP** up on AWS. Especially when you're managing your cloud environment on your own!Mission Cloud un **BEEP**s your apps and servers. Whatever you need in AWS, we can do it. Head to missioncloud.com for the AWS expertise you need. Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. Today's promoted guest episode is brought to us by our friends at Amazon. Now, Amazon is many things: they sell underpants, they sell books, they sell books about underpants, and underpants featuring pictures of books, but they also have a minor cloud computing problem. In fact, some people would call them a cloud computing company with a gift shop that's attached. Now, the problem with wanting to work at a cloud company is that their interviews are super challenging to pass.If you want to work there, but can't pass the technical interview for a long time, the way to solve that has been, “Ah, we're going to run Kubernetes so we get to LARP as if we worked at a cloud company but don't.” Eswar Bala is the Director of Engineering for Amazon EKS and is going to basically suffer my slings and arrows about one of the most complicated, and I would say overwrought, best practices that we're seeing industry-wide. Eswar, thank you for agreeing to subject yourself to this nonsense.Eswar: Hey, Corey, thanks for having me here.Corey: [laugh]. So, I'm a little bit unfair to Kubernetes because I wanted to make fun of it and ignore it. But then I started seeing it in every company that I deal with in one form or another. So yes, I can still sit here and shake my fist at the tide, but it's turned into, “Old Man Yells at Cloud,” which I'm thrilled to embrace, but everyone's using it. So, EKS has recently crossed, I believe, the five-year mark since it was initially launched. What is EKS other than Amazon's own flavor of Kubernetes?Eswar: You know, the best way I can define EKS is, EKS is just Kubernetes. Not Amazon's version of Kubernetes. It's just Kubernetes that we get from the community and offer it to customers to make it easier for them to consume. So, EKS. I've been with EKS from the very beginning when we thought about offering a managed Kubernetes service in 2017.And at that point, the goal was to bring Kubernetes to enterprise customers. So, we have many customers telling us that they want us to make their life easier by offering a managed version of Kubernetes that they've actually beginning to [erupt 00:02:42] at that time period, right? So, my goal was to figure it out, what does that service look like and which customer base should be targeting service towards.Corey: Kelsey Hightower has a fantastic learning tool out there in a GitHub repo called, “Kubernetes the Hard Way,” where he talks you through building the entire thing, start to finish. I wound up forking it and doing that on top of AWS, and you can find that at kubernetesthemuchharderway.com. And that was fun.And I went through the process and my response at the end was, “Why on earth would anyone ever do this more than once?” And we got that sorted out, but now it's—customers aren't really running these things from scratch. It's like the Linux from Scratch project. Great learning tool; probably don't run this in production in the same way that you might otherwise because there are better ways to solve for the problems that you will have to solve yourself when you're building these things from scratch. So, as I look across the ecosystem, it feels like EKS stands in the place of the heavy, undifferentiated lifting of running the Kubernetes control plane so customers functionally don't have to. Is that an effective summation of this?Eswar: That is precisely right. And I'm glad you mentioned, “Kubernetes the Hard Way,” I'm a big fan of that when it came out. And if anyone who did that tutorial, and also your tutorial, “Kubernetes the Harder Way,” would walk away thinking, “Why would I pick this technology when it's super complicated to setup?” But then you see that customers love Kubernetes and you see that reflected in the adoption, even in 2016, 2017 timeframes.And the reason is, it made life easier for application developers in terms of offering web services that they wanted to offer to their customer base. And because of all the features that Kubernetes brought on, application lifecycle management, service discoveries, and then it evolved to support various application architectures, right, in terms of stateless services, stateful applications, and even daemon sets, right, like for running your logging and metrics agents. And these are powerful features, at the end of the day, and that's what drove Kubernetes. And because it's super hard to get going to begin with and then to operate, the day-two operator experience is super complicated.Corey: And the day one experience is super hard and the day two experience of, “Okay, now I'm running it and something isn't working the way it used to. Where do I start,” has been just tremendously overwrought. And frankly, more than a little intimidating.Eswar: Exactly. Right? And that exactly was our opportunity when we started in 2017. And when we started, there was question on, okay, should we really build a service when you have an existing service like ECS in place? And by the way, like, I did work in ECS before I started working in EKS from the beginning.So, the answer then was, it was about giving what customers want. And their space for many container orchestration systems, right, ECS was the AWS service at that point in time. And our thinking was, how do we give customers what they wanted? They wanted a Kubernetes solution. Let's go build that. But we built it in a way that we remove the undifferentiated heavy lifting of managing Kubernetes.Corey: One of the weird things that I find is that everyone's using Kubernetes, but I don't see it in the way that I contextualize the AWS universe, which of course, is on the bill. That's right. If you don't charge for something in AWS Lambda, and preferably a fair bit, I don't tend to know it exists. Like, “What's an IAM and what might that possibly do?” Always have reassuring thing to hear from someone who's often called an expert in this space. But you know, if it doesn't cost money, why do I pay attention to it?The control plane is what EKS charges for, unless you're running a bunch of Fargate-managed pods and containers to wind up handling those things. So, it mostly just shows up as an addenda to the actual big, meaty portions of the belt. It just looks like a bunch of EC2 instances with some really weird behavior patterns, particularly with regard to auto-scaling and crosstalk between all of those various nodes. So, it's a little bit of a murder mystery, figuring out, “So, what's going on in this environment? Do you folks use containers at all?” And the entire Kubernetes shop is looking at me like, “Are you simple?”No, it's just I tend to disregard the lies that customers say, mostly to themselves because everyone has this idea of what's going on in their environment, but the bill speaks. It's always been a little bit of an investigation to get to the bottom of anything that involves Kubernetes at significant points of scale.Eswar: Yeah, you're right. Like if you look at EKS, right, like, we started with managing the control plane to begin with. And managing the control plane is a drop in the bucket when you actually look at the costs in terms of operating a Kubernetes cluster or running a Kubernetes cluster. When you look at how our customers use and where they spend most of their cost, it's about where their applications run; it's actually the Kubernetes data plane and the amount of compute and memory that the applications end of using end up driving 90% of the cost. And beyond that is the storage, beyond that as a networking costs, right, and then after that is the actual control plane costs. So, the problem right now is figuring out, how do we optimize our costs for the application to run on?Corey: On some level, it requires a little bit of understanding of what's going on under the hood. There have been a number of cost optimization efforts that have been made in the Kubernetes space, but they tend to focus around stuff that I find relatively, well, I call it banal because it basically is. You're looking at the idea of, okay, what size instances should you be running, and how well can you fill them and make sure that all the resources per node wind up being taken advantage of? But that's also something that, I guess from my perspective, isn't really the interesting architectural point of view. Whether or not you're running a bunch of small instances or a few big ones or some combination of the two, that doesn't really move the needle on any architectural shift, whereas ingesting a petabyte a month of data and passing 50 petabytes back and forth between availability zones, that's where it starts to get really interesting as far as tracking that stuff down.But what I don't see is a whole lot of energy or effort being put into that. And I mean, industry-wide, to be clear. I'm not attempting to call out Amazon specifically on this. That's [laugh] not the direction I'm taking this in. For once. I know, I'm still me. But it seems to be just an industry-wide issue, where zone affinity for Kubernetes has been a very low priority item, even on project roadmaps on the Kubernetes project.Eswar: Yeah, the Kubernetes does provide ability for customers to restrict their workloads within as particular [unintelligible 00:09:20], right? Like, there is constraints that you can place on your pod specs that end up driving applications towards a particular AZ if they want, right? You're right, it's still left to the customers to configure. Just because there's a configuration available doesn't mean the customers use it. If it's not defaulted, most of the time, it's not picked up.That's where it's important for service providers—like EKS—to offer ability to not only provide the visibility by means of reporting that it's available using tools like [Cue Cards 00:09:50] and Amazon Billing Explorer but also provide insights and recommendations on what customers can do. I agree that there's a gap today. For example in EKS, in terms of that. Like, we're slowly closing that gap and it's something that we're actively exploring. How do we provide insights across all the resources customers end up using from within a cluster? That includes not just compute and memory, but also storage and networking, right? And that's where we are actually moving towards at this point.Corey: That's part of the weird problem I've found is that, on some level, you get to play almost data center archaeologists when you start exploring what's going on in these environments. I found one of the only reliable ways to get answers to some of this stuff has been oral tradition of, “Okay, this Kubernetes cluster just starts hurling massive data quantities at 3 a.m. every day. What's causing that?” And it leads to, “Oh, no no, have you talked to the data science team,” like, “Oh, you have a data science team. A common AWS billing mistake.” And exploring down that particular path sometimes pays dividends. But there's no holistic way to solve that globally. Today. I'm optimistic about tomorrow, though.Eswar: Correct. And that's where we are spending our efforts right now. For example, we recently launched our partnership with Cue Cards, and Cue Cards is now available as an add-on from the Marketplace that you can easily install and provision on Kubernetes EKS clusters, for example. And that is a start. And Cue Cards is amazing in terms of features, in terms of insight it offers, right, it looking into computer, the memory, and the optimizations and insights it provides you.And we are also working with the AWS Cost and Usage Reporting team to provide a native AWS solution for the cost reporting and the insights aspect as well in EKS. And it's something that we are going to be working really closely to solve the networking gaps in the near future.Corey: What are you seeing as far as customer concerns go, with regard to cost and Kubernetes? I see some things, but let's be very clear here, I have a certain subset of the market that I spend an inordinate amount of time speaking to and I always worry that what I'm seeing is not holistically what's going on in the broader market. What are you seeing customers concerned about?Eswar: Well, let's start from the fundamentals here, right? Customers really want to get to market faster, whatever services and applications that they want to offer. And they want to have it cheaper to operate. And if they're adopting EKS, they want it cheaper to operate in Kubernetes in the cloud. They also want a high performance, they also want scalability, and they want security and isolation.There's so many parameters that they have to deal with before they put their service on the market and continue to operate. And there's a fundamental tension here, right? Like they want cost efficiency, but they also want to be available in the market quicker and they want performance and availability. Developers have uptime, SLOs, and SLAs is to consider and they want the maximum possible resources that they want. And on the other side, you've got financial leaders and the business leaders who want to look at the spending and worry about, like, okay, are we allocating our capital wisely? And are we allocating where it makes sense? And are we doing it in a manner that there's very little wastage and aligned with our customer use, for example? And this is where the actual problems arise from [unintelligible 00:13:00].Corey: I want to be very clear that for a long time, one of the most expensive parts about running Kubernetes has not been the infrastructure itself. It's been the people to run this responsibly, where it's the day two, day three experience where for an awful lot of companies like, oh, we're moving to Kubernetes because I don't know we read it in an in-flight magazine or something and all the cool kids are doing it, which honestly during the pandemic is why suddenly everyone started making better IT choices because they're execs were not being exposed to airport ads. I digress. The point, though, is that as customers are figuring this stuff out and playing around with it, it's not sustainable that every company that wants to run Kubernetes can afford a crack SRE team that is individually incredibly expensive and collectively staggeringly so. That it seems to be the real cost is the complexity tied to it.And EKS has been great in that it abstracts an awful lot of the control plane complexity away. But I still can't shake the feeling that running Kubernetes is mind-bogglingly complicated. Please argue with me and tell me I'm wrong.Eswar: No, you're right. It's still complicated. And it's a journey towards reducing the complexity. When we launched EKS, we launched only with managing the control plane to begin with. And that's where we started, but customers had the complexity of managing the worker nodes.And then we evolved to manage the Kubernetes worker nodes in terms two products: we've got Managed Node Groups and Fargate. And then customers moved on to installing more agents in their clusters before they actually installed their business applications, things like Cluster Autoscaler, things like Metric Server, critical components that they have come to rely on, but doesn't drive their business logic directly. They are supporting aspects of driving core business logic.And that's how we evolved into managing the add-ons to make life easier for our customers. And it's a journey where we continue to reduce the complexity of making it easier for customers to adopt Kubernetes. And once you cross that chasm—and we are still trying to cross it—once you cross it, you have the problem of, okay so, adopting Kubernetes is easy. Now, we have to operate it, right, which means that we need to provide better reporting tools, not just for costs, but also for operations. Like, how easy it is for customers to get to the application level metrics and how easy it is for customers to troubleshoot issues, how easy for customers to actually upgrade to newer versions of Kubernetes. All of these challenges come out beyond day one, right? And those are initiatives that we have in flight to make it easier for customers [unintelligible 00:15:39].Corey: So, one of the things I see when I start going deep into the Kubernetes ecosystem is, well, Kubernetes will go ahead and run the containers for me, but now I need to know what's going on in various areas around it. One of the big booms in the observability space, in many cases, has come from the fact that you now need to diagnose something in a container you can't log into and incidentally stopped existing 20 minutes for you got the alert about the issue, so you'd better hope your telemetry is up to snuff. Now, yes, that does act as a bit of a complexity burden, but on the other side of it, we don't have to worry about things like failed hard drives taking systems down anymore. That has successfully been abstracted away by Kubernetes, or you know, your cloud provider, but that's neither here nor there these days. What are you seeing as far as, effectively, the sidecar pattern, for example of, “Oh, you have too many containers and need to manage them? Have you considered running more containers?” Sounds like something a container salesman might say.Eswar: So, running containers demands that you have really solid observability tooling, things that you're able to troubleshoot—successfully—debug without the need to log into the containers itself. In fact, that's an anti-pattern, right? You really don't want a container to have the ability to SSH into a particular container, for example. And to be successful at it demands that you publish your metrics and you publish your logs. All of these are things that a developer needs to worry about today in order to adopt containers, for example.And it's on the service providers to actually make it easier for the developers not to worry about these. And all of these are available automatically when you adopt a Kubernetes service. For example, in EKS, we are working with our managed Prometheus service teams inside Amazon, right—and also CloudWatch teams—to easily enable metrics and logging for customers without having to do a lot of heavy lifting.Corey: Let's talk a little bit about the competitive landscape here. One of my biggest competitors in optimizing AWS bills is Microsoft Excel, specifically, people are going to go ahead and run it themselves because, “Eh, hiring someone who's really good at this, that sounds expensive. We can screw it up for half the cost.” Which is great. It seems to me that one of your biggest competitors is people running their own control plane, on some level.I don't tend to accept the narrative that, “Oh, EKS is expensive that winds up being what 35 bucks or 70 bucks or whatever it is per control plane per cluster on a monthly basis.” Okay, yes, that's expensive if you're trying to stay completely within a free tier perhaps, but if you're running anything that's even slightly revenue-generating or a for-profit company, you will spend far more than that just on people's time. I have no problems—for once—with the EKS pricing model, start to finish. Good work on that. You've successfully nailed it. But are you seeing significant pushback from the industry of, “Nope, we're going to run our own Kubernetes management system instead because we enjoy pain, corporately speaking.”Eswar: Actually, we are in a good spot there, right? Like, at this point, customers who choose to run Kubernetes on AWS by themselves and not adopt EKS just fall into one main category, so—or two main categories: number one, they have existing technical stack built on running Kubernetes on themselves and they'd rather maintain that and not moving to EKS. Or they demand certain custom configurations of the Kubernetes control plane that EKS doesn't support. And those are the only two reasons why we see customers not moving into EKS and prefer to run their own Kubernetes on AWS clusters.[midroll 00:19:46]Corey: It really does seem, on some level, like there's going to be a… I don't want to say reckoning because that makes it sound vaguely ominous and that's not the direction that I intend for things to go in, but there has to be some form of collapsing of the complexity that is inherent to all of this because the entire industry has always done that. An analogy that I fall back on because I've seen this enough times to have the scars to show for it is that in the '90s, running a web server took about a week of spare time and an in-depth knowledge of GCC compiler flags. And then it evolved to ah, I could just unzip a tarball of precompiled stuff, and then RPM or Deb became a thing. And then Yum, or something else, or I guess apt over in the Debian land to wind up wrapping around that. And then you had things like Puppet where it was it was ensure installed. And now it's Docker Run.And today, it's a checkbox in the S3 console that proceeds to yell at you because you're making a website public. But that's neither here nor there. Things don't get harder with time. But I've been surprised by how I haven't yet seen that sort of geometric complexity collapsing of around Kubernetes to make it easier to work with. Is that coming or are we going to have to wait for the next cycle of things?Eswar: Let me think. I actually don't have a good answer to that, Corey.Corey: That's good, at least because if you did, I'd worried that I was just missing something obvious. That's kind of the entire reason I ask. Like, “Oh, good. I get to talk to smart people and see what they're picking up on that I'm absolutely missing.” I was hoping you had an answer, but I guess it's cold comfort that you don't have one off the top of your head. But man, is it confusing.Eswar: Yeah. So, there are some discussions in the community out there, right? Like, it's Kubernetes the right layer to do interact? And there are some tooling that's built on top of Kubernetes, for example, Knative that tries to provide a serverless layer on top of Kubernetes, for example. There are also attempts at abstracting Kubernetes completely and providing tooling that just completely removes any sort of Kubernetes API out of the picture and maybe a specific CI/CD-based solution that takes it from the source and deploys the service without even showing you that there's Kubernetes underneath, right?All of these are evolutions that are being tested out there in the community. Time will tell whether these end up sticking. But what's clear here is the gravity around Kubernetes. All sorts of tooling that gets built on top of Kubernetes, all the operators, all sorts of open-source initiatives that are built to run on Kubernetes. For example, Spark, for example, Cassandra, so many of these big, large-scale, open-source solutions are now built to run really well on Kubernetes. And that is the gravity that's pushing Kubernetes at this point.Corey: I'm curious to get your take on one other, I would consider interestingly competitive spaces. Now, because I have a domain problem, if you go to kubernetestheeasyway.com, you'll wind up on the ECS marketing page. That's right, the worst competition in the world: the people who work down the hall from you.If someone's considering using ECS, Elastic Container Service versus EKS, Elastic Kubernetes Service, what is the deciding factor when a customer's making that determination? And to be clear, I'm not convinced there's a right or wrong answer. But I am curious to get your take, given that you have a vested interest, but also presumably don't want to talk complete smack about your colleagues. But feel free to surprise me.Eswar: Hey, I love ECS, by the way. Like I said, I started my life in the AWS in ECS. So look, ECS is a hugely successful container orchestration service. I know we talk a lot about Kubernetes, I know there's a lot of discussions around Kubernetes, but I wouldn't make it a point that, like, ECS is a hugely successful service. Now, what determines how customers go to?If customers are… if the customers tech stack is entirely on AWS, right, they use a lot of AWS services and they want an easy way to get started in the container world that has really tight integration with other AWS services without them having to configure a lot, ECS is the way, right? And customers have actually seen terrific success adopting ECS for that particular use case. Whereas EKS customers, they start with, “Okay, I want an open-source solution. I really love Kubernetes. I lo—or, I have a tooling that I really like in the open-source land that really works well with Kubernetes. I'm going to go that way.” And those kind of customers end up picking EKS.Corey: I feel like, on some level, Kubernetes has become the most the default API across a wide variety of environments. AWS obviously, but on-prem other providers. It seems like even the traditional VPS companies out there that offer just rent-a-server in the cloud somewhere are all also offering, “Oh, and we have a Kubernetes service as well.” I wound up backing a Kickstarter project that runs a Kubernetes cluster with a shared backplane across a variety of Raspberries Pi, for example. And it seems to be almost everywhere you look.Do you think that there's some validity to that approach of effectively whatever it is that we're going to wind up running in the future, it's going to be done on top of Kubernetes or do you think that that's mostly hype-driven these days?Eswar: It's definitely not hype. Like we see the proof in the kind of adoption we see. It's becoming the de facto container orchestration API. And with all the tooling, open-source tooling that's continuing to build on top of Kubernetes, CNCF tooling ecosystem that's actually spawned to actually support Kubernetes at option, all of this is solid proof that Kubernetes is here to stay and is a really strong, powerful API for customers to adopt.Corey: So, four years ago, I had a prediction on Twitter, and I said, “In five years, nobody will care about Kubernetes.” And it was in February, I believe, and every year, I wind up updating an incrementing a link to it, like, “Four years to go,” “Three years to go,” and I believe it expires next year. And I have to say, I didn't really expect when I made that prediction for it to outlive Twitter, but yet, here we are, which is neither here nor there. But I'm curious to get your take on this. But before I wind up just letting you savage the naive interpretation of that, my impression has been that it will not be that Kubernetes has gone away. That is ridiculous. It is clearly in enough places that even if they decided to rip it out now, it would take them ten years, but rather than it's going to slip below the surface level of awareness.Once upon a time, there was a whole bunch of energy and drama and debate around the Linux virtual memory management subsystem. And today, there's, like, a dozen people on the planet who really have to care about that, but for the rest of us, it doesn't matter anymore. We are so far past having to care about that having any meaningful impact in our day-to-day work that it's just, it's the part of the iceberg that's below the waterline. I think that's where Kubernetes is heading. Do you agree or disagree? And what do you think about the timeline?Eswar: I agree with you; that's a perfect analogy. It's going to go the way of Linux, right? It's here to stay; it just going to get abstracted out if any of the abstraction efforts are going to stick around. And that's where we're testing the waters there. There are many, many open-source initiatives there trying to abstract Kubernetes. All of these are yet to gain ground, but there's some reasonable efforts being made.And if they are successful, they just end up being a layer on top of Kubernetes. Many of the customers, many of the developers, don't have to worry about Kubernetes at that point, but a certain subset of us in the tech world will need to do a deal with Kubernetes, and most likely teams like mine that end up managing and operating their Kubernetes clusters.Corey: So, one last question I have for you is that if there's one thing that AWS loves, it's misspelling things. And you have an open-source offering called Karpenter spelled with a K that is an extending of that tradition. What does Karpenter do and why would someone use it?Eswar: Thank you for that. Karpenter is one of my favorite launches in the last one year.Corey: Presumably because you're terrible at the spelling bee back when you were a kid. But please tell me more.Eswar: [laugh]. So Karpenter, is an open-source flexible and high performance cluster auto-scaling solution. So basically, when your cluster needs more capacity to support your workloads, Karpenter automatically scales the capacity as needed. For people that know the Kubernetes space well, there's an existing component called Cluster Autoscaler that fills this space today. And it's our take on okay, so what if we could reimagine the capacity management solution available in Kubernetes? And can we do something better? Especially for cases where we expect terrific performance at scale to enable cost efficiency and optimization use cases for our customers, and most importantly, provide a way for customers not to pre-plan a lot of capacity to begin with.Corey: This is something we see a lot, in the sense of very bursty workloads where, okay, you're going to steady state load. Cool. Buy a bunch of savings plans, get things set up the way you want them, and call it a day. But when it's bursty, there are challenges with it. Folks love using Spot, but in the event of a sudden capacity shortfall, the question is, is can we spin up capacity to backfill it within those two minutes that we have a warning on that on? And if the answer is no, then it becomes a bit of a non-starter.Customers have had to build an awful lot of those things around EC2 instances that handle a lot of that logic for them in ways that are tuned specifically for their use cases. I'm encouraged to see there's a Kubernetes story around this that starts to remove some of that challenge from the customer side.Eswar: Yeah. So, the burstiness is where complexity comes [here 00:29:42], right? Like many customers for steady state, they know what their capacity requirements are, they set up the capacity, they can also reason out what is the effective capacity needed for good utilization for economical reasons and they can actually pre plan that and set it up. But once burstiness comes in, which inevitably does it at [unintelligible 00:30:05] applications, customers worry about, “Okay, am I going to get the capacity that I need in time that I need to be able to service my customers? And am I confident at it?”If I'm not confident, I'm going to actually allocate capacity beforehand, assuming that I'm going to actually get the burst that I needed. Which means, you're paying for resources that you're not using at the moment. And the burstiness might happen and then you're on the hook to actually reduce the capacity for it once the peak subsides at the end of the [day 00:30:36]. And this is a challenging situation. And this is one of the use cases that we targeted Karpenter towards.Corey: I find that the idea that you're open-sourcing this is fascinating because of two reasons. One, it does show a willingness to engage with the community that… again, it's difficult. When you're a big company, people love to wind up taking issue with almost anything that you do. But for another, it also puts it out in the open, on some level, where, especially when you're talking about cost optimization and decisions that affect cost, it's all out in public. So, people can look at this and think, “Wait a minute, it's not—what is this line of code that means if it's toward the end of the month, crank it up because we might need to hit our numbers.” Like, there's nothing like that in there. At least I'm assuming. I'm trusting that other people have read this code because honestly, that seems like a job for people who are better at that than I am. But that does tend to breed a certain element of trust.Eswar: Right. It's one of the first things that we thought about when we said okay, so we have some ideas here to actually improve the capacity management solution for Kubernetes. Okay, should we do it out in the open? And the answer was a resounding yes, right? I think there's a good story here that actually enables not just AWS to offer these ideas out there, right, and we want to bring it to all sorts of Kubernetes customers.And one of the first things we did is to architecturally figure out all the core business logic of Karpenter, which is, okay, how to schedule better, how quickly to scale, what is the best instance types to pick for this workload. All of that business logic was abstracted out from the actual cloud provider implementation. And the cloud provider implementation is super simple. It's just creating instances, deleting instances, and describing instances. And it's something that we bake from the get-go so it's easier for other cloud providers to come in and to add their support to it. And we as a community actually can take these ideas forward in a much faster way than just AWS doing it.Corey: I really want to thank you for taking the time to speak with me today about all these things. If people want to learn more, where's the best place for them to find you?Eswar: The best place to learn about EKS, right, as EKS evolves, is using our documentation, we have an EKS newsletter that you can go subscribe, and you can also find us on GitHub where we share our product roadmap. So, it's a great places to learn about how EKS is evolving and also sharing your feedback.Corey: Which is always great to hear, as opposed to, you know, in the AWS Console, where we live, waiting for you to stumble upon us, which, yeah. No it's good does have a lot of different places for people to engage with you. And we'll put links to that, of course, in the [show notes 00:33:17]. Thank you so much for being so generous with your time. I appreciate it.Eswar: Corey, really appreciate you having me.Corey: Eswar Bala, Director of Engineering for Amazon EKS. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice telling me why, when it comes to tracking Kubernetes costs, Microsoft Excel is in fact the superior experience.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Scott Fanning, Senior Director of Product Management, Cloud Security at CrowdStrike, sits down to talk about the first-ever Dero cryptojacking operation targeting Kubernetes infrastructure. The research defines Dero as "a cryptocurrency that claims to offer improved privacy, anonymity and higher and faster monetary rewards compared to Monero, which is a commonly used cryptocurrency in cryptojacking operations." CrowdStrike was the first organization to discover Dero, and has been observing the cryptojacking operation since the beginning of February 2023. The operation focuses mainly on locating Kubernetes clusters with anonymous access enabled on a Kubernetes API and listening on non-standard ports accessible from the internet. The research can be found here: CrowdStrike Discovers First-Ever Dero Cryptojacking Campaign Targeting Kubernetes
Scott Fanning, Senior Director of Product Management, Cloud Security at CrowdStrike, sits down to talk about the first-ever Dero cryptojacking operation targeting Kubernetes infrastructure. The research defines Dero as "a cryptocurrency that claims to offer improved privacy, anonymity and higher and faster monetary rewards compared to Monero, which is a commonly used cryptocurrency in cryptojacking operations." CrowdStrike was the first organization to discover Dero, and has been observing the cryptojacking operation since the beginning of February 2023. The operation focuses mainly on locating Kubernetes clusters with anonymous access enabled on a Kubernetes API and listening on non-standard ports accessible from the internet. The research can be found here: CrowdStrike Discovers First-Ever Dero Cryptojacking Campaign Targeting Kubernetes
About KelseyKelsey Hightower is the Principal Developer Advocate at Google, the co-chair of KubeCon, the world's premier Kubernetes conference, and an open source enthusiast. He's also the co-author of Kubernetes Up & Running: Dive into the Future of Infrastructure.Links: Twitter: @kelseyhightower Company site: Google.com Book: Kubernetes Up & Running: Dive into the Future of Infrastructure TranscriptAnnouncer: Hello and welcome to Screaming in the Cloud, with your host Cloud economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. I'm joined this week by Kelsey Hightower, who claims to be a principal developer advocate at Google, but based upon various keynotes I've seen him in, he basically gets on stage and plays video games like Tetris in front of large audiences. So I assume he is somehow involved with e-sports. Kelsey, welcome to the show.Kelsey: You've outed me. Most people didn't know that I am a full-time e-sports Tetris champion at home. And the technology thing is just a side gig.Corey: Exactly. It's one of those things you do just to keep the lights on, like you're waiting to get discovered, but in the meantime, you're waiting table. Same type of thing. Some people wait tables you more or less a sling Kubernetes, for lack of a better term.Kelsey: Yes.Corey: So let's dive right into this. You've been a strong proponent for a long time of Kubernetes and all of its intricacies and all the power that it unlocks and I've been pretty much the exact opposite of that, as far as saying it tends to be over complicated, that it's hype-driven and a whole bunch of other, shall we say criticisms that are sometimes bounded in reality and sometimes just because I think it'll be funny when I put them on Twitter. Where do you stand on the state of Kubernetes in 2020?Kelsey: So, I want to make sure it's clear what I do. Because when I started talking about Kubernetes, I was not working at Google. I was actually working at CoreOS where we had a competitor Kubernetes called Fleet. And Kubernetes coming out kind of put this like fork in our roadmap, like where do we go from here? What people saw me doing with Kubernetes was basically learning in public. Like I was really excited about the technology because it's attempting to solve a very complex thing. I think most people will agree building a distributed system is what cloud providers typically do, right? With VMs and hypervisors. Those are very big, complex distributed systems. And before Kubernetes came out, the closest I'd gotten to a distributed system before working at CoreOS was just reading the various white papers on the subject and hearing stories about how Google has systems like Borg tools, like Mesa was being used by some of the largest hyperscalers in the world, but I was never going to have the chance to ever touch one of those unless I would go work at one of those companies.So when Kubernetes came out and the fact that it was open source and I could read the code to understand how it was implemented, to understand how schedulers actually work and then bonus points for being able to contribute to it. Those early years, what you saw me doing was just being so excited about systems that I attended to build on my own, becoming this new thing just like Linux came up. So I kind of agree with you that a lot of people look at it as a more of a hype thing. They're looking at it regardless of their own needs, regardless of understanding how it works and what problems is trying to solve that. My stance on it, it's a really, really cool tool for the level that it operates in, and in order for it to be successful, people can't know that it's there.Corey: And I think that might be where part of my disconnect from Kubernetes comes into play. I have a background in ops, more or less, the grumpy Unix sysadmin because it's not like there's a second kind of Unix sysadmin you're ever going to encounter. Where everything in development works in theory, but in practice things pan out a little differently. I always joke that ops is the difference between theory and practice. In theory, devs can do everything and there's no ops needed. In practice, well it's been a burgeoning career for a while. The challenge with this is Kubernetes at times exposes certain levels of abstraction that, sorry certain levels of detail that generally people would not want to have to think about or deal with, while papering over other things with other layers of abstraction on top of it. That obscure, valuable troubleshooting information from a running something in an operational context. It absolutely is a fascinating piece of technology, but it feels today like it is overly complicated for the use a lot of people are attempting to put it to. Is that a fair criticism from where you sit?Kelsey: So I think the reason why it's a fair criticism is because there are people attempting to run their own Kubernetes cluster, right? So when we think about the cloud, unless you're in OpenStack land, but for the people who look at the cloud and you say, "Wow, this is much easier." There's an API for creating virtual machines and I don't see the distributed state store that's keeping all of that together. I don't see the farm of hypervisors. So we don't necessarily think about the inherent complexity into a system like that, because we just get to use it. So on one end, if you're just a user of a Kubernetes cluster, maybe using something fully managed or you have an ops team that's taking care of everything, your interface of the system becomes this Kubernetes configuration language where you say, "Give me a load balancer, give me three copies of this container running." And if we do it well, then you'd think it's a fairly easy system to deal with because you say, "kubectl, apply," and things seem to start running.Just like in the cloud where you say, "AWS create this VM, or G cloud compute instance, create." You just submit API calls and things happen. I think the fact that Kubernetes is very transparent to most people is, now you can see the complexity, right? Imagine everyone driving with the hood off the car. You'd be looking at a lot of moving things, but we have hoods on cars to hide the complexity and all we expose is the steering wheel and the pedals. That car is super complex but we don't see it. So therefore we don't attribute as complexity to the driving experience.Corey: This to some extent feels it's on the same axis as serverless, with just a different level of abstraction piled onto it. And while I am a large proponent of serverless, I think it's fantastic for a lot of Greenfield projects. The constraints inherent to the model mean that it is almost completely non-tenable for a tremendous number of existing workloads. Some developers like to call it legacy, but when I hear the term legacy I hear, "it makes actual money." So just treating it as, "Oh, it's a science experiment we can throw into a new environment, spend a bunch of time rewriting it for minimal gains," is just not going to happen as companies undergo digital transformations, if you'll pardon the term.Kelsey: Yeah, so I think you're right. So let's take Amazon's Lambda for example, it's a very opinionated high-level platform that assumes you're going to build apps a certain way. And if that's you, look, go for it. Now, one or two levels below that there is this distributed system. Kubernetes decided to play in that space because everyone that's building other platforms needs a place to start. The analogy I like to think of is like in the mobile space, iOS and Android deal with the complexities of managing multiple applications on a mobile device, security aspects, app stores, that kind of thing. And then you as a developer, you build your thing on top of those platforms and APIs and frameworks. Now, it's debatable, someone would say, "Why do we even need an open-source implementation of such a complex system? Why not just everyone moved to the cloud?" And then everyone that's not in a cloud on-premise gets left behind.But typically that's not how open source typically works, right? The reason why we have Linux, the precursor to the cloud is because someone looked at the big proprietary Unix systems and decided to re-implement them in a way that anyone could run those systems. So when you look at Kubernetes, you have to look at it from that lens. It's the ability to democratize these platform layers in a way that other people can innovate on top. That doesn't necessarily mean that everyone needs to start with Kubernetes, just like not everyone needs to start with the Linux server, but it's there for you to build the next thing on top of, if that's the route you want to go.Corey: It's been almost a year now since I made an original tweet about this, that in five years, no one will care about Kubernetes. So now I guess I have four years running on that clock and that attracted a bit of, shall we say controversy. There were people who thought that I meant that it was going to be a flash in the pan and it would dry up and blow away. But my impression of it is that in, well four years now, it will have become more or less system D for the data center, in that there's a bunch of complexity under the hood. It does a bunch of things. No-one sensible wants to spend all their time mucking around with it in most companies. But it's not something that people have to think about in an ongoing basis the way it feels like we do today.Kelsey: Yeah, I mean to me, I kind of see this as the natural evolution, right? It's new, it gets a lot of attention and kind of the assumption you make in that statement is there's something better that should be able to arise, giving that checkpoint. If this is what people think is hot, within five years surely we should see something else that can be deserving of that attention, right? Docker comes out and almost four or five years later you have Kubernetes. So it's obvious that there should be a progression here that steals some of the attention away from Kubernetes, but I think where it's so new, right? It's only five years in, Linux is like over 20 years old now at this point, and it's still top of mind for a lot of people, right? Microsoft is still porting a lot of Windows only things into Linux, so we still discuss the differences between Windows and Linux.The idea that the cloud, for the most part, is driven by Linux virtual machines, that I think the majority of workloads run on virtual machines still to this day, so it's still front and center, especially if you're a system administrator managing BDMs, right? You're dealing with tools that target Linux, you know the Cisco interface and you're thinking about how to secure it and lock it down. Kubernetes is just at the very first part of that life cycle where it's new. We're all interested in even what it is and how it works, and now we're starting to move into that next phase, which is the distro phase. Like in Linux, you had Red Hat, Slackware, Ubuntu, special purpose distros.Some will consider Android a special purpose distribution of Linux for mobile devices. And now that we're in this distro phase, that's going to go on for another 5 to 10 years where people start to align themselves around, maybe it's OpenShift, maybe it's GKE, maybe it's Fargate for EKS. These are now distributions built on top of Kubernetes that start to add a little bit more opinionation about how Kubernetes should be pushed together. And then we'll enter another phase where you'll build a platform on top of Kubernetes, but it won't be worth mentioning that Kubernetes is underneath because people will be more interested on the thing above.Corey: I think we're already seeing that now, in terms of people no longer really care that much what operating system they're running, let alone with distribution of that operating system. The things that you have to care about slip below the surface of awareness and we've seen this for a long time now. Originally to install a web server, it wound up taking a few days and an intimate knowledge of GCC compiler flags, then RPM or D package and then yum on top of that, then ensure installed, once we had configuration management that was halfway decent.Then Docker run, whatever it is. And today feels like it's with serverless technologies being what they are, it's effectively a push a file to S3 or it's equivalent somewhere else and you're done. The things that people have to be aware of and the barrier to entry continually lowers. The downside to that of course, is that things that people specialize in today and effectively make very lucrative careers out of are going to be not front and center in 5 to 10 years the way that they are today. And that's always been the way of technology. It's a treadmill to some extent.Kelsey: And on the flip side of that, look at all of the new jobs that are centered around these cloud-native technologies, right? So you know, we're just going to make up some numbers here, imagine if there were only 10,000 jobs around just Linux system administration. Now when you look at this whole Kubernetes landscape where people are saying we can actually do a better job with metrics and monitoring. Observability is now a thing culturally that people assume you should have, because you're dealing with these distributed systems. The ability to start thinking about multi-regional deployments when I think that would've been infeasible with the previous tools or you'd have to build all those tools yourself. So I think now we're starting to see a lot more opportunities, where instead of 10,000 people, maybe you need 20,000 people because now you have the tools necessary to tackle bigger projects where you didn't see that before.Corey: That's what's going to be really neat to see. But the challenge is always to people who are steeped in existing technologies. What does this mean for them? I mean I spent a lot of time early in my career fighting against cloud because I thought that it was taking away a cornerstone of my identity. I was a large scale Unix administrator, specifically focusing on email. Well, it turns out that there aren't nearly as many companies that need to have that particular skill set in house as it did 10 years ago. And what we're seeing now is this sort of forced evolution of people's skillsets or they hunker down on a particular area of technology or particular application to try and make a bet that they can ride that out until retirement. It's challenging, but at some point it seems that some folks like to stop learning, and I don't fully pretend to understand that. I'm sure I will someday where, "No, at this point technology come far enough. We're just going to stop here, and anything after this is garbage." I hope not, but I can see a world in which that happens.Kelsey: Yeah, and I also think one thing that we don't talk a lot about in the Kubernetes community, is that Kubernetes makes hyper-specialization worth doing because now you start to have a clear separation from concerns. Now the OS can be hyperfocused on security system calls and not necessarily packaging every programming language under the sun into a single distribution. So we can kind of move part of that layer out of the core OS and start to just think about the OS being a security boundary where we try to lock things down. And for some people that play at that layer, they have a lot of work ahead of them in locking down these system calls, improving the idea of containerization, whether that's something like Firecracker or some of the work that you see VMware doing, that's going to be a whole class of hyper-specialization. And the reason why they're going to be able to focus now is because we're starting to move into a world, whether that's serverless or the Kubernetes API.We're saying we should deploy applications that don't target machines. I mean just that step alone is going to allow for so much specialization at the various layers because even on the networking front, which arguably has been a specialization up until this point, can truly specialize because now the IP assignments, how networking fits together, has also abstracted a way one more step where you're not asking for interfaces or binding to a specific port or playing with port mappings. You can now let the platform do that. So I think for some of the people who may be not as interested as moving up the stack, they need to be aware that the number of people we need being hyper-specialized at Linux administration will definitely shrink. And a lot of that work will move up the stack, whether that's Kubernetes or managing a serverless deployment and all the configuration that goes with that. But if you are a Linux, like that is your bread and butter, I think there's going to be an opportunity to go super deep, but you may have to expand into things like security and not just things like configuration management.Corey: Let's call it the unfulfilled promise of Kubernetes. On paper, I love what it hints at being possible. Namely, if I build something that runs well on top of Kubernetes than we truly have a write once, run anywhere type of environment. Stop me if you've heard that one before, 50,000 times in our industry... or history. But in practice, as has happened before, it seems like it tends to fall down for one reason or another. Now, Amazon is famous because for many reasons, but the one that I like to pick on them for is, you can't say the word multi-cloud at their events. Right. That'll change people's perspective, good job. The people tend to see multi-cloud are a couple of different lenses.I've been rather anti multi-cloud from the perspective of the idea that you're setting out day one to build an application with the idea that it can be run on top of any cloud provider, or even on-premises if that's what you want to do, is generally not the way to proceed. You wind up having to make certain trade-offs along the way, you have to rebuild anything that isn't consistent between those providers, and it slows you down. Kubernetes on the other hand hints at if it works and fulfills this promise, you can suddenly abstract an awful lot beyond that and just write generic applications that can run anywhere. Where do you stand on the whole multi-cloud topic?Kelsey: So I think we have to make sure we talk about the different layers that are kind of ready for this thing. So for example, like multi-cloud networking, we just call that networking, right? What's the IP address over there? I can just hit it. So we don't make a big deal about multi-cloud networking. Now there's an area where people say, how do I configure the various cloud providers? And I think the healthy way to think about this is, in your own data centers, right, so we know a lot of people have investments on-premises. Now, if you were to take the mindset that you only need one provider, then you would try to buy everything from HP, right? You would buy HP store's devices, you buy HP racks, power. Maybe HP doesn't sell air conditioners. So you're going to have to buy an air conditioner from a vendor who specializes in making air conditioners, hopefully for a data center and not your house.So now you've entered this world where one vendor does it make every single piece that you need. Now in the data center, we don't say, "Oh, I am multi-vendor in my data center." Typically, you just buy the switches that you need, you buy the power racks that you need, you buy the ethernet cables that you need, and they have common interfaces that allow them to connect together and they typically have different configuration languages and methods for configuring those components. The cloud on the other hand also represents the same kind of opportunity. There are some people who really love DynamoDB and S3, but then they may prefer something like BigQuery to analyze the data that they're uploading into S3. Now, if this was a data center, you would just buy all three of those things and put them in the same rack and call it good.But the cloud presents this other challenge. How do you authenticate to those systems? And then there's usually this additional networking costs, egress or ingress charges that make it prohibitive to say, "I want to use two different products from two different vendors." And I think that's-Corey: ...winds up causing serious problems.Kelsey: Yes, so that data gravity, the associated cost becomes a little bit more in your face. Whereas, in a data center you kind of feel that the cost has already been paid. I already have a network switch with enough bandwidth, I have an extra port on my switch to plug this thing in and they're all standard interfaces. Why not? So I think the multi-cloud gets lost in the chew problem, which is the barrier to entry of leveraging things across two different providers because of networking and configuration practices.Corey: That's often the challenge, I think, that people get bogged down in. On an earlier episode of this show we had Mitchell Hashimoto on, and his entire theory around using Terraform to wind up configuring various bits of infrastructure, was not the idea of workload portability because that feels like the windmill we all keep tilting at and failing to hit. But instead the idea of workflow portability, where different things can wind up being interacted with in the same way. So if this one division is on one cloud provider, the others are on something else, then you at least can have some points of consistency in how you interact with those things. And in the event that you do need to move, you don't have to effectively redo all of your CICD process, all of your tooling, et cetera. And I thought that there was something compelling about that argument.Kelsey: And that's actually what Kubernetes does for a lot of people. For Kubernetes, if you think about it, when we start to talk about workflow consistency, if you want to deploy an application, queue CTL, apply, some config, you want the application to have a load balancer in front of it. Regardless of the cloud provider, because Kubernetes has an extension point we call the cloud provider. And that's where Amazon, Azure, Google Cloud, we do all the heavy lifting of mapping the high-level ingress object that specifies, "I want a load balancer, maybe a few options," to the actual implementation detail. So maybe you don't have to use four or five different tools and that's where that kind of workload portability comes from. Like if you think about Linux, right? It has a set of system calls, for the most part, even if you're using a different distro at this point, Red Hat or Amazon Linux or Google's container optimized Linux.If I build a Go binary on my laptop, I can SCP it to any of those Linux machines and it's going to probably run. So you could call that multi-cloud, but that doesn't make a lot of sense because it's just because of the way Linux works. Kubernetes does something very similar because it sits right on top of Linux, so you get the portability just from the previous example and then you get the other portability and workload, like you just stated, where I'm calling kubectl apply, and I'm using the same workflow to get resources spun up on the various cloud providers. Even if that configuration isn't one-to-one identical.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: One thing I'm curious about is you wind up walking through the world and seeing companies adopting Kubernetes in different ways. How are you finding the adoption of Kubernetes is looking like inside of big E enterprise style companies? I don't have as much insight into those environments as I probably should. That's sort of a focus area for the next year for me. But in startups, it seems that it's either someone goes in and rolls it out and suddenly it's fantastic, or they avoid it entirely and do something serverless. In large enterprises, I see a lot of Kubernetes and a lot of Kubernetes stories coming out of it, but what isn't usually told is, what's the tipping point where they say, "Yeah, let's try this." Or, "Here's the problem we're trying to solve for. Let's chase it."Kelsey: What I see is enterprises buy everything. If you're big enough and you have a big enough IT budget, most enterprises have a POC of everything that's for sale, period. There's some team in some pocket, maybe they came through via acquisition. Maybe they live in a different state. Maybe it's just a new project that came out. And what you tend to see, at least from my experiences, if I walk into a typical enterprise, they may tell me something like, "Hey, we have a POC, a Pivotal Cloud Foundry, OpenShift, and we want some of that new thing that we just saw from you guys. How do we get a POC going?" So there's always this appetite to evaluate what's for sale, right? So, that's one case. There's another case where, when you start to think about an enterprise there's a big range of skillsets. Sometimes I'll go to some companies like, "Oh, my insurance is through that company, and there's ex-Googlers that work there." They used to work on things like Borg, or something else, and they kind of know how these systems work.And they have a slightly better edge at evaluating whether Kubernetes is any good for the problem at hand. And you'll see them bring it in. Now that same company, I could drive over to the other campus, maybe it's five miles away and that team doesn't even know what Kubernetes is. And for them, they're going to be chugging along with what they're currently doing. So then the challenge becomes if Kubernetes is a great fit, how wide of a fit it isn't? How many teams at that company should be using it? So what I'm currently seeing as there are some enterprises that have found a way to make Kubernetes the place where they do a lot of new work, because that makes sense. A lot of enterprises to my surprise though, are actually stepping back and saying, "You know what? We've been stitching together our own platform for the last five years. We had the Netflix stack, we got some Spring Boot, we got Console, we got Vault, we got Docker. And now this whole thing is getting a little more fragile because we're doing all of this glue code."Kubernetes, We've been trying to build our own Kubernetes and now that we know what it is and we know what it isn't, we know that we can probably get rid of this kind of bespoke stack ourselves and just because of the ecosystem, right? If I go to HashiCorp's website, I would probably find the word Kubernetes as much as I find the word Nomad on their site because they've made things like Console and Vault become first-class offerings inside of the world of Kubernetes. So I think it's that momentum that you see across even People Oracle, Juniper, Palo Alto Networks, they're all have seem to have a Kubernetes story. And this is why you start to see the enterprise able to adopt it because it's so much in their face and it's where the ecosystem is going.Corey: It feels like a lot of the excitement and the promise and even the same problems that Kubernetes is aimed at today, could have just as easily been talked about half a decade ago in the context of OpenStack. And for better or worse, OpenStack is nowhere near where it once was. It would felt like it had such promise and such potential and when it didn't pan out, that left a lot of people feeling relatively sad, burnt out, depressed, et cetera. And I'm seeing a lot of parallels today, at least between what was said about OpenStack and what was said about Kubernetes. How do you see those two diverging?Kelsey: I will tell you the big difference that I saw, personally. Just for my personal journey outside of Google, just having that option. And I remember I was working at a company and we were like, "We're going to roll our own OpenStack. We're going to buy a free BSD box and make it a file server. We're going all open sources," like do whatever you want to do. And that was just having so many issues in terms of first-class integrations, education, people with the skills to even do that. And I was like, "You know what, let's just cut the check for VMware." We want virtualization. VMware, for the cost and when it does, it's good enough. Or we can just actually use a cloud provider. That space in many ways was a purely solved problem. Now, let's fast forward to Kubernetes, and also when you get OpenStack finished, you're just back where you started.You got a bunch of VMs and now you've got to go figure out how to build the real platform that people want to use because no one just wants a VM. If you think Kubernetes is low level, just having OpenStack, even OpenStack was perfect. You're still at square one for the most part. Maybe you can just say, "Now I'm paying a little less money for my stack in terms of software licensing costs," but from an extraction and automation and API standpoint, I don't think OpenStack moved the needle in that regard. Now in the Kubernetes world, it's solving a huge gap.Lots of people have virtual machine sprawl than they had Docker sprawl, and when you bring in this thing by Kubernetes, it says, "You know what? Let's reign all of that in. Let's build some first-class abstractions, assuming that the layer below us is a solved problem." You got to remember when Kubernetes came out, it wasn't trying to replace the hypervisor, it assumed it was there. It also assumed that the hypervisor had APIs for creating virtual machines and attaching disc and creating load balancers, so Kubernetes came out as a complementary technology, not one looking to replace. And I think that's why it was able to stick because it solved a problem at another layer where there was not a lot of competition.Corey: I think a more cynical take, at least one of the ones that I've heard articulated and I tend to agree with, was that OpenStack originally seemed super awesome because there were a lot of interesting people behind it, fascinating organizations, but then you wound up looking through the backers of the foundation behind it and the rest. And there were something like 500 companies behind it, an awful lot of them were these giant organizations that ... they were big e-corporate IT enterprise software vendors, and you take a look at that, I'm not going to name anyone because at that point, oh will we get letters.But at that point, you start seeing so many of the patterns being worked into it that it almost feels like it has to collapse under its own weight. I don't, for better or worse, get the sense that Kubernetes is succumbing to the same thing, despite the CNCF having an awful lot of those same backers behind it and as far as I can tell, significantly more money, they seem to have all the money to throw at these sorts of things. So I'm wondering how Kubernetes has managed to effectively sidestep I guess the open-source miasma that OpenStack didn't quite manage to avoid.Kelsey: Kubernetes gained its own identity before the foundation existed. Its purpose, if you think back from the Borg paper almost eight years prior, maybe even 10 years prior. It defined this problem really, really well. I think Mesos came out and also had a slightly different take on this problem. And you could just see at that time there was a real need, you had choices between Docker Swarm, Nomad. It seems like everybody was trying to fill in this gap because, across most verticals or industries, this was a true problem worth solving. What Kubernetes did was played in the exact same sandbox, but it kind of got put out with experience. It's not like, "Oh, let's just copy this thing that already exists, but let's just make it open."And in that case, you don't really have your own identity. It's you versus Amazon, in the case of OpenStack, it's you versus VMware. And that's just really a hard place to be in because you don't have an identity that stands alone. Kubernetes itself had an identity that stood alone. It comes from this experience of running a system like this. It comes from research and white papers. It comes after previous attempts at solving this problem. So we agree that this problem needs to be solved. We know what layer it needs to be solved at. We just didn't get it right yet, so Kubernetes didn't necessarily try to get it right.It tried to start with only the primitives necessary to focus on the problem at hand. Now to your point, the extension interface of Kubernetes is what keeps it small. Years ago I remember plenty of meetings where we all got in rooms and said, "This thing is done." It doesn't need to be a PaaS. It doesn't need to compete with serverless platforms. The core of Kubernetes, like Linux, is largely done. Here's the core objects, and we're going to make a very great extension interface. We're going to make one for the container run time level so that way people can swap that out if they really want to, and we're going to do one that makes other APIs as first-class as ones we have, and we don't need to try to boil the ocean in every Kubernetes release. Everyone else has the ability to deploy extensions just like Linux, and I think that's why we're avoiding some of this tension in the vendor world because you don't have to change the core to get something that feels like a native part of Kubernetes.Corey: What do you think is currently being the most misinterpreted or misunderstood aspect of Kubernetes in the ecosystem?Kelsey: I think the biggest thing that's misunderstood is what Kubernetes actually is. And the thing that made it click for me, especially when I was writing the tutorial Kubernetes The Hard Way. I had to sit down and ask myself, "Where do you start trying to learn what Kubernetes is?" So I start with the database, right? The configuration store isn't Postgres, it isn't MySQL, it's Etcd. Why? Because we're not trying to be this generic data stores platform. We just need to store configuration data. Great. Now, do we let all the components talk to Etcd? No. We have this API server and between the API server and the chosen data store, that's essentially what Kubernetes is. You can stop there. At that point, you have a valid Kubernetes cluster and it can understand a few things. Like I can say, using the Kubernetes command-line tool, create this configuration map that stores configuration data and I can read it back.Great. Now I can't do a lot of things that are interesting with that. Maybe I just use it as a configuration store, but then if I want to build a container platform, I can install the Kubernetes kubelet agent on a bunch of machines and have it talk to the API server looking for other objects you add in the scheduler, all the other components. So what that means is that Kubernetes most important component is its API because that's how the whole system is built. It's actually a very simple system when you think about just those two components in isolation. If you want a container management tool that you need a scheduler, controller, manager, cloud provider integrations, and now you have a container tool. But let's say you want a service mesh platform. Well in a service mesh you have a data plane that can be Nginx or Envoy and that's going to handle routing traffic. And you need a control plane. That's going to be something that takes in configuration and it uses that to configure all the things in a data plane.Well, guess what? Kubernetes is 90% there in terms of a control plane, with just those two components, the API server, and the data store. So now when you want to build control planes, if you start with the Kubernetes API, we call it the API machinery, you're going to be 95% there. And then what do you get? You get a distributed system that can handle kind of failures on the back end, thanks to Etcd. You're going to get our backs or you can have permission on top of your schemas, and there's a built-in framework, we call it custom resource definitions that allows you to articulate a schema and then your own control loops provide meaning to that schema. And once you do those two things, you can build any platform you want. And I think that's one thing that it takes a while for people to understand that part of Kubernetes, that the thing we talk about today, for the most part, is just the first system that we built on top of this.Corey: I think that's a very far-reaching story with implications that I'm not entirely sure I am able to wrap my head around. I hope to see it, I really do. I mean you mentioned about writing Learn Kubernetes the Hard Way and your tutorial, which I'll link to in the show notes. I mean my, of course, sarcastic response to that recently was to register the domain Kubernetes the Easy Way and just re-pointed to Amazon's ECS, which is in no way shape or form Kubernetes and basically has the effect of irritating absolutely everyone as is my typical pattern of behavior on Twitter. But I have been meaning to dive into Kubernetes on a deeper level and the stuff that you've written, not just the online tutorial, both the books have always been my first port of call when it comes to that. The hard part, of course, is there's just never enough hours in the day.Kelsey: And one thing that I think about too is like the web. We have the internet, there's webpages, there's web browsers. Web Browsers talk to web servers over HTTP. There's verbs, there's bodies, there's headers. And if you look at it, that's like a very big complex system. If I were to extract out the protocol pieces, this concept of HTTP verbs, get, put, post and delete, this idea that I can put stuff in a body and I can give it headers to give it other meaning and semantics. If I just take those pieces, I can bill restful API's.Hell, I can even bill graph QL and those are just different systems built on the same API machinery that we call the internet or the web today. But you have to really dig into the details and pull that part out and you can build all kind of other platforms and I think that's what Kubernetes is. It's going to probably take people a little while longer to see that piece, but it's hidden in there and that's that piece that's going to be, like you said, it's going to probably be the foundation for building more control planes. And when people build control planes, I think if you think about it, maybe Fargate for EKS represents another control plane for making a serverless platform that takes to Kubernetes API, even though the implementation isn't what you find on GitHub.Corey: That's the truth. Whenever you see something as broadly adopted as Kubernetes, there's always the question of, "Okay, there's an awful lot of blog posts." Getting started to it, learn it in 10 minutes, I mean at some point, I'm sure there are some people still convince Kubernetes is, in fact, a breakfast cereal based upon what some of the stuff the CNCF has gotten up to. I wouldn't necessarily bet against it socks today, breakfast cereal tomorrow. But it's hard to find a decent level of quality, finding the certain quality bar of a trusted source to get started with is important. Some people believe in the hero's journey, story of a narrative building.I always prefer to go with the morons journey because I'm the moron. I touch technologies, I have no idea what they do and figure it out and go careening into edge and corner cases constantly. And by the end of it I have something that vaguely sort of works and my understanding's improved. But I've gone down so many terrible paths just by picking a bad point to get started. So everyone I've talked to who's actually good at things has pointed to your work in this space as being something that is authoritative and largely correct and given some of these people, that's high praise.Kelsey: Awesome. I'm going to put that on my next performance review as evidence of my success and impact.Corey: Absolutely. Grouchy people say, "It's all right," you know, for the right people that counts. If people want to learn more about what you're up to and see what you have to say, where can they find you?Kelsey: I aggregate most of outward interactions on Twitter, so I'm @KelseyHightower and my DMs are open, so I'm happy to field any questions and I attempt to answer as many as I can.Corey: Excellent. Thank you so much for taking the time to speak with me today. I appreciate it.Kelsey: Awesome. I was happy to be here.Corey: Kelsey Hightower, Principal Developer Advocate at Google. I'm Corey Quinn. This is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on Apple podcasts. If you've hated this podcast, please leave a five-star review on Apple podcasts and then leave a funny comment. Thanks.Announcer: This has been this week's episode of Screaming in the Cloud. You can also find more Core at screaminginthecloud.com or wherever fine snark is sold.Announcer: This has been a HumblePod production. Stay humble.
About ChenChen Goldberg is GM and Vice President of Engineering at Google Cloud, where she leads the Cloud Runtimes (CR) product area, helping customers deliver greater value, effortlessly. The CR portfolio includes both Serverless and Kubernetes based platforms on Google Cloud, private cloud and other public clouds. Chen is a strong advocate for customer empathy, building products and solutions that matter. Chen has been core to Google Cloud's open core vision since she joined the company six years ago. During that time, she has led her team to focus on helping development teams increase their agility and modernize workloads. Prior to joining Google, Chen wore different hats in the tech industry including leadership positions in IT organizations, SI teams and SW product development, contributing to Chen's broad enterprise perspective. She enjoys mentoring IT talent both in and outside of Google. Chen lives in Mountain View, California, with her husband and three kids. Outside of work she enjoys hiking and baking.Links Referenced: Twitter: https://twitter.com/GoldbergChen LinkedIn: https://www.linkedin.com/in/goldbergchen/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Forget everything you know about SSH and try Tailscale. Imagine if you didn't need to manage PKI or rotate SSH keys every time someone leaves. That'd be pretty sweet, wouldn't it? With Tailscale SSH, you can do exactly that. Tailscale gives each server and user device a node key to connect to its VPN, and it uses the same node key to authorize and authenticate SSH.Basically you're SSHing the same way you manage access to your app. What's the benefit here? Built-in key rotation, permissions as code, connectivity between any two devices, reduce latency, and there's a lot more, but there's a time limit here. You can also ask users to reauthenticate for that extra bit of security. Sounds expensive?Nope, I wish it were. Tailscale is completely free for personal use on up to 20 devices. To learn more, visit snark.cloud/tailscale. Again, that's snark.cloud/tailscaleCorey: Welcome to Screaming in the Cloud, I'm Corey Quinn. When I get bored and the power goes out, I find myself staring at the ceiling, figuring out how best to pick fights with people on the internet about Kubernetes. Because, well, I'm basically sad and have a growing collection of personality issues. My guest today is probably one of the best people to have those arguments with. Chen Goldberg is the General Manager of Cloud Runtimes and VP of Engineering at Google Cloud. Chen, Thank you for joining me today.Chen: Thank you so much, Corey, for having me.Corey: So, Google has been doing a lot of very interesting things in the cloud, and the more astute listener will realize that interesting is not always necessarily a compliment. But from where I sit, I am deeply vested in the idea of a future where we do not have a cloud monoculture. As I've often said, I want, “What cloud should I build something on in five to ten years?” To be a hard question to answer, and not just because everything is terrible. I think that Google Cloud is absolutely a bright light in the cloud ecosystem and has been for a while, particularly with this emphasis around developer experience. All of that said, Google Cloud is sort of a big, unknowable place, at least from the outside. What is your area of responsibility? Where do you start? Where do you stop? In other words, what can I blame you for?Chen: Oh, you can blame me for a lot of things if you want to. I [laugh] might not agree with that, but that's—Corey: We strive for accuracy in these things, though.Chen: But that's fine. Well, first of all, I've joined Google about seven years ago to lead the Kubernetes and GKE team, and ever since, continued at the same area. So evolved, of course, Kubernetes, and Google Kubernetes Engine, and leading our hybrid and multi-cloud strategy as well with technologies like Anthos. And now I'm responsible for the entire container runtime, which includes Kubernetes and the serverless solutions.Corey: A while back, I, in fairly typical sarcastic form, wound up doing a whole inadvertent start of a meme where I joked about there being 17 ways to run containers on AWS. And then as that caught on, I wound up listing out 17 services you could use to do that. A few months went past and then I published a sequel of 17 more services you can use to run Kubernetes. And while that was admittedly tongue-in-cheek, it does lead to an interesting question that's ecosystem-wide. If I look at Google Cloud, I have Cloud Run, I have GKE, I have GCE if I want to do some work myself.It feels like more and more services are supporting Docker in a variety of different ways. How should customers and/or people like me—though, I am sort of a customer as well since I do pay you folks every month—how should we think about containers and services in which to run them?Chen: First of all, I think there's a lot of credit that needs to go to Docker that made containers approachable. And so, Google has been running containers forever. Everything within Google is running on containers, even our VMs, even our cloud is running on containers, but what Docker did was creating a packaging mechanism to improve developer velocity. So, that's on its own, it's great. And one of the things, by the way, that I love about Google Cloud approach to containers and Docker that yes, you can take your Docker container and run it anywhere.And it's actually really important to ensure what we call interoperability, or low barrier to entry to a new technology. So, I can take my Docker container, I can move it from one platform to another, and so on. So, that's just to start with on a containers. Between the different solutions, so first of all, I'm all about managed services. You are right, there are many ways to run a Kubernetes. I'm taking a lot of pride—Corey: The best way is always to have someone else run it for you. Problem solved. Great, the best kind of problems are always someone else's.Chen: Yes. And I'm taking a lot of pride of what our team is doing with Kubernetes. I mean, we've been working on that for so long. And it's something that you know, we've coined that term, I think back in 2016, so there is a success disaster, but there's also what we call sustainable success. So, thinking about how to set ourselves up for success and scale. Very proud of that service.Saying that, not everybody and not all your workloads you need the flexibility that Kubernetes gives you in all the ecosystem. So, if you start with containers your first time, you should start with Cloud Run. It's the easiest way to run your containers. That's one. If you are already in love with Kubernetes, we won't take it away from you. Start with GKE. Okay [laugh]? Go all-in. Okay, we are all in loving Kubernetes as well. But what my team and I are working on is to make sure that those will work really well together. And we actually see a lot of customers do that.Corey: I'd like to go back a little bit in history to the rise of Docker. I agree with you it was transformative, but containers had been around in various forms—depending upon how you want to define it—dating back to the '70s with logical partitions on mainframes. Well, is that a container? Is it not? Well, sort of. We'll assume yes for the sake of argument.The revelation that I found from Docker was the developer experience, start to finish. Suddenly, it was a couple commands and you were just working, where previously it had taken tremendous amounts of time and energy to get containers working in that same context. And I don't even know today whether or not the right way to contextualize containers is as sort of a lite version of a VM, as a packaging format, as a number of other things that you could reasonably call it. How do you think about containers?Chen: So, I'm going to do, first of all, a small [unintelligible 00:06:31]. I actually started my career as a system mainframe engineer—Corey: Hmm.Chen: And I will share that when you know, I've learned Kubernetes, I'm like, “Huh, we already have done all of that, in orchestration, in workload management on mainframe,” just to the side. The way I think about containers is as a—two things: one, it is a packaging of an application, but the other thing which is also critical is the decoupling between your application and the OS. So, having that kind of abstraction and allowing you to portable and move it between environments. So, those are the two things that are when I think about containers. And what technologies like Kubernetes and serverless gives on top of that is that manageability and making sure that we take care of everything else that is needed for you to run your application.Corey: I've been, how do I put this, getting some grief over the past few years, in the best ways possible, around a almost off-the-cuff prediction that I made, which was that in five years, which is now a lot closer to two, basically, nobody is going to care about Kubernetes. And I could have phrased that slightly more directly because people think I was trying to say, “Oh, Kubernetes is just hype. It's going to go away. Nobody's going to worry about it anymore.” And I think that is a wildly inaccurate prediction.My argument is that people are not going to have to think about it in the same way that they are today. Today, if I go out and want to go back to my days of running production services in anger—and by ‘anger,' I of course mean in production—then it would be difficult for me to find a role that did not at least touch upon Kubernetes. But people who can work with that technology effectively are in high demand and they tend to be expensive, not to mention then thinking about all of the intricacies and complexities that Kubernetes brings to the foreground, that is what doesn't feel sustainable to me. The idea that it's going to have to collapse down into something else is, by necessity, going to have to emerge. How are you seeing that play out? And also, feel free to disagree with the prediction. I am thrilled to wind up being told that I'm wrong it's how I learn the most.Chen: I don't know if I agree with the time horizon of when that will happen, but I will actually think it's a failure on us if that won't be the truth, that the majority of people will not need to know about Kubernetes and its internals. And you know, we keep saying that, like, hey, we need to make it more, like, boring, and easy, and I've just said like, “Hey, you should use managed.” And we have lots of customers that says that they're just using GKE and it scales on their behalf and they don't need to do anything for that and it's just like magic. But from a technology perspective, there is still a way to go until we can make that disappear.And there will be two things that will push us into that direction. One is—you mentioned that is as well—the talent shortage is real. All the customers that I speak with, even if they can find those great people that are experts, they're actually more interesting things for them to work on, okay? You don't need to take, like, all the people in your organization and put them on building the infrastructure. You don't care about that. You want to build innovation and promote your business.So, that's one. The second thing is that I do expect that the technology will continue to evolve and are managed solutions will be better and better. So hopefully, with these two things happening together, people will not care that what's under the hood is Kubernetes. Or maybe not even, right? I don't know exactly how things will evolve.Corey: From where I sit, what are the early criticisms I had about Docker, which I guess translates pretty well to Kubernetes, are that they solve a few extraordinarily painful problems. In the case of Docker, it was, “Well, it works on my machine,” as a grumpy sysadmin, the way I used to be, the only real response we had to that was, “Well. Time to backup your email, Skippy, because your laptop is going into production, then.” Now, you can effectively have a high-fidelity copy of production, basically anywhere, and we've solved the problem of making your Mac laptop look like a Linux server. Great, okay, awesome.With Kubernetes, it also feels, on some level, like it solves for very large-scale Google-type of problems where you want to run things across at least a certain point of scale. It feels like even today, it suffers from having an easy Hello World-style application to deploy on top of it. Using it for WordPress, or some other form of blogging software, for example, is stupendous overkill as far as the Hello World story tends to go. Increasingly as a result, it feels like it's great for the large-scale enterprise-y applications, but the getting started story of how do I have a service I could reasonably run in production? How do I contextualize that, in the world of Kubernetes? How do you respond to that type of perspective?Chen: We'll start with maybe a short story. I started my career in the Israeli army. I was head of the department and one of the lead technology units and I was responsible for building a PAS. In essence, it was 20-plus years ago, so we didn't really call it a PAS but that's what it was. And then at some point, it was amazing, developers were very productive, we got innovation again, again. And then there was some new innovation just at the beginning of web [laugh] at some point.And it was actually—so two things I've noticed back then. One, it was really hard to evolve the platform to allow new technologies and innovation, and second thing, from a developer perspective, it was like a black box. So, the developers team that people were—the other development teams couldn't really troubleshoot environment; they were not empowered to make decisions or [unintelligible 00:12:29] in the platform. And you know, when it was just started with Kubernetes—by the way, beginning, it only supported 100 nodes, and then 1000 nodes. Okay, it was actually not for scale; it actually solved those two problems, which I'm—this is where I spend most of my time.So, the first one, we don't want magic, okay? To be clear on, like, what's happening, I want to make sure that things are consistent and I can get the right observability. So, that's one. The second thing is that we invested so much in the extensibility an environment that it's, I wouldn't say it's easy, but it's doable to evolve Kubernetes. You can change the models, you can extend it you can—there is an ecosystem.And you know, when we were building it, I remember I used to tell my team, there won't be a Kubernetes 2.0. Which is for a developer, it's [laugh] frightening. But if you think about it and you prepare for that, you're like, “Huh. Okay, what does that mean with how I build my APIs? What does that mean of how we build a system?” So, that was one. The second thing I keep telling my team, “Please don't get too attached to your code because if it will still be there in 5, 10 years, we did something wrong.”And you can see areas within Kubernetes, again, all the extensions. I'm very proud of all the interfaces that we've built, but let's take networking. This keeps to evolve all the time on the API and the surface area that allows us to introduce new technologies. I love it. So, those are the two things that have nothing to do with scale, are unique to Kubernetes, and I think are very empowering, and are critical for the success.Corey: One thing that you said that resonates most deeply with me is the idea that you don't want there to be magic, where I just hand it to this thing and it runs it as if by magic. Because, again, we've all run things in anger in production, and what happens when the magic breaks? When you're sitting around scratching your head with no idea how it starts or how it stops, that is scary. I mean, I recently wound up re-implementing Google Cloud Distinguished Engineer Kelsey Hightower's “Kubernetes the Hard Way” because he gave a terrific tutorial that I ran through in about 45 minutes on top of Google Cloud. It's like, “All right, how do I make this harder?”And the answer is to do it on AWS, re-implement it there. And my experiment there can be found at kubernetesthemuchharderway.com because I have a vanity domain problem. And it taught me he an awful lot, but one of the challenges I had as I went through that process was, at one point, the nodes were not registering with the controller.And I ran out of time that day and turned everything off—because surprise bills are kind of what I spend my time worrying about—turn it on the next morning to continue and then it just worked. And that was sort of the spidey sense tingling moment of, “Okay, something wasn't working and now it is, and I don't understand why. But I just rebooted it and it started working.” Which is terrifying in the context of a production service. It was understandable—kind of—and I think that's the sort of thing that you understand a lot better, the more you work with it in production, but a counterargument to that is—and I've talked about it on this show before—for this podcast, I wind up having sponsors from time to time, who want to give me fairly complicated links to go check them out, so I have the snark.cloud URL redirector.That's running as a production service on top of Google Cloud Run. It took me half an hour to get that thing up and running; I haven't had to think about it since, aside from a three-second latency that was driving me nuts and turned out to be a sleep hidden in the code, which I can't really fault Google Cloud Run for so much as my crappy nonsense. But it just works. It's clearly running atop Kubernetes, but I don't have to think about it. That feels like the future. It feels like it's a glimpse of a world to come, we're just starting to dip our toes into. That, at least to me, feels like a lot more of the abstractions being collapsed into something easily understandable.Chen: [unintelligible 00:16:30], I'm happy you say that. When talking with customers and we're showing, like, you know, yes, they're all in Kubernetes and talking about Cloud Run and serverless, I feel there is that confidence level that they need to overcome. And that's why it's really important for us in Google Cloud is to make sure that you can mix and match. Because sometimes, you know, a big retail customer of ours, some of their teams, it's really important for them to use a Kubernetes-based platform because they have their workloads also running on-prem and they want to serve the same playbooks, for example, right? How do I address issues, how do I troubleshoot, and so on?So, that's one set of things. But some cloud only as simple as possible. So, can I use both of them and still have a similar developer experience, and so on? So, I do think that we'll see more of that in the coming years. And as the technology evolves, then we'll have more and more, of course, serverless solutions.By the way, it doesn't end there. Like, we see also, you know, databases and machine learning, and like, there are so many more managed services that are making things easy. And that's what excites me. I mean, that's what's awesome about what we're doing in cloud. We are building platforms that enable innovation.Corey: I think that there's an awful lot of power behind unlocking innovation from a customer perspective. The idea that I can use a cloud provider to wind up doing an experiment to build something in the course of an evening, and if it works, great, I can continue to scale up without having to replace, you know, the crappy Raspberry Pi-level hardware in my spare room with serious enterprise servers in a data center somewhere. The on-ramp and the capability and the lack of long-term commitments is absolutely magical. What I'm also seeing that is contributing to that is the de facto standard that's emerged of most things these days support Docker, for better or worse. There are many open-source tools that I see where, “Oh, how do I get this up and running?”“Well, you can go over the river and through the woods and way past grandmother's house to build this from source or run this Docker file.” I feel like that is the direction the rest of the world is going. And as much fun as it is to sit on the sidelines and snark, I'm finding a lot more capability stories emerging across the board. Does that resonate with what you're seeing, given that you are inherently working at very large scale, given the [laugh] nature of where you work?Chen: I do see that. And I actually want to double down on the open standards, which I think this is also something that is happening. At the beginning, we talked about I want it to be very hard when I choose the cloud provider. But innovation doesn't only come from cloud providers; there's a lot of companies and a lot of innovation happening that are building new technologies on top of those cloud providers, and I don't think this is going to stop. Innovation is going to come from many places, and it's going to be very exciting.And by the way, things are moving super fast in our space. So, the investment in open standard is critical for our industry. So, Docker is one example. Google is in [unintelligible 00:19:46] speaking, it's investing a lot in building those open standards. So, we have Docker, we have things like of course Kubernetes, but we are also investing in open standards of security, so we are working with other partners around [unintelligible 00:19:58], defining how you can secure the software supply chain, which is also critical for innovation. So, all of those things that reduce the barrier to entry is something that I'm personally passionate about.Corey: Scaling containers and scaling Kubernetes is hard, but a whole ‘nother level of difficulty is scaling humans. You've been at Google for, as you said, seven years and you did not start as a VP there. Getting promoted from Senior Director to VP at Google is a, shall we say, heavy lift. You also mentioned that you previously started with, I believe, it was a seven-person team at one point. How have you been able to do that? Because I can see a world in which, “Oh, we just write some code and we can scale the computers pretty easily,” I've never found a way to do that for people.Chen: So yes, I started actually—well not 7, but the team was 30 people [laugh]. And you can imagine how surprised I was when I joining Google Cloud with Kubernetes and GKE and it was a pretty small team, to the beginning of those days. But the team was already actually on the edge of burning out. You know, pings on Slack, the GitHub issues, there was so many things happening 24/7.And the thing was just doing everything. Everybody were doing everything. And one of the things I've done on my second month on the team—I did an off-site, right, all managers; that's what we do; we do off-sites—and I brought the team in to talk about—the leadership team—to talk about our team values. And in the beginning, they were a little bit pissed, I would say, “Okay, Chen. What's going on? You're wasting two days of our lives to talk about those things. Why we are not doing other things?”And I was like, “You know guys, this is really important. Let's talk about what's important for us.” It was an amazing it worked. By the way, that work is still the foundation of the culture in the team. We talked about the three values that we care about and how that will look like.And the reason it's important is that when you scale teams, the key thing is actually to scale decision-making. So, how do you scale decision-making? I think there are two things there. One is what you're trying to achieve. So, people should know and understand the vision and know where we want to get to.But the second thing is, how do we work? What's important for us? How do we prioritize? How do we make trade-offs? And when you have both the what we're trying to do and the how, you build that team culture. And when you have that, I find that you're set up more for success for scaling the team.Because then the storyteller is not just the leader or the manager. The entire team is a storyteller of how things are working in this team, how do we work, what you're trying to achieve, and so on. So, that's something that had been a critical. So, that's just, you know, from methodology of how I think it's the right thing to scale teams. Specifically, with a Kubernetes, there were more issues that we needed to work on.For example, building or [recoding 00:23:05] different functions. It cannot be just engineering doing everything. So, hiring the first product managers and information engineers and marketing people, oh my God. Yes, you have to have marketing people because there are so many events. And so, that was one thing, just you know, from people and skills.And the second thing is that it was an open-source project and a product, but what I was personally doing, I was—with the team—is bringing some product engineering practices into the open-source. So, can we say, for example, that we are going to focus on user experience this next release? And we're not going to do all the rest. And I remember, my team was like worried about, like, “Hey, what about that, and what about this, and we have—” you know, they were juggling everything together. And I remember telling them, “Imagine that everything is on the floor. All the balls are on the floor. I know they're on the floor, you know they're on the floor. It's okay. Let's just make sure that every time we pick something up, it never falls again.” And that idea is a principle that then evolved to ‘No Heroics,' and it evolved to ‘Sustainable Success.' But building things towards sustainable success is a principle which has been very helpful for us.Corey: This episode is sponsored in part by our friend at Uptycs. Attackers don't think in silos, so why would you have siloed solutions protecting cloud, containers, and laptops distinctly? Meet Uptycs - the first unified solution that prioritizes risk across your modern attack surface—all from a single platform, UI, and data model. Stop by booth 3352 at AWS re:Invent in Las Vegas to see for yourself and visit uptycs.com. That's U-P-T-Y-C-S.com. My thanks to them for sponsoring my ridiculous nonsense.Corey: When I take a look back, it's very odd to me to see the current reality that is Google, where you're talking about empathy, and the No Heroics, and the rest of that is not the reputation that Google enjoyed back when a lot of this stuff got started. It was always oh, engineers should be extraordinarily bright and gifted, and therefore it felt at the time like our customers should be as well. There was almost an arrogance built into, well, if you wrote your code more like Google will, then maybe your code wouldn't be so terrible in the cloud. And somewhat cynically I thought for a while that oh Kubernetes is Google's attempt to wind up making the rest of the world write software in a way that's more Google-y. I don't think that observation has aged very well. I think it's solved a tremendous number of problems for folks.But the complexity has absolutely been high throughout most of Kubernetes life. I would argue, on some level, that it feels like it's become successful almost in spite of that, rather than because of it. But I'm curious to get your take. Why do you believe that Kubernetes has been as successful as it clearly has?Chen: [unintelligible 00:25:34] two things. One about empathy. So yes, Google engineers are brilliant and are amazing and all great. And our customers are amazing, and brilliant, as well. And going back to the point before is, everyone has their job and where they need to be successful and we, as you say, we need to make things simpler and enable innovation. And our customers are driving innovation on top of our platform.So, that's the way I think about it. And yes, it's not as simple as it can be—probably—yet, but in studying the early days of Kubernetes, we have been investing a lot in what we call empathy, and the customer empathy workshop, for example. So, I partnered with Kelsey Hightower—and you mentioned yourself trying to start a cluster. The first time we did a workshop with my entire team, so then it was like 50 people [laugh], their task was to spin off a cluster without using any scripts that we had internally.And unfortunately, not many folks succeeded in this task. And out of that came the—what you you call it—a OKR, which was our goal for that quarter, is that you are able to spin off a cluster in three commands and troubleshoot if something goes wrong. Okay, that came out of that workshop. So, I do think that there is a lot of foundation on that empathetic engineering and the open-source of the community helped our Google teams to be more empathetic and understand what are the different use cases that they are trying to solve.And that actually bring me to why I think Kubernetes is so successful. People might be surprised, but the amount of investment we're making on orchestration or placement of containers within Kubernetes is actually pretty small. And it's been very small for the last seven years. Where do we invest time? One is, as I mentioned before, is on the what we call the API machinery.So, Kubernetes has introduced a way that is really suitable for a cloud-native technologies, the idea of reconciliation loop, meaning that the way Kubernetes is—Kubernetes is, like, a powerful automation machine, which can automate, of course, workload placement, but can automate other things. Think about it as a way of the Kubernetes API machinery is observing what is the current state, comparing it to the desired state, and working towards it. Think about, like, a thermostat, which is a different automation versus the ‘if this, then that,' where you need to anticipate different events. So, this idea about the API machinery and the way that you can extend it made it possible for different teams to use that mechanism to automate other things in that space.So, that has been one very powerful mechanism of Kubernetes. And that enabled all of innovation, even if you think about things like Istio, as an example, that's how it started, by leveraging that kind of mechanism to separate storage and so on. So, there are a lot of operators, the way people are managing their databases, or stateful workloads on top of Kubernetes, they're extending this mechanism. So, that's one thing that I think is key and built that ecosystem. The second thing, I am very proud of the community of Kubernetes.Corey: Oh, it's a phenomenal community success story.Chen: It's not easy to build a community, definitely not in open-source. I feel that the idea of values, you know, that I was talking about within my team was actually a big deal for us as we were building the community: how we treat each other, how do we help people start? You know, and we were talking before, like, am I going to talk about DEI and inclusivity, and so on. One of the things that I love about Kubernetes is that it's a new technology. There is actually—[unintelligible 00:29:39] no, even today, there is no one with ten years experience in Kubernetes. And if anyone says they have that, then they are lying.Corey: Time machine. Yes.Chen: That creates an opportunity for a lot of people to become experts in this technology. And by having it in open-source and making everything available, you can actually do it from your living room sofa. That excites me, you know, the idea that you can become an expert in this new technology and you can get involved, and you'll get people that will mentor you and help you through your first PR. And there are some roles within the community that you can start, you know, dipping your toes in the water. It's exciting. So, that makes me really happy, and I know that this community has changed the trajectory of many people's careers, which I love.Corey: I think that's probably one of the most impressive things that it's done. One last question I have for you is that we've talked a fair bit about the history and how we see it progressing through the view toward the somewhat recent past. What do you see coming in the future? What does the future of Kubernetes look like to you?Chen: Continue to be more and more boring. There is the promise of hybrid and multi-cloud, for example, is only possible by technologies like Kubernetes. So, I do think that, as a technology, it will continue to be important by ensuring portability and interoperability of workloads. I see a lot of edge use cases. If you think about it, it's like just lagging a bit around, like, innovation that we've seen in the cloud, can we bring that innovation to the edge, this will require more development within Kubernetes community as well.And that's really actually excites me. I think there's a lot of things that we're going to see there. And by the way, you've seen it also in KubeCon. I mean, there were some announcements in that space. In Google Cloud, we just announced before, like, with customers like Wendy's and Rite Aid as well. So, taking advantage of this technology to allow innovation everywhere.But beyond that, my hope is that we'll continue and hide the complexity. And our challenge will be to not make it a black box. Because that will be, in my opinion, a failure pattern, doesn't help those kinds of platforms. So, that will be the challenge. Can we scope the project, ensure that we have the right observability, and from a use case perspective, I do think edge is super interesting.Corey: I would agree. There are a lot of workloads out there that are simply never going to be hosted in the cloud provider region, for a variety of reasons of varying validity, but it is the truth. I think that the focus on addressing customers where they are has been an emerging best practice for cloud providers and I'm thrilled to see Google leading the charge on that.Chen: Yeah. And you just reminded me, the other thing that we see also more and more is definitely AI and ML workloads running on Kubernetes, which is part of that, right? So, Google Cloud is investing a lot in making an AI/ML easy. And I don't know if many people know, but, like, even Vertex AI, our own platform, is running on GKE. So, that's part of seeing how do we make sure that platform is suitable for these kinds of workloads and really help customers do the heavy lifting.So, that's another set of workloads that are very relevant at the edge. And one of our customers—MLB, for example—two things are interesting there. The first one, I think a lot of people sometimes say, “Okay, I'm going to move to the cloud and I want to know everything right now, how that will evolve.” And one of the things that's been really exciting with working with MLB for the last four years is the journey and the iterations. So, they started somewhat, like, at one phase and then they saw what's possible, and then moved to the next one, and so on. So, that's one. The other thing is that, really, they have so much ML running at the stadium with Google Cloud technology, which is very exciting.Corey: I'm looking forward to seeing how this continues to evolve and progress, particularly in light of the recent correction we're seeing in the market where a lot of hype-driven ideas are being stress test, maybe not in the way we might have hoped that they would, but it'll be really interesting to see what shakes out as far as things that deliver business value and are clear wins for customers versus a lot of the speculative stories that we've been hearing for a while now. Maybe I'm totally wrong on this. And this is going to be a temporary bump in the road, and we'll see no abatement in the ongoing excitement around so many of these emerging technologies, but I'm curious to see how it plays out. But that's the beautiful part about getting to be a pundit—or whatever it is people call me these days that's at least polite enough to say on a podcast—is that when I'm right, people think I'm a visionary, and when I'm wrong, people don't generally hold that against you. It seems like futurist is the easiest job in the world because if you predict and get it wrong, no one remembers. Predict and get it right, you look like a genius.Chen: So, first of all, I'm optimistic. So usually, my predictions are positive. I will say that, you know, what we are seeing, also what I'm hearing from our customers, technology is not for the sake of technology. Actually, nobody cares [laugh]. Even today.Okay, so nothing needs to change for, like, nobody would c—even today, nobody cares about Kubernetes. They need to care, unfortunately, but what I'm hearing from our customers is, “How do we create new experiences? How we make things easy?” Talent shortage is not just with tech people. It's also with people working in the warehouse or working in the store.Can we use technology to help inventory management? There's so many amazing things. So, when there is a real business opportunity, things are so much simpler. People have the right incentives to make it work. Because one thing we didn't talk about—right, we talked about all these new technologies and we talked about scaling team and so on—a lot of time, the challenge is not the technology.A lot of time, the challenge is the process. A lot of time, the challenge is the skills, is the culture, there's so many things. But when you have something—going back to what I said before—how you unite teams, when there's something a clear goal, a clear vision that everybody's excited about, they will make it work. So, I think this is where having a purpose for the innovation is critical for any successful project.Corey: I think and I hope that you're right. I really want to thank you for spending as much time with me as you have. If people want to learn more, where's the best place for them to find you?Chen: So, first of all, on Twitter. I'm there or on LinkedIn. I will say that I'm happy to connect with folks. Generally speaking, at some point in my career, I recognized that I have a voice that can help people, and I've experienced that can also help people build their careers. I'm happy to share that and [unintelligible 00:36:54] folks both in the company and outside of it.Corey: I think that's one of the obligations on a lot of us, once we wanted to get into a certain position or careers to send the ladder back down, for lack of a better term. It's I've never appreciated the perspective, “Well, screw everyone else. I got mine.” The whole point the next generation should have it easier than we did.Chen: Yeah, definitely.Corey: Chen Goldberg, General Manager of Cloud Runtimes and VP of Engineering at Google. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry rant of a comment talking about how LPARs on mainframes are absolutely not containers, making sure it's at least far too big to fit in a reasonably-sized Docker container.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Chris: For those not familiar with Kubernetes, can you tell us what it is and why there is so much buzz around it?Chris: Kubernetes, while it has many benefits also is a very complex technology, what are some of the key things organizations should keep in mind when using Kubernetes securely?Nikki: What kind of role do you see RBAC playing with Kubernetes? I don't hear a lot of talk around this subject and I'm curious what you think may be the importance of RBAC around KubernetesChris: Any nuances or recommendations to those rolling their own versus using managed Kubernetes offerings?Nikki: What does governance look like around Kubernetes - specifically around large, multi-cluster environmentsChris: From a compliance perspective, what are some resources organizations can use to securely provision and operate Kubernetes from a compliance perspective?Nikki: Can we also chat about Kubernetes API logs when it comes to auditing and assessments?Chris: You lead the Kubernetes Top 10 project with OWASP, can you tell us a bit about that?Nikki: Where do you think kubernetes, clusters, etc are heading? What does the future look like for security teams to not only understand these new technology areas, but to understand how to secure them properly?Chris: Do you feel like security practitioners are keeping pace with the rate of innovative technologies like Kubernetes, and if now, how can we fix that?Chris: We know you are the CTO and Co-Founder of KSOC - tell us a bit about the firm and what you all specialize in and what led you to founding it?
En el episodio de hoy hablamos de una herramienta que se llama Telepor y os contare como puede ayudaros con la forma de compartir archivos. Es la forma más fácil y segura de acceder a toda su infraestructura. Teleport es un proxy de acceso multiprotocolo consciente de la identidad que entiende los protocolos de cableado SSH, HTTPS, RDP, Kubernetes API, MySQL, MongoDB y PostgreSQL. Lo básico Teleport es una autoridad de certificados y un proxy de acceso multiprotocolo consciente de la identidad que implementa protocolos como SSH, RDP, HTTPS, Kubernetes API y una variedad de bases de datos SQL y NoSQL. Es completamente transparente para las herramientas del lado del cliente y está diseñado para trabajar con todo en el ecosistema DevOps de hoy. Dentro del tarball descargado, encontrarás tres binarios: el demonio teleport, el cliente tsh y la herramienta de administración tctl. Están libres de dependencias, escritos en un lenguaje compilado. Teleport es de código abierto y el código fuente está disponible en Github. Arquitectura de Teleport El concepto clave de la arquitectura de Teleport es el clúster. Un clúster de Teleport está formado por el servicio de autenticación de Teleport, el servicio de proxy de Teleport, los agentes de Teleport y los recursos a los que desee conectarse, como servidores Linux o Windows, bases de datos, clústeres Kubernetes, escritorios Windows y aplicaciones web internas. Cómo funciona un cluster de Teleport El concepto de cluster es la base del modelo de seguridad de Teleport. Los usuarios y los servidores deben unirse al mismo clúster antes de que se pueda conceder el acceso. Para unirse a un clúster, tanto los usuarios como los servidores deben autenticarse y recibir certificados. El Teleport Auth Service es la Agencia Certificadores del cluster, que emite certificados tanto para los usuarios como para los servidores con todos los protocolos soportados. Cómo funciona la autenticación El Servicio Proxy de Teleport sirve la pantalla de inicio de sesión en https://proxy.example.com:443, donde se pide a los usuarios su nombre de usuario, contraseña y un segundo factor. Si se utiliza un proveedor de identidad de terceros, como GitHub, el Servicio Proxy reenvía al usuario a GitHub utilizando OAuth2. El servicio proxy envía la identidad del usuario al servicio de autenticación de Teleport. A su vez, el Servicio de Autoría emite certificados para SSH, Kubernetes y otros recursos en un clúster, y los envía de vuelta al cliente a través del Servicio Proxy. Acceso para redes periféricas Teleport permite a los usuarios acceder a recursos que se ejecutan en dispositivos situados en cualquier parte del mundo, por ejemplo, dispositivos en redes de terceros, servidores detrás de NAT o dispositivos conectados a través de una conexión celular. Ejemplos de ello son los vehículos autoconducidos, los equipos de red, los locales comerciales y los dispositivos médicos.
- First off, for those not familiar with Containers and Kubernetes, what are they?- Why are organizations increasingly adopting these technologies over traditional forms of compute?- How does Cybersecurity change with Kubernetes and what are some things practitioners should be sure to keep an eye on?- When organizations are adopting Kubernetes they often are faced with options such as rolling their own or using managed Kubernetes offerings, any thoughts there?- I recently read a report that researchers found 380,000 publicly exposed Kubernetes API servers, do you think people simply are spinning up these new technologies with security as an afterthought?- Kubernetes is incredibly complex, do you think this leads to challenges around properly configuring and securing it?- Any thoughts on software supply chain security as it relates to Kubernetes and Containers?- For those looking to learn more about Kubernetes and Container Security, do you have any recommended resources?
Full Description / Show Notes Steren and Corey talk about how Google Cloud Run got its name (00:49) Corey talks about his experiences using Google Cloud (2:42) Corey and Steven discuss Google Cloud's cloud run custom domains (10:01) Steren talks about Cloud Run's high developer satisfaction and scalability (15:54) Corey and Steven talk about Cloud Run releases at Google I/O (23:21) Steren discusses the majority of developer and customer interest in Google's cloud product (25:33) Steren talks about his 20% projects around sustainability (29:00) About SterenSteren is a Senior Product Manager at Google Cloud. He is part of the serverless team, leading Cloud Run. He is also working on sustainability, leading the Google Cloud Carbon Footprint product.Steren is an engineer from École Centrale (France). Prior to joining Google, he was CTO of a startup building connected objects and multi device solutions.Links Referenced: Google Cloud Run: https://cloud.run sheets-url-shortener: https://github.com/ahmetb/sheets-url-shortener snark.cloud/run: https://snark.cloud/run Twitter: https://twitter.com/steren TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined today by Steren Giannini, who is a senior product manager at Google Cloud, specifically on something called Google Cloud Run. Steren, thank you for joining me today.Steren: Thanks for inviting me, Corey.Corey: So, I want to start at the very beginning of, “Oh, a cloud service. What are we going to call it?” “Well, let's put the word cloud in it.” “Okay, great. Now, it is cloud, so we have to give it a vague and unassuming name. What does it do?” “It runs things.” “Genius. Let's break and go for work.” Now, it's easy to imagine that you spent all of 30 seconds on a name, but it never works that way. How easy was it to get to Cloud Run as a name for the service?Steren: [laugh]. Such a good question because originally it was not named Cloud Run at all. The original name was Google Serverless Engine. But a few people know that because they've been helping us since the beginning, but originally it was Google Serverless Engine. Nobody liked the name internally, and I think at one point, we wondered, “Hey, can we drop the engine structure and let's just think about the name. And what does this thing do?” “It runs things.”We already have Cloud Build. Well, wouldn't it be great to have Cloud Run to pair with Cloud Build so that after you've built your containers, you can run them? And that's how we ended up with this very simple Cloud Run, which today seems so obvious, but it took us a long time to get to that name, and we actually had a lot of renaming to do because we were about to ship with Google Serverless Engine.Corey: That seems like a very interesting last-minute change because it's not just a find and replace at that point, it's—Steren: No.Corey: —“Well, okay, if we call it Cloud Run, which can also be a verb or a noun, depending, is that going to change the meaning of some sentences?” And just doing a find and replace without a proofread pass as well, well, that's how you wind up with funny things on Twitter.Steren: API endpoints needed to be changed, adding weeks of delays to the launch. That is why we—you know, [laugh] announced in 2018 and publicly launched in 2019.Corey: I've been doing a fair bit of work in cloud for a while, and I wound up going down a very interesting path. So, the first native Google Cloud service—not things like WP Engine that ride on top of GCP—but my first native Google Cloud Service was done in service of this podcast, and it is built on Google Cloud Run. I don't think I've told you part of this story yet, but it's one of the reasons I reached out to invite you onto the show. Let me set the stage here with a little bit of backstory that might explain what the hell I'm talking about.As listeners of this show are probably aware, we have sponsors whom we love and adore. In the early days of this show, they would say, “Great, we want to tell people about our product”—which is the point of a sponsorship—“And then send them to a URL.” “Great. What's the URL?” And they would give me something that was three layers deep, then with a bunch of UTM tracking parameters at the end.And it's, “You do realize that no one is going to be sitting there typing all of that into a web browser?” At best, you're going to get three words or so. So, I built myself a URL redirector, snark.cloud. I can wind up redirecting things in there anywhere it needs to go.And for a long time, I did this on top of S3 and then put CloudFront in front of it. And this was all well and good until, you know, things happened in the fullness of time. And now holy crap, I have an operations team involved in things, and maybe I shouldn't be the only person that knows how to work on all of these bits and bobs. So, it was time to come up with something that had a business user-friendly interface that had some level of security, so I don't wind up automatically building out a spam redirect service for anything that wants to, and it needs to be something that's easy to work with. So, I went on an exploration.So, at first it showed that there were—like, I have an article out that I've spoken about before that there are, “17 Ways to Run Containers on AWS,” and then I wrote the sequel, “17 More Ways to Run Containers on AWS.” And I'm keeping a list, I'm almost to the third installation of that series, which is awful. So, great. There's got to be some ways to build some URL redirect stuff with an interface that has an admin panel. And I spent three days on this trying a bunch of different things, and some were running on deprecated versions of Node that wouldn't build properly and others were just such complex nonsense things that had got really bad. I was starting to consider something like just paying for Bitly or whatnot and making it someone else's problem.And then I stumbled upon something on GitHub that really was probably one of the formative things that changed my opinion of Google Cloud for the better. And within half an hour of discovering this thing, it was up and running. I did the entire thing, start to finish, from my iPad in a web browser, and it just worked. It was written by—let me make sure I get his name correct; you know, messing up someone's name is a great way to say that we don't care about them—Ahmet Balkan used to work at Google Cloud; now he's over at Twitter. And he has something up on GitHub that is just absolutely phenomenal about this, called sheets-url-shortener.And this is going to sound wild, but stick with me. The interface is simply a Google Sheet, where you have one column that has the shorthand slug—for example, run; if you go to snark.cloud/run, it will redirect to Google Cloud Run's website. And the second column is where you want it to go. The end.And whenever that gets updated, there's of course some caching issues, which means it can take up to five seconds from finishing that before it will actually work across the entire internet. And as best I can tell, that is fundamentally magic. But what made it particularly useful and magic, from my perspective, was how easy it was to get up and running. There was none of this oh, but then you have to integrate it with Google Sheets and that's a whole ‘nother team so there's no way you're going to be able to figure that out from our Docs. Go talk to them and then come back in the day.They were the get started, click here to proceed. It just worked. And it really brought back some of the magic of cloud for me in a way that I hadn't seen in quite a while. So, all which is to say, amazing service, I continue to use it for all of these sponsored links, and I am still waiting for you folks to bill me, but it fits comfortably in the free tier because it turns out that I don't have hundreds of thousands of people typing it in every week.Steren: I'm glad it went well. And you know, we measure tasks success for Cloud Run. And we do know that most new users are able to deploy their apps very quickly. And that was the case for you. Just so you know, we've put a lot of effort to make sure it was true, and I'll be glad to tell you more about all that.But for that particular service, yes, I suppose Ahmet—who I really enjoyed working with on Cloud Run, he was really helpful designing Cloud Run with us—has open-sourced this side project. And basically, you might even have clicked on a deploy to Cloud Run button on GitHub, right, to deploy it?Corey: That is exactly what I did and it somehow just worked and—Steren: Exactly.Corey: And it knew, even logging into the Google Cloud Console because it understands who I am because I use Google Docs and things, I'm already logged in. None of this, “Oh, which one of these 85 credential sets is it going to be?” Like certain other clouds. It was, “Oh, wow. Wait, cloud can be easy and fun? When did that happen?”Steren: So, what has happened when you click that deploy to Google Cloud button, basically, the GitHub repository was built into a container with Cloud Build and then was deployed to Cloud Run. And once on Cloud Run, well, hopefully, you have forgotten about it because that's what we do, right? We—give us your code, in a container if you know containers if you don't just—we support, you know, many popular languages, and we know how to build them, so don't worry about that. And then we run it. And as you said, when there is low traffic or no traffic, it scales to zero.When there is low traffic, you're likely going to stay under the generous free tier. And if you have more traffic for, you know, Screaming in the Cloud suddenly becoming a high destination URL redirects, well, Cloud Run will scale the number of instances of this container to be able to handle the load. Cloud Run scales automatically and very well, but only—as always—charging you when you are processing some requests.Corey: I had to fork and make a couple of changes myself after I wound up doing some testing. The first was to make the entire thing case insensitive, which is—you know, makes obvious sense. And the other was to change the permanent redirect to a temporary redirect because believe it or not, in the fullness of time, sometimes sponsors want to change the landing page in different ways for different campaigns and that's fine by me. I just wanted to make sure people's browser cache didn't remember it into perpetuity. But it was easy enough to run—that was back in the early days of my exploring Go, which I've been doing this quarter—and in the couple of months this thing has been running it has been effectively flawless.It's set it; it's forget it. The only challenges I had with it are it was a little opaque getting a custom domain set up that—which is still in beta, to be clear—and I've heard some horror stories of people saying it got wedged. In my case, no, I deployed it and I started refreshing it and suddenly, it start throwing an SSL error. And it's like, “Oh, that's not good, but I'm going to break my own lifestyle here and be patient for ten minutes.” And sure enough, it cleared itself and everything started working. And that was the last time I had to think about any of this. And it just worked.Steren: So first, Cloud Run is HTTPS only. Why? Because it's 2020, right? It's 2022, but—Corey: [laugh].Steren: —it's launched in 2020. And so basically, we have made a decision that let's just not accept HTTP traffic; it's only HTTPS. As a consequence, we need to provision a cert for your custom domain. That is something that can take some time. And as you said, we keep it in beta or in preview because we are not yet satisfied with the experience or even the performance of Cloud Run custom domains, so we are actively working on fixing that with a different approach. So, expect some changes, hopefully, this year.Corey: I will say it does take a few seconds when people go to a snark.cloud URL for it to finish resolving, and it feels on some level like it's almost like a cold start problem. But subsequent visits, the same thing also feel a little on the slow and pokey side. And I don't know if that's just me being wildly impatient, if there's an optimization opportunity, or if that's just inherent to the platform that is not under current significant load.Steren: So, it depends. If the Cloud Run service has scaled down to zero, well of course, your service will need to be started. But what we do know, if it's a small Go binary, like something that you mentioned, it should really take less than, let's say, 500 milliseconds to go from zero to one of your container instance. Latency can also be due to the way the code is running. If it occurred is fetching things from Google Sheets at every startup, that is something that could add to the startup latency.So, I would need to take a look, but in general, we are not spinning up a virtual machine anytime we need to scale horizontally. Like, our infrastructure is a multi-tenant, rapidly scalable infrastructure that can materialize a container in literally 300 milliseconds. The rest of the latency comes from what does the container do at startup time?Corey: Yeah, I just ran a quick test of putting time in front of a curl command. It looks like it took 4.83 seconds. So, enough to be perceptive. But again, for just a quick redirect, it's generally not the end of the world and there's probably something I'm doing that is interesting and odd. Again, I did not invite you on the show to file a—Steren: [laugh].Corey: Bug report. Let's be very clear here.Steren: Seems on the very high end of startup latencies. I mean, I would definitely expect under the second. We should deep-dive into the code to take a look. And by the way, building stuff on top of spreadsheets. I've done that a ton in my previous lives as a CTO of a startup because well, that's the best administration interface, right? You just have a CRUD UI—Corey: [unintelligible 00:12:29] world and all business users understand it. If people in Microsoft decided they were going to change Microsoft Excel interface, even a bit, they would revert the change before noon of the same day after an army of business users grabbed pitchforks and torches and marched on their headquarters. It's one of those things that is how the world runs; it is the world's most common IDE. And it's great, but I still think of databases through the lens of thinking about it as a spreadsheet as my default approach to things. I also think of databases as DNS, but that's neither here nor there.Steren: You know, if you have maybe 100 redirects, that's totally fine. And by the way, the beauty of Cloud Run in a spreadsheet, as you mentioned is that Cloud Run services run with a certain identity. And this identity, you can grant it permissions. And in that case, what I would recommend if you haven't done so yet, is to give an identity to your Cloud Run service that has the permission to read that particular spreadsheet. And how you do that you invite the email of the service account as a reader of your spreadsheet, and that's probably what you did.Corey: The click button to the workflow on Google Cloud automatically did that—Steren: Oh, wow.Corey: —and taught me how to do it. “Here's the thing that look at. The end.” It was a flawless user-onboarding experience.Steren: Very nicely done. But indeed, you know, there is this built-in security which is the principle of minimal permission, like each of your Cloud Run service should basically only be able to read and write to the backing resources that they should. And by default, we give you a service account which has a lot of permissions, but our recommendation is to narrow those permissions to basically only look at the cloud storage buckets that the service is supposed to look at. And the same for a spreadsheet.Corey: Yes, on some level, I feel like I'm going to write an analysis of my own security approach. It would be titled, “My God, It's Full Of Stars” as I look at the IAM policies of everything that I've configured. The idea of least privilege is great. What I like about this approach is that it made it easy to do it so I don't have to worry about it. At one point, I want to go back and wind up instrumenting it a bit further, just so I can wind up getting aggregate numbers of all right, how many times if someone visited this particular link? It'll be good to know.And I don't know… if I have to change permissions to do that yet, but that's okay. It's the best kind of problem: future Corey. So, we'll deal with that when the time comes. But across the board, this has just been a phenomenal experience and it's clear that when you were building Google Cloud Run, you understood the assignment. Because I was looking for people saying negative things about it and by and large, all of its seem to come from a perspective of, “Well, this isn't going to be the most cost-effective or best way to run something that is hyperscale, globe-spanning.”It's yes, that's the thing that Kubernetes was originally built to run and for some godforsaken reason people run their blog on it instead now. Okay. For something that is small, scales to zero, and has long periods where no one is visiting it, great, this is a terrific answer and there's absolutely nothing wrong with that. It's clear that you understood who you were aiming at, and the migration strategy to something that is a bit more, I want to say robust, but let's be clear what I mean when I'm saying that if you want something that's a little bit more impressive on your SRE resume as you're trying a multi-year project to get hired by Google or pretend you got hired by Google, yeah, you can migrate to something else in a relatively straightforward way. But that this is up, running, and works without having to think about it, and that is no small thing.Steren: So, there are two things to say here. The first is yes, indeed, we know we have high developer satisfaction. You know, we measure this—in Google Cloud, you might have seen those small satisfaction surveys popping up sometimes on the user interface, and you know, we are above 90% satisfaction score. We hire third parties to help us understand how usable and what satisfaction score would users get out of Cloud Run, and we are constantly getting very, very good results, in absolute but also compared to the competition.Now, the other thing that you said is that, you know, Cloud Run is for small things, and here while it is definitely something that allows you to be productive, something that strives for simplicity, but it also scales a lot. And contrary to other systems, you do not have any pre-provisioning to make. So, we have done demos where we go from zero to 10,000 container instances in ten seconds because of the infrastructure on which Cloud Run runs, which is fully managed and multi-tenant, we can offer you this scale on demand. And many of our biggest customers have actually not switched to something like Kubernetes after starting with Cloud Run because they value the low maintenance, the no infrastructure management that Cloud Run brings them.So, we have like Ikea, ecobee… for example ecobee, you know, the smart thermostats are using Cloud Run to ingest events from the thermostat. I think Ikea is using Cloud Run more and more for more of their websites. You know, those companies scale, right? This is not, like, scale to zero hobby project. This is actually production e-commerce and connected smart objects production systems that have made the choice of being on a fully-managed platform in order to reduce their operational overhead.[midroll 00:17:54]Corey: Let me be clear. When I say scale—I think we might be talking past each other on a small point here. When I say scale, I'm talking less about oh tens or hundreds of thousands of containers running concurrently. I'm talking in a more complicated way of, okay, now we have a whole bunch of different microservices talking to one another and affinity as far as location to each other for data transfer reasons. And as you start beginning to service discovery style areas of things, where we build a really complicated applications because we hired engineers and failed to properly supervise them, and that type of convoluted complex architecture.That's where it feels like Cloud Run increasingly, as you move in that direction, starts to look a little bit less like the tool of choice. Which is fine, I want to be clear on that point. The sense that I've gotten of it is a great way to get started, it's a great way to continue running a thing you don't have to think about because you have a day job that isn't infrastructure management. And it is clear to—as your needs change—to either remain with the service or pivot to a very close service without a whole lot of retooling, which is key. There's not much of a lock-in story to this, which I love.Steren: That was one of the key principles when we started to design Cloud Run was, you know, we realized the industry had agreed that the container image was the standard for the deployment artifact of software. And so, we just made the early choice of focusing on deploying containers. Of course, we are helping users build those containers, you know, we have things called build packs, we can continuously deploy from GitHub, but at the end of the day, the thing that gets auto-scaled on Cloud Run is a container. And that enables portability.As you said. You can literally run the same container, nothing proprietary in it, I want to be clear. Like, you're just listening on a port for some incoming requests. Those requests can be HTTP requests, events, you know, we have products that can push events to Cloud Run like Eventarc or Pub/Sub. And this same container, you can run it on your local machine, you can run it on Kubernetes, you can run it on another cloud. You're not locked in, in terms of API of the compute.We even went even above and beyond by having the Cloud Run API looks like a Kubernetes API. I think that was an extra effort that we made. I'm not sure people care that much, but if you look at the Cloud Run API, it is actually exactly looking like Kubernetes, Even if there is no Kubernetes at all under the hood; we just made it for portability. Because we wanted to address this concern of serverless which was lock-in. Like, when you use a Function as a Service product, you are worried that the architecture that you are going to develop around this product is going to be only working in this particular cloud provider, and you're not in control of the language, the version that this provider has decided to offer you, you're not in control of more of the complexity that can come as you want to scan this code, as you want to move this code between staging and production or test this code.So, containers are really helping with that. So, I think we made the right choice of this new artifact that to build Cloud Run around the container artifact. And you know, at the time when we launched, it was a little bit controversial because back in the day, you know, 2018, 2019, serverless really meant Functions as a Service. So, when we launched, we little bit redefined serverless. And we basically said serverless containers. Which at the time were two worlds that in the same sentence were incompatible. Like, many people, including internally, had concerns around—Corey: Oh, the serverless versus container war was a big thing for a while. Everyone was on a different side of that divide. It's… containers are effectively increasingly—and I know, I'll get email for this, and I don't even slightly care, they're a packaging format—Steren: Exactly.Corey: —where it solves the problem of how do I build this thing to deploy on Debian instances? And Ubuntu instances, and other instances, God forbid, Windows somewhere, you throw a container over the wall. The end. Its DevOps is about breaking down the walls between Dev and Ops. That's why containers are here to make them silos that don't have to talk to each other.Steren: A container image is a glorified zip file. Literally. You have a set of layers with files in them, and basically, we decided to adopt that artifact standard, but not the perceived complexity that existed at the time around containers. And so, we basically merged containers with serverless to make something as easy to use as a Function as a Service product but with the power of bringing your own container. And today, we are seeing—you mentioned, what kind of architecture would you use Cloud Run for?So, I would say now there are three big buckets. The obvious one is anything that is a website or an API, serving public internet traffic, like your URL redirect service, right? This is, you have an API, takes a request and returns a response. It can be a REST API, GraphQL API. We recently added support for WebSockets, which is pretty unique for a service offering to support natively WebSockets.So, what I mean natively is, my client can open a socket connection—a bi-directional socket connection—with a given instance, for up to one hour. This is pretty unique for something that is as fully managed as Cloud Run.Corey: Right. As we're recording this, we are just coming off of Google I/O, and there were a number of announcements around Cloud Run that were touching it because of, you know, strange marketing issues. I only found out that Google I/O was a thing and featured cloud stuff via Twitter at the time it was happening. What did you folks release around Cloud Run?Steren: Good question, actually. Part of the Google I/O Developer keynote, I pitched a story around how Cloud Run helps developers, and the I/O team liked the story, so we decided to include that story as part of the live developer keynote. So, on stage, we announced Cloud Run jobs. So now, I talked to you about Cloud Run services, which can be used to expose an API, but also to do, like, private microservice-to-microservice communication—because cloud services don't have to be public—and in that case, we support GRPC and, you know, a very strong security mechanism where only Service A can invoke Service B, for example, but Cloud Run jobs are about non-request-driven containers. So, today—I mean, before Google I/O a few days ago, the only requirement that we imposed on your container image was that it started to listen for requests, or events, or GRPC—Corey: Web requests—Steren: Exactly—Corey: It speaks [unintelligible 00:24:35] you want as long as it's HTTP. Yes.Steren: That was the only requirement we asked you to have on your container image. And now we've changed that. Now, if you have a container that basically starts and executes to completion, you can deploy it on a Cloud Run job. So, you will use Cloud Run jobs for, like, daily batch jobs. And you have the same infrastructure, so on-demand, you can go from zero to, I think for now, the maximum is a hundred tasks in parallel, for—of course, you can run many tasks in sequence, but in parallel, you can go from zero to a hundred, right away to run your daily batch job, daily admin job, data processing.But this is more in the batch mode than in streaming mode. If you would like to use a more, like, streaming data processing, than a Cloud Run service would still be the best fit because you can literally push events to it, and it will auto-scale to handle any number of events that it receives.Corey: Do you find that the majority of customers are using Cloud Run for one-off jobs that barely will get more than a single container, like my thing, or do you find that they're doing massively parallel jobs? Where's the lion's share of developer and customer interest?Steren: It's both actually. We have both individual developers, small startups—which really value the scale to zero and pay per use model of Cloud Run. Your URL redirect service probably is staying below the free tier, and there are many, many, many users in your case. But at the same time, we have big, big, big customers who value the on-demand scalability of Cloud Run. And for these customers, of course, they will probably very likely not scale to zero, but they value the fact that—you know, we have a media company who uses Cloud Run for TV streaming, and when there is a soccer game somewhere in the world, they have a big spike of usage of requests coming in to their Cloud Run service, and here they can trust the rapid scaling of Cloud Run so they don't have to pre-provision things in advance to be able to serve that sudden traffic spike.But for those customers, Cloud Run is priced in a way so that if you know that you're going to consume a lot of Cloud Run CPU and memory, you can purchase Committed Use Discounts, which will lower your bill overall because you know you are going to spend one dollar per hour on Cloud Run, well purchase a Committed Use Discount because you will only spend 83 cents instead of one dollar. And also, Cloud Run and comes with two pricing model, one which is the default, which is the request-based pricing model, which is basically you only have CPU allocated to your container instances if you are processing at least one request. But as a consequence of that, you are not paying outside of the processing of those requests. Those containers might stay up for you, one, ready to receive new requests, but you're not paying for them. And so, that is—you know, your URL redirect service is probably in that mode where yes when you haven't used it for a while, it will scale down to zero, but if you send one request to it, it will serve that request and then it will stay up for a while until it decides to scale down. But you the user only pays when you are processing these specific requests, a little bit like a Function as a Service product.Corey: Scales to zero is one of the fundamental tenets of serverless that I think that companies calling something serverless, but it always charges you per hour anyway. Yeah, that doesn't work. Storage, let's be clear, is a separate matter entirely. I'm talking about compute. Even if your workflow doesn't scale down to zero ever as a workload, that's fine, but if the workload does, you don't get to keep charging me for it.Steren: Exactly. And so, in that other mode where you decide to always have CPU allocated to your Cloud Run container instances, then you pay for the entire lifecycle of this container instances. You still benefit from the auto-scaling of Cloud Run, but you will pay for the lifecycle and in that case, the price points are lower because you pay for a longer period of time. But that's more the price model that those bigger customers will take because at their scale, they basically always receive requests, so they already to pay always, basically.Corey: I really want to thank you for taking the time to chat with me. Before you go, one last question that we'll be using as a teaser for the next episode that we record together. It seems like this is a full-time job being the product manager on Cloud Run, but no Google, contrary to popular opinion, does in fact, still support 20% projects. What's yours?Steren: So, I've been looking to work on Cloud Run since it was a prototype, and you know, for a long time, we've been iterating privately on Cloud Run, launching it, seeing it grow, seeing it adopted, it's great. It's my full-time job. But on Fridays, I still find the time to have a 20% project, which also had quite a bit of impact. And I work on some sustainability efforts for Google Cloud. And notably, we've released two things last year.The first one is that we are sharing some carbon characteristics of Google Cloud regions. So, if you have seen those small leaves in the Cloud Console next to the regions that are emitting the less carbon, that's something that I helped bring to life. And the second one, which is something quite big, is we are helping customers report and reduce their gross carbon emissions of their Google Cloud usage by providing an out of the box reporting tool called Google Cloud Carbon Footprint. So, that's something that I was able to bootstrap with a team a little bit on the side of my Cloud Run project, but I was very glad to see it launched by our CEO at the last Cloud Next Conference. And now it is a fully-funded project, so we are very glad that we are able to help our customers better meet their sustainability goals themselves.Corey: And we will be talking about it significantly on the next episode. We're giving a teaser, not telling the whole story.Steren: [laugh].Corey: I really want to thank you for being as generous with your time as you are. If people want to learn more, where can they find you?Steren: Well, if they want to learn more about Cloud Run, we talked about how simple was that name. It was obviously not simple to find this simple name, but the domain is https://cloud.run.Corey: We will also accept snark.cloud/run, I will take credit for that service, too.Steren: [laugh]. Exactly.Corey: There we are.Steren: And then, people can find me on Twitter at @steren, S-T-E-R-E-N. I'll be happy—I'm always happy to help developers get started or answer questions about Cloud Run. And, yeah, thank you for having me. As I said, you successfully deployed something in just a few minutes to Cloud Run. I would encourage the audience to—Corey: In spite of myself. I know, I'm as surprised as anyone.Steren: [laugh].Corey: The only snag I really hit was the fact that I was riding shotgun when we picked up my daughter from school and went through a dead zone. It's like, why is this thing not loading in the Google Cloud Console? Yeah, fix the cell network in my area, please.Steren: I'm impressed that you did all of that from an iPad. But yeah, to the audience give Cloud Run the try. You can really get started connecting your GitHub repository or deploy your favorite container image. And we've worked very hard to ensure that usability was here, and we know we have pretty strong usability scores. Because that was a lot of work to simplicity, and product excellence and developer experience is a lot of work to get right, and we are very proud of what we've achieved with Cloud Run and proud to see that the developer community has been very supportive and likes this product.Corey: I'm a big fan of what you've built. And well, of course, it links to all of that in the show notes. I just want to thank you again for being so generous with your time. And thanks again for building something that I think in many ways showcases the best of what Google Cloud has to offer.Steren: Thanks for the invite.Corey: We'll talk again soon. Steren Giannini is a senior product manager at Google Cloud, on Cloud Run. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice. If it's on YouTube, put the thumbs up and the subscribe buttons as well, but in the event that you hated it also include an angry comment explaining why your 20% project is being a shithead on the internet.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
This week we discuss work life balance, the State of Continuous Delivery Survey and recap WWDC. Plus, some thoughts on Buddha and parenting… Runner-up Titles The Buddha had no kids The Air Fryer is a PaaS. Rundown Work vs. Life Office workers get little reward for returning to the office – an idle factory is taboo (https://cote.io/2022/06/08/office-workers-get-little-reward-for-returning-to-work-an-idle-factory-is-taboo/) CEOs had a phenomenal year. Workers, less so (https://thehustle.co/05312022-CEO-vs-Worker-Pay/) Tesla monitored its employees on Facebook with help of PR firm during 2017 union push (https://www.cnbc.com/2022/06/02/tesla-paid-pr-firm-to-surveil-employees-on-facebook-in-2017-union-push.html) Elon Musk asks all Tesla employees to come back to the office or quit (https://electrek.co/2022/06/01/elon-musk-tesla-employees-come-back-office-or-quit/) Ford factory workers get 40-hour week (https://www.history.com/this-day-in-history/ford-factory-workers-get-40-hour-week) Survey Says State of Continuous Delivery (https://cd.foundation/wp-content/uploads/sites/78/2022/06/The-State-of-CD-Q1-2022.pdf) Chainguard raises $50M Series A for supply chain security (https://techcrunch.com/2022/06/02/chainguard-raises-50m-to-guard-supply-chains/) WWDC Apple WWDC 2022: the 16 biggest announcements (https://www.theverge.com/2022/6/6/23141939/apple-wwdc-2022-biggest-announcements-ios-16-macbook-air-macos-watchos) Create macOS or Linux virtual machines - WWDC22 - Videos (https://developer.apple.com/videos/play/wwdc2022/10002/) Apple will allow Linux VMs to run Intel apps with Rosetta in macOS Ventura (https://arstechnica.com/gadgets/2022/06/macos-ventura-will-extend-rosetta-support-to-linux-virtual-machines/) All the New Features Coming to Your Mac This Fall (https://www.wired.com/story/apple-ventura-macos-13-preview/) EU reaches deal to make USB-C a common charger for most electronic devices (https://www.engadget.com/eu-reaches-deal-to-make-usb-c-a-common-charger-for-most-electronic-devices-104605067.html) Relevant to your Interests Earnings HashiCorp quarter (https://siliconangle.com/2022/06/02/kubecost-launches-open-source-opencost-project-keep-lid-kubernetes-spending/https://twitter.com/jaminball/status/1532457687778312213?s=21&t=FiXLrZJc1LtYPQyeU27CEg) MongoDB quarter (https://twitter.com/jaminball/status/1532094080418607104) GitLab quarter (https://twitter.com/jaminball/status/1533906440695316480?s=21&t=K30ROu7mTJp1DgbvYxhDCA) Salesforce stock jumps as it raises profit forecast (https://www.cnbc.com/2022/05/31/salesforce-crm-earnings-q1-2023.html) Tech Valuations Tumble, but Business Software Stocks Are Cushioned by the Cloud (https://www.wsj.com/articles/tech-valuations-tumble-but-business-software-stocks-are-cushioned-by-the-cloud-11654164000?mod=djemalertNEWS) A Framework for Navigating Down Markets (https://future.com/framework-valuation-navigating-down-markets/) VMware Good thread (VMware history) (https://twitter.com/jdooley_clt/status/1528688334394077184) Broadcom buying VMware makes sense for IoT infrastructure (https://www.theregister.com/2022/05/26/broadcom_buying_vmware_makes_sense/) Broadcom plans 'rapid subscription transition' for VMware (https://www.theregister.com/2022/05/27/broadcom_vmware_subscriptions/) Broadcom buying VMware makes sense for IoT infrastructure (https://www.theregister.com/2022/05/26/broadcom_buying_vmware_makes_sense/) Brian Madden's brutal and unfiltered thoughts on the Broadcom / VMware deal (https://www.linkedin.com/pulse/brian-maddens-brutal-unfiltered-thoughts-broadcom-vmware-brian-madden/?trackingId=m%2FeClBkjQxSyYPzRVcnpHQ%3D%3D) Broadcom will tame the VMware beast (https://siliconangle.com/2022/05/27/broadcom-will-tame-vmware-beast/) VMware Blockchain (https://www.vmware.com/products/blockchain.html) Bolt, the payments start-up, has begun laying off employees. (https://www.nytimes.com/2022/05/25/business/bolt-layoffs.html) Layoffs.fyi - Tech Layoff Tracker and Startup Layoff Lists (https://layoffs.fyi/) Proton Is Trying to Become Google—Without Your Data (https://www.wired.com/story/proton-mail-calendar-drive-vpn/) OpenStack, except it's outer space, (https://twitter.com/Kemp/status/1530198772872933377) Microsoft confirms it's taking a 'new approach' with its game streaming device | Engadget (https://www.engadget.com/microsoft-confirms-its-taking-a-new-approach-to-its-game-streaming-device-090144247.html) How to do fun and interesting executive dinners, round tables, etc. – online and in-person (https://cote.io/2022/05/27/how-to-do-executive-dinners/) Over 380 000 open Kubernetes API servers | The Shadowserver Foundation (https://www.shadowserver.org/news/over-380-000-open-kubernetes-api-servers/) Twitter fined $150M for misusing 2FA data (https://www.techtarget.com/searchsecurity/news/252520746/Twitter-fined-150M-for-misusing-2FA-data) First she documented the alt-right. Now she's coming for crypto. (https://www.washingtonpost.com/technology/2022/05/29/molly-white-crypto/) Exclusive: Microsoft continues to iterate on an Xbox cloud streaming device codenamed 'Keystone' (https://www.windowscentral.com/gaming/xbox/exclusive-microsoft-continues-to-iterate-on-an-xbox-cloud-streaming-stick-codenamed-keystone) Microsoft won't lower software costs on AWS, Google clouds (https://www.techtarget.com/searchenterprisedesktop/news/252520735/Microsoft-wont-lower-software-costs-on-AWS-Google-clouds) A researcher's avatar was sexually assaulted on a metaverse platform owned by Meta, making her the latest victim of sexual abuse on Meta's platforms, watchdog says (https://www.businessinsider.com/researcher-claims-her-avatar-was-raped-on-metas-metaverse-platform-2022-5) Forget LinkedIn—Your Next Job Offer Could Come via Slack (https://www.wsj.com/articles/job-hunters-workers-use-slack-to-find-job-offers-fast-11653918510) Sheryl Sandberg will leave Meta after 14 years this fall (https://www.protocol.com/sheryl-sandberg-meta-coo) This crypto startup believes 'sex-to-earn' is the future of web3 (https://www.inputmag.com/tech/sexn-crypto-startup-sex-to-earn-web3-nfts) ExpressVPN rejects CERT-In directives, removes its India servers (https://economictimes.indiatimes.com/tech/technology/expressvpn-rejects-cert-in-directives-suspends-india-ops/articleshow/91956961.cms) MongoDB CTO on (no)SQL, Superapps, and Southeast Asia (https://future.com/mongodb-cto-cloud-providers-southeast-asia/) Google is combining Meet and Duo into a single app for voice and video calls (https://www.theverge.com/2022/6/1/23149832/google-meet-duo-combination-voice-video) This VR headset will measure a user's brain activity (https://www.pcgamer.com/this-vr-headset-will-measure-a-users-brain-activity) Tesla has to respond to increase in phantom braking complaints (https://electrek.co/2022/06/03/tesla-respond-increase-phantom-braking-complaints/) Amazon's retail CEO is resigning after 23 years (https://www.theverge.com/2022/6/3/23153327/amazon-ceo-consumer-retail-businesses-dave-clark-resigning) Zoom Hires Greg Tomb as President (https://www.globenewswire.com/news-release/2022/06/06/2457166/0/en/Zoom-Hires-Greg-Tomb-as-President.html?utm_source=newsletter&utm_medium=email&utm_campaign=newsletter_axioslogin&stream=top) Peloton hires Amazon Web Services executive Liz Coddington as new CFO in latest shakeup (https://techcrunch.com/2022/06/07/peloton-hires-amazon-executive-liz-coddington-new-cfo-latest-shakeup/) Musk accuses Twitter of 'resisting and thwarting' his right to information on fake accounts (https://www.cnbc.com/2022/06/06/musk-says-twitter-is-refusing-to-share-data-on-spam-accounts.html) ‘A new IBM': How the tech giant simplified its marketing (https://www.marketingweek.com/ibm-simplifying-marketing/) Coinbase extends hiring pause for 'foreseeable future' and plans to rescind some offers (https://www.cnbc.com/2022/06/02/coinbase-hiring-pause-for-foreseeable-future-and-will-rescind-offers.html) Evading the Big Blue Name Police (https://www.itjungle.com/2022/06/08/evading-the-big-blue-name-police/) IBM CEO explains why company offloaded Watson Health (https://www.theregister.com/2022/06/08/ibm_ceo_arvind_krishna_explains/) MongoDB fires up new cloud, on-premises releases (https://venturebeat.com/2022/06/07/mongodb-fires-up-new-cloud-on-premise-releases/) In reversal, Twitter plans to comply with Musk's demands for data (https://www.washingtonpost.com/technology/2022/06/08/elon-musk-twitter-bot-data/) OpenCost: Open Source Collaboration on Kubernetes Cost Standards (https://thenewstack.io/opencost-open-source-collaboration-on-kubernetes-cost-standards/) Kubecost launches open-source OpenCost project (https://siliconangle.com/2022/06/02/kubecost-launches-open-source-opencost-project-keep-lid-kubernetes-spending/) Datadog's 2022 State of Serverless repor (https://www.datadoghq.com/state-of-serverless/)t (https://www.datadoghq.com/state-of-serverless/) The IRS needs digital transformation (https://twitter.com/josephzeballos/status/1534189391328976897?s=21&t=uPoXtZtzX-q_GAtodVVbsg) Oracle quietly closes $28B deal to buy electronic health records company Cerner (https://techcrunch.com/2022/06/07/oracle-quietly-closes-28b-deal-to-buy-electronic-health-records-company-cerner/) Nonsense The Cast of HBO's 'Silicon Valley' Cast Explains What Real Startups Do (NSFW) (https://www.youtube.com/watch?v=5Y64UeNeiOM) WSJ News Exclusive | Justin Timberlake Sells Song Catalog to Blackstone-Backed Fund (https://www.wsj.com/articles/justin-timberlake-sells-song-catalog-to-blackstone-backed-fund-11653557400) Every person in the U.S. now receives an average of 65 packages a year. (https://twitter.com/mims/status/1529222322686672896) Spotify Podcasters Are Making $18,000 a Month With Nothing But White Noise (https://www.bloomberg.com/news/articles/2022-06-01/how-to-make-money-on-spotify-a-white-noise-podcast-could-bring-you-big-bucks) Flying ice cream? Unilever links with drone delivery service Flytrex (https://www.fooddive.com/news/flying-ice-cream-unilever-links-with-drone-delivery-service-flytrex/624541/) Texas to reclaim home of the largest Buc-ee's (https://www.kxan.com/news/texas/texas-to-reclaim-home-of-the-largest-buc-ees/) Sponsors Teleport — The easiest, most secure way to access infrastructure. (https://goteleport.com/?utm_campaign=eg&utm_medium=partner&utm_source=sdt) Listener Feedback / Jobs Tim wants you to work at Biogen as a Global DevOps Lead, Commercial & Medical IT (https://jobs.smartrecruiters.com/Biogen/743999821251393-global-devops-lead-commercial-medical-it) Walmart is hiring Principal Software Engineer - Linux Kernel in Sunnyvale, California (https://www.linkedin.com/jobs/view/2945555862) Ryan wants you to work at DataDog as the Vice President, Events and Field Marketing (https://www.datadoghq.com/careers/detail/?gh_jid=4252681) J&J Senior Algorithm Analytics Engineer in Redwood City, California | Medical Devices (https://jobs.jnj.com/jobs/2206008429W?lang=en-us) NYTimes is hiring a Staff Software Engineer - CI/CD Platform (https://nytimes.wd5.myworkdayjobs.com/Tech/job/New-York-NY/Staff-Software-Engineer---CI-CD-Platform_REQ-012710) Conferences FinOps X (https://events.linuxfoundation.org/finops-x/), June 20-21, 2022, Matt's there! DevOps Loop (https://devopsloop.io), June 22nd. Free! Coté put the agenda together. Open Source Summit North America (https://events.linuxfoundation.org/open-source-summit-north-america/), June 21-24, 2022, Matt's there! DevOpsDayLA (https://www.socallinuxexpo.org/scale/19x/devops-day-la) is happening at SCALE19x (https://www.socallinuxexpo.org/scale/19x), July, 29th, 2022 Discount code: DEVOP THAT Conference Wisconsin (https://that.us/call-for-counselors/wi/2022/), July 25, 2022 Discount code: SDTFriendsWI50 - $50 off 4-Day everything ticket Discount code:: SDTFriendsWI25 - $25 off 3-Day Camper ticket VMware Explore 2022, August 29 – September 1, 2022 (https://www.vmware.com/explore.html?src=so_623a10693ceb7&cid=7012H000001Kb0hQAC) SpringOne Platform (https://springone.io/?utm_source=cote&utm_medium=podcast&utm_content=sdt), SF, December 6–8, 2022 THAT Conference Texas Call For Counselors (https://that.us/call-for-counselors/tx/2023/) Jan 16-19, 2023, SDT news & hype Join us in Slack (http://www.softwaredefinedtalk.com/slack). Get a SDT Sticker! Send your postal address to stickers@softwaredefinedtalk.com (mailto:stickers@softwaredefinedtalk.com) and we will send you free laptop stickers! Follow us on Twitch (https://www.twitch.tv/sdtpodcast), Twitter (https://twitter.com/softwaredeftalk), Instagram (https://www.instagram.com/softwaredefinedtalk/), LinkedIn (https://www.linkedin.com/company/software-defined-talk/) and YouTube (https://www.youtube.com/channel/UCi3OJPV6h9tp-hbsGBLGsDQ/featured). Use the code SDT to get $20 off Coté's book, (https://leanpub.com/digitalwtf/c/sdt) Digital WTF (https://leanpub.com/digitalwtf/c/sdt), so $5 total. Become a sponsor of Software Defined Talk (https://www.softwaredefinedtalk.com/ads)! Recommendations Brandon: Apple Watch SE (https://www.apple.com/apple-watch-se/?afid=p238%7CsZvcBV5q2-dc_mtid_1870765e38482_pcrid_584606532877_pgrid_117189313172_pntwk_g_pchan__pexid__&cid=aos-us-kwgo-watch--slid---product-) for Tweens Coté: Matt Levine interview on (https://longform.org/posts/longform-podcast-490-matt-levine) The Longform podcast (https://longform.org/posts/longform-podcast-490-matt-levine). Photo Credits Banner (https://unsplash.com/photos/88IMbX3wZmI) ArtWork (https://unsplash.com/photos/5cFwQ-WMcJU)
Ransomware gang threatens to overthrow Costa Rica government Majority of Kubernetes API Servers Exposed to the Public Internet Your iPhone Is Vulnerable to a Malware Attack Even When It's Off Hacker Finds Way to Unlock Tesla Models, Start Cars Training to beat a bad cybersecurity culture Ben Verde-Chapman, Chief Growth Officer of GigWage talks about the financial needs of the gig worker and reducing the impact of income volatility. Hosts: Louis Maresca, Brian Chee, and Curt Franklin Guest: Ben Verde-Chapman Download or subscribe to this show at https://twit.tv/shows/this-week-in-enterprise-tech. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsors: linode.com/twiet Compiler - TWIET plextrac.com/twit
Ransomware gang threatens to overthrow Costa Rica government Majority of Kubernetes API Servers Exposed to the Public Internet Your iPhone Is Vulnerable to a Malware Attack Even When It's Off Hacker Finds Way to Unlock Tesla Models, Start Cars Training to beat a bad cybersecurity culture Ben Verde-Chapman, Chief Growth Officer of GigWage talks about the financial needs of the gig worker and reducing the impact of income volatility. Hosts: Louis Maresca, Brian Chee, and Curt Franklin Guest: Ben Verde-Chapman Download or subscribe to this show at https://twit.tv/shows/this-week-in-enterprise-tech. Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsors: linode.com/twiet Compiler - TWIET plextrac.com/twit
About ABAB Periasamy is the co-founder and CEO of MinIO, an open source provider of high performance, object storage software. In addition to this role, AB is an active investor and advisor to a wide range of technology companies, from H2O.ai and Manetu where he serves on the board to advisor or investor roles with Humio, Isovalent, Starburst, Yugabyte, Tetrate, Postman, Storj, Procurify, and Helpshift. Successful exits include Gitter.im (Gitlab), Treasure Data (ARM) and Fastor (SMART).AB co-founded Gluster in 2005 to commoditize scalable storage systems. As CTO, he was the primary architect and strategist for the development of the Gluster file system, a pioneer in software defined storage. After the company was acquired by Red Hat in 2011, AB joined Red Hat's Office of the CTO. Prior to Gluster, AB was CTO of California Digital Corporation, where his work led to scaling of the commodity cluster computing to supercomputing class performance. His work there resulted in the development of Lawrence Livermore Laboratory's “Thunder” code, which, at the time was the second fastest in the world. AB holds a Computer Science Engineering degree from Annamalai University, Tamil Nadu, India.AB is one of the leading proponents and thinkers on the subject of open source software - articulating the difference between the philosophy and business model. An active contributor to a number of open source projects, he is a board member of India's Free Software Foundation.Links: MinIO: https://min.io/ Twitter: https://twitter.com/abperiasamy MinIO Slack channel: https://minio.slack.com/join/shared_invite/zt-11qsphhj7-HpmNOaIh14LHGrmndrhocA LinkedIn: https://www.linkedin.com/in/abperiasamy/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They've also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That's S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.Corey: This episode is sponsored in part by our friends at Rising Cloud, which I hadn't heard of before, but they're doing something vaguely interesting here. They are using AI, which is usually where my eyes glaze over and I lose attention, but they're using it to help developers be more efficient by reducing repetitive tasks. So, the idea being that you can run stateless things without having to worry about scaling, placement, et cetera, and the rest. They claim significant cost savings, and they're able to wind up taking what you're running as it is, in AWS, with no changes, and run it inside of their data centers that span multiple regions. I'm somewhat skeptical, but their customers seem to really like them, so that's one of those areas where I really have a hard time being too snarky about it because when you solve a customer's problem, and they get out there in public and say, “We're solving a problem,” it's very hard to snark about that. Multus Medical, Construx.ai, and Stax have seen significant results by using them, and it's worth exploring. So, if you're looking for a smarter, faster, cheaper alternative to EC2, Lambda, or batch, consider checking them out. Visit risingcloud.com/benefits. That's risingcloud.com/benefits, and be sure to tell them that I said you because watching people wince when you mention my name is one of the guilty pleasures of listening to this podcast.in a siloCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by someone who's doing something a bit off the beaten path when we talk about cloud. I've often said that S3 is sort of a modern wonder of the world. It was the first AWS service brought into general availability. Today's promoted guest is the co-founder and CEO of MinIO, Anand Babu Periasamy, or AB as he often goes, depending upon who's talking to him. Thank you so much for taking the time to speak with me today.AB: It's wonderful to be here, Corey. Thank you for having me.Corey: So, I want to start with the obvious thing, where you take a look at what is the cloud and you can talk about AWS's ridiculous high-level managed services, like Amazon Chime. Great, we all see how that plays out. And those are the higher-level offerings, ideally aimed at problems customers have, but then they also have the baseline building blocks services, and it's hard to think of a more baseline building block than an object store. That's something every cloud provider has, regardless of how many scare quotes there are around the word cloud; everyone offers the object store. And your solution is to look at this and say, “Ah, that's a market ripe for disruption. We're going to build through an open-source community software that emulates an object store.” I would be sitting here, more or less poking fun at the idea except for the fact that you're a billion-dollar company now.AB: Yeah.Corey: How did you get here?AB: So, when we started, right, we did not actually think about cloud that way, right? “Cloud, it's a hot trend, and let's go disrupt is like that. It will lead to a lot of opportunity.” Certainly, it's true, it lead to the M&S, right, but that's not how we looked at it, right? It's a bad idea to build startups for M&A.When we looked at the problem, when we got back into this—my previous background, some may not know that it's actually a distributed file system background in the open-source space.Corey: Yeah, you were one of the co-founders of Gluster—AB: Yeah.Corey: —which I have only begrudgingly forgiven you. But please continue.AB: [laugh]. And back then we got the idea right, but the timing was wrong. And I had—while the data was beginning to grow at a crazy rate, end of the day, GlusterFS has to still look like an FS, it has to look like a file system like NetApp or EMC, and it was hugely limiting what we can do with it. The biggest problem for me was legacy systems. I have to build a modern system that is compatible with a legacy architecture, you cannot innovate.And that is where when Amazon introduced S3, back then, like, when S3 came, cloud was not big at all, right? When I look at it, the most important message of the cloud was Amazon basically threw everything that is legacy. It's not [iSCSI 00:03:21] as a Service; it's not even FTP as a Service, right? They came up with a simple, RESTful API to store your blobs, whether it's JavaScript, Android, iOS, or [AAML 00:03:30] application, or even Snowflake-type application.Corey: Oh, we spent ten years rewriting our apps to speak object store, and then they released EFS, which is NFS in the cloud. It's—AB: Yeah.Corey: —I didn't realize I could have just been stubborn and waited, and the whole problem would solve itself. But here we are. You're quite right.AB: Yeah. And even EFS and EBS are more for legacy stock can come in, buy some time, but that's not how you should stay on AWS, right? When Amazon did that, for me, that was the opportunity. I saw that… while world is going to continue to produce lots and lots of data, if I built a brand around that, I'm not going to go wrong.The problem is data at scale. And what do I do there? The opportunity I saw was, Amazon solved one of the largest problems for a long time. All the legacy systems, legacy protocols, they convinced the industry, throw them away and then start all over from scratch with the new API. While it's not compatible, it's not standard, it is ridiculously simple compared to anything else.No fstabs, no [unintelligible 00:04:27], no [root 00:04:28], nothing, right? From any application anywhere you can access was a big deal. When I saw that, I was like, “Thank you Amazon.” And I also knew Amazon would convince the industry that rewriting their application is going to be better and faster and cheaper than retrofitting legacy applications.Corey: I wonder how much that's retconned because talking to some of the people involved in the early days, they were not at all convinced they [laugh] would be able to convince the industry to do this.AB: Actually, if you talk to the analyst reporters, the IDC's, Gartner's of the world to the enterprise IT, the VMware community, they would say, “Hell no.” But if you talk to the actual application developers, data infrastructure, data architects, the actual consumers of data, for them, it was so obvious. They actually did not know how to write an fstab. The iSCSI and NFS, you can't even access across the internet, and the modern applications, they ran across the globe, in JavaScript, and all kinds of apps on the device. From [Snap 00:05:21] to Snowflake, today is built on object store. It was more natural for the applications team, but not from the infrastructure team. So, who you asked that mattered.But nevertheless, Amazon convinced the rest of the world, and our bet was that if this is going to be the future, then this is also our opportunity. S3 is going to be limited because it only runs inside AWS. Bulk of the world's data is produced everywhere and only a tiny fraction will go to AWS. And where will the rest of the data go? Not SAN, NAS, HDFS, or other blob store, Azure Blob, or GCS; it's not going to be fragmented. And if we built a better object store, lightweight, faster, simpler, but fully compatible with S3 API, we can sweep and consolidate the market. And that's what happened.Corey: And there is a lot of validity to that. We take a look across the industry, when we look at various standards—I mean, one of the big problems with multi-cloud in many respects is the APIs are not quite similar enough. And worse, the failure patterns are very different, of I don't just need to know how the load balancer works, I need to know how it breaks so I can detect and plan for that. And then you've got the whole identity problem as well, where you're trying to manage across different frames of reference as you go between providers, and leads to a bit of a mess. What is it that makes MinIO something that has been not just something that has endured since it was created, but clearly been thriving?AB: The real reason, actually is not the multi-cloud compatibility, all that, right? Like, while today, it is a big deal for the users because the deployments have grown into 10-plus petabytes, and now the infrastructure team is taking it over and consolidating across the enterprise, so now they are talking about which key management server for storing the encrypted keys, which key management server should I talk to? Look at AWS, Google, or Azure, everyone has their own proprietary API. Outside they, have [YAML2 00:07:18], HashiCorp Vault, and, like, there is no standard here. It is supposed to be a [KMIP 00:07:23] standard, but in reality, it is not. Even different versions of Vault, there are incompatibilities for us.That is where—like from Key Management Server, Identity Management Server, right, like, everything that you speak around, how do you talk to different ecosystem? That, actually, MinIO provides connectors; having the large ecosystem support and large community, we are able to address all that. Once you bring MinIO into your application stack like you would bring Elasticsearch or MongoDB or anything else as a container, your application stack is just a Kubernetes YAML file, and you roll it out on any cloud, it becomes easier for them, they're able to go to any cloud they want. But the real reason why it succeeded was not that. They actually wrote their applications as containers on Minikube, then they will push it on a CI/CD environment.They never wrote code on EC2 or ECS writing objects on S3, and they don't like the idea of [past 00:08:15], where someone is telling you just—like you saw Google App Engine never took off, right? They liked the idea, here are my building blocks. And then I would stitch them together and build my application. We were part of their application development since early days, and when the application matured, it was hard to remove. It is very much like Microsoft Windows when it grew, even though the desktop was Microsoft Windows Server was NetWare, NetWare lost the game, right?We got the ecosystem, and it was actually developer productivity, convenience, that really helped. The simplicity of MinIO, today, they are arguing that deploying MinIO inside AWS is easier through their YAML and containers than going to AWS Console and figuring out how to do it.Corey: As you take a look at how customers are adopting this, it's clear that there is some shift in this because I could see the story for something like MinIO making an awful lot of sense in a data center environment because otherwise, it's, “Great. I need to make this app work with my SAN as well as an object store.” And that's sort of a non-starter for obvious reasons. But now you're available through cloud marketplaces directly.AB: Yeah.Corey: How are you seeing adoption patterns and interactions from customers changing as the industry continues to evolve?AB: Yeah, actually, that is how my thinking was when I started. If you are inside AWS, I would myself tell them that why don't use AWS S3? And it made a lot of sense if it's on a colo or your own infrastructure, then there is an object store. It even made a lot of sense if you are deploying on Google Cloud, Azure, Alibaba Cloud, Oracle Cloud, it made a lot of sense because you wanted an S3 compatible object store. Inside AWS, why would you do it, if there is AWS S3?Nowadays, I hear funny arguments, too. They like, “Oh, I didn't know that I could use S3. Is S3 MinIO compatible?” Because they will be like, “It came along with the GitLab or GitHub Enterprise, a part of the application stack.” They didn't even know that they could actually switch it over.And otherwise, most of the time, they developed it on MinIO, now they are too lazy to switch over. That also happens. But the real reason that why it became serious for me—I ignored that the public cloud commercialization; I encouraged the community adoption. And it grew to more than a million instances, like across the cloud, like small and large, but when they start talking about paying us serious dollars, then I took it seriously. And then when I start asking them, why would you guys do it, then I got to know the real reason why they wanted to do was they want to be detached from the cloud infrastructure provider.They want to look at cloud as CPU network and drive as a service. And running their own enterprise IT was more expensive than adopting public cloud, it was productivity for them, reducing the infrastructure, people cost was a lot. It made economic sense.Corey: Oh, people always cost more the infrastructure itself does.AB: Exactly right. 70, 80%, like, goes into people, right? And enterprise IT is too slow. They cannot innovate fast, and all of those problems. But what I found was for us, while we actually build the community and customers, if you're on AWS, if you're running MinIO on EBS, EBS is three times more expensive than S3.Corey: Or a single copy of it, too, where if you're trying to go multi-AZ and you have the replication traffic, and not to mention you have to over-provision it, which is a bit of a different story as well. So, like, it winds up being something on the order of 30 times more expensive, in many cases, to do it right. So, I'm looking at this going, the economics of running this purely by itself in AWS don't make sense to me—long experience teaches me the next question of, “What am I missing?” Not, “That's ridiculous and you're doing it wrong.” There's clearly something I'm not getting. What am I missing?AB: I was telling them until we made some changes, right—because we saw a couple of things happen. I was initially like, [unintelligible 00:12:00] does not make 30 copies. It makes, like, 1.4x, 1.6x.But still, the underlying block storage is not only three times more expensive than S3, it's also slow. It's a network storage. Trying to put an object store on top of it, another, like, software-defined SAN, like EBS made no sense to me. Smaller deployments, it's okay, but you should never scale that on EBS. So, it did not make economic sense. I would never take it seriously because it would never help them grow to scale.But what changed in recent times? Amazon saw that this was not only a problem for MinIO-type players. Every database out there today, every modern database, even the message queues like Kafka, they all have gone scale-out. And they all depend on local block store and putting a scale-out distributed database, data processing engines on top of EBS would not scale. And Amazon introduced storage optimized instances. Essentially, that reduced to bet—the data infrastructure guy, data engineer, or application developer asking IT, “I want a SuperMicro, or Dell server, or even virtual machines.” That's too slow, too inefficient.They can provision these storage machines on demand, and then I can do it through Kubernetes. These two changes, all the public cloud players now adopted Kubernetes as the standard, and they have to stick to the Kubernetes API standard. If they are incompatible, they won't get adopted. And storage optimized that is local drives, these are machines, like, [I3 EN 00:13:23], like, 24 drives, they have SSDs, and fast network—like, 25-gigabit 200-gigabit type network—availability of these machines, like, what typically would run any database, HDFS cluster, MinIO, all of them, those machines are now available just like any other EC2 instance.They are efficient. You can actually put MinIO side by side to S3 and still be price competitive. And Amazon wants to—like, just like their retail marketplace, they want to compete and be open. They have enabled it. In that sense, Amazon is actually helping us. And it turned out that now I can help customers build multiple petabyte infrastructure on Amazon and still stay efficient, still stay price competitive.Corey: I would have said for a long time that if you were to ask me to build out the lingua franca of all the different cloud providers into a common API, the S3 API would be one of them. Now, you are building this out, multi-cloud, you're in all three of the major cloud marketplaces, and the way that you do that and do those deployments seems like it is the modern multi-cloud API of Kubernetes. When you first started building this, Kubernetes was very early on. What was the evolution of getting there? Or were you one of the first early-adoption customers in a Kubernetes space?AB: So, when we started, there was no Kubernetes. But we saw the problem was very clear. And there was containers, and then came Docker Compose and Swarm. Then there was Mesos, Cloud Foundry, you name it, right? Like, there was many solutions all the way up to even VMware trying to get into that space.And what did we do? Early on, I couldn't choose. I couldn't—it's not in our hands, right, who is going to be the winner, so we just simply embrace everybody. It was also tiring that to allow implement native connectors to all of them different orchestration, like Pivotal Cloud Foundry alone, they have their own standard open service broker that's only popular inside their system. Go outside elsewhere, everybody was incompatible.And outside that, even, Chef Ansible Puppet scripts, too. We just simply embraced everybody until the dust settle down. When it settled down, clearly a declarative model of Kubernetes became easier. Also Kubernetes developers understood the community well. And coming from Borg, I think they understood the right architecture. And also written in Go, unlike Java, right?It actually matters, these minute new details resonating with the infrastructure community. It took off, and then that helped us immensely. Now, it's not only Kubernetes is popular, it has become the standard, from VMware to OpenShift to all the public cloud providers, GKS, AKS, EKS, whatever, right—GKE. All of them now are basically Kubernetes standard. It made not only our life easier, it made every other [ISV 00:16:11], other open-source project, everybody now can finally write one code that can be operated portably.It is a big shift. It is not because we chose; we just watched all this, we were riding along the way. And then because we resonated with the infrastructure community, modern infrastructure is dominated by open-source. We were also the leading open-source object store, and as Kubernetes community adopted us, we were naturally embraced by the community.Corey: Back when AWS first launched with S3 as its first offering, there were a bunch of folks who were super excited, but object stores didn't make a lot of sense to them intrinsically, so they looked into this and, “Ah, I can build a file system and users base on top of S3.” And the reaction was, “Holy God don't do that.” And the way that AWS decided to discourage that behavior is a per request charge, which for most workloads is fine, whatever, but there are some that causes a significant burden. With running something like MinIO in a self-hosted way, suddenly that costing doesn't exist in the same way. Does that open the door again to so now I can use it as a file system again, in which case that just seems like using the local file system, only with extra steps?AB: Yeah.Corey: Do you see patterns that are emerging with customers' use of MinIO that you would not see with the quote-unquote, “Provider's” quote-unquote, “Native” object storage option, or do the patterns mostly look the same?AB: Yeah, if you took an application that ran on file and block and brought it over to object storage, that makes sense. But something that is competing with object store or a layer below object store, that is—end of the day that drives our block devices, you have a block interface, right—trying to bring SAN or NAS on top of object store is actually a step backwards. They completely missed the message that Amazon told that if you brought a file system interface on top of object store, you missed the point, that you are now bringing the legacy things that Amazon intentionally removed from the infrastructure. Trying to bring them on top doesn't make it any better. If you are arguing from a compatibility some legacy applications, sure, but writing a file system on top of object store will never be better than NetApp, EMC, like EMC Isilon, or anything else. Or even GlusterFS, right?But if you want a file system, I always tell the community, they ask us, “Why don't you add an FS option and do a multi-protocol system?” I tell them that the whole point of S3 is to remove all those legacy APIs. If I added POSIX, then I'll be a mediocre object storage and a terrible file system. I would never do that. But why not write a FUSE file system, right? Like, S3Fs is there.In fact, initially, for legacy compatibility, we wrote MinFS and I had to hide it. We actually archived the repository because immediately people started using it. Even simple things like end of the day, can I use Unix [Coreutils 00:19:03] like [cp, ls 00:19:04], like, all these tools I'm familiar with? If it's not file system object storage that S3 [CMD 00:19:08] or AWS CLI is, like, to bloatware. And it's not really Unix-like feeling.Then what I told them, “I'll give you a BusyBox like a single static binary, and it will give you all the Unix tools that works for local filesystem as well as object store.” That's where the [MC tool 00:19:23] came; it gives you all the Unix-like programmability, all the core tool that's object storage compatible, speaks native object store. But if I have to make object store look like a file system so UNIX tools would run, it would not only be inefficient, Unix tools never scaled for this kind of capacity.So, it would be a bad idea to take step backwards and bring legacy stuff back inside. For some very small case, if there are simple POSIX calls using [ObjectiveFs 00:19:49], S3Fs, and few, for legacy compatibility reasons makes sense, but in general, I would tell the community don't bring file and block. If you want file and block, leave those on virtual machines and leave that infrastructure in a silo and gradually phase them out.Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats v-u-l-t-r.com slash screaming.Corey: So, my big problem, when I look at what S3 has done is in it's name because of course, naming is hard. It's, “Simple Storage Service.” The problem I have is with the word simple because over time, S3 has gotten more and more complex under the hood. It automatically tiers data the way that customers want. And integrated with things like Athena, you can now query it directly, whenever of an object appears, you can wind up automatically firing off Lambda functions and the rest.And this is increasingly looking a lot less like a place to just dump my unstructured data, and increasingly, a lot like this is sort of a database, in some respects. Now, understand my favorite database is Route 53; I have a long and storied history of misusing services as databases. Is this one of those scenarios, or is there some legitimacy to the idea of turning this into a database?AB: Actually, there is now S3 Select API that if you're storing unstructured data like CSV, JSON, Parquet, without downloading even a compressed CSV, you can actually send a SQL query into the system. IN MinIO particularly the S3 Select is [CMD 00:21:16] optimized. We can load, like, every 64k worth of CSV lines into registers and do CMD operations. It's the fastest SQL filter out there. Now, bringing these kinds of capabilities, we are just a little bit away from a database; should we do database? I would tell definitely no.The very strength of S3 API is to actually limit all the mutations, right? Particularly if you look at database, they're dealing with metadata, and querying; the biggest value they bring is indexing the metadata. But if I'm dealing with that, then I'm dealing with really small block lots of mutations, the separation of objects storage should be dealing with persistence and not mutations. Mutations are [AWS 00:21:57] problem. Separation of database work function and persistence function is where object storage got the storage right.Otherwise, it will, they will make the mistake of doing POSIX-like behavior, and then not only bringing back all those capabilities, doing IOPS intensive workloads across the HTTP, it wouldn't make sense, right? So, object storage got the API right. But now should it be a database? So, it definitely should not be a database. In fact, I actually hate the idea of Amazon yielding to the file system developers and giving a [file three 00:22:29] hierarchical namespace so they can write nice file managers.That was a terrible idea. Writing a hierarchical namespace that's also sorted, now puts tax on how the metadata is indexed and organized. The Amazon should have left the core API very simple and told them to solve these problems outside the object store. Many application developers don't need. Amazon was trying to satisfy everybody's need. Saying no to some of these file system-type, file manager-type users, what should have been the right way.But nevertheless, adding those capabilities, eventually, now you can see, S3 is no longer simple. And we had to keep that compatibility, and I hate that part. I actually don't mind compatibility, but then doing all the wrong things that Amazon is adding, now I have to add because it's compatible. I kind of hate that, right?But now going to a database would be pushing it to the whole new level. Here is the simple reason why that's a bad idea. The right way to do database—in fact, the database industry is already going in the right direction. Unstructured data, the key-value or graph, different types of data, you cannot possibly solve all that even in a single database. They are trying to be multimodal database; even they are struggling with it.You can never be a Redis, Cassandra, like, a SQL all-in-one. They tried to say that but in reality, that you will never be better than any one of those focused database solutions out there. Trying to bring that into object store will be a mistake. Instead, let the databases focus on query language implementation and query computation, and leave the persistence to object store. So, object store can still focus on storing your database segments, the table segments, but the index is still in the memory of the database.Even the index can be snapshotted once in a while to object store, but use objects store for persistence and database for query is the right architecture. And almost all the modern databases now, from Elasticsearch to [unintelligible 00:24:21] to even Kafka, like, message queue. They all have gone that route. Even Microsoft SQL Server, Teradata, Vertica, name it, Splunk, they all have gone object storage route, too. Snowflake itself is a prime example, BigQuery and all of them.That's the right way. Databases can never be consolidated. There will be many different kinds of databases. Let them specialize on GraphQL or Graph API, or key-value, or SQL. Let them handle the indexing and persistence, they cannot handle petabytes of data. That [unintelligible 00:24:51] to object store is how the industry is shaping up, and it is going in the right direction.Corey: One of the ways I learned the most about various services is by talking to customers. Every time I think I've seen something, this is amazing. This service is something I completely understand. All I have to do is talk to one more customer. And when I was doing a bill analysis project a couple of years ago, I looked into a customer's account and saw a bucket with okay, that has 280 billion objects in it—and wait was that billion with a B?And I asked them, “So, what's going on over there?” And there's, “Well, we built our own columnar database on top of S3. This may not have been the best approach.” It's, “I'm going to stop you there. With no further context, it was not, but please continue.”It's the sort of thing that would never have occurred to me to even try, do you tend to see similar—I would say they're anti-patterns, except somehow they're made to work—in some of your customer environments, as they are using the service in ways that are very different than ways encouraged or even allowed by the native object store options?AB: Yeah, when I first started seeing the database-type workloads coming on to MinIO, I was surprised, too. That was exactly my reaction. In fact, they were storing these 256k, sometimes 64k table segments because they need to index it, right, and the table segments were anywhere between 64k to 2MB. And when they started writing table segments, it was more often [IOPS-type 00:26:22] I/O pattern, then a throughput-type pattern. Throughput is an easier problem to solve, and MinIO always saturated these 100-gigabyte NVMe-type drives, they were I/O intensive, throughput optimized.When I started seeing the database workloads, I had to optimize for small-object workloads, too. We actually did all that because eventually I got convinced the right way to build a database was to actually leave the persistence out of database; they made actually a compelling argument. If historically, I thought metadata and data, data to be very big and coming to object store make sense. Metadata should be stored in a database, and that's only index page. Take any book, the index pages are only few, database can continue to run adjacent to object store, it's a clean architecture.But why would you put database itself on object store? When I saw a transactional database like MySQL, changing the [InnoDB 00:27:14] to [RocksDB 00:27:15], and making changes at that layer to write the SS tables [unintelligible 00:27:19] to MinIO, and then I was like, where do you store the memory, the journal? They said, “That will go to Kafka.” And I was like—I thought that was insane when it started. But it continued to grow and grow.Nowadays, I see most of the databases have gone to object store, but their argument is, the databases also saw explosive growth in data. And they couldn't scale the persistence part. That is where they realized that they still got very good at the indexing part that object storage would never give. There is no API to do sophisticated query of the data. You cannot peek inside the data, you can just do streaming read and write.And that is where the databases were still necessary. But databases were also growing in data. One thing that triggered this was the use case moved from data that was generated by people to now data generated by machines. Machines means applications, all kinds of devices. Now, it's like between seven billion people to a trillion devices is how the industry is changing. And this led to lots of machine-generated, semi-structured, structured data at giant scale, coming into database. The databases need to handle scale. There was no other way to solve this problem other than leaving the—[unintelligible 00:28:31] if you looking at columnar data, most of them are machine-generated data, where else would you store? If they tried to build their own object storage embedded into the database, it would make database mentally complicated. Let them focus on what they are good at: Indexing and mutations. Pull the data table segments which are immutable, mutate in memory, and then commit them back give the right mix. What you saw what's the fastest step that happened, we saw that consistently across. Now, it is actually the standard.Corey: So, you started working on this in 2014, and here we are—what is it—eight years later now, and you've just announced a Series B of $100 million dollars on a billion-dollar valuation. So, it turns out this is not just one of those things people are using for test labs; there is significant momentum behind using this. How did you get there from—because everything you're saying makes an awful lot of sense, but it feels, at least from where I sit, to be a little bit of a niche. It's a bit of an edge case that is not the common case. Obviously, I missing something because your investors are not the types of sophisticated investors who see something ridiculous and, “Yep. That's the thing we're going to go for.” There right more than they're not.AB: Yeah. The reason for that was the saw what we were set to do. In fact, these are—if you see the lead investor, Intel, they watched us grow. They came into Series A and they saw, everyday, how we operated and grew. They believed in our message.And it was actually not about object store, right? Object storage was a means for us to get into the market. When we started, our idea was, ten years from now, what will be a big problem? A lot of times, it's hard to see the future, but if you zoom out, it's hidden in plain sight.These are simple trends. Every major trend pointed to world producing more data. No one would argue with that. If I solved one important problem that everybody is suffering, I won't go wrong. And when you solve the problem, it's about building a product with fine craftsmanship, attention to details, connecting with the user, all of that standard stuff.But I picked object storage as the problem because the industry was fragmented across many different data stores, and I knew that won't be the case ten years from now. Applications are not going to adopt different APIs across different clouds, S3 to GCS to Azure Blob to HDFS to everything is incompatible. I saw that if I built a data store for persistence, industry will consolidate around S3 API. Amazon S3, when we started, it looked like they were the giant, there was only one cloud industry, it believed mono-cloud. Almost everyone was talking to me like AWS will be the world's data center.I certainly see that possibility, Amazon is capable of doing it, but my bet was the other way, that AWS S3 will be one of many solutions, but not—if it's all incompatible, it's not going to work, industry will consolidate. Our bet was, if world is producing so much data, if you build an object store that is S3 compatible, but ended up as the leading data store of the world and owned the application ecosystem, you cannot go wrong. We kept our heads low and focused on the first six years on massive adoption, build the ecosystem to a scale where we can say now our ecosystem is equal or larger than Amazon, then we are in business. We didn't focus on commercialization; we focused on convincing the industry that this is the right technology for them to use. Once they are convinced, once you solve business problems, making money is not hard because they are already sold, they are in love with the product, then convincing them to pay is not a big deal because data is so critical, central part of their business.We didn't worry about commercialization, we worried about adoption. And once we got the adoption, now customers are coming to us and they're like, “I don't want open-source license violation. I don't want data breach or data loss.” They are trying to sell to me, and it's an easy relationship game. And it's about long-term partnership with customers.And so the business started growing, accelerating. That was the reason that now is the time to fill up the gas tank and investors were quite excited about the commercial traction as well. And all the intangible, right, how big we grew in the last few years.Corey: It really is an interesting segment, that has always been something that I've mostly ignored, like, “Oh, you want to run your own? Okay, great.” I get it; some people want to cosplay as cloud providers themselves. Awesome. There's clearly a lot more to it than that, and I'm really interested to see what the future holds for you folks.AB: Yeah, I'm excited. I think end of the day, if I solve real problems, every organization is moving from compute technology-centric to data-centric, and they're all looking at data warehouse, data lake, and whatever name they give data infrastructure. Data is now the centerpiece. Software is a commodity. That's how they are looking at it. And it is translating to each of these large organizations—actually, even the mid, even startups nowadays have petabytes of data—and I see a huge potential here. The timing is perfect for us.Corey: I'm really excited to see this continue to grow. And I want to thank you for taking so much time to speak with me today. If people want to learn more, where can they find you?AB: I'm always on the community, right. Twitter and, like, I think the Slack channel, it's quite easy to reach out to me. LinkedIn. I'm always excited to talk to our users or community.Corey: And we will of course put links to this in the [show notes 00:33:58]. Thank you so much for your time. I really appreciate it.AB: Again, wonderful to be here, Corey.Corey: Anand Babu Periasamy, CEO and co-founder of MinIO. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with what starts out as an angry comment but eventually turns into you, in your position on the S3 product team, writing a thank you note to MinIO for helping validate your market.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
You might call building and operating Apache Kafka® as a cloud-native data service synonymous with a serverless experience. Prachetaa Raghavan (Staff Software Developer I, Confluent) spends his days focused on this very thing. In this podcast, he shares his learnings from implementing a serverless architecture on Confluent Cloud using Kubernetes Operator. Serverless is a cloud execution model that abstracts away server management, letting you run code on a pay-per-use basis without infrastructure concerns. Confluent Cloud's major design goal was to create a serverless Kafka solution, including handling its distributed state, its performance requirements, and seamlessly operating and scaling the Kafka brokers and Zookeeper. The serverless offering is built on top of an event-driven microservices architecture that allows you to deploy services independently with your own release cadence and maintained at the team level.There are 4 subjects that help create the serverless event streaming experience with Kafka:Confluent Cloud control plane: This Kafka-based control plane provisions resources required to run the application. It automatically scales resources for services, such as managed Kafka, managed ksqlDB, and managed connectors. The control plane and data plane are decoupled—if a single data plane has issues, it doesn't affect the control plane or other data planes. Kubernetes Operator: The operator is an application-specific controller that extends the functionality of the Kubernetes API to create, configure, and manage instances of complex applications on behalf of Kubernetes users. The operator looks at Kafka metrics before upgrading a broker at a time. It also updates the status on cluster rebalancing and on shrink to rebalance data onto the remaining brokers. Self-Balancing Clusters: Cluster balance is measured on several dimensions, including replica counts, leader counts, disk usage, and network usage. In addition to storage rebalancing, Self-Balancing Clusters are essential to making sure that the amount of available disk and network capability is satisfied during any balancing decisions. Infinite Storage: Enabled by Tiered Storage, Infinite Storage rebalances data fast and efficiently—the most recently written data is stored directly on Kafka brokers, while older segments are moved off into a separate storage tier. This has the added bonus of reducing the shuffling of data due to regular broker operations, like partition rebalancing. EPISODE LINKSMaking Apache Kafka Serverless: Lessons From Confluent CloudCloud-Native Apache KafkaJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Intro to Event-Driven Microservices with ConfluentUse PODCAST100 to get an additional $100 of free Confluent Cloud usage (details)Watch the video version of this podcast
Unedited live recording on YouTube (Ep 146) Datree Kubeconform pre-commit https://github.com/yannh/kubeconform https://pre-commit.com/ Eyar's article about K8s schema validation Open an issue for questions on k8s schema kubectl --dry-run=client bug Datree's CLI tool to ensure K8s manifests and Helm charts follow best practices Check CRDs and schema with Datree ★ Support this podcast on Patreon ★
Abstract of the talk… A typical user's journey with Crossplane starts with provisioning infrastructure using the Kubernetes API, then evolves to composing infrastructure into higher level abstractions, and culminates with building a complete platform using packages. Crossplane packages are distributed as OCI images, meaning that a platform API can easily be reproduced in any cluster, and they can declare dependencies, which specify the lower level services that support the higher level abstractions. This functionality allows for companies to distribute their product in an infrastructure provider-agnostic manner, and for infrastructure admins to build internal platforms made up of both generic and organization-specific components. Bio… Daniel Mangum is a senior software engineer at Upbound where he is a maintainer of Crossplane, an open source CNCF project. He has held leadership positions in the Kubernetes community, and is an active participant in multiple other open source efforts. When not working in the Cloud Native space, Daniel spends his time writing, speaking, and building tooling for the RISC-V ISA. Key take-aways from the talk… This talk will be useful for folks building an internal infrastructure platform, as well as folks that build a product that depends on some form of infrastructure (databases, caches, blob storage, etc.). We will cover how to both build and consume packages, paving the way for advanced usage of Crossplane.
1. Threat matrix for Kubernetes The application of the ATT&CK methodology to Kubernetes is the subject matter that everyone using Kubernetes should know. 2. 5 Objectives for Establishing an API-First Security Strategy The five objectives are a good reminder that when using API (and we all are), think security first. 3. Izar Tarandach and Matt Coles-- Threat Modeling: A Practical Guide for Development TeamsThreat model all the things! 4. Deep dive in CORS: History, how it works, and best practices Put in the work to enable CORS for your web applications. 5. NSA: Top 5 vulnerabilities actively abused by Russian govt hackers Application security is more than just the application. We must build a strong foundation across all the other layers of our system.
About Jason McGeeJason McGee, IBM Fellow, is VP and CTO at IBM Cloud Platform. Jason is currently responsible for technical strategy and architecture for all of IBM’s Cloud Platform, across public, dedicated, and local delivery models. Previously Jason has served as CTO of Cloud Foundation Services, Chief Architect of PureApplication System, WebSphere Extended Deployment, WebSphere sMash, and WebSphere Application Server on distributed platforms. Twitter: @jrmcgee LinkedIn: https://www.linkedin.com/in/jrmcgee/ IBM Cloud Code Engine: Learn more during this live virtual event on April 14th (also available on-demand after April 14th) Read more: https://www.ibm.com/cloud/code-engine Get started today: https://cloud.ibm.com/docs/codeengine?topic=codeengine-getting-started Watch this episode on YouTube: https://youtu.be/yH_mgW2kGzUThis episode sponsored by IBM Cloud.Transcript:Jeremy: Hi, everyone. I'm Jeremy Daly and this is Serverless Chats. Today I'm joined by Jason McGee. Hey Jason, thanks for joining me.Jason: Thanks for having me.Jeremy: So you are an IBM fellow and the VP and CTO of the IBM Cloud platform. So I'd love it if you could tell our guests a little bit about yourself and what it is that you do at IBM.Jason: Sure. I spend my day at IBM worried about developers and platform services on our public cloud. So I'm responsible for both the technical strategy and the delivery of our Kubernetes and OpenShift platforms, our serverless environments, and kind of all the things that surround that space, logging, and monitoring and other developer tools that kind of make up the developer platform for IBM Cloud.Jeremy: And what about yourself? What's your background?Jason: Been a software, kind of middleware guy, my whole life. I used to be the chief architect for WebSphere app server. So I spent the last 20 plus years working on enterprise application platforms and helping companies be able to build mission-critical business systems.Jeremy: Awesome. So I had Michael Behrendt on the show not too long ago and it was great. We talked about a whole bunch of different things. IBM's point of view of serverless. We talked a little bit about the future of serverless and we talked about the IBM Cloud Code Engine, which I want to get into, but for the benefit of our listeners and just because I'm so fascinated by some of the things that IBM is doing now with serverless, it's just super interesting. So could you sort of give me your point of view or IBM's point of view on serverless and just sort of refresh the listener's memory sort of about how IBM is thinking about serverless and how they're probably thinking about it maybe differently than some of the other cloud providers?Jason: Yeah, sure. I mean, it's such a fascinating space and it's really changed a lot, I think, over the last five years or so from its kind of maybe beginnings in being very aligned with serverless functions and kind of event-driven computing and becoming a more general concept about how developers especially can consume cloud platforms. I think if you look at the IBM perspective on serverless, there's a couple layers to the problem that we think about. First is we've been pretty clear that we think Kubernetes and distributions of Kubernetes like OpenShift are kind of the key foundation compute environment for developers to use going forward. And we've done a ton of work in kind of building out our Kubernetes and OpenShift platforms and delivering them as a service on our public cloud. And that's an incredibly flexible platform that you can really build any kind of application. I think over the last five years, we've proven we can run anything on Kubernetes databases and AI and stateless apps and whatever you want.Jeremy: Right.Jason: So very, very flexible. However, sometimes flexible also means complicated and it means that there's lots to manage and there's lots of concepts to get your head around. And so we've been thinking a lot about, well, how do you actually consume a platform like Kubernetes more easily? How does the developer stay more focused on what they're really trying to do, which is like build application logic, solve problems? Now they don't really want to stand up coop clusters and configure security policies. They just want to write code and run code and they want to get the power of cloud to do that. Right? And so I think serverless has kind of morphed to be, for us, more about the experience that we can build on top of that container platform that's more oriented around how developers get work done and allows them to kind of more easily take advantage of the scale and power of public clouds without having to kind of take on the burden of a lot of that kind of work and management.And so the work that we've been doing is really aligned in that direction, that we've been working in projects like Knative, in the open source community to build simpler abstractions on top of Kubernetes. And we've been starting to deliver those in our cloud through things like Code Engine.Jeremy: Yeah. And I think that's interesting too because I always have, this is probably the wrong way to say it, but it's sort of a chip on my shoulder about Kubernetes because it just got so complicated. Right? It's just so many things that you have to do, so hard to manage. And as a serverless guy myself, I love just the simplicity of being able to write some code and just get it out there, have it auto scale, tie into all those events. So I think that a lot of cloud providers have sort of moved that way to say like, "Well, we're going to manage your Kubernetes cluster for you." Right? Which essentially is just, I think moving backwards, but also moving forwards at the same time, if that makes sense. But so in terms of the use cases that this opens up because now you're not necessarily limited to a sort of bespoke implementation of some serverless platform, you have a lot more capabilities. So what types of use cases does this open up?Jason: Yeah. I mean, I may have a couple of comments on that. I mean, so I think with Kubernetes, you have the complexity of managing the Kubernetes environment, but even if that's totally taken care of for you, and even if you're using a managed Kubernetes service like the things we offer on IBM Cloud, you still have that kind of resource burden of using Kubernetes. You have services and pods and replica sets and namespaces and all kinds of concepts that you have to kind of wrap your head around and know how to use in the right way. And so there's a value in like, "Can we abstract that? Can we move away from that?" And it's not like this idea hasn't been tried before. I mean, we've had paths platforms, like kind of Cloud Foundry style, Heroku, very opinionated paths environments in the past and they definitely simplify the user experience. However, they came with this negative, which is if you don't fit within the box of the opinion ...Jeremy: Right.Jason: ... then you can't do what you want to do. And the cost of going outside the box was super high. Maybe you had to completely switched platforms. You were completely blocked. You to switch to some other approach. And so part of what's informing us and as we think about this is how do you have more of a continuum? You have a simple model. It's aligned around what you're doing. Just run my source code, just run my container image. I want to run a batch job, but it's all running on one platform. They're running next to each other. You can drop down a layer into Kubernetes if you want to. If what you're trying to accomplish needs some of that flexibility, you should have access to it without having to kind of start over. And so that's kind of how we've approached the problem a little bit differently is bringing this all together into kind of one unified serverless environment on top of Kubernetes.And that lets us handle different use cases. That lets those handle kind of stateless, data processing and functions. That lets us handle simple web apps. That lets us handle very data-intensive, high-scale computation and data processing, async processing like batch all in one combined way.Jeremy: Right. Yeah. And I think it's interesting because there are artificial limitations may be put in place sometimes on serverless platforms. If you think about AWS Lambda, for example, you get 15 minutes of compute and they bumped things up. So now, and again, I've just sort of grew up in the AWS environment, but they have things like 10 gigs for a function or something like that. And so they've increased these things, but they are sort of artificial limits that I think, depending on the type of workload that you're doing, they can really get in your way, especially if, like you said, you're doing these data-intensive things. So from an IBM perspective, I mean that's sort of gone, right?Jason: Right. Exactly. That's a great, very concrete way to look at the problem. The approaches that have been taken in some of the other cloud environments is these different use cases like serverless functions, single containers, batch processing, they're different services. And every service has its own kind of limitations or rules about what you can and cannot do. How long your thing can execute, how big your code can be, how much data you can transfer. We've taken a different approach to say, "Let's eliminate all those limits and let's have one logical service, one environment that supports all those styles." We can still expose a simplified kind of consumption model for the developer like just give me your source code or just give me your image, but I can run it in a way that doesn't have those computational limits, and therefore I can do more. Right? I can run more kinds of workloads. I don't run up against some of those walls that kind of stopped me from getting my work done.Jeremy: Right. Right. Yeah. And I like that approach too because I'm a big fan of managed services. I think that if you have a service that does image recognition for you, that's great. And do you have a service that does queuing for you? That's great. But in some cases, you start stringing together so many different services and I feel like you lose a lot of that control. So I like that idea of just basically being able to say, "Look, I've got the compute. I can do whatever I need to do with it. It will scale to whatever I needed to scale to." And I think that's where this idea of IBM Cloud Code Engine comes in, which just became GA so I'd love it if you could tell the listeners exactly what that is.Jason: Yeah, absolutely. So, so Code Engine is the new service that we launched that makes some of these concepts I've been talking about real. It is a service that allows developers to deploy functions, containers, source code, batch jobs, into IBM Cloud. The entire environment behind that application is managed for you. So we handle you don't manage clusters, you don't provision infrastructure. You can scale all the way to zero. So you can literally only pay for what you're using. You can scale up to thousands of cores that are in parallel processing your application and we manage that entire runtime environment for you. So you can think of it as a multi-tenant shared Kubernetes-based runtime environment that you can run your workloads on that presents to you the personality that you need for different workloads. And because it's all in one service, if you have an application that's like a mix of some single containers and batch jobs, they can actually talk to each other, they can talk to each other over a private network connection. They can work together instead of being kind of siloed in these completely different environments.Jeremy: Right? Yeah. And so from the developer, I guess, perspective, you had mentioned that you can deploy just code or you could deploy a container if you want to. So what does that developer experience look like? So is this something where I could just say, "Look, I don't need to have a whole ops team now managing this for me. If I just want to write code, deploy it into these things, I'm sure there's some things I need to know," but for the most part, what does that developer experience look like?Jason: Yeah. So you absolutely could do it without a whole ops team. The experience right now, there's like maybe kind of three basic entry points. You can give me source code and we will take care of compiling that source code, combining with a runtime, executing it for you, giving it a web end point, scaling it. You can give me some hints about kind of how much resource you think you need and things like that and we can scale that up and down and manage it for you, including all the way down to zero. That's nice if you're coming from maybe a historical paths background or it's just like, "Here's my code, run it for me." You can have that experience with Code Engine. You could also start with a container image. So lots of developers now, because of things like Kubernetes and Docker, are very familiar and comfortable with packaging up their application as a container image, but you don't want to then deal with creating a cluster and dealing with Kubes.So you can just say like, "Here's my image, run it for me." And one of the advantages we have with Code Engine is we can really do that with any container image. You don't have to have a container image that follows some particular framework that's built in a very special way. We can take any container image and you can just literally point me at the image and say, "Run this for me," and Code Engine will execute it and scale it and manage it for you. Or you can start with a batch job interface. So like a more of an async kind of parallel job submission model. So maybe I'm doing Monte Carlo simulations or data processing and I want to parallelize that across a whole bunch of machines and cores, Code Engine gives you an interface for that. So as a developer, you kind of start with one of those three entry points and let Code Engine take care of how to run that and scale it and keep it highly available and things like that.Jeremy: Right. So I love the idea of the batch jobs. I want to talk about that a little bit more, but let's go back to some of the use cases here. So what if I was building just like a REST API, that seems to be a very popular, serverless use case, what would I do for that? Do I need to have some sort of an API type gateway type thing in front of it? Or how does that work?Jason: No, Code Engine provides all that for you. So you would literally either just take your implementation and package it in a container or point us at your source code directory. If you have source code, we use things like Paketo Buildpacks to build a runtime around that source code. And so you can use different languages. So you can either point us, with our CLI tool, you point us at the source code directory and we'll build it and package it in a runtime and run it for you. Or you point us out a container image that you've uploaded to our container registry or to your container registry of choice and then Code Engine will execute that for you. It will give you that web end point, right? So it'll give you a HTTP end point that you can use to access that service. And it will watch the demand on that system and scale it up and down as needed. And by default, we'll just scale it to zero. So it'll just be kind of registered in the system and it'll take care of scaling it up as needed to handle the demand on the app.Jeremy: All right. Cool. And then what about these batch jobs? So I talked a little bit about this with Michael and this idea of being able to run massively parallel execution. So how does that all work?Jason: Yeah. So similar, obviously with batch, there's a little bit more kind of metadata that you have to provide to describe the job and what you want to execute and how things relate to each other. So there's some input data you provide along with the implementation of the batch job, which itself could just be like a container image and you submit that job. So the CLI interface is a little bit different. You're not standing up a long-running REST end point, you're submitting a job to Code Engine for execution, and it will go take that job and execute it and parallelize it for you. You can also use Frameworks on top. One of the things we've been doing a lot of work on, maybe Michael talked about it a little bit when he was here, is some work we're doing around Ray. Ray is a really interesting new project that lets you do kind of distributed computing, especially around data workloads in a really easy way.And so you can actually stand up Ray on top of Code Engine and so Ray acts as kind of the application interface for the developer to be able to easily parallelize their code, particularly Python code, and then Code Engine acts as the runtime below it. And you can take a simple function in Python, mark it as Ray remote and it'll now execute on the cloud and distribute itself across a thousand cores. And you get your answer back 20 times faster than you would have running it locally. And so you can have those kinds of async environments as well.Jeremy: Awesome. And so what about some customers? So do you have customers that are having success with this now?Jason: Yeah, we have a number. I mean, we have the European Microbiology Laboratory, which is using it to do science processing and provide access for scientists to the large-scale compute environments of the cloud. We have some airlines that are leveraging this. The airline scenarios, I think, the scenario is actually kind of interesting because it shows the power of combining REST end points, more interactive workloads with batch workloads. In their case, they're exploring using it to do dynamic pricing. So if you think about how you do dynamic pricing, there's kind of two dimensions. It's like, there's a very interactive, somebody is getting a price on a ticket or a route, and you want to be able to present them with dynamic price information as part of that web interaction. But then there's like a data processing angle.You're looking at all kinds of data coming from your backend systems from route data, from the fleet and historical information. And you're trying to decide what the right price table is for that route. And so you're doing batch processing in the background, and then you're doing this interactive processing. You can implement both halves on serverless with Code Engine and they scale as needed. If you're getting a lot of traffic on the web front end, it scales up as needed without you having to do anything. So they can kind of combine both halves in one environment.Jeremy: Right. Right. And so in terms of, I think we kind of talked about this a little bit, but when you see all these different services, right, and no matter what it is, whether it's Google's Kubernetes engine that they run or it's EKS on AWS or something like that, I think a lot of people look at these and like, "Oh, it's just another managed Kubernetes cluster." Right? So what are the major differences? I know we talked about it a little bit, but maybe you could just be a little bit more succinct and sort of talk about why is it so different than other sort of previous generations of tools or some of the other competing products out there.Jason: Yeah. So if you look kind of behind the curtain on Code Engine, you'd see a couple of things. One is there is Kubernetes there, there is a Kubernetes environment there. The differences that Kubernetes environment is completely managed by the Code Engine service. So we're not, if you look at, in IBM Cloud, we have the IBM Cloud Kubernetes service and our Red Hat OpenShift service. So in those services, we're managing a cluster on your behalf, but we give you the cluster. It's like, "Here's your Kube cluster. We'll manage its life cycle, but you have direct access to it." With Code Engine, we have Kube cluster there, we completely manage it in all respects. You have no kind of direct access to it. That allows us to manage scale and capacity. We run that in a multi-tenant way. I mean, we have security and isolation between tenants, but logically you can think of it as like a big Kube cluster that lots of users are sharing, which is how the pay as you go model ultimately works because we're keeping track of what you're actually running and just charging you for that.So one part of it is fully managing that runtime environment. We've layered on top of that things like Knative so that we have that developer abstraction like a simpler way to define services, to do the source code and image stuff that I talked about. That's coming through largely through things like Knative, which again, we're completely running for you, but it gives you some of that simple interface now that we talked about, and we're doing that in an open-source way with the community. So it's not like proprietary to IBM Cloud. And then on top of that, we built kind of the batch processing system. So batch scheduling and some of these unique interfaces, the command line interface and the user experience to get into that environment for the different workflows that I talked about. And one of the cool things is, because we built it on top of that Kubernetes layer, we can also expose the Kubernetes API if we want.So like the Ray example I gave you, Ray doesn't really know anything about Code Engine, but Ray knows how to deploy and leverage a Kube cluster. So we're able to actually hand Ray the Kubernetes API server end point inside of Code Engine for your instance. And that framework can use Kubernetes to stand itself up. And then you can use the kind of simple abstractions on top, and that's still all in Code Engine. It's still pay as you go and it still scales to zero. And so that's what I meant by this you can kind of blend the lines and drop down to or the framework can drop down to something like Kubernetes as needed to give you that flexibility.Jeremy: Yeah, that's awesome. So you mentioned you have a fully managed Kubernetes service and then you also have a bunch of other serverless services that run within the IBM Cloud. So OpenWhisk or, I guess, IBM Cloud functions now. And then also, I mean, you mentioned Cloud Foundry, which is sort of a pass, but it also sort of an easy-to-use serverless environment in a sense. Right? And so I guess, is this like an evolution? Is this where you suggest people go?Jason: Yeah. Yeah. So I think the simplest way to think about it is yes, Code Engine is the evolution of those ideas. It doesn't necessarily have a direct technical lineage, always, between those projects, but the problem that functions with IBM Cloud functions that Whisk was trying to solve and the problem that Cloud Foundry was trying to solve with source code, start from source code paths, are both represented in what we're doing in Code Engine. So Code Engine will be the kind of natural evolution path for those workloads and for the problems that those users are using those platforms for. The Cloud Foundry one, I think, is super interesting, in the sense that with the rise of Kubernetes has clearly pivoted many people who were doing Cloud Foundry into doing Kubernetes.Jeremy: Yeah.Jason: And people are using Kubernetes as their foundation and the Cloud Foundry project, which we're deeply involved in, has done a lot of work to kind of realign Cloud Foundry with Kubernetes in a better way. But what never went away, what people always still saw value in with Cloud Foundry was the simple push my source code developer experience. Right? And so that still carries forward. And with Code Engine, we're taking that same experience that we had in Cloud Foundry, and we're bringing it into this new service and bringing it onto Kubernetes seat, so the developer still gets that similar experience, but without the boundaries that we talked about. The challenge with Cloud Foundry was always like, oh, as soon as you want to do stateful things, or you want to do async jobs, Cloud Foundry didn't solve that problem. Go use a Kube cluster or go use some completely different environment. And so it's kind of the same experience with the boundaries removed and that's where we would see people go.Jeremy: Right. So if I'm in one of those services, now, if I've got things written in Cloud Functions or in Cloud Foundry, and I've hit some of those limits, or I just want to take advantage of some of the cooler things that Code Engine does, is there a simple migration path for those?Jason: Yeah. In general, yes. For Cloud Foundry, for sure. It's pretty straightforward to take the same source code directory that you have and just push it to Code Engine instead. Right? So I think the path for a Cloud Foundry, I mean, there's edge cases with everything obviously, but the base of workflow is the same. You can use the same source input directories. We mapped to Paketo Buildpacks, which Cloud Foundry, a lot of that stuff came out of Cloud Foundry. And so that has a really clean path. For Cloud Functions. There's a little bit of a timing thing in general, yeah, you can take your same functions. You can run them on Code Engine. OpenWhisk has some advantages still that we haven't quite gotten built into Code Engine yet. It's got faster startup times, for example, right? The runtime model behind Code Engine, we're still starting a container, like a full container.In OpenWhisk we had done a bunch of work on warm start of containers and container pooling so we can get like small number of milliseconds startup times on those functions. And some of that hasn't worked its way into Code Engine yet. So there are still some cases with Cloud Functions where it has some capability that doesn't quite exist in Code Engine yet, but over time that will get filled in and there'll be a simple path there to move all those workloads over to Code Engine as well.Jeremy: Right. So with Code Engine, because you mentioned this idea of sort of like the cold starts. So does Code Engine keep containers warm for a certain amount of time or is it always a cold start?Jason: It is, in general, a cold start. It can keep some of them, like in the scale up scale down cycle, it may keep them around for a while, so it doesn't be overly aggressive about scaling them down and bringing them right back. But it's not doing some of the warm start tricks yet that OpenWhisk was doing where we have a pool of primed container instances, and then we're injecting code into them and running them. That's work-in-progress. There's work to do both in Knative to improve that stack and then stuff to do in Code Engine. There's a balancing act there too ...Jeremy: Yeah, definitely.Jason: ... on things like network isolation and getting on customer VPC networks and other things which are harder to do in that warm start model.Jeremy: Yeah, definitely. All right. So if somebody wanted to get started with Code Engine, what's the best way for them to do that, just sign up and start writing some code or how do they do that?Jason: Yeah, kind of. I mean, obviously, we've been talking a lot about how developers use these things. And so I always think the best way to get started is either to build something on it or to try out some specific source code project. We have a lot of things that we've done to try to make that easy. So there's a Code Engine landing page on IBM Cloud. It has some great examples to guide you through those three starting points I talked about, start from source code, start from image and do batch. We have some really nice tutorials, like specific text analysis tutorials, for example, that'll show you how to build applications on Code Engine. And we actually have a pretty cool Git repo, which will take you through tons of samples of how to use Code Engine to solve all kinds of problems.So there's a lot of really good code assets out there that a developer could go to and actually try something real on Code Engine and the getting started experience is super easy. You've got IBM Cloud, you log in and you go to Code Engine, you create a project, you push an image and then a couple of minutes you'll have something up and running that you can play with.Jeremy: Amazing. All right. So I love watching the evolution of things and again, just this different way that, that IBM is thinking about serverless and, again, trying to make it easier. Because I always look back and I think of Lambda when it first came out, I was like, "Oh, it's so easy. You just put some code there and it's just done for you." And then we got more and more complex and more and more complex. And not that we didn't need to, I mean, some of this complexity is absolutely necessary, but I'm just curious, seeing the evolution and where things have gone, I talked to a bunch of people earlier about, Roger Graba, for example, who was one of the first people involved with the IBM or the OpenWhisk project, I guess it was Apache OpenWhisk or it became Apache OpenWhisk, whatever what it was, seeing that evolution and seeing the changes that these different cloud providers have gone through, seeing the changes that IBM has gone through and where you sort of are now with Cloud Code Engine.I'd love to get your perspective here on where you think this is going, not just maybe what the future is for IBM, but what you think the future of serverless is and just cloud computing maybe in general. I know that's a lot of question.Jason: I'll give you a long answer.Jeremy: Perfect. Jason: So that brings to mind two things. First, let me talk about the complexity thing for a second. Managing complexity is always hard. You are so right. That many things start out with a value prop of like, this is easy. And then as people use, the more you add more, and then three years later, we're like, "We need a new thing that's easy because that other thing is too hard now." And there's no magic pill for that. That's always a hard problem to manage. However, one of the things I like about the approach that we're trying to take with Code Engine is because we've layered it on Kubernetes, It gives us a way to kind of decide where we want that complexity to show up. When we had a Cloud Functions OpenWhisk stack and we had a Cloud Foundry stack and you had a Kubernetes stack, you had to try to solve all problems within each stack.So each stack was getting more complex because you were trying to like, "Oh, I need storage. And I need like private networking. And I need all these things." With Code Engine, I think we have an opportunity to say, once you cross some line, we're just going to ask you to drop down a layer and go use it directly in Kubernetes, right? You can push some of the complexity down and that allows us to hold a harder line on complexity in the developer layer on top. So it's the balancing act we're trying to play is because we built it on a common platform, we don't have to solve all problems in Code Engine directly.Jeremy: Right.Jason: So that's kind of my viewpoint on the complexity problem. On the evolution, it's really interesting. So one of the other things that my team's working on and launched recently is this thing called IBM Cloud Satellite, which is about distributing cloud outside of cloud data centers so you can kind of consume cloud services anywhere you want. So cloud computing in general, and this is not just an IBM thing, in the industry cloud computing is diversifying to be kind of omnipresent. You can consume cloud on-prem, at the edge, in our cloud data centers, wherever you want. There's a programming model dimension to that problem, too. As you specially go to the edge, you kind of want some of these simple to consume, easy to deploy, scale to zero, resource-efficient, you need some kind of model like that because at the edge, especially, you don't have 2000 cores worth of compute to go deal with.You have one box in a retail store, or you have two servers in the back of the distribution center. And so I think things like Code Engine layered on top of distributed cloud and in our case, things like Satellite, is actually a really powerful combination. I think we're going to see serverless become the dominant application development and deployment model, especially for these edge use cases, because it combines ease of deployment and management with efficiency and scale to zero footprint, which are all really attractive when you get outside of a mega data center like you have in cloud.Jeremy: Right. Right. So I love this idea, too, about sort of expose the complexity when the complexity needs to be exposed. I love this idea of sort of creating same defaults, right? If you could default Kubernetes to do all the optimal things that you would need it to do for use case X, if you could just do that for me and then if I say, "Oh, I want to tweak this one thing," then be able to kind of go down to that level. But I love this idea of you mentioned about edge too because that's one of those things that I think, from a programming model, as you said, how do you write code that's sort of, I guess, environment-aware? How does it know what's running at the edge versus running in a data center versus running maybe in a hybrid cloud and partially in your own private cloud or your own private data center? That model, just wrapping your head around it from a developer standpoint, I think is incredibly complex right there.Jason: Yeah. It is. And sometimes it's like, how do they know? And then sometimes it's like, how do I just operate at a high enough level of abstraction that like the differences between those environments can get handled below me? If I'm consuming Kubernetes clusters directly, the shape of that Kubernetes cluster in like a retail store or a telco data center in Atlanta somewhere or in the cloud are going to all be different because you have a different amount of capacity. You have a different networking arm. So you're going to have to deal with the differences. If I'm giving you a container image and saying, "Run this," the developer doesn't have to deal with those differences. The provider might have to deal with those differences but the developer doesn't have to deal with those differences. So that's where I think things like serverless and approaches like Code Engine really come to be much more valuable because you're just dealing at this higher level of abstraction and then Satellite and Code Engine and other services can kind of magically deal with the complexity for you.Jeremy: Yeah. And so I know we talked a lot about Kubernetes and what's running underneath a lot of these services. Is that something you see, though, as being that sort of common format across all these different services, or do you think that something will evolve beyond Kubernetes to become a standard?Jason: Right now, I really think that Kubernetes will become the base platform. What Kubernetes is will probably keep evolving. And I'm not saying it's Kubernetes forever, but I don't think we should underestimate the power of the kind of industry-wide alignment that exists around containerization and Kubernetes as the next infrastructure platform, if you will, because that's kind of really what it is. And I told you at the beginning, I used to build webs for apps servers. So I was like very involved in the whole Java app server era, the late 90s and early 2000s. And at that time, the industry kind of aligned around two platforms, Java and .net, as the two dominant, at least enterprise, application platforms. We have everyone aligned on Kube. Literally, there's nobody in the industry who's not like, "Kubernetes is the platform." So I think it will be the abstraction for infrastructure in all these environments. The question will be, how do you consume it? Who manages it? How's it delivered? How does it optimize itself? And then at what level do you consume?And I don't think Code Engine is the end of it at all. I think there's lots of room for improving the consumption experience on top of Kubernetes for these developer use cases.Jeremy: Yeah. Yeah. And that's actually was going to be my next question, sort of where do you see, what's the next evolution of Code Engine, right? So is that going to be kind of driving into specific use cases more and trying to solve those or becoming more flexible? How do you see the developers, I don't know, in five years, maybe this probably a hard question, but in five years, how are we going to be writing cloud applications?Jason: Yeah. It's a great and super hard question, but I think projects like Ray, I think, are an interesting forward look into where this might go. One of the things that I've always felt like, if I look at the whole history of paths in particular over the last five, six, seven years, paths has always been about simplifying the experience for the developers, but fundamentally, most paths environments don't change anything about how you write the code. They change how you package the code, how you deploy the code, how the code is executed, and how the dependencies of the code are satisfied. But the actual code you write probably wasn't any different. Right? And that's where I think there's the next step is like, how do we actually get into the languages, into the code structure itself to be able to take advantage of cloud capacity, to be able to take advantage of scale and there's lots of projects that have taken attempts at that.Ray, as an example, I think is a particularly interesting one, because there's some good examples where you can take a Python function, you literally add like one annotation to it in the language, and now it becomes remotely executable and horizontally scalable for you.Jeremy: Right.Jason: It's that kind of stuff that I think three or four years from now, there'll be a lot more of, where we're actually changing how code is written because that code can assume there's some containerized, scalable fabric out there somewhere that it can go execute on top of.Jeremy: Right. Yeah. And I think that that pendulum swing for developers, especially, well, developers in the cloud, who's they used to be writing a bunch of code, whether it was JavaScript or Python or Java, whatever it was and then all of a sudden now they have to switch context and be like, "All right, now I have to write a YAML file in order to configure my cloud resources," and that sort of back and forth. So yeah, that marrying of basically saying like a programming language for the cloud is a really interesting concept.Jason: And I think the distributed cloud notion, funnily enough, is a big enabler of that. Because, I don't know, the other tension I see right now is like, let's say you wanted to use Lambda or you want to use serverless functions. That only works in your cloud environment, but you're also running something at the edge or you're running something in your data center, so you're forced to kind of use different approaches, which tends to force you to kind of some common denominator models.Jeremy: Right. Right.Jason: And so you're kind of holding back from really adopting some of these newer models because of the diversity. Well, if cloud goes everywhere and those services go everywhere, then now I can just say, "Well, I'll use the serverless model everywhere. And so I can really deeply adopt it." So I think the distributed cloud thing will open up the opportunity to embed these approaches more deeply in kind of day-to-day development activities.Jeremy: Yeah. No, I love that. I'm all for that approach because I think this split-brain sort of approach to it is getting very complex and it's not super easy. So is there anything else that you'd like to let the listeners know about IBM Cloud Code Engine?Jason: No. I mean, I think we touched on a lot of the motivation behind it and the kind of core capabilities. I would just encourage you to go check it out, go check out the space, go give it a try and love to hear people's feedback as they do that.Jeremy: Awesome. Well, first of all, I got to make sure I thank IBM Cloud for sponsoring this episode because just the team over there and everything that all of you are working on is amazing stuff and I appreciate the support. We appreciate the support in the community for what you're doing. So if people want to find out more about you or more about Cloud Code Engine, how do they do that?Jason: Yeah. And you can find me on Twitter, JRMcGee, or LinkedIn. For me personally, I love to talk to people. For Code Engine, I think the best place to start is the product page, which is ibm.com/cloud/code-engine. And from there, you can get to all of the code examples I talked about.Jeremy: Awesome. All right. Well, I will put all that stuff in the show notes. Thanks again, Jason.Jason: Yeah. Great. Thanks, Jeremy.
#96: With the advent of software like Crossplane, we are beginning to see the Kubernetes API coming more to the forefront. In today's episode, we attempt to tackle why it appears that events are still not completely understood. Crossplane: https://crossplane.io/ Transcript: https://www.devopsparadox.com/episodes/the-kubernetes-api-is-becoming-omnipresent-96/#transcript YouTube channel: https://youtube.com/devopsparadox/ Books and Courses: Catalog, Patterns, And Blueprints https://www.devopstoolkitseries.com/posts/catalog/ Kubernetes Chaos Engineering With Chaos Toolkit And Istio https://www.devopstoolkitseries.com/posts/chaos/ Canary Deployments To Kubernetes Using Istio and Friends https://www.devopstoolkitseries.com/posts/canary/ Review the podcast on Apple Podcasts: https://www.devopsparadox.com/review-podcast/ Slack: https://www.devopsparadox.com/slack/ Connect with us at: https://www.devopsparadox.com/contact/
The final — and raddest — Kubernetes release of 2020 is 1.20. This week, Craig and Adam talk to its release team lead, Jeremy Rickard from VMware. Jeremy talks about migrating to newer Kubernetes versions, sooner or later; what was added, what was deprecated, and what that really means; and what happens when you Google your own nane. Do you have something cool to share? Some questions? Let us know: web: kubernetespodcast.com mail: kubernetespodcast@google.com twitter: @kubernetespod Chatter of the week Ready Player Two News of the week Kubernetes 1.20: Release Don’t panic about Docker Dockershim deprecation FAQ Mirantis will support the Dockershim etcd graduates in the CNCF Episode 95, with Xiang Li CNCF launchese Cloud Native Security Whitepaper Istio 1.8 Kuma 1.0 Linkerd doesn’t use Envoy AWS re:Invent: ECS Anywhere EKS Distro and EKS Anywhere EKS add-ons, console and spot instance support Lambda containers AWS Proton ECR Public Registry Anthos on bare metal is now GA IBM acquires Instana Opstrace public launch Weaveworks Kubernetes Platform (WKP) 2.4 Spectro Cloud anywhere Improving the Kubernetes API docs by Phillipe Martin Participate in the Chinese Cloud Native survey How David Anderson would reboot Kubernetes Episode 32, with David Anderson Links from the interview Episode 61, with Jeremy Rickard and Ralph Squillace Porter Jeremy’s beard Release team for 1.20 1.12, 1.17, 1.18 and 1.19 Enhancements sub-project The Raddest Release Enhancements sheet #1769: NUMA memory manager Up or out: the deprecation clock starts for Alpha/Beta features #1985: Dockershim deprecation KEP Kat Cosgrove’s Twitter thread Stephen Augustus’s issue in kubernetes/community Sitting this release out: Sidecar containers Not in 1.20: Distroless images 1.21 lead: Nabarun Pal Kubernetes on an F-16 jet Other Rickards: Matt Rickard (our guest on episode 6) Jeremy Rickard the mathematician Jeremy Rickard on Twitter
In this TCP Talks episode, Justin Brodley and Jonathan Baker chat with Liz Rice, VP of open source engineering for Aqua Security, which provides tools to secure cloud-native deployments. Liz describes Aqua's evolution over the years: From a provider of container security to its acquisition of CloudSploit and its development of open-source security solutions. Most customers are using cloud native software, and Aqua wants to secure those workloads and engage that community. "As a business, we have to be where the discussions are. Having open-source tools that are genuinely useful gives us a good way to participate in that community," Liz explains. In addition to her role at Aqua Security, she is the chair on the CloudNative Computing Foundation's (CNCF) Technical Oversight Committee. During the conversation, Liz gives an overview of how they handle projects. Key Takeaways
Running Kubernetes on conventional operating systems is time-consuming and labor-intensive. Today’s guests Andrew Rynhard and Timothy Gerla have engineered a product that attempts to provide a solution to this problem. They call it Talos, and it is a modern OS designed specifically to host Kubernetes clusters, managed by a flexible and powerful API. Talos is completely stripped down to the bare components required to run Kubernetes and get information from the system. It stays updated by keeping time with Kubernetes, but also provides the user with a large degree of control in the event that they might need to update a flag. In this episode, Andrew and Timothy get into some of the mechanics and thought processes behind Talos, telling us why they went with a read-only API, how they handle security concerns on the OS, and how a system like theirs might get adopted by the Kubernetes community and layperson more broadly. They get into the advantages provided by a stripped-down solution for systematizing the use of Kubernetes across communities and running new components through clusters rather than on the OS itself. In a space where most participants are largely operating in the dark, it is a pleasure to see innovations like this display such lasting power so make sure you check out this episode. Follow us: https://twitter.com/thepodlets Website: https://thepodlets.io Feeback: info@thepodlets.io https://github.com/vmware-tanzu/thepodlets/issues Guests: Andrew Rynhard https://twitter.com/andrewrynhard Tim Gerla https://twitter.com/tybstar Hosts: Carlisia Campos Bryan Liles Olive Power Key Points From This Episode: What a Kubernetes OS is: a stripped-down OS that integrates with Kubernetes. The difficulties of managing and getting Kubernetes installed on regular OSs. Why a Kubernetes OS? Less attack surface and OS compatibility issues. What Talos does: quickly makes nodes part of a Kubernetes cluster by being stripped down. How replacing SSH with an API alleviates some users’ security concerns. A command-line interface called OSCTL that allows users to explore the API. What does ‘stripped-down’ mean? Talos runs kubelets and gets information from the OS. The ability to run new components through clusters rather than from the OS. How the Kubernetes OS evolves with Kubernetes but gets separately controlled too. Better integrating into Kubernetes by abstracting OS features into Kubernetes as operators. Security precautions: kernel hardening, SSH and Bash removal, and a read-only OS. Usability of Talos for the average Joe, and its consistency across base platforms. Possibilities for interacting with deeper levels of an OS through an API managed OS. How Talos might become appealing to laypeople: decreasing costs for porting to it. Value gained from switching to a purpose-built OS as something which could outweigh costs. Tendencies to hang onto tried and trusted tech even if its predecessors are superior. Quotes: “To me, it’s just about abstracting away the operating system and not even having to worry about it anymore, and looking at Kubernetes and the entire cluster as an operating system.” — Andrew Rynhard [0:05:00] “As rapid as the technology is changing, you need an operating system that is going to evolve with it or at least the operations intelligence to evolve with Kubernetes right alongside it.” — Andrew Rynhard [0:13:08] “The challenge I think for us and for anybody changing the way that operating systems work is is it better enough than what I have today or what I had before?” — @tybstar [0:26:50] “There’s a lot of companies out there who got us at this point in tech that don’t exist anymore, but if they didn’t do what they did, we would not be here right now.” — @bryanl [0:33:41] Links Mentioned in Today’s Episode: Talos — https://www.talos-systems.com/ Timothy Gerla — http://www.gerla.net/ Timothy Gerla on Twitter — https://twitter.com/tybstar Andrew Ryndhard on LinkedIn —https://www.linkedin.com/in/andrewrynhard/ Andrew Ryndhard on GitHub — https://github.com/andrewrynhard Jed Salazar on LinkedIn — https://www.linkedin.com/in/jedsalazar/ Bryan Liles on LinkedIn — https://www.linkedin.com/in/bryanliles/ Carlisia Campos on LinkedIn — https://www.linkedin.com/in/carlisia/ Red Hat — https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux Arch — https://www.archlinux.org/Debian — https://www.debian.org/ Linux — https://www.linux.org/ Bell Labs — http://www.bell-labs.com/ AT&T — https://www.att.com/ Transcript: EPISODE 20 [INTRODUCTION] [0:00:08.7] ANNOUNCER: Welcome to The Podlets Podcast, a weekly show that explores Cloud Native one buzzword at a time. Each week, experts in the field will discuss and contrast distributed systems concepts, practices, tradeoffs and lessons learned to help you on your Cloud Native journey. This space moves fast and we shouldn’t reinvent the wheel. If you’re an engineer, operator or technically minded decision maker, this podcast is for you. [INTERVIEW] [00:00:41] CC: Hi, everybody. Welcome back to the Podlets. Today we have a special episode, and on the show, we have a special guest, Andrew Ryndhard. Say hi, Andrew. [00:00:53] AR: Hello, how are you? [00:00:55] CC: We also have Timothy Gerla. Say hi, Tim. [00:00:58] TG: Hi. Thanks for having me. [00:01:00] CC: Yeah. Andrew and Timothy are from Talos. Andrew dropped an issue on our GitHub repo and here we are. It was a great suggestion. What we’re going to talk about today is what they are working on, which is a Kubernetes operating system. We have tons of questions for them for sure. We also have a special participant on the episode today as a co-host, Jed Salazar. Hi, Jed. [00:01:28] JS: Hey, everyone. Jed Salazar here from the CRE team here at VMware. [00:01:31] CC: And Bryan Liles. [00:01:32] BL: Hi. [00:01:33] CC: Hi. And me, Carlisia Campos. Who’d like to get the party started and kick this off? [00:01:41] BL: Oh, I’m here. Let’s throw the gauntlet down. We’re talking about Kubernetes operating systems today. I have an operating system, a Mac, or I have Linux. I can run Kubernetes. What is a Kubernetes operating system and why should I even be thinking about this? [00:01:58] AR: Sure. I’d like to think about Kubernetes operating system as an operating system that has stripped down the absolute bare minimum to run Kubernetes. Everything that is required to run the kubelet, and essentially that’s it, at least in my opinion. It should be super minimal to start with. Second of all, I also think that it should integrate with Kubernetes as well. The combination of just being able to strip down Linux as we know as small as possible and then actually integrating with Kubernetes itself using APIs to figure out things about itself, whatever. I think that that, in my opinion, is what I would call a Kubernetes OS. [00:02:42] BL: Interesting. Okay. Now that we know a little bit about Kubernetes operating systems, and like I said, I’m starting in early today as the devil’s advocate. Now, like I said before, I have a Mac and I have Linux and I have Windows on my desktop. There’re been lots of efforts from lots of people trying to get Kubernetes running up on Ubuntu or Fedora, and it’s cool that you’re trying to slim this down, but really why would I look at a Kubernetes operating system over my Linux that I’m familiar with? I like Ubuntu with Debian. [00:03:17] AR: Sure. That’s a great question. It’s one we get a lot. I like to think that you actually just get less operational overhead when you actually have a Kubernetes-specific operating system. I think that Kubernetes itself is a job, managing it, getting it installed, unfortunately. It’s getting better, but it’s still a job at the end of the day. Having to manage Kubernetes and the operating system, everything that you need to pass compliance on the operating system, get all the packages installed, these are all things that we kind of know that Kubernetes needs already and yet we’re still having to go in and app install whatever we might need to get Kubernetes up and running. The idea with a Kubernetes operating system in my mind is that we should stop worrying about the individual node, the underlying operating system and start looking at Kubernetes as a whole as a giant machine and we just add machines, nodes to this giant machine that give us extra resources. The less that we have to care about the machine or the underlying operating system, the better, in my mind. We get to focus on Kubernetes. Not only that, but because it’s minimal, you get a smaller attack surface. There’re just not things there that you would otherwise have to worry about. I’ve done Kubernetes for three years now and having to go in and worry about updating packages that are just completely unrelated, it’s something that I think we shouldn’t have to do anymore. If you’re dedicated to running your apps and your stack in Kubernetes, then why are we going in and managing the nodes on an individual basis. For that matter, managing things that don’t really have any relevance for running Kubernetes. To me, it’s just about abstracting away the operating system and not even having to worry about it anymore and looking at Kubernetes and the entire whole cluster as an operating system. We can’t really get there if we’re having to worry about the two jobs of managing both at the same time. [00:05:17] JS: Andrew, can I ask a follow up question? [00:05:18] AR: Sure. [00:05:20] JS: I fully agree with all of those statements. I think a general purpose operating system might not be the best job for a specific role, like being a Kubernetes node. As you mentioned, you have to deal with kind of all the various packages that might be beneficial to you if you’re running it for some general purpose. It’s really supposed to be running a workload as a Kubernetes node so you can kind of scope that down. I’m just wondering when you kind of make this pitch or kind of let these folks know, how do you get folks to kind of relinquish their desire to have full control over their operating system from being able to install their own security management processes on it or being a little bit shy about not being able to SSH or kind of use their common patterns of operating system management? [00:06:09] AR: Oh, that’s a great question. I think the biggest thing that I always answer back is – I can take this in two parts. Let me first of all talk about what – People, they do want to run things on the host. My answer always back is can you run it in Kubernetes? Kubernetes is sort of your package manager, if you will. They sit back usually and they’re like, “Hmm. Yeah, I probably could.” If you need to run something on every host, Kubernetes has something for that. It’s a daemon set. Run it on Kubernetes and call it a day. This isn’t something that’s going to work for absolutely everything I imagine. Nothing in the world is like that. But I think for the majority of the use cases out there and for the things that people want to run on the host, you could actually just run it in Kubernetes itself. As far as SSH and for those that don’t really know what we’ve done in Talos, in Talos we’ve actually stripped down just the kernel and a small Go lang binary that’s our – That, basically, its whole goal is to create a Kubernetes cluster or make a node part of a Kubernetes cluster as fast as possible, and that’s really it. We’ve gone so far as ripping out Bash and SSH and we’ve actually replaced that with an API. My answer always back to the SSH question is what is it that you really trying to get out of SSH? 9 times out of 10, it’s, “I want to get information about what’s wrong. I want to do troubleshooting.” If our answer back to them is, “Oh! We have an API for that,” you still at the end of the day – it’s really the information that you’re after. It’s not necessarily that you need SSH to do that. You need a way to get this information and not necessarily have to sit there and wait for a Prometheus metric, see it pulled it every minute. You want something right on the spot. You want to ask a question and you want to get an immediate answer. I feel like we can answer that with an API. That tends to satisfy the desire for wanting SSH most of the time. I mean, as you said, people are still going to want to hold on to it, but I think over time we’re going to have to educate people that this is a better way. It’s a read-only API that gives operations engineers a way to get that information that they would otherwise get by SSH-ing and asking via Unix utilities what you want to know. [00:08:27] CC: When you say an API, are you also giving them a command line tool or like in the case of Talos, or only an actual API? [00:08:37] TG: Yeah, we do provide a command line interface to the API. It’s called OSCTL and it basically wraps our API, and our intention is that that will be used for exploration of the system, automation through scripting languages, etc. Then as you get more sophisticated with your environment, you might begin to build your own tools that interact directly with that API. [00:08:56] CC: Cool. Yeah, this is a really cool subject. I wasn’t even aware that Kubernetes operating system was a thing until really recently, and I don’t remember how I came across it. One question I have is, Andrew, you were saying, “Well, we strip down Kubernetes to the bare minimum.” How opinionated is it in your case in specific? When you say you – it’s a stripped down version to the bare minimum, this statement of bare minimum, would there be a consensus in the community that, yes, this set of functionality is the bare minimum? Is it your opinion of what the bare minimum should be? [00:09:38] AR: Sure. I think at the absolute bare minimum, we need to run the kubelet. In my mind, that’s really all we need, but you still have this practical issue of, like you said, you need to get information off that machine. You need to be able to kind of manage Kubernetes without having to need Kubernetes as a chicken and egg’s problem. That’s where the API was actually born. When I started Talos, I actually just built a very minimal strip down route-fs that all that did was run the kubelet. But figuring out why the kubelet wasn’t running successfully obviously was not very easy. I figured, “You know what? Let’s put an API in front of this. I want to keep this as minimal as possible. I want to keep this read-only.” I threw an API in front of it. I think you need two things, really. You need to have what’s required by the kubelet. You need a CNI. You need all the utilities that the kubelet will run and you also need a way to query the system. If that is – If in the case of other operating systems that are minimal operating systems, they have decided to do SSH and all the classic utilities that we all know and love, we went another route with an API. But I don’t think the operating system, the route-fs should have any more than what’s required by the kubelet. That would be the pie in the sky dream right there. [00:11:01] CC: The two questions that come to my mind are if I wanted to add Kubernetes components to that, would it be possible? If I wanted to add anything to the operating system, would it possible? I think the second question you already answered, which is, well, if you need to run – Correct me if I’m wrong. If you need to run something on the operating system that’s not there, you can run it in the actual cluster. [00:11:27] AR: Yeah, that’s the idea, is that Kubernetes gives us the APIs to do – We could schedule to specific nodes. We can schedule to a class of nodes. We can schedule to every single node. I think that you can actually handle a lot of the use cases out there for any kind of application with Kubernetes itself. I think that that’s really strong because you get one single consistent API in managing your infrastructure. I want to deploy applications for this team or this team. At the end of the day, everything is just declarative and Kubernetes will make it happen. You don’t have to worry about the scheduling and all of these different things. The only thing that the operating system is concerned about is making that machine available to the Kubernetes cluster. [00:12:10] BL: This idea of slimmed down operating systems, it’s not a new one. CoreOS was doing this years ago. One issue that CoreOS ran into was like, “Well, what’s current?” Well, it depends on what stream you’re on. How do you manage keeping everything up-to-date? [00:12:28] AR: Our goal is to keep pace with Kubernetes essentially. I know that, traditionally, there’s long-term support and there’s all these different ways of releasing different versions of an operating system, but Kubernetes isn’t really there yet. There is no notion that I know of of LTS in Kubernetes yet. There’s just, I believe, it’s N-2 or something like that where they actually offer official support. I think that the operating system is bound to that. I think that it needs to follow Kubernetes as close as possible. There’re constantly different feature gates being opened up. There’re things being graduated to GA. I think especially at this time right now, as rapid as the technology is changing, you need an operating system that is going to evolve with it or at least the operations intelligence to evolve with Kubernetes right alongside it. [00:13:20] BL: So that brings up an interesting point. I mean, there are two things here. There’s the operating system itself and there’s Kubernetes. Do they upgrade in lockstep or are they upgraded separately? [00:13:29] AR: I could only speak for ourselves. There are people that I think they actually have upgrades kind of be one and the same, where the operating system and a Kubernetes upgrade both happen. We’ve decided started to go the other route where we actually want to evolve our APIs sort of independently, but then give you a way to still manage Kubernetes on its own. We’ve actually done self-hosted Kubernetes. In Talos, we’ll actually bootstrap a lightweight control plane, small control plane and then we’ll spin up another control plane using the Kubernetes API. Then now, Kubernetes upgrades simply look like a kubectl edit. I’m going to update my daemon step for my API server. Then from there, you will have to basically update the kub. We use hyperkub for the kubelet. You have to tell Talos, “Use this kubelet image next time you boot.” We’ve separated the two I think for good reason. I think that the two should be able to evolve independently to give a little bit more power back to the user. If you combine them, if you couple them really closely, it becomes really, really opinionated. I think we should at least support what Kubernetes supports, and that’s the N-2 and leave it up to the user to kind of configure Kubernetes, but we still have same best practices out of the box. [00:14:54] BL: Yeah, that makes sense, because yesterday, what did we get? We got a Kubernetes 1.15.10, and I don’t know 16, but we got 1.17.3 yesterday too. You might not want to move, because you might not – 1.17 introduced a whole bunch of deprecations and for custom resource definition. You’re not ready to move yet. We’re on beta 1 for a while for CRDs. I totally see why you had moved that direction. [00:15:20] AR: Yeah, that’s exactly it. We can’t impose too much opinion, but I think that we should drive – The opinion at least up until like, “Hey, don’t worry about what’s on this machine. I’m going to make it a Kubernetes node for you. Just tell me which version you want.” I think that’s where we should draw the boundary and then we should still give the controls back to the user as far as what flags do I want to specify. What kind of feature gates? All these various things that you don’t get out of a lot of the different managed products out there. Hopefully we’ll be tittering right on the line of having that convenience of managed but still giving you that power and flexibility to update a flag if you need to. [00:16:04] CC: This episode is so in the style of an interrogation. It’s hilarious. [00:16:08] BL: That’s me. I’m digging in. [00:16:09] CC: I feel like – No. We are all digging in. It’s just because – At least speaking from myself. I’m super curious. I wanted to ask you, Andrew, at the beginning you were saying that a Kubernetes operating system needs to integrate with Kubernetes and I was sitting here thinking, “Operate? It’s supposed to be Kubernetes.” What did you have in mind when you said that? Did you mean to be able to interface with another Kubernetes cluster? Was that what you meant? [00:16:36] AR: Not quite. What I meant by that is there’s this really powerful thing that Kubernetes gives us in CRDs and this idea of operators or controllers. If you can actually have a way to use an operator controller, say, for upgrading your operating system, which we have in Talos, it’s just an upgrade operator lives in Kubernetes and knows how to talk to Kubernetes and it knows how to talk to our API and sort of orchestrate upgrades across the board. Part of that is, for example, when you receive the upgrade API on a Talos node, it actually is aware, “Hey, I’m running Kubernetes. I’m going to cordon myself, because I know I’ve gotten this and I know that I’m not going to be able to schedule workload on me.” I think that that’s just one example, but we could probably take that a lot farther one day. But I would like to see everything that we know and love about our operating systems today essentially be abstracted and pushed up into Kubernetes as operators. There’s a lot of power in that where you can actually orchestrate things, like I said, like upgrades. I think that that’s one example of how we can integrate better with Kubernetes as how an operating system should, at least. [00:17:45] CC: Got you. [00:17:46] JS: I was wondering if we can kind of maybe just pivot a little bit, like maybe to satisfy my own curiosities, but I was kind of hoping we could talk a little bit about like some of the selling features. Imagine if I’m a hardened sys admin or security team and basically someone comes up and says, “Hey, I want to run this Kubernetes operating system.” Knowing what I know about the state of security today and operating systems, there’s a lot of efforts to basically kind of contain things. No pun intended, but we have user space operates out of some type of sandbox. We have seccomp to limit sys calls. How does Talos approach security maybe like philosophically or maybe even down to the implementation details? What is security in Talos look like? [00:18:33] AR: Yeah. Again, our goal is to basically – We want people to forget about the operating system. But to forget about the operating system, you have to know it’s secure. You have to go to great lengths to secure that because you can’t forget about it for that reason. We actually go down to the kernel, we actually apply what’s called the kernel self-protection project. We basically try to harden the kernel, and at boot time, we do a bunch of checks to make sure that your kernel is running at least most of those configurations. I think we have a little bit of work to do as far as enforcing all of them. But we do some checks to ensure that your kernel is compatible with KSPP, for example. That alone has a ton of benefits to it. It’s a statically compiled kernel so you it can’t do any kernel module loading and stuff like that. That’s completely prohibited. That alone just kind of cuts off a lot of security issues in itself. Then going up the stack further, we’ve actually stripped out SSH. We stripped out Bash. So you have nothing that you can really log on to anymore. Again, that’s just flat out removes a lot of – A whole category of potential attacks potentially. Going even further than that, we’ve actually have Talos running completely out of RAM and it’s a squash-fs. So it’s a read-only file system. The only thing that actually uses a disk is the kubelet. The idea is that we want to make the operating system, again, just have it go away. Having it read-only I think is a really strong thing, and squash-fs in particular, because you can’t remount it, rewrite if you’re a user or something like that. Then up in Kubernetes we actually – Out of the box, we try to deploy it with all of the security best practices, the CIS benchmarks and all of that. We go to all the way from the kernel, to our user LAN and even to Kubernetes itself. We try to bring out security best practices out of the box. I think that’s something I’d love to see for Kubernetes itself upstream, but for now that’s what we’re doing. [00:20:33] BL: Can we go back to the interrogation? No. Let’s not go back to an interrogation. Thinking of – If we take the concept of a Kubernetes operating system, that can be updated in a different cadence, then the Kubernetes running on it – Who is Talos for? Who does it make – Could Joe as a neophyte or someone who doesn’t really know the space, will this make their life any easier or is there a special set of expertise that we would need to be fruitful with this? [00:21:06] TG: I think from our perspective, we hope that everybody who uses Kubernetes would find something useful in Talos, or a system like Talos. Number one, I think Talos would be a great way to get started on your laptop or workstation. I got some basic features to standup a small Kubernetes cluster there. That’s one place to start. As you move further into production, I think that a Kubernetes OS-based platform would be particularly useful in an environment where you might have multiple clusters spread across different geographical locations, spread across different teams. Maybe spread across different hosting environments. We’ve talked to a number of folks who have been running Kubernetess in production for a couple of years now, and these clusters kind of come up organically within a larger organization in different areas, doing different things for the business, managed by different teams. Now that a little bit of time has passed, these organizations are realizing that, “Hey, we’ve got kind of a Kubernetes sprawl problem. We have this team over here on Amazon managing and running Kubernetes one way. We have a separate team managing and running Kubernetes a different way over here on a different kind of platform.” I think anything that – anywhere where we can drive some consistency across the tooling, consistency across the base platforms would be useful. We also think that the minimal aspect of our system and some of the design decisions we’ve made around security and make it particularly useful in maybe a regulated environment. I think that that claim would hold true for any sort of special purpose operating system or minimal operating system designed for a specific task. [00:22:35] BL: Interesting. Just thinking about a concept of a Kubernetes operating system, what’s next? I’m not asking what’s next from Talos, but given all the opportunity all the time and all the knowledge. What should we be doing that we’re not doing right now? [00:22:49] AR: Specifically around operating systems or Kubernetes? [00:22:52] BL: Well, you know what? You can start with operating systems. I mean, you can go to Kubernetes and then we’ll see if our lists match. [00:22:57] AR: That’s a good question. Right off the bat, I’m going to say I don’t really know. I think this is new space. I think that we have a big task in front of us already in getting people to use these kinds of operating systems, hopefully not too big of a task. I’m hoping to see – Because you find these big companies, “Oh! We can’t do this. We can’t do that,” because getting a new OS is hard. I think we first of all need to win people over on just these even more minimal operating systems beyond what CoreOS has done. Personally, I don’t know if I could answer that question honestly without just owing something. [00:23:33] TG: I’ve got a thought here. One of the things that I’m really interested in beyond just Kubernetes and beyond just the operating system – what is computing going to look like in 5, 10, 15 years? I don’t know if Kubernetes is going to be around. I’m kind of a tech-cynic, right? I’ve seen a lot of fads in my career and things that pop up and are very popular for a couple of years and then sort of disappear. I don’t think Kubernetes is one of those. I think Kubernetes and the concepts and the layers of abstraction that Kubernetes has provided, all of that will remain useful and powerful in distant future whether or not it’s called Kubernetes or if it’s called something else, some new paradigm. But what I’m really interested in is seeing what can we do with this idea of an API-managed OS? If you look at the general purpose operating systems out there, some aspects of the system might expose an API. But for the most part, you’re still interacting and interfacing with this system like you were 30 years ago, 35, 40 years ago even. That’s fine. What works works, but everything else today has an API. Kubernetes has a powerful and extensible API and I think that your operating system should have something similar, something comparable, something that you can interact with using the same tools and the same processes and the same ideas that you can at the top of the stack and move some of those concepts down to the host OS level where we’re talking about today. [00:24:51] CC: This brings up a point that I’m so curious about, not only the idea of having a Kubernetes operating system, but any idea that is new that you were just talking about, Tim, is – So what works works. For example, every year or every couple of years, I am evaluating a new code editor or I am evaluating a new note taking app, or do-to-list app, those three things. I’m continuously finding something to reevaluate because what I have has never worked for me just the way I think. Actually, recently I found a couple of things that are really good. In any case, the thing is they just never worked for years. They’re very limited. They don’t match my thinking. But operating system, I would never – Well, I’m not an administrator also, but just like from having my own laptops forever, I’m not going to go out there. That’s not true either, but I was going to say I’m not going to go out there and try a new operating system to see if it’s offering that I already have, then it might be better for me. But that’s not true, because I have done that many times too. So never mind. But I think the idea of my question is stance, is how are you communicating to people out there that, “Hey, there is this new thing that maybe it’s working for you – Maybe you think it’s working for you, but you just don’t know that there is a new different way of doing.” When you do try to do that, how are people responding? I mean, of course, there are those cases where people just know they get it and they immediately resonate with them. But I’m talking about the people who like might benefit from this but they don’t quite grasp. How do you break through that barrier? [00:26:38] TG: Sure. Maybe the lay majority. [00:26:40] CC: Yeah, and how are people responding? [00:26:42] TG: Yeah. The great thing about Talos is that people understand pretty immediately what it is, how it works and why we’ve done it. The challenge I think for us and for anybody changing the way that operating systems work. Is it better enough than what I have today or what I had before? Is it worth the switch in costs? I think that switching cost is something that’s pretty well understood in the industry. People have gone through this process and they’ve moved from virtualization to containers, from Docker to Kubernetes, etc. They understand that process and they understand there’s a technical cost. There’s a people cost, etc. We have to show that value. I think that progress in our industry is incremental. Our industry is young. We’re not building bridges. We’re not at the level of like the internal combustion engine where the engineering is understood and we know how to build it and we know how to make it so that it doesn’t fall over and explode. Clearly, we’re not quite there yet in the broader world of computing. I think anywhere where we can show a little bit of incremental improvement where we can tackle one narrow slice of a problem and make it a little bit better and get to a point where computing is just a little bit safer and a little bit easier and a little bit faster. I think that’ll be a pretty compelling argument and there’s a lot of details involved and we have to talk about how do you get your applications from one operating system to the next? 15 years ago, it may have been a very big ask to ask someone to port their enterprise application from one operating system to another. They’re so inextricably linked. There’re a lot of connections between the OS and the applications, but today, we have these levels of abstraction. We have containers. We have the Kubernetes orchestration mechanisms and I think that switching cost is going down every release of Kubernetes and every step along the way as people change the way that applications are deployed that switching cost gets a little bit cheaper. It will be easier for us to prove that the value you gain by moving to a purpose-built operating system is greater than the switching cost. [00:28:41] CC: Very good points. [00:28:42] JS: I feel like there’s a lot of emphasis and focus on the move over. The first steps toward migrating to something new. There’s a lot of emphasis on bootstrapping a cluster. There’s a lot of emphasis on how do I get started. I’m part of a team called customer reliability engineering and we see operators running Kubernetes environments that are durable and have been in the field for many years. I think that there’s kind of a hidden cost in these day two operations where, like today, to effectively be a Kubernetes operator, you need to also have a great deal of understanding of Linux internal operating systems or Linux operating systems internals. These are abstractions on top, but sometimes those abstractions are leaky. So you need to be able to parse IP tables rules. You need to be able to understand how traffic gets routed, all of these aspects of it. I’m just wondering how do we kind of get folks shifted from this mindset of I’m going to start with something that’s general purpose and then I’m going to basically make it do what I want it to do by making all of these configuration changes and installing things on top of it to kind of make that not general purpose, but kind of specific focus on it and kind of get people to move back more fundamentally and think, “Well, what if we just started with something that is strictly for running workloads?” We don’t have to worry about installing a security suite on top of this or making this configuration change or hardening requirements or what have you. We’re fundamentally in a better place because we’ve started with something that’s arguably more secure. [00:30:21] BL: You know that – I mean, I’m old. I’m old now. I’m realizing this. When I started – Back in my day when we started with Linux, we went through this whole thing of Linux installers and there’re many iterations of Linux installers and it depended on, “Well, did you like what Red Hat was doing? Did you like what Debian project was doing? Oh! Did you like what Arch was doing? Oh! Did you want to do it yourself? Do you want to merge the world with gen 2?” Really, we come to this point now, no one ever talks about Linux installers anymore. You just put it on there. I think what I’m getting at is that we don’t actually know what we want. I mean we say that we want it to be simpler. We say we want it to be more secure, but we don’t know. Only time will tell, and I think it’s going to be a lot of chipping away at problems. Then people who are wanting to have the bold ideas are saying, “I’m going to out there and create a Kubernetes operating system.” In reality, it may work. We hope it works, or it may not work, but at least we gained just a little tiny bit more knowledge on how we want to run this thing. I think – And I’ll just say one more last thing, is that if you look at like Bell Labs, Bell Labs created the vacuum tube, and then like 20 years later, 20 or 30 years later, they created the transistor, twice actually. It took a long time to get the vacuum tube out because it kind of just worked and they just said, “We can’t throw it back. We just can’t throw that away.” Maybe we’re seeing a lot of that in Kubernetes. We’re holding on to some good things even though some greater things are going to come, but it might not be here this year or next year. It might be 18 months. It might be 24 months. We just got to really pay attention to that. [00:32:02] AR: Brian, when you said you were old, I was going to shake my head internally and then you brought up the vacuum tube and I’m like, “Okay.” [00:32:07] BL: I mean, I’m not that old. [00:32:09] JS: Yeah, I think that’s a good point, Brian. The thing I like to point out is the allegory of the cave. People have been living a certain way for so long they think that these shadows are real and they just know that way of life until some crazy comes along and says, “Hey, there’s a whole world out there,” and no one believes him. I think we just need to do – Like you said, we just need to do it. When you just create and make it happen and hopefully educate people in the process and just keep chipping away at it. Do the good work. [00:32:38] BL: That’s the important piece and that was the power of Bell Labs. You probably can tell. I just read a book about Bell Labs. I’m an expert now. But that was the power of Bell Labs. They didn’t just focus on making product for AT&T. They focused on changing the world, like literally. Who creates a transistor if you knew what one was? You just don’t create that. That’s like some really crazy stuff. I try to bring the parallel back to what we’re doing here. We can’t just create this perfect Kubernetes thing, because really, we don’t know what it is. I mean, we can be smart and say, “Well, it needs to be secure. It needs to be networking,” and all these stuff. But you know what? We don’t even have cgroups v2 support yet. We don’t even know where we are. Let’s figure out – Let’s just keep going down the path, but we will suss out these better patterns. [00:33:23] CC: Yeah, I like that. [00:33:24] BL: That’s it. It is incremental. Here’s the crazy part, and this is the real tough part. You know what? It is incremental, and reality says that not everybody can win. Don’t take your failures as a loss. Take them as, “well, maybe we shouldn’t have done that,” and keep on moving forward because there’s a lot of companies out there who got us at this point in tech that don’t exist anymore, but if they didn’t do what they did, we would not be here right now. It’s not [inaudible 00:33:52]. [00:33:53] CC: Why are we talking about failures? [00:33:55] BL: I’m sorry. It’s the ultimate success. [00:34:01] CC: Oh gosh! Let’s not end the show in such a downer. [00:34:04] BL: No. That’s a happy point though. Let me put the bow on the happy point and then I will stop talking. The thing is, is it’s not the glass is half empty. It is glass is half full. The path to success is littered with failure and it’s not a bad thing. It’s a good thing, because it’s good that we can continue making those failures because we know they lead to successes. That is actually a happy thing. [00:34:29] CC: I wonder if Andrew and Tim want to do a little bit of interrogating of us. I think that would be fair. [00:34:36] AR: I wouldn’t know what to interrogate you guys about. [00:34:40] CC: Well, we are coming up at the top of the hour. So it’s time to say goodbye. It was great having you, Andrew, and you, Tim, on the call. Jed, thank you for participating as well. I think it was very informative. With that, I will say, until next week. Bye everybody. [00:34:59] TG: Bye. Thanks for having us. [00:35:00] CC: My pleasure. [00:35:00] AR: Bye-bye. Thank you. [00:35:02] JS: Thank you. Bye. [END OF INTERVIEW] [0:35:05.3] ANNOUNCER: Thank you for listening to The Podlets Cloud Native Podcast. Find us on Twitter at https://twitter.com/ThePodlets and on the http://thepodlets.io/ website, where you'll find transcripts and show notes. We'll be back next week. Stay tuned by subscribing. [END]See omnystudio.com/listener for privacy information.
The question of diving into Kubernetes is something that faces us all in different ways. Whether you are already on the platform, are considering transitioning, or are thinking about what is best for your team moving forward, the possibilities and the learning-curve make it a somewhat difficult question to answer. In this episode, we discuss the topic and ultimately believe that an individual is the only one who can answer that question well. That being said, the capabilities of Kubernetes can be quite persuasive and if you are tempted then it is most definitely worth considering very seriously, at least. In our discussion, we cover some of the problems that Kubernetes solves, as well as some of the issues that might arise when moving into the Kubernetes space. The panel shares their thoughts on learning a new platform and compare it with other tricky installations and adoption periods. From there, we look at platforms and how Kubernetes fits and does not fit into a traditional definition of what a platform constitutes. The last part of this episode is spent considering the future of Kubernetes and how fast that future just might arrive. So for all this and a bunch more, join us on The Podlets Podcast, today! Follow us: https://twitter.com/thepodlets Website: https://thepodlets.io Feeback: info@thepodlets.io https://github.com/vmware-tanzu/thepodlets/issues Hosts: Carlisia Campos Josh Rosso Duffie Cooley Bryan Liles Key Points From This Episode: The main problems that Kubernetes solves and poses. Why you do not need to understand distributed systems in order to use Kubernetes. How to get around some of the concerns about installing and learning a new platform. The work that goes into readying a Kubernetes production cluster. What constitutes a platform and can we consider Kubernetes to be one? The two ways to approach the apparent value of employing Kubernetes. Making the leap to Kubernetes is a personal question that only you can answer. Looking to the future of Kubernetes and its possible trajectories. The possibility of more visual tools in the UI of Kubernetes. Understanding the concept of conditions in Kubernetes and its objects. Considering appropriate times to introduce a team to Kubernetes. Quotes: “I can use different tools and it might look different and they will have different commands but what I’m actually doing, it doesn’t change and my understanding of what I’m doing doesn’t change.” — @carlisia [0:04:31] “Kubernetes is a distributed system, we need people with expertise across that field, across that whole grouping of technologies.” — @mauilion [0:10:09] “Kubernetes is not just a platform. Kubernetes is a platform for building platforms.” — @bryanl [0:18:12] Links Mentioned in Today’s Episode: Weave — https://www.weave.works/docs/net/latest/overview/ AWS — https://aws.amazon.com/ DigitalOcean — https://www.digitalocean.com/ Heroku — https://www.heroku.com/ Red Hat — https://www.redhat.com/en Debian — https://www.debian.org/ Canonical — https://canonical.com/ Kelsey Hightower — https://github.com/kelseyhightower Joe Beda — https://www.vmware.com/latam/company/leadership/joe-beda.html Azure — https://azure.microsoft.com/en-us/ CloudFoundry — https://www.cloudfoundry.org/ JAY Z — https://lifeandtimes.com/ OpenStack — https://www.openstack.org/ OpenShift — https://www.openshift.com/ KubeVirt — https://kubevirt.io/ VMware — https://www.vmware.com/ Chef and Puppet — https://www.chef.io/puppet/ tgik.io — https://www.youtube.com/playlist?list=PL7bmigfV0EqQzxcNpmcdTJ9eFRPBe-iZa Matthias Endler: Maybe You Don't Need Kubernetes - https://endler.dev/2019/maybe-you-dont-need-kubernetes Martin Tournoij: You (probably) don’t need Kubernetes - https://www.arp242.net/dont-need-k8s.html Scalar Software: Why most companies don't need Kubernetes - https://scalarsoftware.com/blog/why-most-companies-dont-need-kubernetes GitHub: Kubernetes at GitHub - https://github.blog/2017-08-16-kubernetes-at-github Debugging network stalls on Kubernetes - https://github.blog/2019-11-21-debugging-network-stalls-on-kubernetes/ One year using Kubernetes in production: Lessons learned - https://techbeacon.com/devops/one-year-using-kubernetes-production-lessons-learned Kelsey Hightower Tweet: Kubernetes is a platform for building platforms. It's a better place to start; not the endgame - https://twitter.com/kelseyhightower/status/935252923721793536?s=2 Transcript: EPISODE 18 [INTRODUCTION] [0:00:08.7] ANNOUNCER: Welcome to The Podlets Podcast, a weekly show that explores Cloud Native one buzzword at a time. Each week, experts in the field will discuss and contrast distributed systems concepts, practices, tradeoffs and lessons learned to help you on your cloud native journey. This space moves fast and we shouldn’t reinvent the wheel. If you’re an engineer, operator or technically minded decision maker, this podcast is for you. [EPISODE] [0:00:41.9] JR: Hello everyone and welcome to The Podlets Podcast where we are going to be talking about should I Kubernetes? My name is Josh Rosso and I am very pleased to be joined by, Carlisia Campos. [0:00:55.3] CC: Hi everybody. [0:00:56.3] JR: Duffy Cooley. [0:00:57.6] DC: Hey folks. [0:00:58.5] JR: And Brian Lyles. [0:01:00.2] BL: Hi. [0:01:03.1] JR: All right everyone. I’m really excited about this episode because I feel like as Kubernetes has been gaining popularity over time, it’s been getting its fair share of promoters and detractors. That’s fair for any piece of software, right? I’ve pulled up some articles and we put them in the show notes about some of the different perspectives on both success and perhaps failures with Kub. But before we dissect some of those, I was thinking we could open it up more generically and think about based on our experience with Kubernetes, what are some of the most important things that we think Kubernetes solves for? [0:01:44.4] DC: All right, my list is very short and what Kubernetes solves for my point of view is that it allows or it actually presents an interface that knows how to run software and the best part about it is that it doesn’t – the standard interface. I can target Kubernetes rather than targeting the underlying hardware. I know certain things are going to be there, I know certain networking’s going to be there. I know how to control memory and actually, that’s the only reason that I really would give, say for Kubernetes, we need that standardization and you don’t want to set up VM’s, I mean, assuming you already have a cluster. This simplifies so much. [0:02:29.7] BL: For my part, I think it’s life cycle stuff that’s really the biggest driver for my use of it and for my particular fascination with it. I’ve been in roles in the past where I was responsible for ensuring that some magical mold of application on a thousand machines would magically work and I would have all the dependencies necessary and they would all agree on what those dependencies were and it would actually just work and that was really hard. I mean, getting to like a known state in that situation, it’s very difficult. Having something where either both the abstractions of containers and the abstraction of container orchestration, the ability to deploy those applications and all those dependencies together and the ability to change that application and its dependencies, using an API. That’s the killer part for me. [0:03:17.9] CC: For me, from a perspective of a developer is very much what Duffy just said but more so the uniformity that comes with all those bells and whistles that we get by having that API and all of the features of Kubernetes. We get such a uniformity across such a really large surface and so if I’m going to deploy apps, if I’m going to allow containers, what I have to do for one application is the same for another application. If I go work for another company, that uses Kubernetes, it is the same and if that Kubernetes is a hosted Kubernetes or if it’s a self-managed, it will be the same. I love that consistency and that uniformity that even so I can – there are many tools that help, they are customized, there’s help if you installing and composing specific things for your needs. But the understanding of what you were doing is it’s the same, right? I can use different tools and it might look different and they will have different commands but what I’m actually doing, it doesn’t change and my understanding of what I’m doing doesn’t change. I love that. Being able to do my work in the same way, I wish, you know, if that alone for me makes it worthwhile. [0:04:56.0] JR: Yeah, I think like my perspective is pretty much the same as what you all said and I think the one way that I kind of look at it too is Kubernetes does a better job of solving the concerns you just listed, then I would probably be able to build myself or my team would be able to solve for ourselves in a lot of cases. I’m not trying to say that specialization around your business case or your teams isn’t appropriate at times, it’s just at least for me, to your point Carlisia, I love that abstraction that’s consistent across environments. It handles a lot of the things, like Brian was saying, about CPU, memory, resources and thinking through all those different pieces. I wanted to take what we just said and maybe turn it a bit at some of the common things that people run in to with Kubernetes and just to maybe hit on a piece of low hanging fruit that I think is oftentimes a really fair perspective is Kubernetes is really hard to operate. Sure, it gives you all the benefits we just talked about but managing a Kubernetes cluster? That is not a trivial task. And I just wanted to kind of open that perspective up to all of us, you know? What are your thoughts on that? [0:06:01.8] DC: Well, the first thought is it doesn’t have to be that way. I think that’s a fallacy that a lot of people fall into, it’s hard. Guess what? That’s fine, we’re in the sixth year of Kubernetes, we’re not in the sixth year of stability of a stable release. It’s hard to get started with Kubernetes and what happens is we use that as an excuse to say well, you know what? It’s hard to get started with so it’s a failure. You know something else that was hard to get started with? Whenever I started with it in the 90s? Linux. You download it and downloading it on 30 floppy disks. There was the download corruption, real things, Z modem, X modem, Y modem. This is real, a lot of people don’t know about this. And then, you had to find 30 working flopping disk and you had to transfer 30, you know, one and a half megabyte — and it still took a long time to floppy disk and then you had to run the installer. And then most likely, you had to build a kernel. Downloading, transferring, installing, building a kernel, there was four places where just before you didn’t have windows, this was just to get you to a log in prompt, that could fail. With Kubernetes, we had this issue. People were installing Kubernetes, there’s cloud vendors who are installing it and then there’s people who were installing it on who knows what hardware. Guess what? That’s hard and it’s not even now, it’s not even they physical servers that’s networking. Well, how are you going to create a network that works across all your servers, well you’re going to need an overlay, which one are you going to use, Calico? Use Weave? You’re going to need something else that you created or something else if it works. Yeah, just we’re still figuring out where we need to be but these problems are getting solved. This will go away. [0:07:43.7] BL: I’m living that life right now, I just got a new laptop and I’m a Linux desktop kind of guy and so I’m doing it right now. What does it take to actually get a recent enough kernel that the hardware that is shipped with this laptop is supported, you know? It’s like, those problems continue, even though Linux has been around and considered stable and it’s the underpinning of much of what we do on the internet today, we still run into these things, it’s still a very much a thing. [0:08:08.1] CC: I think also, there’s a factor of experience, for example. This is not the first time you have to deal with this problem, right Duffy? Been using Linux on a desktop so this is not the first hardware that you had to setup Linux on. So you know where to go to find that information. Yeah, it’s sort of a pain but it’s manageable. I think a lot of us are suffering from gosh, I’ve never seen Kubernetes before, where do I even start and – or, I learned Kubernetes but it’s quite burdensome to keep up with everything as opposed to let’s say, if 10 years from now, we are still doing Kubernetes. You’ll be like yeah, okay, whatever. This is no big deal. So because we have done these things for a few years that we were not possibly say that it’s hard. I don’t’ think we would describe it that way. [0:09:05.7] DC: I think there will still be some difficulty to it but to your point, it’s interesting, if I look back like, five years ago, I was telling all of my friends. Look, if you’re a system’s administrator, go learn how to do other things, go learn how to be, go learn an API centric model, go play with AWS, go play with tools like this, right? If you’re a network administrator, learn to be a system’s administrator but you got to branch out. You got to figure out how to ensure that you’re relevant in the coming time. With all the things that are changing, right? This is true, I was telling my friend this five years ago, 10 years ago, continues, I continue to tell my friends that today. If I look at the Kubernetes platform, the complexity that represents in operating it is almost tailor made to those people though did do that, that decided to actually branch out and to understand why API’s are interesting and to understand, you know, can they have enough of an understanding in a generalist way to become a reasonable systems administrator and a network administrator and you know, start actually understanding the paradigms around distributed systems because those people are what we need to operate this stuff right now, we’re building – I mean, Kubernetes is a distributed system, we need people with expertise across that field, across that whole grouping of technologies. [0:10:17.0] BL: Or, don’t. Don’t do any of that. [0:10:19.8] CC: Brian, let me follow up on that because I think it’s great that you pointed that out Duffy. I was thinking precisely in terms of being a generalist and understanding how Kubernetes works and being able to do most of it but it is so true that some parts of it will always be very complex and it will require expertise. For example, security. Dealing with certificates and making sure that that’s working, if you want to – if you have particular needs for networking, but, understanding the whole idea of this systems, as it sits on top of Kubernetes, grasping that I think is going to – have years of experience under their belt. Become relatively simple, sorry Brian that I cut you off. [0:11:10.3] BL: That’s fine but now you gave me something else to say in addition to what I was going to say before. Here’s the killer. You don’t need to know distributed systems to use Kubernetes. Not at all. You can use a deployment, you can use a [inaudible] set, you can run a job, you can get workloads up on Kubernetes without having to understand that. But, Kubernetes also gives you some good constructs either in the Kubernetes API's itself or in its client libraries where you could build distributed systems in easier way but what I was going to say before that though is I can’t build a cluster. Well don’t. You know what you should do? Use a cloud vendor, use AWS, use Google, use Microsoft or no, I mean, did I say Microsoft? Google and Microsoft. Use Digital Ocean. There’s other people out there that do it as well, they can take care of all the hard things for you and three, four minutes or 10 minutes if you’re on certain clouds, you can have Kubernetes up and running and you don’t even have to think about a lot of these networking concerns to get started. I think that’s a little bit of the thud that we hear, "It’s hard to install." Well, don’t install it, you install it whenever you have to manage your own data centers. Guess what? When you have to manage your own data centers and you’re managing networking and storage, there’s a set of expertise that you already have on staff and maybe they don’t want to learn a new thing, that’s a personal problem, that’s not really a Kubernetes problem. Let’s separate those concerns and not use our lack or not wanting to, to stop us from actually moving forward. [0:12:39.2] DC: Yeah. Maybe even taking that example step forward. I think where this problem compounds or this perspective sometimes compounds about Kubernetes being hard to operate is coming from of some shops who have the perspective of are operational concerns today, aren’t that complex. Why are we introducing this overhead, this thing that we maybe don’t need and you know, to your point Brian, I wonder if we’d all entertain the idea, I’m sure we would that maybe even, speaking to the cloud vendors, maybe even just a Heroku or something. Something that doesn’t even concern itself with Kube but can get your workload up and running and successful as quickly as possible. Especially if you’re like, maybe a small startup type persona, even that’s adequate, right? It could have been not a failure of Kubernetes but more so choosing the wrong tool for the job, does that resonate with you all as well, does that make sense? [0:13:32.9 DC: Yeah, you know, you can’t build a house with a screwdriver. I mean, you probably could, it would hurt and it would take a long time. That’s what we’re running into. What you’re really feeling is that operationally, you cannot bridge the gap between running your application and running your application in Kubernetes and I think that’s fair, that’s actually a great thing, we prove that the foundations are stable enough that now, we can actually do research and figure out the best ways to run things because guess what? RPM’s from Red Hat and then you have devs from the Debian project, different ways of getting things, you have Snap from Canonical, it works and sometimes it doesn’t, we need to actually figure out those constructs in Kubernetes, they’re not free. These things did not exist because someone says, "Hey, I think we should do this." Many years. I was using RPM in the 90s and we need to remember that. [0:14:25.8] JR: On that front, I want to maybe point a question to you Duffy, if you don’t mind. Another big concern that I know you deal with a lot is that Kubernetes is great. Maybe I can get it up no problem. But to make it a viable deployment target at my organization, there’s a lot of work that goes into it to make a Kubernetes cluster production ready, right? That could be involving how you integrate storage and networking and security and on and on. I feel like we end up at this tradeoff of it’s so great that Kubernetes is super extensible and customizable but there is a certain amount of work that that kind of comes with, right? I’m curious Duff, what’s your perspective on that? [0:15:07.3] DC: I want to make a point that bring back to something Brian mentioned earlier, real quick, before I go on to that one. The point is that, I completely agree that yo do not have to actually be a distributed systems person to understand how to use Kubernetes and if that were a bar, we would have set that bar and incredibly, the inappropriate place. But from the operational perspective, that’s what we were referring to. I completely also agree that especially when we think about productionalizing clusters, if you’re just getting into this Kubernetes thing, it may be that you want to actually farm that out to another entity to create and productionalize those clusters, right? You have a choice to make just like you had a choice to make what when AWS came along. Just like you had a choice to make — we’re thinking of virtual machines, right? You have a choice and you continue to have a choice about how far down that rabbit hole as an engineering team of an engineering effort your company wants to go, right? Do you want farm everything out to the cloud and not have to deal with the operations, the day to day operations of those virtual machines and take the constraints that have been defined by that platformer, or do you want to operate that stuff locally, are you required by the law to operate locally? What does production really mean to you and like, what are the constraints that you actually have to satisfy, right? I think that given that choice, when we think about how to production Alize Kubernetes, it comes down to exactly that same set of things, right? Frequently, productionalizing – I’ve seen a number of different takes on this and it’s interesting because I think it’s actually going to move on to our next topic in line here. Frequently I see that productionizing or productionalizing Kubernetes means to provide some set of constraints around the consumption of the platform such that your developers or the focus that are consuming that platform have to operate within those rails, right? They could only define deployments and they can only define deployments that look like this. We’re going to ask them a varied subset of questions and then fill out all the rest of it for them on top of Kubernetes. The entry point might be CICD, it might be a repository, it might be code repository, very similar to a Heroku, right? The entry point could be anywhere along that thing and I’ve seen a number of different enterprises explore different ways to implement that. [0:17:17.8] JR: Cool. Another concept that I wanted to maybe have us define and think about, because I’ve heard the term platform quite a bit, right? I was thinking a little bit about you know, what the term platform means exactly? Then eventually, whether Kubernetes itself should be considered a platform. Backing u, maybe we could just start with a simple question, for all of us, what makes something a platform exactly? [0:17:46.8] BL: Well, a platform is something that provides something. That is a Brian Lyles exclusive. But really, what it is, what is a platform, a platform provides some kind of service that can be used to accomplish some task and Kubernetes is a platform and that thing, it provides constructs through its API to allow you to perform tasks. But, Kubernetes is not just a platform. Kubernetes is a platform for building platforms. The things that Kubernetes provides, the workload API, the networking API, the configuration and storage API’s. What they provide is a facility for you to build higher level constructs that control how you want to run the code and then how you want to connect the applications. Yeah, Kubernetes is actually a platform for platforms. [0:18:42.4] CC: Wait, just to make sure, Brian. You’re saying, because Kelsey Hightower for example is someone who says Kubernetes is a platform of platforms. Now, is Kubernetes both a platform of platforms, at the same time that it’s also a platform to run apps on? [0:18:59.4] BL: It’s both. Kelsey tweeted that there is some controversy on who said that first, it could have been Joe Beda, it could have been Kelsey. I think it was one of those two so I want to give a shout out to both of those for thinking in the same line and really thinking about this problem. But to go back to what you said, Carlisia, is it a platform for providing platforms and a platform? Yes, I will explain how. If you have Kubernetes running and what you can do is you can actually talk to the API, create a deployment. That is platform for running a workload. But, also what you can do is you can create through Kubernetes API mechanisms, ie. CRD’s, custom resource definitions. You can create custom resources that I want to have something called an application. You can basically extend the Kubernetes API. Not only is Kubernetes allowing you to run your workloads, it’s allowing you to specify, extend the API, which then in turn can be run with another controller that’s running on your platform that then gives you this thing when you cleared an application. Now, it creates deployment which creates a replica set, which creates a pod, which creates containers, which downloads images from a container registry. It actually is both. [0:20:17.8] DC: Yeah, I agree with that. Another quote that I remember being fascinated by which I think kind of also helps define what a platform is Kelsey put on out quote that said, Everybody wants platform at a service with the only requirement being that they’ve built it themselves." Which I think is awesome and it also kind of speaks, in my opinion to what I think the definition of a platform is, right? It’s an interface through which we can define services or applications and that interface typically will have some set of constraints or some set of workflows or some defined user experience on top of it. To Brian's point, I think that Kubernetes is a platform because it provides you a bunch of primitive s on the back end that you can use to express what that user experience might be. As we were talking earlier about what does it take to actually – you might move the entry point into this platform from the API, the Kubernetes API server, back down into CICD, right? Perhaps you're not actually defining us and called it a deployment, you’re just saying, I want so many instances off this, I don’t want it to be able to communicate with this other thing, right? It becomes – so my opinion, the definition about of a platform it is that user experience interface. It’s the constraints that we know things that you're going to put on top of that platform. [0:21:33.9] BL: I like that. I want to throw out a disclaimer right here because we’re here, because we’re talking about platforms. Kubernetes is not a platform, it’s as surface. That is actually, that’s different, a platform as a service is – from the way that we look at it, is basically a platform that can run your code, can actually make your code available to external users, can scale it up, can scale it down and manages all the nuances required for that operation to happen. Kubernetes does not do that out of the box but you can build a platform as a surface on Kubernetes. That’s actually, I think, where we’ll be going next is actually people, stepping out of the onesy-twosy, I can deploy a workload, but let’s actually work on thinking about this level. And I’ll tell you what. DEUS who got bought by Azure a few years ago, they actually did that, they built a pass that looks like Heroku. Microsoft and Azure thought that was a good idea so they purchased them and they’re still over there, thinking about great ideas but I think as we move forward, we will definitely see different types of paths on Kubernetes. The best thing is that I don’t think we’ll see them in the conventional sense of what we think now. We have a Heroku, which is like the git-push Heroku master, we share code through git. And then we have CloudFoundry idea of a paths which is, you can run CFPush and that actually is more of an extension of our old school Java applications, where we could just push [inaudible] here but I think at least I am hoping and this is something that I am actually working on not to toot my own horn too much but actually thinking about how do we actually – can we build a platform as a service toolkit? Can I actually just build something that’s tailing to my operation? And that is something that I think we’ll see a lot more in the next 18 months. At least you will see it from me and people that I am influencing. [0:23:24.4] CC: One thing I wanted to mention before we move onto anything else, in answering “Is Kubernetes right for me?” We are so biased. We need to play devil’s advocate at some point. But in answering that question that is the same as in when we need to answer, “Is technology x right for me?” and I think there is at a higher level there are two camps. One camp is very much of the thinking that, "I need to deliver value. I need to allow my software and if the tools I have are solving my problem I don’t need to use something else. I don’t need to use the fancy, shiny thing that’s the hype and the new thing." And that is so right. You definitely shouldn't be doing that. I am divided on this way of thinking because at the same time at that is so right. You do have to be conscious of how much money you’re spending on things and anyway, you have to be efficient with your resources. But at the same time, I think that a lot of people who don’t fully understand what Kubernetes really can do and if you are listening to this, if you maybe could rewind and listen to what Brian and Duffy were just saying in terms of workflows and the Kubernetes primitives. Because those things they are so powerful. They allow you to be so creative with what you can do, right? With your development process, with your roll out process and maybe you don’t need it now. Because you are not using those things but once you understand what it is, what it can do for your used case, you might start having ideas like, “Wow, that could actually make X, Y and Z better or I could create something else that could use these things and therefore add value to my enterprise and I didn’t even think about this before.” So you know two ways of looking at things. [0:25:40.0] BL: Actually, so the topic of this session was, “Should I Kubernetes” and my answer to that is I don’t know. That is something for you to figure out. If you have to ask somebody else I would probably say no. But on the other side, if you are looking for great networking across a lot of servers. If you are looking for service discovery, if you are looking for a system that can restart workloads when they fail, well now you should probably start thinking about Kubernetes. Because Kubernetes provides all of these things out of the box and are they easy to get started with though? Some of these things are harder. Service discovery is really easy but some of these things are a little bit harder but what Kubernetes does is here comes my hip-hop quote, Jay Z said this, basically he’s talking about difficult things and he basically wants difficult things to take a little bit of time and impossible things or things we thought that were impossible to take a week. So basically making difficult things easy and making things that you could not even imagine doing, attainable. And I think that is what Kubernetes brings to the table then I’ll go back and say this one more time. Should you use Kubernetes? I don’t know that is a personal problem that is something you need to answer but if you’re looking for what Kubernetes provides, yes definitely you should use it. [0:26:58.0] DC: Yeah, I agree with that I think it is a good summary there. But I also think you know coming back to whether you should Kubernetes part, from my perspective the reason that I Kubernetes, if you will, I love that as a verb is that when I look around at the different projects in the infrastructure space, as an operations person, one of the first things I look for is that API that pattern around consumption, what's actually out there and what’s developing that API. Is it a the business that is interested in selling me a new thing or is it an API that’s being developed by people who are actually trying to solve real problems, is there a reasonable way to go about this. I mean when I look at open stack, OpenStack was exactly the same sort of model, right? OpenStack existed as an API to help you consume infrastructure and I look at Kubernetes and I realize, “Wow, okay well now we are developing an API that allows us to think about the life cycle and management of applications." Which moves us up the stack, right? So for my part, the reason I am in this community, the reason I am interested in this product, the reason I am totally Kubernetes-ing is because of that. I realized that fundamentally infrastructure has to change to be able to support the kind of load that we are seeing. So whether you should Kubernetes, is the API valuable to you? Do you see the value in that or is there more value in continuing whatever paradigm you’re in currently, right? And judging that equally I think is important. [0:28:21.2] JR: Two schools of thoughts that I run into a lot on the API side of thing is whether overtime Kubernetes will become this implementation detail, where 99% of users aren’t even aware of the API to any extent. And then another one that kind of talks about the API is consistent abstraction with tons of flexibility and I think companies are going in both directions like OpenShift from Red Hat is perhaps a good example. Maybe that is one of those layer two platforms more so Brian that you were talking about, right? Where Kubernetes is the platform that was used to build it but the average person that interacts with it might not actually be aware of some of the Kubernetes primitives and things like that. So if we could all get out of our crystal balls for a second here, what do you all think in the future? Do you see the Kubernetes API becoming just a more prevalent industry standard or do you see it fading away in favor of some other abstraction that makes it easier? [0:29:18.3] BL: Oh wow, well I already see it as I don’t have to look too far in the future, right? I can see the Kubernetes API being used in ways that we could not imagine. The idea that I will think of is like KubeVirt. KubeVirt allows you to boot basically pods on whatever implements that it looks like a Kubelet. So it looks like something that could run pods. But the neat thing is that you can use something like KubeVirt with a virtual Kubelet and now you can boot them on other things. So ideas in that space, I don’t know VMware is actually going on that, “Wow, what if we can make virtual machines look like pods inside of Kubernetes? Pretty neat." Azure has definitely led work on this as now, we can just bring up either bring up containers, we can bring up VM’s and you don’t actually need a Kube server anymore. Now but the crazy part is that you can still use a workloads API’s, storage API’s with Kubernetes and it does not matter what backs it. And I’ll throw out one more suggestion. So there is also projects like AWS operators in [inaudible] point and what they allow you to do is to use the Kubernetes API or actually in cluster API, I'll use all three. But I use the Kubernetes API to boot things that aren’t even in the cluster and this will be AWS services or this could be databases across multiple clouds or guess what? More Kubernetes services. Yeah, so we are on that path but I just can’t wait to see what people are going to do with that. The power of Kubernetes is this API, it is just so amazing. [0:30:50.8] DC: For my part, I think is that I agree that the API itself is being extended in all kinds of amazing ways but I think that as I look around in the crystal ball, I think that the API will continue to be foundational to what is happening. If I look at the level two or level three platforms that are coming, I think those will continue to be a thing for enterprises because they will continue to innovate in that space and then they will continue to consume the underlying API structure and that portability Kubernetes exposes to define what that platform might look like for their own purpose, right? Giving them the ability to effectively have a platform as a service that they define themselves but using and under – you know, using a foundational layer that it’s like consistent and extensible and extensive I think that that’s where things are headed. [0:31:38.2] CC: And also more visual tools, I think is in our future. Better, actual visual UI's that people can use I think that’s definitely going to be in our future. [0:31:54.0] BL: So can I talk about that for a second? [0:31:55.9] CC: Please, Brian. [0:31:56.8] BL: I am wearing my octant hoodie today, which is a visual tool for Kubernetes and I will talk now as someone who has gone down this path to actually figure this problem out. As a prediction for the future, I think we’ll start creating better API’s in Kubernetes to allow for more visual things and the reason that I say that this is going to happen and it can’t really happen now is because for inside of an octant and whenever creating new eye views, pretty much happened now what that optic is. But what is going to happen and I see the rumblings from the community, I see the rumblings from K-native community as well is that we are going to start standardizing on conditions and using conditions as a way that we can actually say what’s going on. So let me back it up for a second so I can explain to people what conditions are. So Kubernetes, we think of Kubernetes as YAML and in a typical object in Kubernetes, you are going to have your type meta data. What is this, you are going to have your object meta data, what’s name this and then you are going to have a spec, how is this thing configured and then you are going to have a status and the status generally will say, “Well what is the status of this object? Is it deployment? How many references out? If it is a pod, am I ready to go?" But there is also this concept and status called conditions, which are a list of things that say how your thing, how your object is working. And right now, Kubernetes uses them in two ways, they use them in the negative way and the positive way. I think we are actually going to figure out which one we want to use and we are going to see more API’s just say conditions. And now from a UI developer, from my point of view, now I can just say, “I don’t really care what your optic is. You are going to give me conditions in a format that I know and I can just basically report on those in the status and I can tell you if the thing is working or not.” That is going to come too. And that will be neat because that means that we get basically, we can start building UI’s for free because we just have to learn the pattern. [0:33:52.2] CC: Can you talk a little bit more about conditions? Because this is not something I hear frequently and that I might know but then not know what you are talking about by this name. [0:34:01.1] BL: Oh yeah, I will give you the most popular one. So everything in Kubernetes is an object and that even means that the nodes that your workloads run on, are objects. If you run KubeControl, KubeCuddle, Kube whatever, git nodes, it will show you all the nodes in your cluster if you have permission to see that and if you do KubeCTL, gitnode, node name and then you actually have the YAML output what you will see in the bottom is an object called 'conditions'. And inside of there it will be something like is there sufficient memory, is the node – I actually don’t remember all of them but really what it is, they’re line items that say how this particular object is working. So do I have enough memory? Do I have enough storage? Am I out of actual pods that can be launched on me and what conditions are? It is basically saying, “Hey Brian, what is the weather outside?” I could say it's nice. Or I could be like, “Well, it’s 75 degrees, the wind is light but variable. It is not humid and these are what the conditions are.” They allow the object to specify things about itself that might be useful to someone who is consuming it. [0:35:11.1] CC: All right that was useful. I am actually trying to bring one up here. I never paid attention to that. [0:35:18.6] BL: Yeah and you will see it. So the two ones that are most common right now, there is some competition going on in Kubernetes architecture, trying to figure out how they are going to standardize on this but with pods and nodes you will see conditions on there and those are just telling you what is going on but the problem is that a condition is a type, a message, a status and something else but the problem is that the status can be true of false — oh and a reason, the status can be true or false but sometimes the type is a negative type where it would be like “node not ready”. And then it will say false because it is. And now whenever you’re inspecting that with automated code, you really want the positive condition to be true and the negative condition to be false and this is something that the K-native community is really working on now. They have the whole facility of this thing called duck typing. Which they can actually now pattern-match inside of optics to find all of these neat things. It is actually pretty intriguing. [0:36:19.5] CC: All right, it is interesting because I very much status is everything for objects and that is very much a part of my work flow. But I never noticed that there was some of the objects had conditions. I never noticed that and just a plug, we are very much going to have the K-native folks here to talk about duck typing. I am really excited about that. [0:36:39.9] BL: Yeah, they’re on my team. They’ll be happy to come. [0:36:42.2] CC: Oh yes, they are awesome. [0:36:44.5] JR: So I was thinking maybe we could wrap this conversation up and I think we have acknowledged that “Should I Kubernetes?” is a ridiculously hard question for us to answer for you and we should clearly not be the ones answering it for you but I was wondering if we could give some thoughts around — for the Podlet listener who is sitting at their desk right now thinking like, “Is now the right time for my organization to bring this in?” And I will start with some thought and then open it all up to you. So one common thing I think that I run into a lot is you know your current state and you know your desired state to steal a Kubernetes concept for a moment. And the desired state might be more decoupled services that are more scalable and so on and I think oftentimes at orgs we get a little bit too obsessed with the desired state that we forget about how far the gap is between the current state and the desired state. So as an example, you know maybe your shop’s biggest issue is the primary revenue generating application is a massive dot-net framework monolith, which isn’t exactly that easy to just port over into Kubernetes, right? So if a lot of your friction right now is teams collaborating on this tool, updating this tool, scaling this tool, maybe before even thinking about Kubernetes, being honest with the fact that a lot of value can be derived right now from some amount of application architecture changes. Or even sorry to use a buzzword but some amount of modernization of aspects of that application before you even get to the part of introducing Kubernetes. So that is one common one that I run into with orgs. What are some other kind of suggestion you have for people who are thinking about, “Is it the right time to introduce Kube?” [0:38:28.0] BL: So here is my thought, if you work for a small startup and you’re working on shipping value and you have no Kubernetes experience and staff and you don’t want to use for some reason you don’t want to use the cloud, you know go figure out your other problems then come back. But if you are an enterprise and especially if you work in a central enterprise group and you are thinking about “modernization”, I actually do suggest that you look at Kubernetes and here is the reason why. My guess is that if you’re a business of a certain size, you run VMware in your data center. I am just guessing that because I haven’t been to a company that doesn’t. Because we learned a long time ago that using virtual machines in many cases is way more efficient than just running hardware because what happens is we can’t use our compute capacity. So if you are working for a big company or even like a medium sized company, I don’t think – I am not telling you to run for it but I am telling you to at least have someone go look at it and investigate if this could ultimately be something that could make your stack easier to run. [0:39:31.7] DC: I think I am going to take the kind of the operations perspective. I think if you are in the business of coming up with a way to deploy applications on the servers and you are looking at trying to handle the lifecycle of that and you’re pretty fed up with the tooling that is out there and things like Puppet and Chef and tooling like that and you are looking to try and understand is there something in Kubernetes for me? Is there some model that could help me improve the way that I actually handle a lifecycle of those applications, be they databases or monoliths or compostable services? Any which way you want to look at it like are there tools there that can be expressed. Is the API expressive enough to help me solve some of those problems? In my opinion the answer is yes. I look at things like DaemonSet and the things like scheduling [inaudible] that are exposed by Kubernetes. And there is actually quite a lot of power there, quite a lot of capability in just the traditional model of how do I get this set of applications onto that set of servers or some subset they’re in. So I think it is worth evaluating if that is the place you’re in as an organization and if you are looking at fleets of equipment and trying to handle that magical recipe of multiple applications and dependencies and stuff. See what is the water is like on this side, it is not so bad. [0:40:43.1] CC: Yes, I don’t think there is a way to answer this question. It is Kubernetes for me without actually trying it, giving it a try yourself like really running something of maybe low risk. We can read blogposts to the end of the world but until you actually do it and explore the boundaries is what I would say, try to learn what else can you do that maybe you don’t even need but maybe might become useful once you know you can use. Yeah and another thing is maybe if you are a shop that has one or two apps and you don’t need full blown, everything that Kubernetes has to offer and there is a much more scaled down tool that will help you deploy and run your apps, that’s fine. But if you have more, a certain number, I don’t know what that number would be but multiple apps and multiple services just think about having that uniformity across everything. Because for example, I’ve worked in shops where the QA machines were taking care by a group of dev ops people and the production machines, oh my god they were taken care by other groups and now the different group of people and the two sides of these groups used were different and I as a developer, I had to know everything, you know? How to deploy here, how to deploy there and I had to have my little notes and recipes because whenever I did it – First of all I wasn’t doing that multiple times a day. I had to read through the notes to know what to do. I mean just imagine if it was one platform that I was deploying to with the CLI comments there, it is very easy to use like Kubernetes has, gives us with Kubes ETL. You know you have to think outside of the box. Think about these other operations that you have that people in your company are going to have to do. How is this going to be taught in the future? Having someone who knows your stack because your stack is the same that people in your industry are also using. I think about all of these things not just – I think people have to take it across the entire set of problems. [0:43:01.3] BL: I wanted to mention one more thing and this is we are producing lots of content here with The Podlets and with our coworkers. So I want to actually give a shout out to the TGIK. We want to know what you can do in Kubernetes and you want to have your imagination expanded a little bit. Every Friday we make a new video and actually funny enough, three fourths of the people on this call have actually done this. Where, on Friday, we pick a topic and we go in and it might be something that would be interesting to you or it might not and we are all over the place. We are not just doing applications but we are applications low level, mapping applications on Kubernetes, new things that just came out. We have been doing this for a 101 episodes now. Wow. So you can go look at that if you need some examples of what things you could do on Kubernetes. [0:43:51.4] CC: I am so glad to tgik.io maybe somebody, an English speaker should repeat that because of my accent but let me just say I am so glad you mentioned that Brian because I was sitting here as we are talking and thinking there should be a catalog of used cases of what Kubernetes can do not just like the rice and beans but a lot of different used cases, maybe things that are unique that people don’t think about to use because they haven’t run into that need yet. But they could use it as a pause, okay that would enable me to do these thing that I didn’t even think about. That is such a great catalog of used cases. It is probably the best resource. Somebody say the website again? Duffy what is it? [0:44:38.0] DC: tgik.io and it is every Friday at 1 PM Pacific. [0:44:43.2] CC: And it is live. It’s live and it’s recorded, so it is uploaded to the VMware Cloud Native YouTube and everything is going to be on the show notes too. [0:44:52.4] DC: It’s neat, you can come ask us questions there is a live chat inside of that and you can use that live chat. You can ask us questions. You can give us ideas, all kinds of crazy things just like you can with The Podlets. If you have an idea for an episode or something that you want us to cover or if you have something that you are interested in, you can go to thepodlets.io that will link you to our GitHub pages where you can actually open an issue about things you’d love to hear more about. [0:45:15.0] JR: Awesome and then maybe on that note, Podlets, is there anything else you all would like to add on “Should I Kubernetes?” or do you think we’ve – [0:45:22.3] BL: As best as our bias will allow it I would say. [0:45:27.5] JR: As best as we can. [0:45:27.9] CC: We could go another hour. [0:45:29.9] JR: It’s true. [0:45:30.8] CC: Maybe we’ll have “Should I Kubernetes?” Part 2. [0:45:34.9] JR: All right everyone, well that wraps it up for at least Part 1 of “Should I Kubernetes?” and we appreciate you listening. Thanks so much. Be sure to check out the show notes as Duffy mentioned for some of the articles we read preparing for this episode and TGIK links and all that good stuff. So again, I am Josh Russo signing out, with us also Carlisia Campos. [0:45:55.8] CC: Bye everybody, it was great to be here. [0:45:57.7] JR: Duffy Coolie. [0:45:58.5] DC: Thanks you all. [0:45:59.5] JR: And Brian Lyles. [0:46:00.6] BL: Until next time. [0:46:02.1] JR: Bye. [END OF EPISODE] [0:46:03.5] ANNOUNCER: Thank you for listening to The Podlets Cloud Native Podcast. Find us on Twitter at https://twitter.com/ThePodlets and on the http://thepodlets.io/ website, where you'll find transcripts and show notes. We'll be back next week. Stay tuned by subscribing. [END]See omnystudio.com/listener for privacy information.
#39: Is it possible that the biggest contribution from the Kubernetes project isn't container scheduling, but the Kubernetes API itself? Transcript: https://www.devopsparadox.com/39#transcript Books and Courses: https://www.devopstoolkitseries.com/ Canary Deployments To Kubernetes Using Istio and Friends is still on sale for $13.99 with the link below (coupon and price expires 27-Jan-2020 10:01 AM PT) https://www.udemy.com/course/canary-deployments-to-kubernetes-using-istio-and-friends/?couponCode=7F311AD2C040117054AB Review the podcast on Apple Podcasts: https://www.devopsparadox.com/review-podcast Leave us a message on Voxer: https://web.voxer.com/u/devopsparadox Find our contact information at: https://www.devopsparadox.com/contact
A warm welcome to John Harris who will be joining us for his first time on the show today to discuss our exciting topic, CI and CD in cloud native! CI and CD are two terms that usually get spoken about together but are actually two different things entirely if you think about them. We begin by getting into exactly what these differences are, highlighting the regulatory aspects of CD in contrast to the future-focussed nature of CI. We then move on to a deep exploration of their benefits in optimizing processes in cloud native space through automation and surveillance from development to production environments. You’ll hear about the benefits of automatic building in container orchestration, the value of make files and local test commands, and the evolution of CI from its ‘rubber chicken’ days with Martin Fowler and Jez Humble. We take a deep dive into the many ways that containers differ from regular binary as far as deployment methods, build speed, automation, run targets, realtime reflections of changes, and regulation. Moreover, we talk to the challenges of transitioning between testing and production environments, getting past human error through automation, and using sealed secrets to manage clusters. We also discuss the benefits and drawbacks of different CI tools such as Kubebuilder, Argo, Jenkins X, and Tekton. Our conversation gets wrapped up by looking at some of the exciting developments on the horizon of CI and CD, so make sure to tune in! Follow us: https://twitter.com/thepodlets Website: https://thepodlets.io Feeback: info@thepodlets.io https://github.com/vmware-tanzu/thepodlets/issues Hosts: Bryan Liles Nicholas Lane Key Points From This Episode: • The difference between CI and CD.• Understanding the meaning of CD: ‘continuous delivery’ and ‘continuous deployment’.• Building an artifact that can be deployed in the future is termed ‘continuous integration’.• The benefits of continuous integration for container orchestration: automatic building.• What to do before starting a project regarding make files and local test commands.• Kubebuilder is a tool that scaffolds out the creation of controllers and web hooks.• Where CI has got to as far as location since its ‘rubber chicken’ co-located days.• The prescience of Martin Fowler and Jez Humble regarding continuous integration.• The value of running tests in a CI process for quality maintenance purposes.• What makes containers great as far as architecture, output, deployment, and speed.• The benefits of CD regarding deployment automation, reflection, and regulation.• Transitioning between testing and production environments using targets, clusters, pipelines.• Getting past human error through automation via continuous deployment.• What containers mean for the traditional idea of environments.• How labeling factors into the simplicity of transitioning from development to production.• What GitOps means for keeping track of changes in environments using tags.• How sealed secrets stop the need to change an app when managing clusters.• The tools around CD and what a good CD system should look like.• Using Argo and Spinnaker to take better advantage of hardware.• How JenkinsX helps mediate YAML when installing into clusters.• Why the customizable nature of CI tools can be seen as negative.• The benefits of using cloud native-built tools like Tekton.• Perspectives on what is missing in the cloud native space.• A definition of blue-green deployments and how they operate in service meshes.• The business abstraction elements of CI tools that are lacking.• Testing and data storage-related aspects of CI/CD that need to be developed. Quotes: “With the advent of containers, now it’s as simple as identifying the images you want and basically running that image in that environment.” — @bryanl [0:18:32] “The whole goal whenever you’re thinking about continuous delivery or continuous deployment is that any human intervention on the actual moving of code is a liability and is going to break.” — @bryanl [0:21:27] “Any time you’re in developer tooling, everyone wants to do something slightly differently. All of these tools are so tweak-able that they become so general.” — @johnharris85 [0:34:23] Links Mentioned in Today’s Episode: John Harris — https://www.linkedin.com/in/johnharris85/Jenkins — https://jenkins.io/CircleCI — https://circleci.com/Drone — https://drone.io/Travis — https://travis-ci.org/GitLab — https://about.gitlab.com/Docker — https://www.docker.com/Go — https://golang.org/Rust — https://www.rust-lang.org/Kubebuilder — https://github.com/kubernetes-sigs/kubebuilderMartin Fowler — https://martinfowler.com/Jez Humble — https://continuousdelivery.com/about/David Farley — https://dfarley.com/index.htmlAMD — https://www.amd.com/enIntel — https://www.intel.com/content/www/us/en/homepage.htmlWindows — https://www.microsoft.com/en-za/windowsLinux — https://www.linux.org/Intel 386 — http://www.computinghistory.org.uk/det/6192/Introduction-of-Intel-386/386SX — https://www.computerworld.com/article/2475341/flashback--remembering-the-386sx.html386DX — https://en.wikipedia.org/wiki/Intel_80386Pentium — https://www.intel.com/content/www/us/en/products/processors/pentium.htmlAMD64 — https://www.webopedia.com/TERM/A/AMD64.htmlARM — https://en.wikipedia.org/wiki/ARM_architectureTomcat — http://tomcat.apache.org/Netflix — https://www.netflix.com/za/GitOps — https://www.weave.works/technologies/gitops/Weave — https://www.weave.works/Argo — https://www.intuit.com/blog/technology/introducing-argo-flux/Spinnaker — https://www.spinnaker.io/Google X — https://x.company/Jenkins X — https://jenkins.io/projects/jenkins-x/YAML — https://yaml.org/Tekton — https://github.com/tektonCouncourse CI — https://concourse-ci.org/ Transcript: EPISODE 11 [INTRODUCTION] [0:00:08.7] ANNOUNCER: Welcome to The Podlets Podcast, a weekly show that explores Cloud Native one buzzword at a time. Each week, experts in the field will discuss and contrast distributed systems concepts, practices, tradeoffs and lessons learned to help you on your cloud native journey. This space moves fast and we shouldn’t reinvent the wheel. If you’re an engineer, operator or technically-minded decision maker, this podcast is for you. [EPISODE] [00:00:41] BL: Back to the Kubelets Podcast, episode 11. I’m Bryan Liles, and today we have Nicholas Lane. [00:00:50] NL: Hello! [00:00:51] BL: And joining us for the first time, we have John Harris. [00:00:55] JH: Hey everyone. How is it going? [00:00:56] BL: All right! So today we’re going to talk about CI and CD in cloud native. I want to start this off with this whole term CI and CD. We talk about them together, that are two different things almost entirely if you think about them. But CI stands for continuous integration, and then we have CD. What does CD stand for? [00:01:19] NL: Compact disk. [00:01:20] BL: Right. True, and actually I’ve used that term before. I actually do agree. But what else does CD stand for? [00:01:28] NL: It’s continuous deployment right? [00:01:30] BL: Yeah, and? [00:01:31] JH: Continuous delivery. [00:01:32] NL: Oh! I forgot about that one. [00:01:35] BL: Yeah, that’s the interesting thing, is that as we talk about tech and we give things acronyms, CD is just a great one. Change in directories, compact disk, continuous delivery and continuous deployment. Here’s the bonus question, does anyone here know the difference between continuous delivery and continuous deployment? [00:01:58] NL: Now that’s interesting. [00:01:59] JH: I would go ahead and say continuous delivery is the ability to move changes through the pipeline, but you still have the ability to do human intervention at any stage, and usually deployments production and continuous delivery would be a business decision, whereas continuous deployment is no gating and everything just go straight to product. [00:02:18] BL: Oh, John! Gold start for you, because that is one of the common ones. I just like to bring that up because we always talk about CI and CD as they are just one thing, but they’re actually way bigger topics and we’ve already introduced three things here. Let’s start at the beginning and let’s talk about continuous integration, a.k.a CI. I’ll start off. We have CI, and what is the goal of CI? I think that we always get boggled down with tech terms and all these technology and all these packages from all these companies. But I’d like to boil CI down to one simple thing. The process of continuous integration is to build an artifact that can be deployed somewhere at some future date at some future time by some future person, process. Everything else is a detail of the system you choose to use. Whether you use Jenkins, or CircleCI, or Drone, or you built your own thing, or you’re using Travis, or any of the other online CI tools. At the end of the day, you’re building either – If you’re doing web development. Maybe you’re building out Docker files, because we’re in cloud native. I mean docker images, because we’re in cloud native. But if you’re not, maybe you’re just building JARs, WARs, or EARs, or a ZIP file, or a binary, or something. I’d just like to start off, start this off with there. Any more thoughts on continuous integration? [00:03:48] NL: Yeah. I think the only times that I’ve ever used something that’s like continuous integration is when I’ve been doing like more container orchestration, like development, things on top of like things like Kubernetes, for instance. The thing I really like about it is like the concept of being able to like, from my computer, save and do an automatic save and push to a local repo and have all of the pieces get built for me automatically somewhere else, and I just love that so much because it saves so much brain thinky juice to run every command to make the binary you need. [00:04:28] BL: So did you actually create those scripts yourself? [00:04:30] NL: Some of them. When I’ve used things like GitLab, I use the pipeline that exists there and just fiddled around with like a little bit of code, like some bash there, but like not too much because GitLab has a pretty robust pipeline. Travis — I don’t think I needed to actually. Travis had a pretty good just go make Docker build, scripts already templated out for you. [00:04:53] JH: Yeah. I’d like to tell people whenever you start any project, whether it’s big or small, especially if it’s on – Not on Windows. I’ll tell you something different if it’s on Windows. But if you’re developing on a Mac or developing on Linux, the first thing you should do in your project is create a make file or your programming language equivalent of a make file, and then in that make file what you should do is write a command that will build your software that runs its tests locally, and also builds – whatever the process is. I mean, if you’re running in Go, you do a Go build. If you’re using Rust, build with Rust, or C++, or whatever before you even write any code. The reason why is because the hardest part is making your code build, and if you leave that to the end, you’re actually making it harder on yourself. If your code build works from the beginning, all you have to do is change it to fit what you’re doing rather than thinking about it when it’s crunch time. [00:05:57] NL: I actually ran into that exact scenario recently, because I’ve been building some tooling around some Kubernetes stuff, and the first one I did, I built it all manually by hand. Then at the end I was like – I gave it to the person who wanted it and they’re like, “So, where’s the make file?” I’m like, “Where’s the what?” So I had go in and like fill in the make file, and that was a huge pain in the butt. Then recently the other thing I’ve been using is Kubebuilder. John, you and I have been talking about Kubebuilder quite a bit, but using Kubebuilder, and one of the things it does for you is it scaffolds out and a make file for you, and that was like going from me doing it by myself to having it already exist for you or just having it at the beginning was so much better. I totally agree with you, Brian. [00:06:42] BL: So quick point of order here. For those of us who don’t know what Kubebuilder is. What is Kubebuilder? [00:06:48] NL: Kubebuilder is a tool that was created by members of the Kubernetes Community to scaffold out the creation of controllers and web hooks. What a controller is in Kubernetes is a piece of software that waits, sort of watches a specific object or many specific objects and reconciles them. If they noticed that something has changed and you want to make an action based on that change, the controller does that for you. [00:07:17] JH: Okay. So it actually makes the action of working with CRDs and Kubernetes much easier than creating it all yourself. [00:07:26] NL: Correct. Yeah. So, for instance, the one that I made for myself was a tool that watched, updated and watched a specific CRD, but it wasn’t necessarily a controller. It was just like flagging on whether or not a change occurred, and I used the dynamic client, and that was a huge headache on of itself. Kubebuilder has like the ability to watch not just CRDs, but any object in Kubernetes and then reconcile them based on changes. [00:07:53] NL: It’s pretty great. [00:07:54] BL: All right. So back to CI. John, do you have any opinions on CI or anecdotes or anything like that? [00:07:59] JH: Yeah. I think one of the interesting things about the original kind of philosophy of CI outside of tooling was like trunk-based development that every develop changes get integrated into trunk as soon as possible. You don’t get into integration hell and rebasing. I guess it’s kind of interesting when you apply that to a cloud native landscape where like when that stuff came out with like Martin Fowler or Jez Humble probably 10, 15 years ago almost now, a lot of dev teams were co-located. You could do CI. I think there was a rubber chicken method where you didn’t use a tool. It was just whoever had the chicken that’s responsible for the build. Just to pull everyone else’s changes. But now it seems like everything is branch-based. When you look at a project like Kubernetes, there’s a huge number of contributors all geographically displaced, different time zones, lots of different branches and features going on at the same time. It’s interesting how these original principles of continuous integration from the beginning now apply to these huge projects in the cloud native landscape. [00:08:56] BL: Yeah, that’s actually a great point of how prescient Martin Fowler has been for many, many years, and even with Jez Humble being able to see these problems 10, 15 years ago and be able to describe them. I believe Jez Humble wrote the CD book, the continuous delivery book. [00:09:15] JH: Yeah, with David Farley, I think. [00:09:18] NL: Yeah. Yeah, he did. So, John, you brought up some good things about CI. I try to simplify everything. I think the mark of someone who really knows what they’re talking about is being able to explain everything in the simplest words possible, and then you can work backwards when people understand. I started off by saying that CI produces an artifact. I didn’t talk about branches or anything like that, or even the integration piece. But now let’s go into that a little bit. There are a lot of misconceptions about CI in general, but one of the things that we talk about is that you have to run test. No, you don’t have to run test, but should you? Yes, 100% of the time. Your CI process, your integration process should actually build your software and run the test, because running the test on this dedicated service or hardware wherever it is ensures that the quality of your software is there at least as much as your developers have insured the quality in the test. It’s very important those run, and a lot of bugs of course can be spotted by running a CI. I mean, we are all sorts of developers here, and I tell you what, sometimes I forget to run the test locally and CI catches me before a commit makes it into master and it has a huge typo or a whole bunch of print lines in there. Moving on here, thinking about CI and cloud native. Whenever you’re creating a cloud native app, have you ever thought about the differences between let’s say creating just a regular binary that maybe runs on a server, but not in a container on somebody’s cloud native stack, i.e. Kubernetes? Have you ever thought about the differences of things to think about? [00:11:04] BL: Yeah. So part of it is – I would imagine or I believe it’s like things like resource, like what resources you need or what architecture you’re deploying into. You need the binary to make like run in this – With containerization, it’s easy because you’re like, “I know that the container is going to be this architecture,” but you can’t necessarily guarantee that outside of a containerized world. I mean, I suppose you can being like with the right tooling setup you can be like, “I only want to run on this.” But that isn’t necessarily guaranteed, because any computer that runs on could be just whatever architecture that happens to land on, right? Also, something to – I think of is like how do you start processes on disparate computers in a controlled fashion? Something like, again, with containers, you can trust that the container runtime will run it for you. But without that, it seems like a much harder task. [00:12:01] NL: Yeah, I would agree. Then I said that containers in general just help us out, because most of our workloads go on some AMD or Intel 64 bit and it’s Linux. We know what our output is going to be. So it’s not like in the old days where you had to actually figure out what your run target was. I mean, that’s even on Intel stacks. I mean, I’m updating myself here where you had like – When the 386 was out and then you had the 386SX and the 386DX, there were different things there, and you actually compile your code different. Then when the 46 came out and then when we had introduction of Pentium chips, things were different. But now we can pretty much all target AMD64, and in some cases, I mean, there are some chip things like the bigger encryption things that are in the newer chips. But for the most part, we know what our deployed target is going to be. But the cool thing is also that we don’t have to have Intel or AMD64. It could be ARM32 or ARM64, and with the addition to a lot of the work that has been going on in Windows land lately, we can have Windows images. I don’t know so many people were doing that yet. I’m not out and part of the field, but I like that the opportunity is there. [00:13:25] JH: Oh! I think one of the interesting things is the deployment method as well. Now with containers, everything is kind of an immutable rip and replace. Like if we develop an application, we know that the old container is going to stop when I deploy a new one. I think Netflix were doing a little bit of this before containers and some other folks with like baking AMIs and using that immutable method. But I think before that it was if we had a WAR file, we had to throw it back into Tomcat, let Tomcat pick it up or whatever. Everything was a little bit more flaky in terms of deployment. We had to do a lot of checks around deployment rather than just bring something out, bring something back in blue/green, whatever. [00:13:59] BL: Well, I actually like that you brought that up, because that’s actually one of the greatest parts of this whole cloud native thing, is that when we’re using containers and we’re deploying with containers, we know what our file system is going to look like, because we created it. There would not be some rogue file or another configuration there that will trip up our deployment, because at build time, we’ve created the environment. It’s much better than that facility that Netflix was doing with baking AMIs. In a previous life, I actually ran the facility for baking AMIs at a large company where we had thousands of developers on more than a thousand dev teams, and we had a lot of spyware. Whenever you had to build an image, it was fine in one account, but if you had let’s say a thousand accounts with the way that AWS works and encrypted images, you actually had to copy all the images to all the accounts. It couldn’t actually boot it from your account. That process would literally take all night to get it done across all of our accounts. If you made a mistake, guess what? You get to do it again. So I am glad that we actually have this thing called a container and all these things based on CRI, the container runtime, that we are able to quickly build containers. I don’t want to just limit this conversation to continuous integration. Let’s get into the other parts too with deployment and delivery. What is so novel about CD and the cloud native world? [00:15:35] NL: I think to me it’s the ability to have your code or your artifact or whatever it is, whatever you’re working on. When you make a change, you can see the change reflected in reality, whatever your reality looks like, without your intervention. I mean, you might have had to set up all the pipelines and all that jargon, but when you press save in VS code and it creates a branch and runs all your tests and then deploys it for you or delivers it for you into what you’d define as reality, that’s just so nice, because it really kind of sucks having to do the like, “Okay, I’ve got a new deployment. Destroy the old deployment. Put in the new one or like rev the new image tag or whatever in the deployment you’re doing.” All these manual steps, again, thinky-brain juice, it takes pieces of your attention away, and having these pieces like added for you is just so nice. [00:16:30] BL: Yeah, what do you think, John? [00:16:32] JH: Yeah. I think just something in the state of DevOps we’ve bought one of the best predictors for a company’s success is like cycle time of feature from ideation to production. I think like the faster we can get that cycle – It kind of gets me interested. How long does an application take to build? If it takes two hours, how good are you at getting features out there quickly? Maybe one of the drivers with microservices, smaller pieces independently deployed, we can get features out to production quicker, because I think the name of the game is just about enabling developers to put the decision in the hands of the business to decide when the customer should see that feature. I think the tighter we can make that cycle, the better for everyone. [00:17:14] BL: Oh, no! I agree. I love and hate web services, but what I do like is the idea of making these abstractions smaller, and if the abstractions are smaller, it’s less code. A lot of the languages we use now are faster compiling, let’s say, a large C++ project. That could take literally two hours to compile. But now when we have languages like Go, and Rust is not as fast, but it’s not slow as well. Then we have all of our interpret languages, whether it’d be Python, or JavaScript, or TypeScript, where we can actually go from an idea, run the test in a few minutes and build this image that we can actually run and see it almost in real-time. Now with the complexity of the tools, I mean, the features that are built in the tools, we can now easily manage multiple deployment environments, because think about before, you would have a dev environment, and that would be the Wild West. That would be literally where it would be awful. You might have to rebuild it every couple of months. Then you would have staging, and then maybe you would have some kind of pre-prod environment just as like your final smoke test, and then you would have your production. Maintaining all the software on all those was extremely hard. But now with the advent of containers, now it’s as simple as identifying the images you want and basically running that image in that environment. I like where we’ve ended up. But with all power comes new problems, and just because we can deploy quicker means we just run into a lot of different problems we didn’t run into before. The first one that I’ll bring up is the complexity. Auto conversion between environments, so moving code between test staging and production. How do we do that? Any ideas before I throw some out there? [00:19:11] NL: I guess you would have different, or maybe the same pipeline but different targets for like if say you’re using something like Kubernetes. You could have one part of your pipeline deploy initially to this Kubernetes context, which points to like one cluster. It’s building up clusters by environment type and then deploying into those, running your tests, see if it runs properly and then switch over to the next context to apply that image tag and that information and then just go down the chain until you go to production. [00:19:44] BL: Well, that’s interesting. One thing I’d like to throw out there, and I’m not advocating any particular product. But the idea of having pipelines for continuous integration and your CD process is great, where you can now have gates and you can basically automate the whole thing. Code goes into CI and we built an artifact, and a message can go out automatically to an approver or not, and that message could say, “Hey! This code is going to be integrated into our trunk or our master branch.” They can either do it themselves manually as a lot of people do or they can actually maybe click on a link or check a checkbox and this gets integrated in. Then what automatically could happen at this point is, and I’ve seen a lot of companies doing this, is now we take that software and we spin up a new whole environment and we just install that software. For that one particular feature that you worked on, you can actually get an automatic environment for that. Then what we can do is we can take that environment itself and we can now merge this maybe into a staging branch or tag it with a staging label, and that automatically gets moved to staging. Depending on how complicated you are, how advanced you are, now you can actually have it go out to your product people or people who make decisions, maybe your executives, and they can view the software in whatever context it happens to be in. Then they can say, “Okay.” Now that’s when we’re talking about now we can hit okay and the software just keeps on moving to the pipeline and it gets into production. The whole goal here, and this is actually where your goal should be just in general whenever you’re thinking about continuous delivery or continuous deployment is that any human intervention on the actual moving of code is a liability and is going to break, and it’s going to break because on Friday afternoon at 5:25 PM, someone’s thinking about the weekend and they’re not thinking about code, and they’re going to break your build. Our goal is to build these delivery systems that are Friday afternoon proof. We can push code anytime. It doesn’t matter. We trust our process. [00:22:03] JH: I think it’s a great point about environments. I think back in the day, an environment used to be a set of machines, and then test used to be – staging was where there were kind of more stable versions of APIs and folks were more coordinated pushing things into them. What really is an environment? Like you said, when we push micro services or whatever service, we can spin up an entire Kubernetes cluster just for that service. We can set it up. We can run whatever tests we want. We could tear it down. With the advent of Elastic compute, and now containers, they really enabled this world where like the traditional idea of an environment and what constitutes an environment is starting to get a bit kind of sloppy and blend into each other. [00:22:42] BL: I like it though. I think it’s progress. [00:22:45] NL: I totally agree. The one that scares me but I also find like really interesting, is the idea of having all of your environments in one set of machines. So clusters. Having a multi-tenanted set of machines for like dev staging and production, they’re all running in the same place and they’re all just separated by like what configuration of like connectivity from different networking and things like that set up. When a user hits your website, bryanliles.com, they should go to the production images, but those are binaries, and those binaries should be running in the same space essentially as the development ones. It’s scary, but it’s also like allows for like some really fast testing and integration. I find it to be very fascinating. [00:23:33] BL: I mean that’s where we want to be. I find more often than not that people have separate clusters for dev and staging and production. But using the Kubernetes API, you don’t have to do that, because what we can do is we can force deployment or workload to a set of machines based on their label. That’s actually one of the very strong positives for Kubernetes. Forget all the complexity. One of the things that makes it easy is to say that I want this particular deployment to only live on my development machines. Well, which development machine? I don’t care. What if we increase our development pool size? We just re-label nodes. It doesn’t matter. Now we can just control that. When it comes down to controlling cost and complexity, this is actually one idea that Kubernetes is leading and just making it easier to actually use more of your hardware. [00:24:31] NL: Yeah. Absolutely. That’s so great because if you think about it from a CI/CD standpoint, at that point all you have to do is just change the label to where you’re applying this piece of code. So you’re like, “Node selector, label equals dev. Okay, now it’s staging. Okay, now it’s prod.” [00:24:47] BL: So this brings me into the next part of what I want to talk about or introduce to you all today. We’re on a journey as you probably can tell. Now whenever we have our CI process and we’re building and we’re deploying, where do we store our configurations? [00:25:04] NL: [inaudible 00:25:04]. [00:25:06] BL: Ever thought about that? [00:25:08] NL: Okay. I mean, in a Kubernetes perspective, you might be using something like etcd to sort of – But like everything else, what if you’re using Travis? [inaudible 00:25:16] store everything. Everything should be versioned, right? Everything should be – [00:25:20] BL: Yeah, 100%. [00:25:24] NL: I would store everything these as much as possible. Now, do I do that all the time? God, no! Absolutely not. I’m a human being after all. [00:25:32] BL: I mean, that’s what I actually want to bring up, is this concept of GitOps. GitOps was a coined term by my friend, Alexis, who works at Weave. I think Weave created this. Really what it’s about is instead of having – basically, Kubernetes is declarative, and our configurations can be declarative too, because what we can do is make sure is we can have tech space configurations, and for one reason it’s because tech space means it can be versioned. It can be diffs. We take those text versions and we put them in our same repository we put our code in. How do we know what’s in production at any given time or any given time in the past? We just look at the tags of what we did. We had a push at 5:15 on August 13th. Of course, this is 5:15, you could see time, because any other time doesn’t exist in the computer land. So what we could do is we could just basically tag that particular version as like 2019-08-13. If I said 5-17-55, and we call 01 just so we could have 100 deploys in a day. If we started doing that, now not only can we control what we have, but we can also know what was on in any given environment at any given time. Because with Git and with Mercurial and any other of these – Well, only the popular ones, with Git and Mercurial, you can definitely do this. Any given commit can have multiple tags. You could actually have a tag that hit dev and then a tag that, let’s say, hits staging, and then a tag that hit production, the exact same code but three different tags. So you know at any given time what happened. [00:27:18] JH: Yeah, the config thing is so important. I think that was another Jez Humble quote where it was like, “Give me three hours access to your code and I’ll break it. But give me 5 minutes with your configurations and I’ll break it.” Almost like every big bug is, right, someone was accidentally pointing the prod server to the staging database like, “Oops! Their API was pointing to the wrong port, and everything came down,” or we changed the wrong versions or whatever. I think that’s one of the intersections of developers and operations folks. We kind of talked about like Dev Ops and things like that. I really love the idea of everything being kept in Git and using GitOps, but then we’ve got things like secrets and configuration that shouldn’t be seen or being able to be edited by developers, but need to be for ops folks. But we still want to keep the single point of truth. Things like sealed secrets have really enabled us to move along in this area where we can keep everything in text-based version. [00:28:08] BL: All right. Quick point of order here. Sealed secrets is a controller/CRD created by Bitnami. What it allows you do is, John – [00:28:23] JH: It allows you – It creates a CRD, which is sealed secret, which is a special resource type in your cluster and also creates a key, which is only available to that operator running in your cluster. You can submit a sealed secret in plain text or you can submit a secret in plain text and it will throw it back out as an encrypted secret with that key and then you can check that into version control. Then when you go to deploy your software, you can deploy that encrypted secret into the cluster. The operator will pick it up, decrypt it using only the key that it has access to and then put it back in the cluster as a regular secret. Your application just interacts with regular Kubernetes secrets. You don’t need to change your app. They deal with all the encryption outside of the user intervention. [00:29:03] BL: I think the most important part of what you said is that this allows us to have no excuses about what we can store in our repositories for our configuration, because someone is going to make the argument, “No, we can’t store secrets, because someone’s going to be able to see them.” Well, guess what? We never even stored an unencrypted secret in our repository. They’re all encrypted, and it’s still secrets. It’s [inaudible 00:29:25]. I don’t know if anyone’s cracked yet. I’m sure maybe a state level actor has thought of it. But for us regular people, even our companies, like even at VMware, or even at Google, they have not done it yet. So it’s still pretty safe. Thinking even further now, and really what I’m trying to paint the picture of is not just how do you do CD, but really what CD could look like and how it can actually make you happy rather than sad. The next item I wanted to think about was tools around CD and creating tools and what does a good continuous delivery system look like. I kind of hinted about this earlier whenever I was talking about pipelines. The ability to take advantage of your hardware, so we’re deploying to let’s say 100 servers. We’re pulling 5 or 6 services to 100 node cluster. We can do those all at once, and what we can do is you want to have a system that can actually run like this. I could think of a couple. From Intuit, there is Argo, and they have Argo CD. There is the tool created by Google and maybe Netflix. I want to have to look that one up. It’s funny, because they quoted – [00:30:40] JH: Spinnaker? [00:30:42] BL: Spinnaker. They quoted me in their book, and I don’t remember their name. I’m sorry anyone from Spinnaker product listening. Once again, not advocating any products, but they have the concept of doing pipelines. Then you also have other things for your projects, like if you’re using open source, Drone. Another X Google – I think it was X-Googler that made this. Basically, they have ways you can do more than one thing at a time. The most important piece about this is not only can you do more than one thing at a time, is that you have a programmatic check that it’ll make sure that you can verify that whatever you did was successful. We deployed to staging or we deployed to our smoke test servers for our smoke test, and that requires our testing people and an executive signoff. They can actually just wait until they get their signoff or maybe if it goes over a day or so, they can actually – It just fails, and now the build is done. But that part is pretty neat. Any other topics over here before I start throwing out more? [00:31:45] NL: I think I just have thoughts on some of the tools that we’ve used. Everyone Jenkins. Jenkins can do anything that you want it to do, but you really have to tighten the screws on it. It is super powerful. It’s kind of like Bash, like Bash scripting. It’s super powerful, but you have to know precisely what you’re doing, otherwise it can really hurt you. Actually, I have used Spinnaker in the past, and I’ve really liked it. It has a good UI, very good pipelines. Easy blue/green or canary deployment mechanism, I thought that was great. I’ve looked at Drone, believe it or not, but Drone is actually pretty cool. Check out Drone. I really liked it. [00:32:25] BL: Well, since we’re throwing out products, Jenkins, does have JenkinsX. I have not given it the full rundown yet. But what I do like about it, and I think everyone should pay attention to this if you’re doing a product in this space, is that when you install JenkinsX, you install it locally to your machine. You basically get this binary called JX, and you then tell JX to install it into your cluster. Instead of just doing kubectl apply-f a whole bunch of YAML, it actually ask you questions and it sets up GitHub repositories or wherever you need these repositories. It sets up [inaudible 00:33:01] spaces for you. There’s no just [inaudible 00:33:05] kubectl apply-f HTTPS: I just owned your system, because that’s actually a problem. Then it solves the YAML sprawl, because YAML and Kubernetes is something that is complained about a lot, but it’s how it’s configured. But it’s also just a detail what we’re supposed to be doing, and we actually work with Joe Beda and I could talk about this all the time, is that the YAML is the implementation, but it’s not the idea. The idea is that we build tools on top of that that create YAML so users have to see less YAML. I think that’s a problem with Jenkins, is that it’s so powerful and they’re like, “Well, we want powerful people or smart people to be able to do smart things. So here you go.” The problem with that is that where do I start? It’s a little daunting. So I do think that they definitely came with the much stronger game with this JX command. Just as a little sidebar, we do it as well with our Valero project, and I think that just speaks, should be like the bar for anything. If you’re installing something into a cluster, you should come up with a command line tool that helps you manage the lifecycle of whatever you’re installing to the operator, YAML, whatever. [00:34:18] JH: I think what’s interesting about the options, this is definitely one area where there’s so much nuance. Any time you’re in developer tooling, everyone wants to do something slightly differently. All of these tools are so tweak-able that they become so general. I think it’s probably one of the criticisms that could be leveraged against Jenkins is that you can do everything, and that’s actually a negative as well as a positive. Sometimes it’s too overwhelming. There are too many ways of doing things. I’m a fan of some of the more kind opinionated tools in that space. [00:34:45] BL: Yeah. I like opinionated tools as well, but the problem that we’re having in this cloud native space is that, yeah, Kubernetes is five-years-old now. We are just getting to the point where we actually understand what a good decision is, because there was a lot of guesses before and we’ve done a lot of things, and some of these have been good ideas, but in some cases they have not been great ideas. Even I ran the project case on it. Great idea on paper, but implementation, it required people to know too many things. We’d learned a lot of lessons from that. That’s what I think we’re going to find out in this space is that we’re going to learn little lessons. I say this project from my last project that I was going to bring up is something that I think has learned some of the lessons. Google sponsors a project called Tekton, and if you go to – It’s like I believe, and they have some continuous delivery stuff in there and they implement pipelines. But the neat part is, and this is actually the best part, it’s actually a cloud native built service. So every step of your delivery process, from creating images, to actually putting them on clusters, is backed by a Docker image or a container, and I think that part is pretty neat. So now you can define your steps. What is your step? Well, you can use one of their pre-baked, run this command, or if you have something special, like the example before I was giving out where you would say that you need an approval, maybe it’s a Slack approval. You send something with Slack and it has a checkbox, check yes if you like me. What we can do now is we can actually control that and it’s easy to write something a little Docker image that can actually make that call and then get the request and then it can move it on. If you’re looking at more of a toolkit full of good ideas, I do think that Tekton has definitely has some lots of industry. People are looking at it and it’s probably the best example of getting it right in the cloud native way. Because a lot of the products we have now are not cloud native. We’re talking about Jenkins. We’re talking about Spinnaker and we talk about Drone and Travis, which is totally a SaaS product. They’re not cloud native. Actually, the neat part about Tekton is that it actually comes with its own controllers and its own CRDs. So you can actually build these things up using your familiar Kubernetes tooling, which means in theory we could actually use the tooling that we are deploying. We can actually control it in the same way as our applications, because it’s just yet another object that goes in our cluster. [00:37:21] NL: That does sound pretty cool. One other that I meant to bring up was Concourse. Have you check out Concourse yet? [00:37:27] BL: CouncourseCI. I have not. I have used it, but never in a way where I would have a big opinion on it. [00:37:34] NL: I’m kind of in the same place. I think it’s a good idea. It seems really neat, but I need to kick the tires a little more. I will say that I really like the UI. The structure of the UI is really nice. Everything makes sense, and anything you can click on like drills into something a bit deeper. I think that’s pretty cool, but it is one of the shout that I went out to as well as like another tool that I’m aware of. [00:37:52] BL: Yeah, that’s pretty interesting. So we’ve gone about 40 minutes now. Let’s actually start winding this down, and the way that I’m going to suggest that we wind this down is thinking about where we are now. What’s missing in this space and what else could we actually be doing in the cloud native space to make this work out better? [00:38:12] NL: I think I’d like to see better structured or better examples of blue-green or canary deployments with tests associated, and that might just be like me not looking hard enough at this problem. But anytime I began looking at blue-green, I get the idea of what someone’s done, but I would love to see some implementation details, or any of these opinionated tools having opinions around blue-green and what they specifically do to test it. I feel like I’m just not seeing that. [00:38:41] BL: With blue-green, blue-green is hard to do in Kubernetes without an external tool, because for everyone, a blue-green deployment is, I have a software deployment and we’ll give it a color. We’ll call it blue, and I have the next version, and we’ll call it green. Really what I can do is I basically have two versions of my application deployed and I can use my load balancer, or in this case, my service to just change the label or the selector in my service and now I can point at at my green from my blue. Then I want to deploy again, I can just deploy another blue and then change my label selector again. The problem with this is that you can do it in Kubernetes, just fine. But out of the box with Kubernetes, you will drop traffic, because guess what? What happens to a connection that was initiated or a session that was initiated on the blue cluster when you went to green? Actually, this is a whole conversation in itself about service meshes and this is actually one of the reasons service mesh is a big topic, because you can do this blue-green, or another example would be Netflix and Redblack, or you get the creative people who are like rainbow deployments, because just having two is not good enough for them. So they want to have any number of deployments going at one time. I agree with that 100%. [00:39:57] JH: I think, yeah, integrating tools like launch. [inaudible 00:40:01] and I think there are more which enable – I think we’re missing the business abstractions on this stuff so far. Like you said, it’s kind of hard to do if you need to go into the gritty of it right now, but I think the business abstractions of if we deploy a different version to a certain subset of customers, can we get all of those metrics? Can we get those traces back in? Will you automate it, roll it out? Can we increase the percentage of customers that are seeing those things? Have that all controlled in a Kubernetes native way, but having roll it up to a business and more of an abstraction. I think that stuff is currently missing. I think the underpinning kind of technologies are coming up, stuff like service mesh, but I think it’s the abstraction that’s really going to make it useful, which doesn’t exist today. [00:40:39] BL: Yeah. Actually, that’s pretty close to what I was going to say. We built all these tooling that helps us basically as technologists, but really what it comes down to is the business. A lot of the things we’re talking about where we’re talking about CD is important to the business, but when we’re talking about metrics or trace collection, that’s not important to the business, because they only care about the SLA. This is on the SLO side. What we really need to do is mature our processes enough that we can actually marry our outputs to something that other people can understand that has no jargon and it’s sales going up, sales going down. Everything else is just a detail. So, anything else? [00:41:20] NL: Something I think I’d like to see is in our testing, if there was a good way to accurately show the effect of something at load in a CI/CD component. Because one of the things that I’ve run into is like I’ve got this great idea for how this code should work and when I deploy it, it works great. The like a thousand people touch it all at once and it doesn’t work right anymore. I’d love to have some tool along the way that can test things out of load and like show me something that I could fix before all those people touch it. [00:41:57] BL: Yes, that would be a good tool to have. So John, anything else for you? [00:42:02] JH: I’ll open a can of worms right at the end and say the biggest problem here is probably going to be data when we have a lot of systems we need to talk to each other and we need the data to align between those systems and we have now proliferation of environments and clusters. Like how do we get that data reliably into the place that it needs to be to make up testing robust enough to get things out there? It’s probably an episode on some – [00:42:23] BL: Yeah, that’s a big conversation that if we could answer it, we wouldn’t working at VMware. We would have our own companies doing all these great things. But we can definitely iterate on it. So with that, I think we’re going to wrap it up. Thanks for listening to the Kubelets. I’m Bryan Liles, and with me today was Nicholas Lane and John – Yeah, and John Harris. [00:42:47] JH: Thanks everyone. [00:42:47] BL: All right, we’ll see you next time. [END OF EPISODE] [00:42:50] ANNOUNCER: Thank you for listening to The Podlets Cloud Native Podcast. Find us on Twitter at https://twitter.com/ThePodlets and on the http://thepodlets.io/ website, where you'll find transcripts and show notes. We'll be back next week. Stay tuned by subscribing. [END]See omnystudio.com/listener for privacy information.
This week on The Podlets Cloud Native Podcast we have Josh, Carlisia, Duffie, and Nick on the show, and are also happy to be joined by a newcomer, Brian Liles, who is a senior staff engineer at VMWare! The purpose of today’s show is coming to a deeper understanding of the meaning of ‘stateful’ versus ‘stateless’ apps, and how they relate to the cloud native environment. We cover some definitions of ‘state’ initially and then move to consider how ideas of data persistence and co-ordination across apps complicate or elucidate understandings of ‘stateful’ and ‘stateless’. We then think about the challenging practice of running databases within Kubernetes clusters, which effectively results in an ephemeral system becoming stateful. You’ll then hear some clarifications of the meaning of operators and controllers, the role they play in mediating and regulating states, and also how important they are in a rapidly evolving but skills-scarce environment. Another important theme in this conversation is the CAP theorem or the impossibility of consistency, availability and partition tolerance all at once, but the way different databases allow for different combinations of two out of the three. We then move on to chat about the fundamental connection between workloads and state and then end off with a quick consideration about how ideas of stateful and stateless play out in the context of networks. Today’s show is a real deep dive offering perspectives from some the most knowledgeable in the cloud native space so make sure to tune in! Follow us: https://twitter.com/thepodlets Website: https://thepodlets.io Feeback: info@thepodlets.io https://github.com/vmware-tanzu/thepodlets/issues Hosts: Carlisia Campos Duffie Cooley Bryan Liles Josh Rosso Nicholas Lane Key Points From This Episode: • What ‘stateful’ means in comparison to ‘stateless’.• Understanding ‘state’ as a term referring to data which must persist.• Examples of stateful apps such as databases or apps that revolve around databases.• The idea that ‘persistence’ is debatable, which then problematizes the definition of ‘state’. • Considerations of the push for cloud native to run stateless apps.• How inter-app coordination relates to definitions of stateful and stateless applications.• Considering stateful data as data outside of a stateless cloud native environment.• Why it is challenging to run databases in Kubernetes clusters.• The role of operators in running stateful databases in clusters.• Understanding CRDs and controllers, and how they relate to operators.• Controllers mediate between actual and desired states.• Operators are codified system administrators.• The importance of operators as app number grows in a skill-scarce environment.• Mechanisms around stateful apps are important because they ensure data integrity.• The CAP theorem: the impossibility of consistency, availability, and tolerance.• Why different databases allow for different iterations of the CAP theorem.• When partition tolerance can and can’t get sacrificed.• Recommendations on when to run stateful or stateless apps through Kubernetes.• The importance of considering models when thinking about how to run a stateful app.• Varying definitions of workloads.• Pods can run multiple workloads• Workloads create states, so you can’t have one without the other.• The term ‘workloads’ can refer to multiple processes running at once.• Why the ephemerality of Kubernetes systems makes it hard to run stateful applications. • Ideas of stateful and stateless concerning networks.• The shift from server to browser in hosting stateful sessions. Quotes: “When I started envisioning this world of stateless apps, to me it was like, ‘Why do we even call them apps? Why don’t we just call them a process?’” — @carlisia [0:02:60] “‘State’ really is just that data which must persist.” — @joshrosso [0:04:26] “From the best that I can surmise, the operator pattern is the combination of a CRD plus a controller that will operate on events from the Kubernetes API based on that CRD’s configuration.” — @bryanl [0:17:00] “Once again, don’t let developers name them anything.” — @bryanl [0:17:35] “Data integrity is so important” — @apinick [0:22:31] “You have to really be careful about the different models that you’re evaluating when trying to think about how to manage a stateful application like a database.” — @mauilion [0:31:34] Links Mentioned in Today’s Episode: KubeCon+CloudNativeCon — https://events19.linuxfoundation.org/events/kubecon-cloudnativecon-north-america-2019/Google Spanner — https://cloud.google.com/spanner/CockroachDB — https://www.cockroachlabs.com/CoreOS — https://coreos.com/Red Hat — https://www.redhat.com/enMetacontroller — https://metacontroller.app/Brandon Philips — https://www.redhat.com/en/blog/authors/brandon-phillipsMySQL — https://www.mysql.com/ Transcript: EPISODE 009 [INTRODUCTION] [0:00:08.7] ANNOUNCER: Welcome to The Podlets Podcast, a weekly show that explores Cloud Native one buzzword at a time. Each week, experts in the field will discuss and contrast distributed systems concepts, practices, tradeoffs and lessons learned to help you on your cloud native journey. This space moves fast and we shouldn’t reinvent the wheel. If you’re an engineer, operator or technically minded decision maker, this podcast is for you. [INTERVIEW] [00:00:41] JR: All right! Hello, everybody, and welcome to episode 6 of The Cubelets Podcast. Today we are going to be discussing the concept of stateful and stateless and what that means in this crazy cloud native landscape that we all work. I am Josh Rosso. Joined with me today is Carlisia. [00:00:59] CC: Hi, everybody. [00:01:01] JR: We also have Duffie. [00:01:03] D: Hey, everybody. [00:01:04] JR: Nicholas. [00:01:05] NL: Yo! [00:01:07] JR: And a newcomer to the podcast, we also have Brian. Brian, you want to give us a little intro about yourself? [00:01:12] BL: Hi! I’m Brian. I work at VMWare. I do lots of community stuff, including sharing the KubeCon+CloudNativeCon. [00:01:22] JR: Awesome! Cool. All right. We’ve got a pretty good cast this week. So let’s dive right into it. I think one of the first things that we’ve been talking a bit about is the concept of what makes an application stateful? And of course in reverse, what makes an application stateless? Maybe we could try to start by discerning those two. Maybe starting with stateless if that makes? Does someone want to take that on? [00:01:45] CC: Well, I’m going to jump right in. I have always been a developer, as supposed to some of you or all of you have who have system admin backgrounds. The first time that I heard the stateless app, I was like, “What?” That wasn’t recent, okay? It was a long time ago, but that was a knot in my head. Why would you have a stateless app? If you have an app, you’re going to need state. I couldn’t imagine what that was. But of course it makes a lot of sense now. That was also when we were more in the monolithic world. [00:02:18] BM: Actually that’s a good point. Before you go into that, it’s a great point. Whenever we start with apps or we start developing apps, we think of an application. An application does everything. It takes input and it does stuff and it gives output. But now in this new world where we have lots of apps, big apps, small apps, we start finding that there’s apps that only talk and coordinate with other apps. They don’t do anything else. They don’t save any data. They don’t do anything. That’s what – where we get into this thing called stateless apps. Apps don’t have any type of data that they store locally. [00:02:53] CC: Yeah. It’s more like when I envision in my head. You said it brilliantly, Brian. It’s almost like a process. When I started envisioning this world of stateless apps, to me it was like, “Why do we even call them apps? Why don’t we just call them a process?” They’re just shifting back data and forth but they’re not – To me, at the beginning, apps were always stateless. They went together. [00:03:17] D: I think, frequently, people think of applications that have only locally relevant stuff that is actually not going to persist to disc, but maybe held in memory or maybe only relevant to the type of connection that’s coming through that application also as stateless, which is interesting, because there’s still some state there, but the premise is that you could lose that state and not lose the functionality of that code. [00:03:42] NL: Something that we might want to dive into really quickly when talking about stateless and stateful apps. What do we mean by the word state? When I first learned about these things, that was what always screwed me up. I’m like, “What do you mean state? Like Washington? Yeah. We got it over here.” [00:03:57] JR: Oh! State. That’s that word. State is one of those words that we use to sound smarter than we actually are 95% of the time, and that’s a number I just made up. When people are talking about state, they mean databases. Yeah. But there are other types of state as well. If you maintain local cache that needs to be persistent, if you have local files that you’re dealing with, like you’re opening files. That’s still state. State really is just that it’s data that must persist. [00:04:32] D: I agree with that definition. I think that state, whether persisted to memory or persisted to disc or persisted to some external system, that’s still what we refer to as state. [00:04:41] JR: All right. Makes sense and sounds about like what I got from it as well. [00:04:45] CC: All right. So now we have this world where we talk about stateless apps and stateful apps. Are there even stateful apps? Do we call a database an app? If we have a distributed system where we have one stateless app over here, another stateless app over there and then we have the database that’s connected to the two of them, are we calling the database a stateful app or is that whole thing – How do we call this? [00:05:15] NL: Yeah. The database is very much a state as an app with state. I’m very much – [00:05:19] D: That’s a close definition. Yeah. [00:05:21] NL: Yeah. Literally, it’s the epitome of a stateful app. But then you also have these apps that talk to databases as well and they might have local data, like data that – they start a transaction and then complete it or they have a long distributed type transaction. Any apps that revolve around a database, if they store local data, whether it’s within a transaction or something else, they’re still stateful apps. [00:05:46] D: Yup. I think you can modify and input data or modify state that has to be persisted in some way I think is a stateful app, even though I do think it’s confusing because of what – As I said before, I think that there are a bunch of applications that we think of, like not everybody considers Spark jobs to be stateful. Spark jobs, for example, are something that would bring data in, mutate that data in some way, produce some output and go away. The definition there is that Spark would generally push the resulting data into some other external system. It’s interesting, because in that model, Spark is not considered to be a stateful app because the Spark job could fail, go away, get recreated, pick up the pieces where it left off or just redo that work until all of the work is done. In many cases, people consider that to be a stateless application. That’s I think is like the crux – In my opinion, the crux of the confusion around what a stateful and stateless application is, is that people frequently – I think it’s more about where you store – what you mean by persistence and how that actually realizes in your application. If you’re pushing your state to an external database, is your application still stateful? [00:06:58] NL: I think it’s a good question, or if you are gathering data from an external source and mutating it in some way, but you don’t need data to be present when you start up, is that a stateful app or a stateless app? Even though you are taking in data, modifying it and checking it, sending out to some other mechanism or serving it in your own way, does that become like a stateless app? If that app gets killed and it comes back and it’s able to recover, is it stateful or stateless? That’s a bit of a gray area, I think. [00:07:26] JR: Yeah. I feel like a lot of the customers I work with, if the application can get killed even if it has some type of local state, they still refer to it as stateless usually, to me at least, when we talk about it because they think, “I can kind of restart this application and I’m not too worried about losing whatever it may have had.” Let’s say cached for simplicity, right? I think that kind of leads us into an interesting question. We’ve talked a lot on this podcast about cloud native infrastructure and cloud native applications and it seems like since the inception of cloud native, there’s always been this push that a stateless app is the best candidate to run or the easiest candidate to run. I’m just curious if we could dive into that for a moment. Why in the cloud native infrastructure area has there always been this push for running stateless applications? Why is it simpler? Those kinds of things. [00:08:15] BL: Before we dive into that, we have to realize – And this is just a problem of our whole ecosystem, this whole cloud native. We’re very hand-wavy in our descriptions for things. There’re a lot of ambiguous descriptions, and state is one of those. Just keep that in mind, that when we’re talking today, we’re really just talking about these things that store data and when that’s the state. Just keep that in mind as you’re listening to this. But when it comes to distributed systems in general, the easiest system is a system that doesn’t need coordination with any other system. If it happens to die, that’s okay. We can just restart it. People like to start there. It’s the easiest thing to start. [00:08:58] NL: Yeah, that was basically what I was going to say. If your application needs to tie into other applications, it becomes significantly more complicated to implement it, at least for your first time and in your system. These small applications that only – They don’t care about anybody else, they just take in data or not, they just do whatever. Those are super easy to start with because they’re just like, “Here. Start this up. Who cares? Whatever happens, it happens.” [00:09:21] CC: That could be a good boundary to define – I don’t want to jump back too far, but to define where is the stateless app to me is part of a system and just say it depends for it to come back up. Does it depend on something else that has state? [00:09:39] BL: I’ll give you an example. I can give you a good example of a stateless app that we use every day, every single one of us, none of us on this call, but when you search Google. You go to google.com and you go to the bar and you type in a search, what’s happening is there is a service at the beginning that collects that search and it federates the search over many different probably clusters of computers so they can actually do the search currently. That app that actually coordinates all that work is a stateless app most likely. All it does is just splits it up and allows more CPUs to do the work. Probably, that goes away. Probably not a problem. You probably have 10 more of them. That’s what I consider stateless. It doesn’t really own any of the data. It’s the coordinator. [00:10:25] CC: Yeah. If it goes down, it comes back up. It doesn’t need to reset itself to the state where it was before. It can truly be considered a stateless because it can just, “Okay. I reset. I’m starting from the beginning from this clear state.” [00:10:43] BL: Yes. That’s a good summary of that. [00:10:45] CC: Because another way to think about stateless – What makes an app stateful app, does it have to be combined or like deployed and shipped together with the part that maintains the state? That’s a more clear cut definition. Then that app is definitely a stateful app. [00:11:05] D: What we frequently talk about in like the cloud native space is like you know that you have a stateless app if you can just create 20 of them and not have to worry about the coordination of them. They are all workers. They are all going to take input. You could spread the load across those 20 in an identical way and not worry about which one you landed on. That’s stateless application. A stateful application is a very different thing. You have to have some coordination. You have to say how many databases can you have on a backend? Because you’re persisting data there, you have to be really careful about that you only write to the master database or to the writing database and you could read of any other memories of that database cluster, that sort of stuff. [00:11:44] CC: It might seem that we are going so deep into this differentiating between stateful and stateless, but this is so important because clusters are usually designed to be ephemeral. Ephemeral means obviously they die down, they are brought back up, the nodes, and you should worry as least as possible with the state of things. Then going back to what Joshua is saying, when we are in this cloud native world, usually we are talking about stateless apps, stateless workloads and then we’re going to just talk about what workload means. But then if that’s the case, where are the stateful apps? It’s like we have this vision that the stateful apps live outside the cloud native world? How does it work? But it’s supposed to work. [00:12:36] BL: Yup. This is the question that keeps a lot of people employed. Making sure my state is available when I need it. You know what? I’m not going to even use that word state. Making sure my data is available wherever I need it and when I need it. I don’t want to go too deep in right now, but this is actually a huge problem in the Kubernetes community in general, and we see it because there’s been lots of advice given, “Don’t run things like databases in your clusters.” This is why we see people taking the ideas of Google Spanner and like CockroachDB and actually going through a lot of work to make sure that you can run databases in Kubernetes clusters. The interesting piece about this is that we’re actually to the point where we can run these types of workloads in our clusters, but with a caveat, big star at the end, it’s very difficult and you have to know what you’re doing. [00:13:34] JR: Yeah. I want to dovetail on that Brian, because it’s something that we see all the time. I feel like when we first started setting up, let’s call them clusters, but in our case it was Kubernetes, right? We always saw that data level always being delegated to like if you’re in Amazon, some service that they hosted and so on. But now I think more and more of the customers that at least I’m seeing. I’m sure Nicholas and Duffie too, they’re interested in doing exactly what you just described. Cockroach is an example I literally just worked with recently, and it’s just interesting how much more thoughtful they have to be about their cluster operations. Going back to what you said Carlisia, it’s not as easy as just like trashing a cluster and instantiating a new one anymore, like they’re used to. They need to be more thoughtful about keeping that data integrity intact through things like upgrades and disaster recover. [00:14:18] D: Another interesting point kind to your point, Brian, is that like, frequently, people are starting to have conversations and concerns around data gravity, which means that I have a whole bunch of data that I need to work with, like to a Spark job, which I mentioned earlier. I need to basically put my compute where that data is. The way that I store that data inside the cluster and use Kubernetes to manage it or whether I just have to make sure that I have some way of bringing up compute workloads close to that data. It’s actually kind of introducing a whole new layer to this whole thing. [00:14:48] BL: Yeah! Whole new layer of work and a whole new layer of complexity, because that’s actually – The crux of all this is like where we slide the complexity too, but this is interesting, and I don’t want to go too far to this one definitely. This is why we’re seeing more people creating operators around managing data. I’ve seen operators who are bringing databases up inside of Kubernetes. I’ve seen operators that actually can bring up resources outside of Kubernetes using the Kubernetes API. The interesting thing about this is that I looked at both solutions and I said, “I still don’t know what the answer is,” and that’s great. That means that we have a lot to learn about the problem, and at least we have some paths for it. [00:15:29] NL: Actually, that kind of reminds me of the first time I ever heard the word stateful or stateless – I’m an infrastructure guy. Was around the discussion of operators, which there’s only a couple of years ago when operators were first introduced at CoreOS and some people were like, “Oh! Well, this is how you now operate a stateful mechanism inside of Kubernetes. This is the way forward that we want to propose.” I was just like, “Cool! What is that? What’s state? What do you mean stateful and stateless?” I had no idea. Josh, you were there. You’re like, “Your frontend doesn’t care about state and your backend does.” I’m like, “Does it? I don’t know. I’m not a developer.” [00:16:10] JR: Let’s talk about exactly that, because I think these patterns we’re starting to see are coming out of the needs that we’re all talking about, right? We’ve seen at least in the Kubernetes community a lot of push for these different constructs, like something called a stateful [inaudible 00:16:21], which isn’t that important right now, but then also like an operator. Maybe we can start by defining what is an operator? What is that pattern and why does it relate to stateful apps? [00:16:31] CC: I think that would be great. I am not clear what an operator is. I know there’s going to be a controller involved. I know it’s not a CRD. I am not clear on that at all, because I only work with CRDs and we don’t define – like the project I worked on with Velero, we don’t categorize it as an operator. I guess an operator uses specific framework that exists out there. Is it a Kubernetes library? I have no idea. [00:16:56] BL: We did it to ourselves again. We’re all doing these to ourselves. From the best that I can surmise, the operator pattern is the combination of a CRD plus a controller that will operate on events from the Kubernetes API based on that CRD’s configuration. That’s what an operator is. [00:17:17] NL: That’s exactly right. [00:17:18] BL: To conflate this, Red Hat created the operator SDK, and then you have [inaudible 00:17:23] and you have a Metacontroller, which can help you build operators. Then we actually sometimes conflate and call CRDs operators, and that’s pretty confusing for everyone. Once again, don’t let developers name anything. [00:17:41] CC: Wait. So let’s back up a little. Okay. There is an actual library that’s called an operator. [00:17:46] BL: Yes. There’s an operator SDK. [00:17:47] CC: Referred to as an operator. I heard that. Okay. Great. But let me back up a little because – [00:17:49] D: The word operator can [00:17:50] CC: Because if you are developing an app for Kubernetes, if you’re extending Kubernetes, you are – Okay, you might not use CRDs, but if you are using CRDs, you need a controller, right? Because how will you do actions? Then every app that has a CRD – because the alternative to having CRDs is just using the API directly without creating CRDs to reflect to resources. If you’re creating CRDs to reflect to resources, you need controllers. All of those apps, they have CRDs, are operators. [00:18:24] D: Yip [inaudible 00:18:25] is an operator. [00:18:26] CC: [inaudible 00:18:26] not an operator. How can you extend Kubernetes and not be qualified [inaudible 00:18:31] operator? [00:18:32] BL: Well, there’s a way. There is a way. You can actually just create a CRD and use a CRD for data storage, you know, store states, and you can actually query the Kubernetes API for that information. You don’t need a controller, but we couple them with controllers a lot to perform action based on that state we’ve saved to etcd. [00:18:50] CC: Duffie. [00:18:51] D: I want to back up just for a moment and talk about the controller pattern and what it is and then go from there to operators, because I think it makes it easier to get it in your head. A control pattern is effectively a way to understand desired state and real state and provide some logic or business code that will allow you to converge those two states, your actual state and your desired state. This is a pattern that we see used in almost everything within a distributed system. It’s like within Kubernetes, within most of the kind of more interesting systems that are out there. This control pattern describes a pretty good way of actually managing application flow across distributed systems. Now, operators, when they were initially introduced, we were talking about that this is a slightly different thing. Operators, when we introduced the idea, came more from like the operational burden of these stateful applications, things like databases and those sorts of stuff. With the database, etcd for example, you have a whole bunch of operational and runtime concerns around managing the lifecycle of that system. How do I add a new member to the cluster? What do I do when a member dies? How do I take action? Right now, that’s somebody like myself waking up at 2 in the morning and working through a run book to basically make sure that that service remains operational through the night. But the idea of an operator was to take that control pattern that we described earlier and make it wake up at 2 in the morning to fix this stuff. We’re going to actually codify the operational knowledge of managing the burden of these stateful applications so that we don’t have to wake up at 2 in the morning and do it anymore. Nobody wants to do that. [00:20:32] BL: Yeah. That makes sense. Remember back at KubCon years ago, I know it was one in Seattle where Brandon Philips was on stage talking about operators. He basically was saying if we think about SysOp, system operators, it was a way to basically automate or capture the knowledge of our system administrators in scripts or in a process or in code a la operators. [00:20:57] D: The last part that I’ll add to this thing, which I think is actually what really describes the value of this idea to me is that there are only so many people on the planet that do what the people in this blog post do. Maybe you’re one of them that listen to this podcast. People who are operating software or operating infrastructure at scale, there just aren’t that many of us on the planet. So as we add more applications, as more people adopt the cloud native regime or start coming to a place where they can crank out more applications more quickly, we’re going to have to get to a place where we are able to automate the burden of managing those applications, because there just aren’t enough of us to be able to support the load that is coming. There just aren’t enough people on the planet that do this to be able to support that. That’s the thing that excites me most about the operator pattern, is that it gives us a place to start. It gives us a place to actually start thinking about managing that burden over time, because if we don’t start changing the way we think about managing that burden, we’re going to run out of people. We’re not going to be able to do it. [00:22:05] NL: Yeah. It’s interesting. With stateful apps, we keep kind of bringing them – coming back to stateful apps, because stateful apps are hard and stateless apps are easy, and we’ve created all these mechanisms around operating things with state because of how just complicated it is to make sure that your data is ready, accessible and has integrity. That’s the big one that I keep not thinking about as a SysOps person coming into the Dev world. Data integrity is so important and making sure that your data is exactly what it needs to be and was the last time you checked it, is super important. It’s only something I’m really starting to grasp. That’s why I was like these things, like operators and all these mechanisms that we keep creating and recreating and recreating keep coming about, because making sure that your stateful apps have the right data at the right time is so important. [00:22:55] BL: Since you brought this up, and we just talked about why a state is so hard, I want to introduce the new term to this conversation, the whole CAP theorem, where data would typically be – in a distributed system at least, your data will be consistent or your data can be available, or if your distributed systems falls in multiple parts, you can have partition tolerance. This is one of those computer science things where you can actually pick two. You can have it be available and have partition tolerance, but your data won’t be consistent, or you can have consistency and availability, but you won’t have partition tolerance. If your cluster splits into two for some reason, the data will be bad. This is why it’s hard, this is why people have written basically lots of PhD dissertations on this subject, and this is why we are talking about this here today, is because managing state, and particularly managing distributed, is actually a very, very hard problem. But there’s software out there that will help us, and Kubernetes is definitely part of that and stateful sets are definitely part of that as well. [00:24:05] JR: I was just going to say on those three points, consistently, availability and partition tolerance. Obviously, we’d want all three if we could have them. Is there one that we most commonly tradeoff and give up or does it go case-by-case? [00:24:17] BL: Actually, it’s been proven. You can’t have all three. It’s literally impossible. It depends. If you have a MySQL server and you’re using MySQL to actually serve data out of this, you’re going to most likely get consistency and availability. If you have it replicated, you might not have partition tolerance. That’s something to think about, and there are different databases and this is actually one of the reasons why there are different databases. This is why people use things like relational databases and they use key value stores not because we really like the interfaces, but because they have different properties around the data. [00:24:55] NL: That’s an interesting point and something that I had recently just been thinking about, like why are there so many different types of databases. I just didn’t know. It was like in only recently heard of CAP theorem as well just before you mentioned it. I’m like, “Wow! That’s so fascinating.” The whole thing where you only pick two. You can’t get three. Josh, to kind of go back to your question really quickly, I think that partition tolerance is the one that we throw away the most. We’re willing to not be able to segregate our database as much as possible because C and A are just too important, I think. At least that’s what I’m saying, like I am wearing an [inaudible 00:25:26] shirt and [inaudible 00:25:27] is not partition tolerant. It’s bad at it. [00:25:31] BL: This is why Google introduced Spanner, and Spanner in some situations can get free with tradeoffs and a lot of really, really smart stuff, but most people can’t run this scale. But we do need to think about partition tolerance, especially with data whenever – Let’s say you run a store and you have multiple instances across the world and someone buys something from inventory, what is your inventory look like at any particular point? You don’t have to answer my question, of course, but think about that. These are still very important problems if fiber gets cut across the Atlantic and now I’ve sold more things than I have. Carlisia, speaking to you as someone who’s only been a developer, have you moved your thoughts on state any further? [00:26:19] CC: Well, I feel that I’m clear on – Well, I think you need to clarify your question better for me. If you’re asking if I understand what it means, I understand what it means. But I actually was thinking to ask this question to all of you, because I don’t know the answer, if that’s the question you’re asking me. I want to put that to the group. Do you recommend people, as in like now-ish, to run stateful workloads? We need to talk about workloads mean. Run stateful apps or database in sites if they’re running a Kubernetes cluster or if they’re planning for that, do you all as experts recommend that they should already be looking into doing that or they should be running for now their stateful apps or databases outside of the cloud native ecosystem and just connecting the two? Because if that’s what your question was, I don’t know. [00:27:21] BL: Well, I’ll take this first. I think that we should be spending lots of more time than we are right now in coming up community-tested solutions around using stateful sets to their best ability. What that means is let’s say if you’re running a database inside of Kubernetes and you’re using a stateful set to manage this, what we do need to figure out is what happens when my database goes down? The pod just kills? When I bring up a new version, I need to make sure that I have the correct software to verify integrity, rebuilt things, so that when it comes back up, it comes back up correctly. That’s what I think we should be doing. [00:27:59] JR: For me, I think working with customers, at least Kubernetes-oriented folks, when they’re trying to introduce Kubernetes as their orchestration part of their overall platform, I’m usually just trying to kind of meet them where they’re at. If they’re new to Kubernetes and distributed systems as a whole, if we have stateless, let’s call them maybe simpler applications to start with, I generally have them lean into that first, because we already have so much in front of us to learn about. I think it was either Brian or Duffie, you said it introduces a whole bunch more complexity. You have to know what you’re doing. You have to know how to operate these things. If they’re new to Kubernetes, I generally will advise start with stateless still. But that being said, so many of our customers that we work with are very interested in running stateful workloads on Kubernetes. [00:28:42] CC: But just to clarify what you said, Josh, because you spoke like an expert, but I still have beginner’s ears. You said something that sounded to me like you recommend that you go stateless. It sounded to me like that. What you really say is that they take out the stateless part of what they have, which they might already have or they might have to change and put the stateless. You’re not suggesting that, “Oh! You can’t do stateful anymore. You need to just do everything stateless.” What you’re saying is take the stateless part of your system, put that in Kubernetes, because that is really well-tested and keep the stateful outside of that ecosystem. Is that right? [00:29:27] JR: I think that’s a better way to put it. Again, it’s not that Kubernetes can’t do stateful. It’s more of a concept of biting off more than you can chew. We still work with a lot of people who are very new to these distributed systems concepts, and to take on running stateful workloads, if we could just delegate that to some other layer, like outside of the cluster, that could be a better place to start, at least in my experience. Nicholas and Duff might have different – [00:29:51] NL: Josh, you basically nailed it like what I was going to say, where it’s like if the team that I’m working with is interested in taking on the complexity of maintaining their databases, their stateful sets and making sure that they have data integrity and availability, then I’m all for them using Kubernetes for a stateful set. Kubernetes can run stateful applications, but there is all this complexity that we keep talking about and maintaining data and all that. If they’re willing to take on their complexity, great, it’s there for you. If they’re not, if they’re a little bit kind of behind as – Not behind, but if they’re kind of starting out their Kubernetes journey or their distributed systems journey, I would recommend them to move that complexity to somebody else and start with something a little bit easier, like a stateless application. There are a lot of good services that provide data as a service, right? You’ve got dataview as RDS is great for creating stateful application. You can leverage it anytime and you’ve got like dedicated wires too. I would point them to there first if they don’t want to take on like complexity. [00:30:51] D: I completely agree with that. An important thing I would add, which is in response to the stateful set piece here, is that as we’ve already described, managing a stateful application like a database does come with some complexity. So you should really carefully look at just what these different models provide you. Whether that model is making use of a stateful set, which provides you like ordinality, ensuring that things start up in a particular order and some of the other capabilities around that stuff. But it won’t, for example, manage some of the complexity. A stateful set won’t, for example, try and issue a command to the new member to make sure that it’s part of an existing database cluster. It won’t manage that kind of stuff. So you have to really be careful about the different models that you’re evaluating when trying to think about how to manage a stateful application like a database. I think because it’s actually why the topic of an operator came up kind of earlier, which was that like there are a lot of primitives within Kubernetes in general that provide you a lot of capability for managing things like stateful applications, but they may not entirely suit your needs. Because of the complexity with stateful applications, you have to really kind of be really careful about what you adopt and where you jump in. [00:32:04] CC: Yeah. I know just from working with Velero, which is a tool for doing backup and recovery migration of Kubernetes clusters. I know that we backup volumes. So if you have something mounted on a volume, we can back that up. I know for a fact that people are using that to backup stateful workloads. We need to talk about workloads. But at any case, one thing to – I think one of you mentioned is that you definitely also need to look at a backup and recovery strategy, which is ever more important if you’re doing stateful workloads. [00:32:46] NL: That’s the only time it’s important. If you’re doing stateless, who cares? [00:32:49] BL: Have we defined what a workload is? [00:32:50] CC: Yeah. But let me say something. Yeah, I think we should do an episode on that maybe, maybe not. We should do an episode on GitOps type of thing for related things, because even though you – Things are stateless, but I don’t want to get into it. Your cluster will change state. You can recover in stuff from like a fresh version. But as it goes through a lifecycle, it will change state and you might want to keep that state. I don’t know. I’m not the expert in that area, but let’s talk about workloads, Brian. Okay. Let me start talking about workloads. I never heard the term workload until I came into the cloud native world, and that was about a year ago or when they started looking in this space more closely. Maybe a little bit before a year ago. It took me forever to understand what a workload was. Now I understand, especially today, we’re talking about a little bit before we started recording. Let me hear from you all what it means to you. [00:34:00] BL: This is one of those terms, and I’m sure like the last any ex-Googlers about this, they’ll probably agree. This is a Google term that we actually have zero context about why it’s a term. I’m sure we could ask somebody and they would tell us, but workloads to me personally are anything that ultimately creates a pod. Deployments create replica sets, create pods. That whole thing is a workload. That’s how I look at it. [00:34:29] CC: Before there were pods, were there workloads, or is a workload a new thing that came along with pods? [00:34:35] BL: Once again, these words don’t make any sense to us, because they’re Google terms. I think that a pod is a part of a workload, like a deployment is a part of a workload, like a replica set is part of a workload. Workload is the term that encompasses an entire set of objects. [00:34:52] D: I think of a workload as a subset of an application. When I think of an application or a set of microservices, I might think of each of the services that make up that entire application as a workload. I think of it that way because that’s generally how I would divide it up to Brian’s point into different deployment or different stateful sets or different – That sort of stuff. Thinking of them each as their own autonomous piece, and altogether they form an application. That’s my think of it. [00:35:20] CC: To connect to what Brian said, deployment, will always run in the pods, which is super confusing if you’re not looking at these things, just so people understand, because it took me forever to understand that. The connection between a workload, a deployment and a pod. Pods contain – If you have a deployment that you’re going to shift Kubernetes – I don’t know if shift is the right word. You’re going to need to run on Kubernetes. That deployment needs to run somewhere, in some artifact, and that artifact is called a pod. [00:35:56] NL: Yeah. Going back to what Duffie said really quickly. A workload to me was always a process, kind of like not just a pod necessarily, but like whatever it is that if you’re like, “I just need to get this to run,” whatever that is. To me that was always a workload, but I think I’m wrong. I think I’m oversimplifying it. I’m just like whatever your process is. [00:36:16] BL: Yeah. I would give you – The reason why I would not say that is because a pod can run multiple containers at once, which ergo is multiple processes. That’s why I say it that way. [00:36:29] NL: Oh! You changed my mind. [00:36:33] BL: The reason I bring this up, and this is probably a great idea for a future show, is about all the jargon and terminology that we use in this land that we just take as everyone knows it, but we don’t all know it, and should be a great conversation to have around that. But the reason I always bring up the whole workload thing is because when we think about workloads and then you can’t have state without workloads, really. I just wanted to make sure that we tied those two things together. [00:36:58] CC: Why can you not have state without workloads? What does that mean? [00:37:01] BL: Well, the reason you can’t have state without workloads is because something is going to have to create that state, whether that workload is running in or out a cluster. Something is going to have to create it. It just doesn’t come out of nowhere. [00:37:11] CC: That goes back to what Nick was saying, that he thinks a workload is a process. Was that was you said, Nick? [00:37:18] NL: It is, yeah, but I’m renegading on that. [00:37:23] CC: At least I could see why you said that. Sorry, Brian. I cut you off. [00:37:28] BL: What I was saying is a workload ultimately is one or more processes. It’s not just a process. It’s not a single process. It could be 10, it could be 1. [00:37:39] JS: I have one final question, and we can bail on this and edit it out if it’s not a good one to end with. I hope it’s not too big, but I think maybe one thing we overlooked is just why it’s hard to run stateful workloads in these new systems like Kubernetes. We talked about how there’s more complexity and stuff, but there might be some room to talk about – People have been spinning up an EC2 server, a server on the web and running MySQL on it forever. Why in like the Kubernetes world of like pods and things is it a little bit harder to run, say, MySQL just [inaudible 00:38:10]. Is that something worth diving into? [00:38:13] NL: Yeah, I think so. I would say that for things like, say, applications, like databases particularly, they are less resilient to outages. While Kubernetes itself is dedicated to – Or most container orchestrations, but Kubernetes specifically, are dedicated to running your pods continuously as long as they will, that it is still somewhat of a shifting landscape. You do have priority and preemption. If you don’t set those things up properly of if there’s just like a total failure of your system at large, your stateful application can just go down at any time. Then how do you reconcile the outage in data, whatever data that might have gotten lost? Those sorts of things become significantly more complicated in an environment like Kubernetes where you don’t necessarily have access to a command line to run the commands to recover as easy. You may not, but it’s the same. [00:39:01] BL: Yes. You got to understand what databases do. Disk is slow, whether you have spinning disk or you have disk on chip, like SSD. What databases do in a lot of cases is they store things in memory. So if it goes away, didn’t get stored. In other cases, what databases do is they have these huge transactional logs, maybe they write them out in files and then they process the transaction log whenever they have CPU time. If a database dies just suddenly, maybe its state is inconsistent because it had items that were to be processed in a queue that haven’t been processed. Now it doesn’t know what’s going on, which is why – [00:39:39] NL: That’s interesting. I didn’t know that. [00:39:40] BL: If you kill MySQL, like kill MySQL D with a -9, why it might not come back up. [00:39:46] JR: Yeah. Going back to Kubernetes as an example, we are living in this newer world where things can get rescheduled and moved around and killed and their IPs changed and things. It seems like this environment is, should I say, more ephemeral, and those types of considerations becoming to be more complex. [00:40:04] NL: I think that really nails it. Yeah. I didn’t know that there were transactional logs about databases. I should, I feel like, have known that but I just have no idea. [00:40:11] D: There’s one more part to the whole stateful, stateless thing that I think is important to cover, but I don’t know if we’ll be able to cover it entirely in the time that we have left, and that is from the network perspective. If you think about the types of connections coming into an application, we refer to some of those connections as stateful and stateless. I think that’s something we could tackle in our remaining time, or what’s everybody’s thought? [00:40:33] JR: Why don’t you try giving us maybe a quick summary of it, Duffie, and then we can end on that. [00:40:36] CC: Yeah. I think it’s a good idea to talk about network and then address that in the context of network. I’m just thinking an idea for an episode. But give us like a quick rundown. [00:40:45] D: Sure. A lot of the kind of older monolithic applications, the way that you would scale these things is you would have multiple of them and then you would have some intelligence in the way that you’re routing connections down to those applications that would describe the ability to ensure that when Bob accesses a website and he authenticates, he’s going to authenticate to one specific instance of this application and the intelligence up in the frontend is going to handle the routing to make sure that Bob’s connection always comes back to that same instance. This is an older pattern. It’s been around for a very long time and it’s certainly the way that we first kind of learned to scale applications before we’ve decided to break into maker services and kind of handle a lot of this routing in a more resilient way. That was kind of one of the early versions of how we do this, and that is a pretty good example of a stateful session, and that there is actually some – Perhaps Bob has authenticated and he has a cookie that allows him, that when he comes back to that particular application, a lot of the settings, his browser settings, whether he’s using the dark theme or the light theme, that sort of stuff, is persisted on the server side rather than on the client side. That’s kind of what I mean by stateful sessions. Stateless sessions mean it doesn’t really matter that the user is terminating to the same end of point, because we’ve managed to keep the state either with the client. We’re handling state on the browser side of things rather on the server side of things. So you’re not necessarily gaining anything by pushing that connection back to the same specific instance, but just to a service that is more widely available. There are lots of examples of this. I mean, Brian’s example of Google earlier. Obviously, when I come back to Google, there are some things I want it to remember. I want it to remember that I’m logged in as myself. I want it to remember that I’ve used a particular – I want it to remember my history. I want it to remember that kind of stuff so that I could go back and find things that I looked at before. There are a ton of examples of this when we think about it. [00:42:40] JR: Awesome! All right, everyone. Thank you for joining us in episode 6, Stateful and Stateless. Signing off. I’m Josh Rosso, and going across the line, thank you Nicholas Lane. [00:42:54] NL: Thank you so much. This was really informative for me. [00:42:56] JR: Carlisia Campos. [00:42:57] CCC: This was a great conversation. Bye, everybody. [00:42:59] JR: Our new comer, Brian Liles. [00:43:01] BL: Until next time. [00:43:03] JR: And Duffie Cooley. [00:43:05] DCC: Thank you so much, everybody. [00:43:06] JR: Thanks all. [00:43:07] CCC: Bye! [END OF EPISODE] [0:50:00.3] ANNOUNCER: Thank you for listening to The Podlets Cloud Native Podcast. Find us on Twitter at https://twitter.com/ThePodlets and on the http://thepodlets.io/ website, where you'll find transcripts and show notes. We'll be back next week. Stay tuned by subscribing. [END]See omnystudio.com/listener for privacy information.
In this episode of The Podlets Podcast, we are talking about the very important topic of recovery from a disaster! A disaster can take many forms, from errors in software and hardware to natural disasters and acts of God. That being said that are better and worse ways of preparing for and preventing the inevitable problems that arise with your data. The message here is that issues will arise but through careful precaution and the right kind of infrastructure, the damage to your business can be minimal. We discuss some of the different ways that people are backing things up to suit their individual needs, recovery time objectives and recovery point objectives, what high availability can offer your system and more! The team offers a bunch of great safety tips to keep things from falling through the cracks and we get into keeping things simple avoiding too much mutation of infrastructure and why testing your backups can make all the difference. We naturally look at this question with an added focus on Kubernetes and go through a few tools that are currently available. So for anyone wanting to ensure safe data and a safe business, this episode is for you! Follow us: https://twitter.com/thepodlets Website: https://thepodlets.io Feeback: info@thepodlets.io https://github.com/vmware-tanzu/thepodlets/issues Hosts: https://twitter.com/carlisiahttps://twitter.com/bryanlhttps://twitter.com/joshrossohttps://twitter.com/opowero Key Points From This Episode: • A little introduction to Olive and her background in engineering, architecture, and science. • Disaster recovery strategies and the portion of customers who are prepared.• What is a disaster? What is recovery? The fundamentals of the terms we are using.• The physicality of disasters; replication of storage for recovery.• The simplicity of recovery and keeping things manageable for safety.• What high availability offers in terms of failsafes and disaster avoidance.• Disaster recovery for Kubernetes; safety on declarative systems.• The state of the infrastructure and its interaction with good and bad code.• Mutating infrastructure and the complications in terms of recovery and recreation. • Plug-ins and tools for Kubertnetes such as Velero.• Fire drills, testing backups and validating your data before a disaster!• The future of backups and considering what disasters might look like. Quotes: “It is an exciting space, to see how different people are figuring out how to back up distributed systems in a reliable manner.” — @opowero [0:06:01] “I can assure you, careers and fortunes have been made on helping people get this right!” — @bryanl [0:07:31] “Things break all the time, it is how that affects you and how quickly you can recover.” —@opowero [0:23:57] “We do everything through the Kubernetes API, that's one reason why we can do selectivebackups and restores.” — @carlisia [0:32:41] Links Mentioned in Today’s Episode: The Podlets — https://thepodlets.io/The Podlets on Twitter — https://twitter.com/thepodletsVMware — https://www.vmware.com/Olive Power — https://uk.linkedin.com/in/olive-power-488870138Kubernetes — https://kubernetes.io/PostgreSQL — https://www.postgresql.org/AWS — https://aws.amazon.com/Azure — https://azure.microsoft.com/Google Cloud — https://cloud.google.com/Digital Ocean — https://www.digitalocean.com/SoftLayer — https://www.ibm.com/cloudOracle — https://www.oracle.com/HackIT — https://hackit.org.uk/Red Hat — https://www.redhat.com/Velero — https://blog.kubernauts.io/backup-and-restore-of-kubernetes-applications-using- heptios-velero-with-restic-and-rook-ceph-as-2e8df15b1487CockroachDB — https://www.cockroachlabs.com/Cloud Spanner — https://cloud.google.com/spanner/ Transcript: EPISODE 08[INTRODUCTION] [0:00:08.7] ANNOUNCER: Welcome to The Podlets Podcast, a weekly show that explores Cloud Native one buzzword at a time. Each week, experts in the field will discuss and contrast distributed systems concepts, practices, tradeoffs and lessons learned to help you on your cloud native journey. This space moves fast and we shouldn’t reinvent the wheel. If you’re an engineer, operator or technically minded decision maker, this podcast is for you. [EPISODE] [00:00:41] CC: Hi, everybody. We are back. This is episode number 8. Today we have on the show myself, Carlisia Campos and Josh. [00:00:51] JR: Hello, everyone. [00:00:52] CC: That was Josh Rosso. And Olive Power. [00:00:55] OP: Hello. [00:00:57] CC: And also Brian Lyles. [00:00:59] BL: Hello. [00:00:59] CC: Olive, this is your first time, and I didn’t even give you a heads-up. But tell us a little bit about your background. [00:01:06] OP: Yeah, sure. I’m based in the UK. I joined VMware as part of the Heptio acquisition, which I joined Heptio way back last year in October. The acquisition happened pretty quickly for me. Before that, I was at Red Hat working on some of their cloud management tooling and a bit of OpenShift as well. Before that, I worked with HP and Fujitsu. I kind of work in enterprise management a lot, so things like desired state and automation are kind of things that have followed me around through most of my career. Coming in here to VMware, working in the cloud native applications business unit is kind of a good fit for me. I’m a mom of two and I’m based in the UK, which I have to point out, currently undergoing a heat wave. We’ve had about like 3 weeks of 25 to 30 degrees, which is warm, very warm for us. Everybody is in a great mood. [00:01:54] CC: You have a science background, right? [00:01:57] OP: Yeah, I studied chemistry in university and then I went on to do a PhD in cancer research. I was trying to figure out ways where we could predict how different people will going to respond to radiation treatments and then with a view to tailoring everybody’s treatment to make it unique for them rather than giving the same treatment to different who present you with the same disease but were response very, very different. Yeah, that was really, really interesting. [00:02:22] CC: What is your role at VMware? [00:02:23] OP: I’m a cloud native architect. I help customers predominantly focus on their Kubernetes platforms and how to build them either from scratch or help them get more production-ready depending on where they are in their Kubernetes journey. It’s been really exciting part of being part of Heptio and following through into the VMware acquisition. We’re going to speak to customers a lot at very exciting times for them. They’re kind of embarking on their Kubernetes journey a lot of them. We’re with them from the start and every step of the way. That’s really rewarding and exciting. [00:02:54] CC: Let me pick up on that thread actually, because one thing that I love about this group for me, because I don’t get to do that. You all meet customers and you know what they are doing. Get that knowledge first-hand. What would you say the percentage of the clients that you see, how disaster recovery strategy, which by the way is a topic of today’s show. [00:03:19] OP: I speak to customers a lot. As I mentioned earlier, a lot of them are like in different stages of their journey in terms of automation, in terms of infrastructure of code, in terms of where they want to go for their next platform. But there generally in the room a team that is responsible for backup and recovery, and that’s generally sort of leads into this storage team really because you’re trying to backup state predominantly. When we’re speaking to customers, we’ll have the automation people in the room. We’ll have the developers in the room and we’ll have the storage people in the room, and they are the ones that are primarily – Out of those three sort of folks I’ve mentioned, they’re the ones that are primarily concerned about backup. How to back up their data. How to restore it in a way that satisfies the SLAs or the time to get your systems back online in a timely manner. They are the force concerned with that. [00:04:10] JR: I think it’s interesting, because it’s almost scary how many of our customers don’t actually have a disaster recovery strategy of any sort. I think it’s often times just based on the maturity of the platform. A lot of the applications and such, they’re worried about downtime, but not necessarily like it’s going to devastate the business in a lot of these apps. I’m not trying to say that people don’t run mission critical apps on things like Kubernetes. It’s just a lot of people are very new and they’re just kind of ramping up. It’s a really complicated thing that we work with our customers on, and there’re so many like layers to this. I’m sure layers that we’ll get into. There are things like disaster recovery of the actual platform. If Kubernetes, as an example, goes down. Getting it back up, backing up its data store that we call etcd. There’s obviously like the applications disaster recovery. If a cluster of some sort goes own, be it Kubernetes or otherwise, shifting some CI system and redeploying that into some B cluster to bring it back up. Then to Olive’s point, what she said, it all comes back to storage. Yeah. I mean, that’s where it gets extremely complicated. Well, at least in my mind, it’s complicated for me, I should say. When you’re thinking about, “Okay, I’m running this PostgreS as a service thing on this cluster.” It’s not that simple to just move the app from cluster A to cluster B anymore. I have to consider what do I do with the data? How do I make sure I don’t lose it out? Then that’s a pretty complicated question to answer. [00:05:32] OP: I think a lot of the storage providers, vendors playing in that storage space are kind of looking at novel ways to solve that and have adapted their current thinking maybe that was maybe slightly older thinking to new ways of interacting with Kubernetes cluster to provide that ongoing replication of data around different systems outside of the Kubernetes and then allowing it to be ported back in when a Kubernetes cluster – If we’re talking about Kubernetes in this instance as a platform, porting that data back in. There’re a lot of vendors playing in that space. It’s kind of an exciting space really to see how different people are figuring out how to back up distributed systems in reliable manner, because different people want different levels of backup. Because of the microservices nature of the cloud native architectures that we predominantly deal with, your application is not just one thing anymore. Certain parts of that application need to be recovered fairly quickly, and other parts don’t need to recover that quickly. It’s all about functionality ultimately that your end customers or your end users see. If you think about visually as like a banking application, for example, where if you’re looking at things like – The customer is interacting with that and they can check their financial details and they can check the current stages of their account, then they are two different services. But the actual service to transfer money into their account is down. It’s still a pretty functional system to the end user. But in the background, all those great systems are in place to recover that transfer of money functionality, but it’s not detrimental to your business if that’s down. There’ll be different SLAs and different objectives in terms of recovery, in terms of the amount of time that it takes for you to restore. All of that has to be factored in into disaster recovery plans and it’s up to the company and we can help as much as possible for them to figure out which feats of the applications and which feats of your business need to conform to certain SLAs in terms of recovery, because different feats will have different standards and different times in and around that space. It’s a complicated thing. It definite is. [00:07:29] BL: I want to take a step back and unpack this term, disaster recovery, because I can assure you, careers and fortunes have been made on helping people get this right. Before we get super deep into this, what’s a disaster and then what’s a recovery for that? Have you thought about that at a fundamental level? [00:07:45] OP: Just for me, if we would kind of take it at face value. A physical disaster, they could be physical ones or software-based ones. Physical ones can be like earthquakes or floodings, fires, things like that that are happening either in your region or can be fairly widespread across the area that you’re in, or software, cyber attacks that are perhaps to your own internal systems, like your system has been compromised. That’s fairly local to you. There are two different design strategies there. Physical disaster, you have to have a recover plan that is outside of that physical boundary that you can recover your system from somewhere that’s not affected by that physical disaster. For the recovery in terms of software in terms of your system has been compromised, then the recovery from that is different. I’m not an expert on cyber attacks and vulnerabilities, but the recovery from there for companies trying to recover from that, they plan for it as much as possible. So they down their systems and try and get patches and fixes to them as quickly as possible and spin the system backups. [00:08:49] BL: I’m understanding what you’re saying. I’m trying to unpack it for those of us listening who don’t really understand it. I’m going to go through what you said and we’ll unpack it a little bit. Physical from my assumption is we’re running workloads. Let’s say we’re just going to say in a cloud, not on-premise. We’re running workloads in let’s say AWS, and in the United States, we can take care local diversity by running in East and West regions. Also, we can take care of local diversity by running in availability, but they don’t reach it, because AWS is guaranteed that AZ1 and AZ3 have different network connections, are not in the same building, and things like that. Would you agree? Do you see that? I mean, this is for everyone out there. I’m going to go from super high-level down to more specific. [00:09:39] OP: I personally wouldn’t argue that, except not everybody is on AWS. [00:09:43] BL: Okay. AWS, or Azure, or Google Cloud, DigitalOcean, or SoftLayer, or Oracle, or Packet. If I thought about this, probably we could do 20 more. [00:09:55] JR: IBM. [00:09:56] BL: IBM. That’s why I said SoftLayer. They all practice in the physical diversity. They all have different regions that you can deploy software. Whether it’s be data locality, but also for data protection. If you’re thinking about creating a planet for this, this would be something you could think about. Where does my rest? What could happen to that data? Building could actually just fall over on to itself. All the hard drives are gone. What do I do? [00:10:21] OP: You’re saying that replication is a form of backup? [00:10:26] BL: I’m actually saying way more than that. Before you even think about things when it comes to disaster recovery, you got to define what a disaster is. Some applications can actually run out of multiple physical locations. Let’s go back to my AWS example, because it’s everywhere and everyone understands how AWS works at a high-level. Sometimes people are running things out of US-East-1 and US-West-2, and they could run both of the applications. The reason they can do that is because the individual transactions of whatever they’re doing don’t need to talk to one another. They connect just websites out of places. To your point, when you talk about now you have the issue where maybe you’re doing inventory management, because you have a large store and you’re running it out of multiple countries. You’re in the EU and you’re somewhere on APAC as well. What do you do about that? Well, there are a couple of ways that – I could think about how we would do that. We could actually just have all the database connections go back to one single main service. Then what we could do with that main service is that we could have it replicated in their local place and then we can replicate it in a remote place too. If the local place goes up, at least you can point all the other sites back to this one. That’s the simplest way. The reason I wanted to bring this up, is because I don’t like acronyms all that much, but disaster recovery has two of my favorite ones and they’re called RPO and RTO. Really, what it comes down to is you need to think about when you have a disaster, no matter that disaster is or how you define it, you have RTO. Basically, it’s the time that you can be down before there’s a huge issue. Then you have something called DPO, which is without going into all the names, is how far you can go since your last backup before you have business problems. Just thinking about those things is how we should think about our backup disaster recovery, and it’s all based on how your business works or how your project works and how long you can be down and how much data you have. [00:12:27] CC: Which goes to what Olive was saying. Please spell out to us what RTO and RPO stand for. [00:12:35] BL: I’m going to look them up real quick, because I literally pushed those acronym meanings out. I just know what they mean. [00:12:40] OP: I think it’s recovery time objective and recovery data objective. [00:12:45] BL: Yeah. I don’t know what the P stands for, but it is for data. [00:12:49] OP: Recovery. [00:12:51] BL: It’s the recovery points. Yeah. That’s what it is. It is the recovery point objective, RPO; and recovery time objective, RTO. You could tell that I’ve spent a lot of time in enterprise, because we don’t even define words. The acronym means what it is. Do you know what the acronym stands for anymore? [00:13:09] OP: How far back in terms of data can we go that was still okay? How far back in time can we be down, basically, until we’re okay? [00:13:17] CC: It is true though, and as Josh was saying, some teams or companies or products, especially companies that are starting their journey, their cloud native journey. They don’t have a backup, because there are many complicated things to deal with, and backup is super complicated, I mean, the disaster recovery strategy. Doing that is not trivial. But shouldn’t you start with that or at least because it is so complex? It’s funny to me when people say I don’t have that kind of a strategy. Maybe just like what Bryan said why utilizing, spreading out your data through regions, that is a strategy in itself, and there’s more to it. [00:14:00] JR: Yeah. I think I oversimplified too much. Disaster recovery could theoretically be anything I suppose. Going back to what you were saying, Brian, the recovery aspect of it. Recovery for some of the customers I work with is literally to stand on a brand-new cluster, whatever that cluster is, a cluster, that is their platform. Then redeploy all the applications on top of it. That is a recovery strategy. It might not be the most elegant and it might make assumptions about the apps that run on it, but it is a recovery strategy that somewhat simple, simple to kind of conceptualize and get started with. I think a lot of the customers that I work with when they’re first getting their bearings with distributed system of sorts, they’re a lot more concerned about solving for high availability, which is what you just said, Carlisia, where we’re spreading across maybe multiple sites. There’s the notion of different parts of the world, but there’s also the idea of like what I think Amazon has coined availability zones. Making sure if there is a disaster, you’re somewhat resilient to that disaster like Brian was saying with moving connections over and so on. Then once we’ve done high-availability somewhat well, depending on the workloads that are running, we might try to get a more fancy recovery solution in place. One that’s not just rebuild everything and redeploy, because the downtime might not be acceptable. [00:15:19] BL: I’m actually going to give some advice to all the people out there who might be listening to this and thinking about disaster recovery. First of all, all that complex stuff, that book you read, forget about it. Not because you don’t need to know. It’s because you should only think about what’s in scope at any given time. When you’re starting an application, let’s say I’m actually making a huge assumption that you’re using someone else’s cloud. You’re using public cloud. Whenever you’re in your data center, there’s a different problem. Whenever you’re using public cloud, think about what you already have. All the major public clouds had a durable object storage. Many 9s of durability and then fewer 9s, but still a lot of 9s of availability too. The canonical example there is S3. When you’re designing your applications and you know that you’re going to have disaster issues, realize that S3 is almost always going to be there, unless it was 2017 and it goes down, or the other two failures that it had. Pretty much, it will be there. Think about how do I get that data into S3. I’m just saying, you can use it for storage. It’s fairly cheap for how much storage you can get. You can make it sure it’s encrypted, and using IM, you can definitely make sure that people who have the right pillages can see it. The same goes with Azure and the same goes with Google. That’s the first phase. The second phase is that now you’re going to say, “Well, what is a relational database?” Once again, use your cloud provider. All the major cloud providers have great relational databases, and actually key value stores as well. The neat thing about them is you can actually set them up sometimes to run in a whole region. You can set them up to do automated backups. At least the minimum that you have, you actually use your cloud provider for what it’s valuable for. Now, you’re not using a cloud provider and you’re doing it on-premise, I’m going to tell you, the simple answer is I hope you have a little bit of money, because you’re going to have to pay somebody either one of Kubernetes architects or you’re going to pay somebody else to do it. There’s no easy button for this kind of solution. Just for this little mini-rant, I’m going to leave everyone with the biggest piece of advice, the best piece of advice that I can ever leave you if you’re running relational databases. If you are running a relational database, whether it’d be PostgreS, MySQL, Aurora, have it replicated. But here’s the kicker, have another replica that you delay and make it delay 10 minutes, 15 minutes, not much longer than that. Because what’s going to happen, especially in a young company, especially if you’re using Rails or something like that, you’re going to have somebody who is going to have access to production, because you’re a small company, you haven’t really federated this out yet. Who’s going to drop your main database table? They’re just going to do it and it’s going to happen and you’re going to panic. If you have it in a replica, that databases go in a replica, you have a 10-minute delay replica – 10 minutes to figure it out before the world ends. Hopefully someone deletes the master database. You’re going to know pretty quickly and you can just cut that replica out, pull that other one over. I’m not going to say where i learned this trick. We had to employ it multiple times, and it saves our butts multiple times. That’s my favorite thing to share. [00:18:24] OP: Is that replica on separate system? [00:18:26] BL: It was on a separate system. I actually don’t say, because it will be telling on who did it. Let’s say that it was physically separate from the other one in a different location as well. [00:18:37] OP: I think we’ve all been there. We’ve all have deleted something that maybe – [00:18:41] CC: I’m going to tell who did it. It was me. [00:18:45] BL: Oh no! It definitely wasn’t me. [00:18:46] OP: We mentioned HA. Will the panel think that there’s now a slightly inverse relationship between the amount of HA that you architect for versus the disaster recovery plan that you have implemented on the back of that? More you’re architecting around HA, like the less you architect or plan for DR. Not eliminating ether of them. [00:19:08] BL: I see it more. Mean, it used to be 15 years ago. [00:19:11] CC: Sorry. HA, we’re talking about high availability. [00:19:15] BL: When you think about high availability, a lot of sites were hosted. This is really before you had public cloud and a lot of people were hosting things on WebHost or they’re hosting themselves. Even if you are a company who had like a big equinox of level 3, you probably didn’t have two facilities at two different equinoxes or level 3, which probably does had one big cage and you just had diversity in the systems in there. We found people had these huge tape backups and we’re very diligent about swapping our tapes out. One thing you did was we made sure that – I mean, lots of practice of bringing this huge system down, because we assumed that the database would die and we would just spend a few hours bringing it back up, or days. Now with high availability, we can architect systems where that is less of a problem, because we could run more things that manage our data. Then we can also do high availability in the backend on the database side too. We can do things like multi-writes and multi-reads. We can actually write our data in multiple places. What we find when we do this is that the loss of a single database or a slice of processing/webhosts just means that our services degraded, which means we don’t really have a disaster in this point and we’re trying to avoid disasters. [00:20:28] JR: I think on that point, the way I’ve always thought about it, and I’ll admit this is super overly simplified, but like successful high availability or HA could make your lead to perform disaster recovery less likely, can, maybe, right? It’s possible. [00:20:45] BL: Also realize that everybody is running in public cloud. In that case, well, you can still back your stuff up to public cloud even if you’re not running in public cloud. There are still people out there who are running big tape arrays, and I’ve seen them. I’ve seen tape arrays that are wider. I’m sitting in an 80-inch wide table, bigger than this table with robotic arms and takes the restic and you had to make sure that you got the text right for that particular day doing your implementation. I guess what I’m saying is that there is a balance. HA, high availability, if you’re doing it in a truly high available way, you can’t miss whole classes of disaster. But I’m not saying that you will not have disaster, because if that was the case, we won’t be having this discussion right now. I’d like to move the conversation just a little bit to more cloud native. If you’re running on Kubernetes, what should you think about for disaster recovery? What are the types of disasters we could have? How could we recover them? [00:21:39] JR: Yeah. I think one thing that comes to mind, I was actually reading the Kubernetes Best Practices book last night, but I just got an O’Reilly membership. Awesome. Really cool book. One of the things that they had recommended early on, which I thought was a really good pull out is that since Kubernetes is a declarative system where we write these manifests to describe the desired state of our application and how it should run, recommending that we make sure to keep that declarative state in source control, just like we would our code so that if something were to go wrong, it is somewhat more trivial to redeploy the application should we need to recover. That does assume we’re not worried about like data and things like that, but it is a good call out I think. I think the book made a good call out. [00:22:22] OP: That’s on the declarative system and enable to bring your systems back up to the exact way they were before kind of itself adds comfort to the whole notion that they could be disaster. If they was, we can spin up backup relatively quickly. That’s back from the days of automation where the guys originally – I came from Red Hat, so fork at Ansible. We’re kind of trying to do the infrastructure as a code, being able to deploy, redeploy, redeploy in the same manner as the previous installation, because I’ve been in this game long-time now and I’ve spent a lot of time working with processes in and around building physical servers. That process will get handled over to lots of different teams. It was a huge thing to build these things, to get one of these things built and signed off, because it literally has to pass through the different teams to do their own different bits of things. The idea that you would get a language that had the functionality that suited the needs of all those different teams, of the store team, could automate their piece, which they were doing. They just wasn’t interactive with any of the other teams. The network people would automate theirs and the application install people would do their bit. The server OS people would do their bit. Having a process that could tie those teams together in terms of a language, so Ansible, Puppet, Chef, those kinds of things try to unite those teams and it can all do your automation, but we have a tool that can take that code and run it as one system end-to-end. At the end of that, you get an up and running system. If you run it again, you get all the systems exactly the same as the previous one. If you run it again, you get another one. Reducing the time to build these things plays very importantly into this space. Disaster is only disaster in terms of time, because things break all the time. How that affects you and how quickly you can recover. If you can recover in like seconds, in minutes and it hasn’t affected your business at all, then it wasn’t really a disaster. The time it takes you to recover, to build your things back is key. All that automation and then leading on to Kubernetes, which is the next step, I think, this whole declarative, self-healing and implementing the desired state on a regular basis really plays well into this space. [00:24:25] CC: That makes me think, I don’t completely understand because I’m not out there architecting people’s systems. The one thing that I do is building this backup tool, which happens to be for Kubernetes. I don’t completely get the limitations and use cases, but my question is, is it enough to have the declarations of how your infrastructure should be in source control? Because what if you’re running applications on the platform and your applications are interacting with a platform, change in the state of the platform. Is that not something that happens? Of course, ideally, having those declarations and source control of course is a great backup, but don’t you also want to back up the changes to state as they keep happening? [00:25:14] BL: Yeah, of course. That has been used for a long-time. That’s how replication works. Literally, you take the change and you push it over the wire and it gets applied to the remote system. The problem is, is that there isn’t just one way to do this, because if you do only transaction-based. If you only do the changes, you need a good base to start with, because you have to apply those changes to something. How do you get that piece? I’m not asking you to answer that. It’s just something to think about. [00:25:44] JR: I think you’ve hit a fatal flaw too, Carlisia, and like what that simplified just like having source control model kind of falls over. I think having that declarative kind of stamped out, this is the ideal nature of the world to this deployment and source control has benefits beyond just that of disaster recovery scenario, right? For stateless applications especially, like we talked about in the previous podcast, it can actually be all lead potentially, which is so great. Move your CI system over to cluster B. Boom! You’re back up and running. That’s really neat. A lot of our customers we work with, once we get them to a point where they’re at that stage, they then go, “Well, what about all these persisted volumes?” which by the way is evolving on a computer, which is a Kubernetes term. But like what about all these parts on like disk that I don’t want to lose if I lose my cluster? That it totally feeds into why tools like the one you work on are so helpful. Maybe I don’t know if now would be a good time. But maybe, Carlisia, you could expand on that tool. What it tries to solve for? [00:26:41] CC: I want to back up a little though. Let’s put aside stateful workloads and volumes and databases. I was talking about the infrastructure itself, the state of the infrastructure. I mean, isn’t that common? I don’t know the answer to this. I might be completely off. Isn’t that common for you to develop a cloud native application that is changing the state of the infrastructure, or is this something that’s not good to do? [00:27:05] JR: It’s possible that you can write applications that can change infrastructure, but think about that. What happens when you have bad code? We all have bad code. Our people like to separate those two things. You can still have infrastructure as code, but it’s separated from the application itself, and that’s just to protect your app people from your not app people and vice versa. A lot of that is being handled through systems that people are writing right now. You have Ansible from IBM. You have things like HashiCorp and all the things that they’re doing. They have their hosted thing. They have their own premise thing. They have their local thing. People are looking at that problem. The good thing is that that problem hasn’t been solved. I guess good and bad at the same time, because it hasn’t been solved. So someone can solve it better. But the bad thing is that if we’re looking for good infrastructure as code software, that has not been solved yet. [00:27:57] OP: I think if we’re talking about containerized applications, I think if there was systems that interacted or affected or changed the infrastructure, they would be separate from the applications. As you were saying, Brian, you just expanded a little bit [inaudible 00:28:11] containerized or sandboxed, processes that were running separate to the main application. You’re separating out what’s actually running and doing function in terms of application versus systems that have to edit that infrastructure first before that main application runs. They’re two separate things. If you had to restore the infrastructure back to the way it was without rebuilding it, but perhaps have a system whereby if you have something editing the infrastructure, you would always have something that would edit it back. If you have the process that runs to stop something, you’d also have a process that start at something. If you’re trying to [inaudible 00:28:45] your applications and if it needs to interact with other things, then that application design should include the consideration of what do I need to do to interact with the infrastructure. If I’m doing something left-wise, I have to do the opposite in equal reaction right-wise to have an effectively clean application. That’s the kind of stuff I’ve seen anyway. [00:29:04] JR: I think it maybe even fold into a whole other topic that we could even cover on another podcast, which is like the notion of the concern of mutating infrastructure. If you have a ton of hands in those cookie jars and they’re like changing things all over the place, you’re losing that potential single source of declarative truth even, right? It just could become very complicated. I think maybe to the crux of your original point, Carlisia. Hopefully I’m not super off. If that is happening a lot, I think it could actually make recover more complicated, or maybe recovery is not the way to put it, but recreating the infrastructure, if that makes sense. [00:29:36] BL: Your infrastructure should be deterministic, and that’s why I said you could. I know we talked about this before about having applications modify infrastructure. Think about that. Can and should are two different things. If you have it happen within your application due to input of any kind, then you’re no longer deterministic, unless you can figure out what that input is going to be. Be very careful about that. That’s why people split infrastructure as code from their other code. You could still have CI, continuous integration and continuous delivery/deployment for both, but they’re on different pipelines with different release metrics and different monitoring and different validation to make sure they work correctly. [00:30:18] OP: Application design plays a very important role now, especially in terms of cloud native architecture. We’re talking a lot about microservices. A lot of companies are looking to re-architect their applications. Maybe mistakes that were made in the past, or maybe not mistakes. It’s perhaps a strong word. But maybe things that were allowed in the past perhaps are now best practices going forward. If we’re looking to be able to run things independently of each other, and by definition, applications independent on the infrastructure, that should be factored in into the architecture of those applications going forward. [00:30:50] CC: Josh asked me to talk a little bit about Velerao. I will touch up on it quickly. First of all, we’d love to have a whole show just about infrastructure code, GitOps. Maybe that would be two episodes. Velero doesn’t do any backup of the infrastructure itself. It works at the Kubernetes level. We back up the Kubernetes clusters including the volumes. If you have any sort of stateful app attached to a pod that can get backed up as well. If you want to restore that to even a different service provider, then the one you backed up from, we have a restic plugin that you can use. It’s embedded in the Velero tool. So you can do that using this plugin. There are few really cool things that I find really cool about Velero is, one, you can do selective backups, which really, really don’t recommend. We recommend you always back up everything, but you can do selective restores. That would be – If you don’t need to restore a whole cluster, why would you do it? You can just do parts of it. It’s super simple to use. Why would you not have a backup? Because this is ridiculously simple. You do it through a command line, and we have a scheduler. You can just put your backup on scheduler. Determine the expiration date of each backup. A lot of neat simple features and we are actively developing things all the time. Velero is not the only one. It’d be fair to mention, and I’m not a super well versed on the tools out there, but etcd itself has a backup tool. I’m not familiar with any of these other tools. One thing to highlight is that we do everything through the Kubernetes API. That’s for example one reason why we can do selective backup or restores. Yes, you can backup etcd completely yourself, but you have to back up the whole thing. If you’re on a managed service, you wouldn’t be able to do that, because you just wouldn’t have access. All the tools like we use to back up to the etcd offers or a service provider. PX-motion. I’m not sure what this is. I’m reading the documentation here. There is this K10 from [inaudible 00:33:13] Canister. I haven’t used any of these tools. [inaudible 00:33:16]. [00:33:17] OP: I just want to say, Velero, the last customer I worked on, they wanted to use Velero in its capacity to be able to back up a whole cluster and then restore that whole cluster on a different cloud provider, as you mentioned. They weren’t thoroughly using it as – Well, they were using it as backup, but their primary function was that they wanted to populate the cluster as it was on a brand-new cloud provider. [00:33:38] CC: Yeah. It’s a migration. One thing that, like I said, Velero does, is back up the cluster, like all the Kubernetes objects, because why would we want to do that? Because if you’re declaring – Someone explain to everybody who’s listening, including myself. Some people bring this up and they say, “Well, I don’t need to back up the Kubernetes objects if all of that is declared and I have the declaration is source control. If something happens, I can just do it again. [00:34:10] BL: Untrue, because just for any given Kubernetes object, there is a configuration that you created. Let’s say if you’re creating an appointment, you need spec replicas, you need the spec templates, you need labels and selectors. But if you actually go and pull down that object afterwards, what you’ll see is there is other things inside of that object. If you didn’t specify any replicas, you get the defaults or other things that you should get defaults for. You don’t want to have a lousy backup and restore, because then you get yourself into a place where if I go back this thing up and then I restore it to a different cluster to actually test it out to see if it works, it will be different. Just keep that in mind when you’re doing that. [00:34:51] JR: I think it just comes down to knowing exactly what Brian just said, because there certainly are times where when I’m working with a customer, there’s just such a simple use case at the notion of redeploying the application and potentially losing some of those factors that may have mutated overtime. They just shrug to it and go, “Whatever.” It is so awesome that tools like Velero and other tools are bridging that gap, and I think to a point that Olive made, not only just backing that stuff up and capturing it state as it was in the cluster, but providing us with a good way to section out one namespace or one group of applications and just move those potentially over and so on. Yeah, it just kind of comes to knowing what exactly are you going to have to solve for and how complex your solution should be. [00:35:32] BL: Yeah. We’re getting towards the end, and I wanted to make sure that we talked about testing your backup, because that’s a popular thing here. People take backups. I’ve done my backups, whether I dump to S3, or I have Velero dumping to S3, or I have some other method that is in an invalid backup, it’s not valid until someone comes and takes that backup, restore it somewhere and actually verifies that it works, because there’ll be nothing worse than having yourself in a situation where you need a backup and you’re in some kind of disaster, whether small or large, and going to find out that, “Oh my gosh! We didn’t even backup the important thing.” [00:36:11] CC: That is so true. I have only been in this backup world for a minute, but I mean I’ve needed to backup things before. I don’t think I’ve learned this concept after coming here. I think I’ve known this concept. It just became stronger in my mind, so I always tell people, if you haven’t done that restore, you don’t have a backup. [00:36:29] JR: One thing I love to add on to that concept too is having my customers run like fire drills if they’re open to it. Effectively, having a list of potential terrible things that can happen, from losing a cluster to just like losing an important component. Unlike one person the team, let’s say, once a week or one a month, depending on their tolerance, just chooses something from that list and does it, not in production, but does it. It gives you the opportunity to test everything end-to-end. Did your learning fire off? When you did restore to your points, was the backup valid? Did the application come back online? It’s kind of a lot of like semi-fun, using the word fun loosely there. Fun ways that you can approach it, and it really is a good way to kind of stress test. [00:37:09] BL: I do have one small follow up on that. You’re doing backups, and no matter how you’re doing them, think about your strategy and then how long to keep data. I mean, whether it’s due to regulation or just physical space and it costs money. You just don’t backup yesterday and then you’d backup again. Backup every day and keep the last 8 days and then, like old school, would actually then have a full backup and keep that for a while just in case, because you never know. [00:37:37] CC: Good point too. Yeah. I think a lot of what we said goes to what – It was Olive I think who said it first. You have to understand your needs. [00:37:46] OP: Yeah, just which bits have different varying degrees of importance in terms of application functionality for your end user. Which bits are absolutely critical and which bits can buy you a little bit more time to recover. [00:37:58] CC: Yeah. That would definitely vary from product to product. As we are getting into this idea of ephemeral clusters and automation and we get really good at automating things and bringing things back up, is it possible that we get to a point where we don’t even talk about disasters anymore, or you just have to grow, bring this up cluster or this system, and does it even matter why [inaudible 00:38:25]. We’re not going to talk about this aspect, because what I’m thinking is in the past, in a long, long time ago, or maybe not so long time ago. When I was working with application, and that was a disaster, it was a disaster, because it felt like a disaster. Somebody had to go in manually and find out what happened and what to fix and fix it manually. It was complete chaos and stress. Now if they just like keep rolling and automate it, something goes down, you bring it back up. Do you know what I mean? It won’t matter why. Are we going to talk about this in terms of it was a disaster? Does it even matter what caused it? Maybe it was a – Recovery from a disaster wouldn’t look any different than a planned update, for example. [00:39:12] BL: I think we’re getting to a place – And I don’t know whether we’re 5 years away or 10 years away or 20 years away, a place where we won’t have the same class of disaster that we have now. Think about where we’ve come over the past 20 years. Over the past 20 years, be basically looked at hardware in a rack is replace. I can think about 1988, 1999 and 2000. We rack a whole bunch of servers, and that server will be special. Now, at these scales, we don’t care about that anymore. When a server goes away, we have 50 more just like it. The reason we were able to do that across large platforms is because of Linux. Now with Kubernetes, if Kubernetes keeps on going in the same trajectory, we’re going to basically codify these patterns that makes hardware loss not a thing. We don’t really care if we lose a server. You have 50 more nodes that look just like it. We’re going to start having the software – The software is always available. Think about like the Google Spanner. Google Spanner is multi-location, and it can lose notes and it doesn’t lose data, and it’s relational as well. That’s what CockroachDB is about as well, about Spanner, and we’re going into the place where this kind of technology is available for anyone and we’re going to see that we’re not going to have these kinds of disasters that we’re having now. I think what we’ll have now is bigger distributed systems things where we have timing issues and things like that and leader election issues. But I think those cool stuff can’t be phased out at least over the next computing generation. [00:40:39] OP: It’s maybe more around architectures these days and applications designers and infrastructure architects in the container space and with Kubernetes orchestrating and maintaining your desired state. You’re thinking that things will fail, and that’s okay, because it will go back to the way it was before. The concept of something stopping in mid-run is not so scary anymore, because it would get put back to its state. Maybe you might need to investigate if it keeps stopping and starting and Kubernetes keeps bringing it back. The system is actually still fully functional in terms of end users. You as the operator might need to investigate why that’s so. But the actual endpoint is still that your application is still up and running. Things fail and it’s okay. That’s maybe a thing that’s changed from maybe 5 years ago, 10 years ago. [00:41:25] CC: This is a great conversation. I want to thank everybody, Olive Power, Josh Rosso, Brian Lyles. I’m Carlisia Campos singing off. Make sure to subscribe. This was Episode 8. We’ll be back next week. See you. [END OF EPISODE] [0:50:00.3] KN: Thank you for listening to The Podlets Cloud Native Podcast. Find us on Twitter at https://twitter.com/ThePodlets and on the http://thepodlets.io/ website, where you'll find transcripts and show notes. We'll be back next week. Stay tuned by subscribing. [END]See omnystudio.com/listener for privacy information.
Today on the show we have esteemed Kubernetes thought-leader, Kelsey Hightower, with us. We did not prepare a topic as we know that Kelsey presents talks and features on podcasts regularly, so we thought it best to pick his brain and see where the conversation takes us. We end up covering a mixed bag of super interesting Kubernetes related topics. Kelsey begins by telling us what he has been doing and shares with us his passion for learning in public and why he has chosen to follow this path. From there, we then talk about the issue of how difficult many people still think Kubernetes is. We discover that while there is no doubting that it is complicated, at one point, Linux was the most complicated thing out there. Now, we install Linux servers without even batting an eyelid and we think we can reach the same place with Kubernetes in the future if we shift our thinking! We also cover other topics such as APIs and the debates around them, common questions Kelsey gets before finally ending with a brief discussion on KubeCon. From the attendance and excitement, we saw that this burgeoning community is simply growing and growing. Kelsey encourages us all to enjoy this spirited community and what the innovation happening in this space before it simply becomes boring again. Tune in today! Follow us: https://twitter.com/thepodlets Website: https://thepodlets.io Feeback: info@thepodlets.io https://github.com/vmware-tanzu/thepodlets/issues Hosts: Carlisia Campos Duffie Cooley Bryan Liles Michael Gasch Key Points From This Episode: Learn more about Kelsey Hightower, his background and why he teaches Kubernetes! The purpose of Kelsey’s course, Kubernetes the Hard Way. Why making the Kubernetes cluster disappear will change the way Kubernetes works. There is a need for more ops-minded thinking for the current Kubernetes problems. Find out why Prometheus is a good example of ops-thinking applied to a system. An overview of the diverse ops skillsets that Kelsey has encountered. Being ops-minded is just an end –you should be thinking about the next big thing! Discover the kinds of questions Kelsey is most often asked and how he responds. Some interesting thinking and developments in the backup space of Kubernetes. Is it better to backup or to have replicas? If the cost of losing data is very high, then backing up cannot be the best solution. Debates around which instances are not the right ones to use Kubernetes in. The Kubernetes API is the part everyone wants to use, but it comes with the cluster. Why the Kubernetes API is only useful when building a platform. Can the Kubernetes control theory be applied to software? Protocols are often forgotten about when thinking about APIs. Some insights into the interesting work Akihiro Suda’s is doing. Learn whether Kubernetes can run on Edge or not. Verizon: how they are changing the Edge game and what the future trajectory is. The interesting dichotomy that Edge presents and what this means. Insights into the way that KubeCon is run and why it’s structured in the way it is. How Spotify can teach us a lesson in learning new skills! Quotes: “The real question to come to mind: there is so much of that work that how are so few of us going to accomplish it unless we radically rethink how it will be done?” — @mauilion [0:06:49] “If ops were to put more skin in the game earlier on, they would definitely be capable of building these systems. And maybe they even end up more mature as more operations people put ops-minded thinking into these problems.” — @kelseyhightower [0:04:37] “If you’re in operations, you should have been trying to abstract away all of this stuff for the last 10 to 15 years.” — @kelseyhightower [0:12:03] “What are you backing up and what do you hope to restore?” — @kelseyhightower [0:20:07] “Istio is a protocol for thinking about service mesh, whereas Kubernetes provides the API for building such a protocol.” — @kelseyhightower [0:41:57] “Go to sessions you know nothing about. Be confused on purpose.” — @kelseyhightower [0:51:58] “Pay attention to the fundamentals. That’s the people stuff. Fundamentally, we’re just some people working on some stuff.” — @kelseyhightower [0:54:49] Links Mentioned in Today’s Episode: The Podlets on Twitter — https://twitter.com/thepodlets Kelsey Hightower — https://twitter.com/kelseyhightower Kelsey Hightower on GitHub — https://github.com/kelseyhightower Interaction Protocols: It's All about Good Manners — https://www.infoq.com/presentations/history-protocols-distributed-systems Akihiro Suda — https://twitter.com/_AkihiroSuda_ Carlisia Campos on LinkedIn — https://www.linkedin.com/in/carlisia/ Kubernetes — https://kubernetes.io/ Duffie Cooley on LinkedIn — https://www.linkedin.com/in/mauilion/ Bryan Liles on LinkedIn — https://www.linkedin.com/in/bryanliles/ KubeCon North America — https://events19.linuxfoundation.org/events/kubecon-cloudnativecon-north-america-2019/ Linux — https://www.linux.org/ Amazon Fargate — https://aws.amazon.com/fargate/ Go — https://golang.org/ Docker — https://www.docker.com/ Vagrant — https://www.vagrantup.com/ Prometheus — https://prometheus.io/ Kafka — https://kafka.apache.org/ OpenStack — https://www.openstack.org/ Verizon — https://www.verizonwireless.com/ Spotify — https://www.spotify.com/ Transcript: EPISODE 7 [INTRODUCTION] [0:00:08.7] ANNOUNCER: Welcome to The Podlets Podcast, a weekly show that explores Cloud Native one buzzword at a time. Each week, experts in the field will discuss and contrast distributed systems concepts, practices, tradeoffs and lessons learned to help you on your cloud native journey. This space moves fast and we shouldn’t reinvent the wheel. If you’re an engineer, operator or technically minded decision maker, this podcast is for you. [INTERVIEW] [00:00:41] CC: Hi, everybody. Welcome back to The Podlets, and today we have a special guest with us, Kelsey Hightower. A lot of people listening to us today will know Kelsey, but as usual, there are a lot of new comers in this space. So Kelsey, please give us an introduction. [00:01:00] KH: Yeah. So I consider myself a minimalist. So I want to keep this short. I work at Google, on Google Cloud stuff. I’ve been involved with the Kubernetes community for what? 3, 4, 5 years ever since it’s been out, and one main goal, learning in public and helping other people do the same. [00:01:16] CC: There you go. You do have a repo on your GitHub that it’s about learning Kubernetes the hard way. Are you still maintaining that? [00:01:26] KH: Yeah. So every six months or so. So Kubernetes is a hard way for those that don’t know. It’s a guide, a tutorial. You can copy and paste. It takes about three hours, and the whole goal of that guide was to teach people how to stand up a Kubernetes cluster from the ground up. So starting from scratch, 6 VMs, you install etcd, all the components, the nodes, and then you run a few test workloads so you can get a feel for Kubernetes. The history behind that was when I first joined Google, we were all concerned about the adaption of such a complex system that Kubernetes is, right? Docker Swarm is out at the time. A lot of people are using Mesos and we’re wondering like a lot of the feedback at that time was Kubernetes is too complex. So Kubernetes the hard way was built as an idea that if people understand how it worked just like they understand how Linux works, because that’s also complex, that if people just saw how the moving pieces fit together, then they would complain less about the complexity and have a way to kind of grasp it. [00:02:30] DC: I’m back. This is Duffie Colley. I’m back this week, and then we also have Michael and Bryan with us. So looking forward to this session talking through this stuff. [00:02:40] CC: Yeah. Thank you for doing that. I totally forgot to introduce who else is in this show, and me, Carlisia. We didn’t plan what the topic is going to be today. I will take a wild guess, and we are going to touch on Kubernetes. I have so many questions for you, Kelsey. But first and foremost, why don’t you tell us what you would love to talk about? One thing that I love about you is that every time I hear an interview of you, you’re always talking about something different, or you’re talking about the same thing in a different way. I love that about the way you speak. I know you offer to be on a lot of podcast shows, which is how we ended up here and I was thinking, “Oh my gosh! We’re going to talk about what everybody is going to talk about, but I know that’s not going to happen.” So feel free to get a conversation started, and we are VMware engineers here. So come at us with questions, but also what you would like to talk about on our show today. [00:03:37] KH: Yeah. I mean, we’re all just coming straight off the hills of KubeCon, right? So this big, 12,000 people getting together. We’re super excited about Kubernetes and the Mister V event, things are wrapping up there as well. When we start to think about Kubernetes and what’s going to happen, and a lot of people saw Amazon jump in with Fargate for EKS, right? So those unfamiliar with that offering, over the years, all the cloud providers have been providing some hosted Kubernetes offering, the ideas that the cloud provider, just like we do with hypervisors and virtual machines, would provide this base infrastructure so you can focus on using Kubernetes. You’ve seen this even flow down on-prem with VMware, right? VMware saying, “Hey, Kubernetes is going to be a part of this control plane that you can use to Kubernetes’ API to manage virtual machines and containers on-prem.” So at some point now, where do we go from here? There’s a big serverless movement, which is trying to eliminate infrastructure for all kinds of components, whether that’s compute, database as a storage. But even in the Kubernetes world, I think there’s an appetite when we saw this with Fargate, that we need to make the Kubernetes cluster disappear, right? If we can make it disappear, then we can focus on building new platforms that extend the API or, hell, just using Kubernetes as is without thinking about managing nodes, operating systems and autoscalers. I think that’s kind of been the topic that I’m pretty interested in talking about, because that feature means lots of things disappear, right? Programming languages and compilers made assembly disappear for a lot of developers. Assembly is still there. I think people get caught up on nothing goes away. They’re right. Nothing goes away, but the number of people who have to interact with that thing is greatly reduced. [00:05:21] BL: You know what, Kelsey? I’m going to have you get out of my brain, because that was the exact example that I was going to use. I was on a bus today and I was thinking about all the hubbub, about the whole Fargate EKS thing, and then I was thinking, “Well, Go, for example, can generate assembler and then it compiles that down.” No one complains about the length of the assembler that Go generates. Who cares? That’s how we should think about this problem. That’s a whole solvable problem. Let’s think about bigger things. [00:05:51] KH: I think it’s because in operations we tend to identify ourselves as the people responsible for running the nodes. We’re the people responsible for tuning the API server. When someone says it’s going to go away, in ops – And you see this in some parts, right? Ops, some people focus a lot more on observability. They can care less about what machine something runs on. They’re still going to try to observe and tune it. You see this in SRE and some various practices. But a lot of people who came up in a world like I have in a traditional ops background, you were the one that pixie-booted the server. You installed that Linux OS. You configured it with Puppet. When someone tells you, “We’re going to move on from that as if it’s a good thing.” You’re going to be like, “Hold up. That’s my job.” [00:06:36] DC: Definitely. We’ve touched this topic through a couple of different times on this show as well, and it definitely comes back to like understanding that, in my opinion, it’s not about whether there will be a worker for people who are in operations, people who want to focus on that. The real question that come to mind is like there is so much of that work that how are so few of us are going to be able to accomplish it unless we radically re-sync how it will be done. We’re vastly outnumbered. The number of people walking into the internet for the first time every day is mind-boggling. [00:07:08] KH: In early days, we have this goal of abstract or automating ourselves out of a job, and anyone that tried that a number of times knows that you’re always going to have something else to do. I think if we carry that to the infrastructure, I want to see the ops folks. I was very surprised that Docker didn’t come from operations folks. It came from the developer folks. Same thing for Vagrant and the same thing from Kubernetes. These are developer-minded folks that want to tackle infrastructure problems. If I think if ops were to put more skin in the game earlier on, definitely capable of building these systems and maybe they even end up more mature as more operations people put ops-minded thinking to these problems. [00:07:48] BL: Well, that’s exactly what we should do. Like you said, Kelsey, we will always have a job. Whenever we solve one problem, we could think about more interesting problems. We don’t think about Linux on servers anymore. We just put Linux on servers and we run it. We don’t think about the 15 years where it was little rocky. That’s gone now. So think about what we did there and let’s do that again with what we’re doing now. [00:08:12] KH: Yeah. I think the Prometheus community is a good example of operations-minded folks producing a system. When you meet the kind of the originators of Prometheus, they took a lot of their operational knowledge and kind of build this metrics and monitoring standard that we all kind of think about now when we talk about some levels of observability, and I think that’s what happens when you have good operations people that take prior experience, the knowledge, and that can happen over code these days. This is the kind of systems they produce, and it’s a very robust and extensible API that I think you start to see a lot of adaption. [00:08:44] BL: One more thing on Prometheus. Prometheus is six-years-old. Just think about that, and that’s not done yet, and it’s just gotten better and better and better. We go to give up our old thing so we can get better and better and better. That’s just what I want to add. [00:08:58] MG: Kelsey, if you look at the – Basically your own history of coming from ops, as I understood your own history, right? Now being kind of one of the poster childs in the Kubernetes world, you see the world changing to serverless, to higher abstractions, more complex systems on one hand, but then on the other side, we have ops. Looking beyond or outside the world of Silicon Valley into the traditional ops, traditional large enterprise, what do you think is the current majority level of these ops people? I don’t want to discriminate anyone here. I’m just basically throwing this out as a question. Where do you think do they need to go in terms of to keep up with this evolving and higher level abstractions where we don’t really care about nitty-gritty details? [00:09:39] KH: Yes. So this is a good, good question. I spent half of my time. So I probably spent time onsite with at least 100 customers a year globally. I fly on a plane and visit them in their home turf, and you definitely meet people at various skill levels and areas of responsibility. I want to make sure that I’m clear about the areas of responsibility. Sometimes you’re hired in an area of responsibility that’s below your skillset. Some people are hired to manage batch jobs or to translate files from XML to JSON. That really doesn’t say a lot about their skillset. It just kind of talks about the area of responsibility. So shout out to all the people that are dealing with main frames and having to deal with that kind of stuff. But when you look at it, you have the opportunity to rise up to whatever level you want to be in in terms of your education. When we talk about this particular question, some people really do see themselves as operators, and there’s nothing wrong with that. Meaning, they could come in. They get a system and they turn the knobs. You gave me a mainfrastructure me, I will tell you how to turn the knobs on that mainframe. You buy me a microwave, I’ll tell you how to pop popcorn. They’re not very interested in building a microwave. Maybe they have other things that are more important to them, and that is totally okay. Then you have people who are always trying to push the boundaries. Before Kubernetes, if I think back to 10 years ago, maybe 8. When I was working in a traditional enterprise, like kind of the ones you’re talking about or hinting at, the goal has always been to abstract away all of these stuff that it means to deploy an application the right way in a specific environment for that particular company. The way I manage to do it was say, “Hey, look. We have a very complex change in management processes.” I work in finance at that time. So everything had to have a ticket no matter how good the automation was. So I decided to make JIRA the ticketing system their front door to do everything. So you go to JIRA. There’ll be a custom field that says, “Hey, here are all the RPMs that have been QA’d by the QA team. Here are all the available environments.” You put those two fields in. That ticket goes to change in management and approval, and then something below the scenes automated everything, in that case it was Puppet, Red Hat and VMware, right? So I think what most people have been doing if you’re in the world of abstracting this stuff away and making it easier for the company to adapt, you’ve already been pushing these ideas that we call serverless now. I think the cloud providers put these labels on platforms to describe the contract between us and the consumer of the APIs that we present. But if you’re in operations, you should have been trying to abstract away all of these stuff for the last 10 or 15 years. [00:12:14] BL: I 100% agree. Then also, think about other verticals. So 23 years ago, I did [inaudible 00:12:22] work. That was my job. But we learned how to program in C and C++ because we were on old Suns, not even Spark machines. We’re on the old Suns, and we wanted to write things in CVE and we wanted to write our own Window managers. That is what we’re doing right now, and that’s why you see like Mitchell Hashimoto with Vagrant and you’re seeing how we’re pushing this thing. We have barely scratched the surface of what we’re trying to do. For a lot of people who are just ops-minded, understand that being ops-minded is just the end. You have to be able to think outside of your boundaries so you can create the next big thing. [00:12:58] KH: Of you may not care about creating the next big thing. There are parts of my life where I just don’t care. For example, I pay Comcast to get internet access, and my ops involvement was going to BestBuy and buying a modem and screwing it into the wall, and I troubleshoot this thing every once in a while when someone in the household complains the internet is down. But that’s just far as I’m ever going to push the internet boundaries, right? I am not really interested in pushing that forward. I’m assuming others will, and I think that’s one thing in our industry where sometimes we believe that we all need to contribute to pushing things forward. Look, there’s a lot of value in being a great operations person. Just be welcomed to saying that what we operate will change overtime. [00:13:45] DC: Yeah, that’s fair. Very fair. For me, personally, I definitely identify as an operations person. I don’t consider it my life’s goal to create new work necessarily, but to expand on the work that has been identified and to help people understand the value of it. I find I sit in between two roles personally. One is to help figure out all of the different edges and pieces and parts of Kubernetes or some other thing in the ecosystem. Second, to educate others on those things, right? Take what I’ve learned and amplify it. Having the amplifying effect. [00:14:17] CC: One thing that I wanted to ask you, Kelsey is – I work on the Valero project, and that does back and recovery of Kubernetes clusters. Some people ask me, “Okay. So tell me about the people who are doing?” I’m like, “I don’t want to talk about that. That’s boring. I wanted to talk about the people who are not doing backups.” “Okay. Let’s talk about why you should be doing maybe thinking about that.” Well, anyway. I wonder if you get a lot of questions in the area of Kubernetes operations or cloud native in general, infrastructure, etc., that in the back of your mind you go, “That’s the wrong question or questions.” Do you get that? [00:14:54] KH: Yeah. So let’s use your backup example. So I think when I hear questions, at least it lets me know what people are thinking and where they’re at, and if I ask enough questions, I can kind of get a pulse in the trend of where the majority of the people are. Let’s take the backups questions. When I hear people say, “I want to back up my Kubernetes cluster.” I rewind the clock in my mind and say, “Wow! I remember when we used to backup Linux servers,” because we didn’t know what config files were on the disk. We didn’t know where processes are running. So we used to do these PS snapshots and we used to pile up the whole file system and store it somewhere so we can recover it. Remember Norton Ghost? You take a machine and ghost it so you can make it again. Then we said, “You know what? That’s a bad idea.” What we should be doing is having a tool that can make any machine look like the way we want it. Config management is boring. So we don’t back those up anymore. So when I hear that question I say, “Hmm, what is happening in the community that’s keeping people to ask these questions?” Because if I hear a bunch of questions that already have good answers, that means those answers aren’t visible enough and not enough people are sharing these ideas. That should be my next key note. Maybe we need to make sure that other people know that that is no longer a boring thing, even though it’s boring to me, it’s not boring to the industry in general. When I hear these question I kind of use it as a keeps me up-to-date, keeps me grounded. I hear stuff like how many Kubernetes clusters should I have? I don’t think there’s a best practice around that answer. It depends on how your company segregates things, or depends on how you understand Kubernetes. It depends on the way you think about things. But I know why they’re asking that question, is because Kubernetes presents itself as a solution to a much broader problem set than it really is. Kubernetes manages a group of machines typically backed by IS APIs. If you have that, that’s what it does. It doesn’t do everything else. It doesn’t tell you exactly how you should run your business. It doesn’t tell you how you should compartmentalize your product teams. Those decisions you have to make independently, and once you do, you can serialize those into Kubernetes. So that’s the way I think about those questions when I hear them, like, “Wow! Yeah, that is a crazy thing that you’re still asking this question six years later. But now I know why you’re asking that question.” [00:17:08] CC: That is such a great take on this, because, yes, it in the area of backup, people who are doing backup in my mind – Yeah, they should be independent of Kubernetes or not. But let’s talk about the people who are not doing backups. What motivates you to not do backups? Obviously, backups can be done in many different ways. But, yes. [00:17:30] BL: So think about it like this way. Some people don’t exercise, because exercise is tough and it’s hard, and it’s easier to sit on the couch and eat a bag of potato chips than exercise. It’s the same thing with backups. Well, backing up my Kubernetes cluster before Valero was so hard that I’d rather just invest brain cycles in figuring out how to make money. So that’s where people come from when it comes to hard things like backups. [00:17:52] KH: There’s a trust element too, right? Because we don’t know if the effort we’re putting in is worth it. When people do unit testing, a lot of times unit testing can be seen as a proactive activity, where you write unit tests to catch bugs in the future. Some people only write unit test when there’s a problem. Meaning, “Wow! There’s an odd things in a database. Maybe we should write a test to prove that our code is putting odd things. Fix the code, and now the test pass.” I think it’s really about trusting that the investment is worth it. I think when you start to think about backups – I’ve seen people back up a lot of stuff, like every day or every couple of hours, they’re backing up their database, but they’d never restored the database. Then when you read their root cause analysis, they’re like, “Everything was going fine until we tried to restore a 2 terabyte database over 100 meg link. Yeah, we never exercised that part.” [00:18:43] CC: That is very true. [00:18:44] DC: Another really fascinating thing to think about the backup piece is that especially like in the Kubernetes with Valero and stuff, we’re so used to having the conversation around stateless applications and being able to ensure that you can redeploy in the case of a failure. You’re not trying to actually get back to a known state the way that like a backup traditionally would. You’re just trying to get back to a running state. So there’s a bit of a dichotomy there I think for most folks. Maybe they’re not conceptualizing the need for having to deal with some of those stateful applications when they start trying to just think about how Valero fits into the puzzle, because they’ve been told over and over again, “This is about immutable infrastructure. This is about getting back to running. This is not about restoring some complex state.” So it’s kind of interesting. [00:19:30] MG: I think part of this is also that for the stateful services that why we do backups actually, things change a lot lately, right? With those new databases, scale out databases, cloud services. Thinking about backup also has changed in the new world of being cloud native, which for most of the people, that’s also a new learning experiment to understand how should I backup Kafka? It’s replicated, but can I backup it? What about etcd and all those things? Little different things than backing up a SQL database like more traditional system. So backup, I think as you become more complex, stays if needed for [inaudible 00:20:06]. [00:20:06] KH: Yeah. The case is what are you backing up and what do you hope to restore? So replication, global replication, like we do with like cloud storage and S3. The goal is to give some people 11 9s of reliability and replicate that data almost as many geographies as you can. So it’s almost like this active backup. You’re always backing up and restoring as a part of the system design versus it being an explicit action. Some people would say the type of replication we do for object stores is much closer to active restoring and backing up on a continuous basis versus a one-time checkpoint. [00:20:41] BL: Yeah. Just a little bit of a note, you can back up two terabytes over 100 meg link in like 44 hours and a half. So just putting out there, it’s possible. Just like two days. But you’re right. When it comes to backups, especially for like – Let’s say you’re doing MySQL or Postgres. These days, is it better to back it up or is it better to have a replica right next to it and then having like a 10 minute delayed replica right next to that and then replicating to Europe or Asia? Then constantly querying the data that you’re replicating. That’s still a backup. What I’m saying here is that we can change the way that we talk about it. Backup started as conventional as they used to be. There are definitely other ways to protect your data. [00:21:25] KH: Yeah. Also, I think the other part too around the backup thing is what is the price of data loss? When you take a backup, you’re saying, “I’m willing to lose this much data between the last backup and the next.” That cost is too high than backing up cannot be your primary mode of operation, because the cost of losing data is way too high, then replication becomes a complementing factor in the whole discussion of backups versus real-time replication and shorter times to recovery. I have a couple of questions. When should people not use Kubernetes? Do you know what I mean? I visit a lot of customers, I work with a lot of eng teams, and I am in the camp of Kubernetes is not for everything, right? That’s a very obvious thing to say. But some people don’t actually practice it that way. They’re trying to jam more and more into Kubernetes. So I love to get your insights on where do you see Kubernetes being like the wrong direction for some folks or workloads. [00:22:23] MG: I’m going to scratch this one from my question list to Kelsey. [00:22:26] KH: I’ll answer it too then. I’ll answer it after you will answer it. [00:22:29] MG: Okay. Who wants to go first? [00:22:30] BL: All right. I’ll go first. There are cases when I’m writing a piece of software where I don’t care about the service discovery. I don’t care about ingress. It’s just software that needs to run. When I’m running it locally, I don’t need it. If it’s simple enough where I could basically throw it into a VM through a CloudNet script, I think that is actually lower friction than Kubernetes if it’s simple. Now, but I’m also a little bit jaded here, because I work for the dude who created Kubernetes, and I’m paid to create solutions for Kubernetes, but I’m also really pragmatic about it as well. It’s all about effort for me. If I can do it faster in CloudNet, I will. [00:23:13] DC: For my part, I think that there’s – I have a couple of – I got follow on questions to this real quick. But I do think that if you’re not actively trying to develop a distributed systems, something where you’re actually making use of the primitives that Kubernetes provides, then that already would kind of be a red flag for me. If you’re building a monolithic application or if you’re in that place where you’re just rapidly iterating on a SaaS product and you’re just trying to like get as many commits on this thing until it works and like just really rapidly prototype or even create this thing. Maybe Kubernetes isn’t the right thing, because although we’ve come a long way in improving the tools that allow for that iteration, I certainly wouldn’t say that we’re like all the way there yet. [00:23:53] BL: I would debate you that, Duffy. [00:23:55] DC: All right. Then the other part of it is Kubernetes aside, I’m curious about the same question as it relates to containerization. Is it containerization the right thing for everyone, or have we made that pronouncement, for example? [00:24:08] KH: I’m going to jump in and answer on this one, because I definitely think we need a way to transport applications in some way, right? We used to do it on floppy disks. We used to do it on [inaudible 00:24:18]. I think the container to me I treat as a glorified [inaudible 00:24:23]. That’s the way I’ve been seeing it for years. Registry store them. They replace [inaudible 00:24:28]. Great. Now we kind of have a more maybe universal packaging format that can handle simple use cases, scratch containers where it’s just your binary, and the more complex use cases where you have to compose multiple layers to get the output, right? I think RPM spec files used to do something very similar when you start to build those thing in [inaudible 00:24:48], “All right. We got that piece.” Do people really need them? The thing I get weary about is when people believe they have to have Kubernetes on their laptop to build an app that will eventually deploy to Kubernetes, right? If we took that thinking about the cloud, then everyone would be trying to install open stack on their laptop just to build an app. Does that even make sense? Does that make sense in that context? Because you don’t need the entire cloud platform on your laptop to build an app that’s going to take a request and respond. I think Kubernetes people, I guess because it’s easier to put your on laptop, people believe that it needs to be there. So I think Kubernetes is overused, because people just don’t quite understand what it does. I think there’s a case where you don’t use Kubernetes, like I need to read a file from a bucket. Someone uploaded an XML file and my app is going to translate it into JSON. That’s it. In that case, this is where I think functions as a service, something like Cloud Run or even Heroku make a lot more sense to me because the operational complexity is kind of hitting within a provider and is linked almost like an SDK to the overall service, which is the object store, right? The compute part, I don’t want to make a big deal about, because it’s only there to process the file that got uploaded, right? It’s almost like a plug-in to an FTP server, if you will. Those are the cases where I start to see Kubernetes become less of a need, because I need a custom platform to do such an obvious operation. [00:26:16] DC: Those applications that require the primitives that Kubernetes provides, service discovery, the ability to define ingress in a normal way. When you’re actually starting to figure out how you’re going to platform that application with regard to those primitives, I do see the argument for having Kubernetes locally, because you’re going to be using those tools locally and remotely. You have some way of defining what that platforming requirement is. [00:26:40] KH: So let me pull on that thread. If you have an app that depends on another app, typically we used to just have a command line flag that says, “This app is over there.” Local host when it’s on my laptop. Some DNS name when it’s in the cluster, or a config file can satisfy that need. So the need for service discovery usually arises where you don’t know where things are. But if you’re literally on your laptop, you know where the things are. You don’t really have that problem. So when you bring that problem space to your laptop, I think you’re actually making things worse. I’ve seen people depend on Kubernetes service discovery for the app to work. Meaning, they just assume they can call a thing by name and they don’t support IPs, and ports. They don’t support anything, because they say, “Oh! No. No. No. You’ll always be running into Kubernetes.” You know what’s going to happen? In 5 or 10 years, we’re going to be talking like, “Oh my God! Do you remember when you used to use Kubernetes? Man! That legacy thing. I built my whole career porting apps away from Kubernetes to the next thing.” The number one thing we’ll talk about is where people lean too hard on service discovery, or people who built apps that taught to config maps directly. Why are you calling the Kubernetes API from your app? That’s not a good design. I think we got to be careful coupling ourselves too much to the infrastructure. [00:27:58] MG: It’s a fair point too. Two answers from my end, to your question. So one is I just build an appliance, which basically priced to bring an AWS Lambda experience to the Vsphere ecosystem. Because we don’t – Or actually my approach is that I don’t want any ops people who needs to do some one-off things, like connect this guy to another guy. I don’t want him to learn Kubernetes for that. It should be as simple as writing a function. So for that appliance, we had to decide how do we build it? Because it should be scalable. We might have some function as a service component running on there. So we looked around and we decided to put it on Kubernetes. So build the appliance as a traditional VM using Kubernetes on top. For me as a developer, it gave me a lot of capabilities, like self-healing, the self-healing capabilities. But it’s also a fair point that you wrote, Kelsey, about how much do we depend or write our applications being depend on those auxiliary features from Kubernetes? Like self-healing, restarts, for example. [00:28:55] KH: Well, in your case, you’re building a platform. I would hate for you to tell me that you rebuilt a Kubernetes-like thing just for that appliance. In your case, it’s a great use case. I think the problem that we have as platform builders is what happens when things start leaking up to the user? You tell a user all they have to care about is functions. Then they get some error saying, “Oh! There’s some Kubernetes security context that doesn’t work.” I’m like, “What the hell is Kubernetes?” That leakage is the problem, and I think that’s the part where we have to be careful, and it will take time, but we don’t start leaking the underlying platform making the original goal untrue. [00:29:31] MG: The point is where I wanted to throw this question back was now these functions being written as simple scripts, whatever, and the operators put in. They run on Kubernetes. Now, the operators don’t know that it runs in Kubernetes. But going back to your question, when should we not use Kubernetes. Is it me writing in a higher level abstraction like a function? Not using Kubernetes in first sense, because I don’t know actually I’m using it. But on the covers, I’m still using it. So it’s kind of an answer and not an answer to your question because – [00:29:58] KH: I’ve seen these single node appliances. There’s only one node, right? They’re only there to provide like email at a grocery store. You don’t have a distributed system. Now, what people want is the Kubernetes API, the way it deploys things, the way it swaps out a running container for the next one. We want that Kubernetes API. Today, the only way to get it is by essentially bringing up a whole Kubernetes cluster. I think the K3S project is trying to simplify that by re-implementing Kubernetes. No etcd, SQLite instead. A single binary that has everything. So I think when we start to say what is Kubernetes, there’s the implementation, which is a big distributed system. Then there’s the API. I think what’s going to happen is if you want the Kubernetes API, you’re going to have so many more choices on the implementation that makes better sense for the target platform. So if you’re building an appliance, you’re going to look at K3S. If you’re a cloud provider, you’re going to probably look something like what we see on GitHub, right? You’re going to modify and integrate it into your cloud platform. [00:31:00] BL: Of maybe what happened with Kubernetes over the next few years is what happened with the Linux API, or the API. Firecracker and gVisor did this, and WSL did this. We can basically swap out Linux from the backend because we can just get on with the calls. Maybe that will happen with Kubernetes as well. So maybe Kubernetes will become a standard where Kubernetes standard and Kubernetes implementation that we have right now. I don’t even know about that one. [00:31:30] KH: We’re starting to see it, right? When you say here is my pod, and we can just look at Fargate for EKS as an example. When you give them a pod, their implementation is definitely different than what most people are thinking about running these days, right? One pod per VM. Not using Virtual Kube. So they’ve taken that pod spec and tried to uphold its means. But the problem with that, you get leaks. For example, they don’t allow you to bind to a host 4. Well, the pod spec says you can bind to a host 4. Their implementation doesn’t allow you to do it, and we see the same problem with gVisor. It doesn’t implement all the system calls. You couldn’t run the Docker daemon on top of gVisor. It wouldn’t work. So I think as long as we don’t leak, because when we leak, then we start breaking stuff. [00:32:17] BL: So we’re doing the same thing with Project Pacific here at VMware, where this concept of a pod is actually a virtual machines that loops in like a tenth of a second. It’s pretty crazy how they’ve been able to figure that out. If we can get this right, that’s huge for us. That means we can move out of our appliance and we can create better things that actually work. I’m VMware specific. I’m on AWS and I want this name space. I can use Fargate and EKS. That’s actually a great idea. [00:32:45] MG: I remember this presentation, Kelsey, that you gave. I think two or three years ago. It might be three years, where you took the Kubernetes architecture and you removed the boxes and the only thing remaining was the API server. This is where it clicked to me as like, “This is right,” because I was focused on the scheduler. I wanted to understand the scheduler. But then you zoomed out or your stripped off all these pieces and the only thing remaining was the API server. This is where it clicked to me. It’s like [inaudible 00:33:09] or like the syscall interface. It’s basically my API to do some crazy things that I would have write on my own and assembly kind of something before I could even get started. As well the breakthrough moment for me, this specific presentation. [00:33:24] KH: I’m working on an analogy to talk about what’s happening with the Kubernetes API, and I haven’t refined it yet. But when the web came out, we had all of these HTTP verbs, put post git. We have a body. We have headers. You can extract that out of the whole web, the web browser plus the web server. If you have tracked out that one piece, the instead of building web package, we can build APIs and GraphQL, because we can reuse many of those mechanisms, and we just call that RESTful interfaces. Kubernetes is going through the same evolution, right? The first thing we built was this container orchestration tool. But if you look at the CRDs, the way we do RBAC, the way we think about the status field in a custom object, if you extract those components out, then you end up with this Kubernetes style APIs where we start to treat infrastructure not as code, but as data. That will be the restful moment for Kubernetes, right? The web, we extracted it out, then we have REST interfaces. In Kubernetes, once we extracted out, we’ll end up with this declarative way of describing maybe any system. But right now, the fine, or the perfect match is infrastructure. Infrastructure as data and using these CRDs to allow us to manipulate that data. So maybe you start with Helm, and then Helm gets piped into something like Customize. That then gets piped into a mission controller. That’s how Kubernetes actually works, and that data model to API development I think is going to be the unique thing that lasts longer then the Kubernetes container platform does. [00:34:56] CC: But if you’re talking about – Correct me if I misinterpret it, platform as data. Data to me is meant to be consumed, and I actually have been thinking since you said, “Oh, developers should not be developing apps that connect directly to Kubernetes,” or I think you said the Kubernetes API. Then I was thinking, “Wait. I’ve heard so many times people saying that that’s one great benefit of Kubernetes, that the apps have that access.” Now, if you see my confusion, please clarify it. [00:35:28] KH: Yeah. Right. I remember early on when we’re doing config maps, and a big debate about how config maps should be consumed by the average application. So one way could be let’s just make a configs map API and tell every developer that they need to import a Kubernetes library to call the API server, right? Now everybody’s app doesn’t work anymore on your laptop. So we were like, “Of course not.” What we should do is have config maps be injected into the file system. So that’s why you can actually describe a config map as a volume and say, “Take these key values from the config map and write them as normal files and inject them into the container so you can just read them from the file system. The other option also was environment variables. You can take a config map and translate them into an environment variables, and lastly, you can take those environment variables and put them into command line flags. So the whole point of that is all three of the most popular ways of configuring an app, environment variables, command line flags and files. Kubernetes molded itself into that world so that developers would never tightly couple themselves to the Kubernetes API. Now, let’s say you’re building a platform, like you’re building a workflow engine like Argo, or you’re building a network control plane like Istio. Of course, you should use a Kubernetes API. You’re building a platform on top of a platform. I would say that’s kind of the exception to the rule if you’re building a platform. But a general application that’s leveraging the platform, I really think you should stay away from the Kubernetes API directly. You shouldn’t be making sys calls directly [inaudible 00:37:04] of your runtime. The unsafe package in Go. Once you start doing that, Go can’t really help you anymore. You start pining yourself to specific threads. You’re going to be in a bad time. [00:37:15] CC: Right. Okay. I think I get it. But you can still use Kubernetes to decouple your app from the machine by using objects to generate those dependencies. [00:37:25] KH: Exactly. That was the whole benefit of Kub, and Docker even, saying, “You know what? Don’t worry too much more about C groups and namespaces. Don’t even try to do that yourself.” Because remember, there was a period of time where people were actually trying to build C groups and network namespaces into the runtime. There’s a bunch of like Ruby and Python projects that they were trying to containerize themselves within the runtime. Whoa! What are we doing? Having that second layer now with Containerd on C, we don’t have to implement that 10,000 times for every programming language. [00:37:56] DC: One of the things I want to come back to is your point that you’d made about the Kubernetes API being like one of the more attractive parts of the projects, and people needing that to kind of move forward in some of these projects, and I wonder if it’s more abstract than that. I wonder if it’s abstract enough to think about in terms of like a level triggered versus edge triggered stuff. Taking control theory, the control theory that basically makes Kubernetes such a stable project and applying that to software architecture rather than necessarily bringing the entire API with you. Perhaps, what you should take from this is the lessons that we’ve learned in developing Kubernetes and apply that to your software. [00:38:33] KH: Yeah. I have the fortunate time to spend some time with Mark Burgess. He came out with the Promise Theory, and the Promise Theory is the underpinnings of Puppet Chef, Ansible, CF Engine, and this idea that we would make promises about something and eventually convergent to that state. The problem was with Puppet Chef and Ansible, we’re basically doing this with shell scripts and Ruby. We were trying to write all of these if, and, else statements. When those didn’t work, what did you do? You made an exec statement at the bottom and then you’re like, “Oh! Just run some batch, and who knows what’s going to happen?” That early implementations of Promise Theory, we didn’t own the resource that we were making promises about. Anyone could go behind this and remove the user, or the user could have a different user ID on different systems but mean the same thing. In the Kubernetes world, we push a lot of that if, else statements into the controller. Now, we force the API not have any code. That’s the big difference. If you look at the Kubernetes API, you can’t do if statements. Terraform, you can do if statements. So you kind of fall into the imperative trap at the worst moments when you’re doing dry runs or something like that. It does a really good of it. Don’t get me wrong. So the Kubernetes API says, “You know what? We’re going to go all-in on this idea.” You have to change the controller first and then update the API. There is no escape patches in the API. So it forces a set of discipline that I think gets us closer to the promises, because we know that the controller owns everything. There’s no way to escape in the API itself. [00:40:07] DC: Exactly. That’s exactly what I was pushing for. [00:40:09] MG: I have a somewhat related question and I’m just not sure how to frame it correctly. So yesterday I saw a good talk by someone talking about protocols, like they somewhat forgotten power of protocols in the world of APIs. We got Swagger. We got API definitions. But he made the very easy point of if I give you an open, a close and a write and read method, or an API, you’d still don’t know how to call them in sequence and which one to call it off. This is same for [inaudible 00:40:36] library if you look at that. So I always have to force myself, “Should I do anything [inaudible 00:40:40] or I’m not leaking some stuff.” So I look it up. Versus on protocols, if you look at the RFC definitions, they are very, very precise and very plainly outlined of what you should do, how you should behave, how you should communicate between these systems. This is more of a communication and less about the actual implementation of an API. I still have to go through that talk again, and I’m going to put it in the show notes. But this kind of opened my mind again a little bit to think more about communication between systems and contracts and promises, as you said, Carlisia. Because we make so many assumptions in our code, especially as we have to write a lot of stuff very quickly, which I think will make things brittle overtime. [00:41:21] KH: So the gift and the curse of Kubernetes that it tries to do both all the time. For some things like a pod or a deployment, we all feel that. If I give any Kubernetes cluster a deployment object, I’m going to get back out running pod. This is what we all believe. But the thing is it may not necessarily run on the same kernel. It may not run on the same OS version. It may not even run on the same type of infrastructure, right? This is where I think Kubernetes ends up leaking some of those protocol promises. A deployment gets you a set of running pods. But then we dropdown to a point where you can actually do your own API and build your own protocol. I think you’re right. Istio is a protocol for thinking about service mesh, whereas Kubernetes provides the API for building such a protocol. [00:42:03] MG: Yeah, good point. [inaudible 00:42:04]. [00:42:04] DC: On the Fargate stuff, I thought was a really interesting article, or actually, an interesting project by [inaudible 00:42:10], and I want to give him a shout out on this, because I thought that was really interesting. He wrote an admission controller that leverages autoscaler, node affinity and pod affinity to effectively do the same thing so that whenever there is a new pod created, it will spin up a new machine and associate only that pod with that machine. I was like, “What a fascinating project.” But also just seeing this come up from like the whole Fargate ECS stuff. I was like – [00:42:34] KH: I think that’s the thread that virtual kubelet is pulling on, right? This idea that you can simplify autoscalling if you remove that layer, right? Because right now we’re trying to do this musical chairs dance, right? Like in a cloud. Imagine if someone gave you the hypervisor and told you you’re responsible for attaching hypervisor workers and the VMs. It would be a nightmare. We’re going to be talking about autoscalling the way we do in the cloud. I think Kubernetes moving into a world where a one pod per resource envelope. Today we call them VMs, but I think at some point we’re going to drop the VM and we would just call it a resource envelope. VMs, this is the way we think about that, Firecrackers. Like, “Hey, does it really need to be a complete VM?” Firecracker is saying, “No. It doesn’t. It just needs to be a resource envelope that allows you to run their particular workload.” [00:43:20] DC: Yeah. Same thing we’re doing here. It’s just enough VM to get you to the point where you can drop those containers on to it. [00:43:25] CC: Kelsey, question. Edge? Kubernetes on edge. Yes or no? [00:43:29] KH: Again, it’s just like compute on edge has been a topic for discussion forever. Problem is when some people say compute on edge, they mean like go buy some servers from Dell and put it in some building somewhere close to your property as you can. But then you have to go build the APIs to deploy it to that edge. What people want, and I don’t know how far off it is, but Kubernetes has set the bar so high that the Kubernetes API comes with a way to low balance, attach storage, all of these things by just writing a few YAML files. What I hear people saying is I want that close to my data center or store as possible. When you say Kubernetes on the edge, that’s what they’re saying, is like, “But we currently have one at edge. It’s not enough.” We’ve been providing edge for a very longtime. Open stack was – Remember open stack? Oh! We’re going to do open stack on the edge. But now you’re a pseudo cloud provider without the APIs. I think what Kubernetes is bringing to the table is that we have to have a default low balancer. We have to have a default block store. We have to have a default everything and on or for to mean Kubernetes like it does today centralized. [00:44:31] BL: Well, Doors have been doing this forever in some form or another. 20 years ago I worked for a Duty Free place, and literally traveled all over the world replacing point of sale. You might think of point of sales as a cash register. There was a computer in the back and it was RS-232 links from the cash register to the computer in the back. Then there was dial-up, or [inaudible 00:44:53] line to our central thing. We’ve been doing edge for a long time, but now we can do edge. The central facility can actually manage the compute infrastructure. All they care about is basically CPU and memory and network storage now, and it’s a lot more flexible. The surety is long, but I think we’re going to do it. It’s going to happen, and I think we’re almost right – People are definitely experimenting. [00:45:16] KH: You know what, Carlisia? You know what’s interesting now though? I was watching the Reinvent announcement. Verizon is starting to allow these edge components to leverage 5G for the last mile, and that’s something game-changer, because most people are very skeptical about 5G being able to provide the same coverage as 4G because of the wavelength and point-to-point, all of these things. But for edge, this thing is a game-changer. Higher bandwidth, but shorter distance. This is exactly what edge want, right? Now you don’t have to dig up the ground and run fiber from point-to-point. So if you could buy in these Kubernetes APIs, plus concepts like 5G, and get in that closer to people, yeah, I think that’s going to change the way we think about regions and zones. That kind of goes away. We’re going to move closer to CDNs, like Cloudflare has been experimenting with their worker technology. [00:46:09] DC: On the edge stuff, I think that there’s also an interesting dichotomy happening, right? There’s a definition of edge that we referred to, which is storage stuff and one that you’re alluding to, which is that there may be like some way of actually having some edge capability and a point of presence in a 5G tower or some point with that. In some cases, edge means data gravity. You’re actually taking a bunch of data from sensors and you’re trying to store it in a place where you don’t have to pay the cost of moving all of the data form one point to another where you can actually centralize compute. So in those edge cases, you’re actually willing to invest in a high-end compute to allow for the manipulation of that data where that data lake is so that you can afford to move it into some centralized location later. But I think that that whole space is so complex right now, because there are so many different definitions and so many different levels of constraints that you have to solve for under one umbrella term, which is the edge. [00:47:04] KH: I think Bryan was pulling on that with the POS stuff, right? Because instead of you going to go buy your own cash registry and gluing everything together, that whole space got so optimized that you can just buy a square terminal. Plug it on some Wi-Fi and then there you go, right? You now have that thing. So once we start to do this for like ML capabilities, security capabilities, I think you’re going to see that POS-like thing expand and that computer get a little bit more robust to do exactly what you’re saying, right? Keep the data local. Maybe you ship models to that thing so that it can get smarter overtime, and then upload the data from various stores overtime. [00:47:40] DC: Yup. [00:47:40] MG: One last question from my end. Switching gears a bit, if allow it. KubeCon. I left KubeCon with some mixed feelings this years. But my perspective is different, because I’m not the typical, one of the 12,000 people, because most of them were new comers actually. So I looked at them and I asked myself, “If I would be new to this huge big world of CNCF and Kubernetes and all these stuff, what would I take from that?” I would be confused. Confused like how from [inaudible 00:48:10] talks, which make it sound like it’s so complex to run all these things through the keynotes, which seems to be like just a lineup of different projects that I all have to get through and install and run. I was missing some perspective and some clarity from KubeCon this year, especially for new comers. Because I’m afraid, if we don’t retain them, attract them, and maybe make them contributors, because that’s another big problem. I’m afraid that we’ll lose our base that is using Kubernetes. [00:48:39] BL: Before Kelsey says anything, and Kelsey was a Kub contrary before I was, but I was a Kub contrary this time, and I can tell you exactly why everything is like it is. Well, fortunately and unfortunately, this cloud native community is huge now. There’s lots of money. There are lots of people. There are lots of interests. If we went back to KubeCon when it was in San Francisco years ago, or even like the first Seattle one, that was a community event. We could make the event for the community. Now, there’s community. The people who are creating the products. There’s the end users, the people who are consuming the products, and there are these big corporations and companies, people who are actually financing this whole entire thing. We actually have to balance all three of those. As a person who just wants to learn, what are you trying to learn from? Are you learning from the consumption piece? Are you learning to be a vendor? Are you learning to be a contributor? We have to think about that. At a certain point, that’s good for Kubernetes. That means that we’ve been able to do the whole chasm thing. We’ve cross over to chasm. This thing is real. It’s big. It’s going to make a lot of people a lot of money one day. But I do see the issue for the person who’s trying to come in and say, “What do I do now?” Well, unfortunately, it’s like anything else. Where do you start? Well, you got to take it all in. So you need to figure out where you want to be. I’m not going to be the person that’s going to tell you, “Well, go do a sig.” That’s not it. What I want to tell you is like anything else that we’d have to learn is real hard, whether it’s a programming language or a new technique. Figure out where you want to be and you’re going to have to do some research. Then hopefully you can contribute. I’m sure Kelsey has opinions on this as well. [00:50:19] KH: I think Brian is right. I mean, I think it’s just like a pyramid happening. A the very bottom, we’re new. We need to get everybody together in one space and it becomes more of a tradeshow, like an introductory, like a tasting, right? When you’re hungry and you go and just taste everything. Then when you figure out what you want, then that will be your focus, and that’s going to change every year for a lot of people. Some people go from consumer to contributor, and they’re going to want something out of the conference. They’re only going to want to go to the contributor day and maybe some of the deep-dive technical tracks. You’re trying to serve everybody in two or three days. So you’re going to start to have like everything pulling for your attention. I think what you got to do is commit. If you go and you’re a contributor, or you’re someone what’s building on top, you may have to find a separate event to kind of go with it, right? Someone told me, “Hey, when you go to all of these conferences, make sure you don’t forget to invest in the one-on-one time.” Me going to Oslo and spending an evening with Mark Burgess and really talk about Promise Theory outside of competing for attention with the rest of the conference. When I go, I’d like to meet new people. Sit down with them. Out of the 12,000 people, I call it a win if I can meet three new people that I’ve never met before. You know what? I’ll do a follow-up hangout with them to go deeper in some areas. So I think it’s more of a catch all. It’s definitely has a tradeshow feel now, because it’s big and there’s a lot of money and opportunity involved. But at the same time, you got to know that, “Hey, you got to go and seek out.” You go to Spotif
Welcome to the first episode of The Podlets Podcast! On the show today we’re kicking it off with some introductions to who we all are, how we got involved in VMware and a bit about our career histories up to this point. We share our vision for this podcast and explain the unique angle from which we will approach our conversations, a way that will hopefully illuminate some of the concepts we discuss in a much greater way. We also dive into our various experiences with open source, share what some of our favorite projects have been and then we define what the term “cloud native” means to each of us individually. The contribution that the Cloud Native Computing Foundation (CNCF) is making in the industry is amazing, and we talk about how they advocate the programs they adopt and just generally impact the community. We are so excited to be on this podcast and to get feedback from you, so do follow us on Twitter and be sure to tune in for the next episode! Note: our show changed name to The Podlets. Follow us: https://twitter.com/thepodlets Hosts: Carlisia Campos Kris Nóva Josh Rosso Duffie Cooley Nicholas Lane Key Points from This Episode: An introduction to us, our career histories and how we got into the cloud native realm. Contributing to open source and everyone’s favorite project they have worked on. What the purpose of this podcast is and the unique angle we will approach topics from. The importance of understanding the “why” behind tools and concepts. How we are going to be interacting with our audience and create a feedback loop. Unpacking the term “cloud native” and what it means to each of us. Differentiating between the cloud native apps and cloud native infrastructure. The ability to interact with APIs as the heart of cloud natives. More about the Cloud Native Computing Foundation (CNCF) and their role in the industry. Some of the great things that happen when a project is donated to the CNCF. The code of conduct that you need to adopt to be part of the CNCF. And much more! Quotes: “If you tell me the how before I understand what that even is, I'm going to forget.” — @carlisia [0:12:54] “I firmly believe that you can't – that you don't understand a thing if you can't teach it.” — @mauilion [0:13:51] “When you're designing software and you start your main function to be built around the cloud, or to be built around what the cloud enables us to do in the services a cloud to offer you, that is when you start to look at cloud native engineering.” — @krisnova [0:16:57] Links Mentioned in Today’s Episode: Kubernetes — https://kubernetes.io/The Podlets on Twitter — https://twitter.com/thepodlets VMware — https://www.vmware.com/Nicholas Lane on LinkedIn — https://www.linkedin.com/in/nicholas-ross-laneRed Hat — https://www.redhat.com/CoreOS — https://coreos.com/Duffie Cooley on LinkedIn — https://www.linkedin.com/in/mauilionApache Mesos — http://mesos.apache.org/Kris Nova on LinkedIn — https://www.linkedin.com/in/kris-novaSolidFire — https://www.solidfire.com/NetApp — https://www.netapp.com/us/index.aspxMicrosoft Azure — https://azure.microsoft.com/Carlisia Campos on LinkedIn — https://www.linkedin.com/in/carlisiaFastly — https://www.fastly.com/FreeBSD — https://www.freebsd.org/OpenStack — https://www.openstack.org/Open vSwitch — https://www.openvswitch.org/Istio — https://istio.io/The Kublets on GitHub — https://github.com/heptio/thekubeletsCloud Native Infrastructure on Amazon — https://www.amazon.com/Cloud-Native-Infrastructure-Applications-Environment/dp/1491984309Cloud Native Computing Foundation — https://www.cncf.io/Terraform — https://www.terraform.io/KubeCon — https://www.cncf.io/community/kubecon-cloudnativecon-events/The Linux Foundation — https://www.linuxfoundation.org/Sysdig — https://sysdig.com/opensource/falco/OpenEBS — https://openebs.io/Aaron Crickenberger — https://twitter.com/spiffxp Transcript: [INTRODUCTION] [0:00:08.1] ANNOUNCER: Welcome to The Podlets Podcast, a weekly show that explores Cloud Native one buzzword at a time. Each week, experts in the field will discuss and contrast distributed systems concept, practices, tradeoffs and lessons learned to help you on your cloud native journey. This space moves fast and we shouldn’t reinvent the wheel. If you’re an engineer, operator or technically minded decision maker, this podcast is for you. [EPISODE] [0:00:41.3] KN: Welcome to the podcast. [0:00:42.5] NL: Hi. I’m Nicholas Lane. I’m a cloud native Architect. [0:00:45.0] CC: Who do you work for, Nicholas? [0:00:47.3] NL: I've worked for VMware, formerly of Heptio. [0:00:50.5] KN: I think we’re all VMware, formerly Heptio, aren’t we? [0:00:52.5] NL: Yes. [0:00:54.0] CC: That is correct. It just happened that way. Now Nick, why don’t you tell us how you got into this space? [0:01:02.4] NL: Okay. I originally got into the cloud native realm working for Red Hat as a consultant. At the time, I was doing OpenShift consultancy. Then my boss, Paul, Paul London, left Red Hat and I decided to follow him to CoreOS, where I met Duffie and Josh. We were on the field engineering team there and the sales engineering team. Then from there, I found myself at Heptio and now with VMware. Duffie, how about you? [0:01:30.3] DC: My name is Duffie Cooley. I'm also a cloud native architect at VMware, also recently Heptio and CoreOS. I've been working in technologies like cloud native for quite a few years now. I started my journey moving from virtual machines into containers with Mesos. I spent some time working on Mesos and actually worked with a team of really smart individuals to try and develop an API in front of that crazy Mesos thing. Then we realized, “Well, why are we doing this? There is one that's called Kubernetes. We should jump on that.” That's the direction in my time with containerization and cloud native stuff has taken. How about you Josh? [0:02:07.2] JR: Hey, I’m Josh. I similar to Duffie and Nicholas came from CoreOS and then to Heptio and then eventually VMware. Actually got my start in the middleware business oddly enough, where we worked on the Egregious Spaghetti Box, or the ESB as it’s formally known. I got to see over time how folks were doing a lot of these, I guess, more legacy monolithic applications and that sparked my interest into learning a bit about some of the cloud native things that were going on. At the time, CoreOS was at the forefront of that. It was a natural progression based on the interests and had a really good time working at Heptio with a lot of the folks that are on this call right now. Kris, you want to give us an intro? [0:02:48.4] KN: Sure. Hi, everyone. Kris Nova. I've been SRE DevOps infrastructure for about a decade now. I used to live in Boulder, Colorado. I came out of a couple startups there. I worked at SolidFire, we went to NetApp. I used to work on the Linux kernel there some. Then I was at Deis for a while when I first started contributing to Kubernetes. We got bought by Microsoft, left Microsoft, the Azure team. I was working on the original managed Kubernetes there. Left that team, joined up with Heptio, met all of these fabulous folks. I think, I wrote a book and I've been doing a lot of public speaking and some other junk along the way. Yeah. Hi. What about you, Carlisia? [0:03:28.2] CC: All right. I think it's really interesting that all the guys are lined up on one call and all the girls on another call. [0:03:34.1] NL: We should have probably broken it up more. [0:03:36.4] CC: I am a developer and have always been a developer. Before joining Heptio, I was working for Fastly, which is a CDN company. They’re doing – helping them build the latest generation of their TLS management system. At some point during my stay there, Kevin Stuart was posting on Twitter, joined Heptio. At this point, Heptio was about, I don't know, between six months in a year-old. I saw those tweets go by I’m like, “Yeah, that sounds interesting, but I'm happy where I am.” I have a very good friend, Kennedy actually. He saw those tweets and here he kept saying to me, “You should apply. You should apply, because they are great people. They did great things. Kubernetes is so hot.” I’m like, “I'm happy where I am.” Eventually, I contacted Kevin and he also said, “Yeah, that it would be a perfect match.” two months later decided to apply. The people are amazing. I did think that Kubernetes was really hard, but my decision-making went towards two things. The people are amazing and some people who were working there I already knew from previous opportunities. Some of the people that I knew – I mean, I love everyone. The only thing was that it was an opportunity for me to work with open source. I definitely could not pass that up. I could not be happier to have made that decision now with VMware acquiring Heptio, like everybody here I’m at VMware. Still happy. [0:05:19.7] KN: Has everybody here contributed to open source before? [0:05:22.9] NL: Yup, I have. [0:05:24.0] KN: What's everybody's favorite project they've worked on? [0:05:26.4] NL: That's an interesting question. From a business aspect, I really like Dex. Dex is an identity provider, or a middleware for identity provider. It provides an OIDC endpoint for multiple different identity providers. You can absorb them into Kubernetes. Since Kubernetes only has an OIDC – only accepts OIDC job tokens for authentication, that functionality that Dex provides is probably my favorite thing. Although, if I'm going to be truly honest, I think right now the thing that I'm the most excited about working on is my own project, which is starting to join like me, joining into my interest in doing Chaos engineering. What about you guys? What’s your favorite? [0:06:06.3] KN: I understood some of those words. NL: Those are things we'll touch on on different episodes. [0:06:12.0] KN: Yeah. I worked on FreeBSD for a while. That was my first welcome to open source. I mean, that was back in the olden days of IRC clients and writing C. I had a lot of fun, and still I'm really close with a lot of folks in the FreeBSD community, so that always has a special place in my heart, I think, just that was my first experience of like, “Oh, this is how you work on a team and you work collaboratively together and it's okay to fail and be open.” [0:06:39.5] NL: Nice. [0:06:40.2] KN: What about you, Josh? [0:06:41.2] JR: I worked on a project at CoreOS. Well, a project that's still out there called ALB Ingress controller. It was a way to bring the AWS ALBs, which are just layer 7 load balancers and take the Kubernetes API ingress, attach those two together so that the ALB could serve ingress. The reason that it was the most interesting, technology aside, is just it went from something that we started just myself and a colleague, and eventually gained community adoption. We had to go through the process of just being us two worrying about our concerns, to having to bring on a large community that had their own business requirements and needs, and having to say no at times and having to encourage individuals to contribute when they had ideas and issues, because we didn't have the bandwidth to solve all those problems. It was interesting not necessarily from a technical standpoint, but just to see what it actually means when something starts to gain traction. That was really cool. Yeah, how about you Duffie? [0:07:39.7] DC: I've worked on a number of projects, but I find that generally where I fit into the ecosystem is basically helping other people adopt open source technologies. I spent a quite a bit of my time working on OpenStack and I spent some time working on Open vSwitch and recently in Kubernetes. Generally speaking, I haven't found myself to be much of a contributor to of code to those projects per se, but more like my work is just enabling people to adopt those technologies because I understand the breadth of the project more than the detail of some particular aspect. Lately, I've been spending some time working more on the SIG Network and SIG-cluster-lifecycle stuff. Some of the projects that have really caught my interest are things like, Kind which is Kubernetes in Docker and working on KubeADM itself, just making sure that we don't miss anything obvious in the way that KubeADM is being used to manage the infrastructure again. [0:08:34.2] KN: What about you, Carlisia? [0:08:36.0] CC: I realize it's a mission what I'm working on at VMware. That is coincidentally the project – the open source project that is my favorite. I didn't have a lot of experience with open source, just minor contributions here and there before this project. I'm working with Valero. It's a disaster recovery tool for Kubernetes. Like I said, it's open source. We’re coming up to version 1 pretty soon. The other maintainers are amazing, super knowledgeable and very experienced, mature. I have such a joy to work with them. My favorites. [0:09:13.4] NL: That's awesome. [0:09:14.7] DC: Should we get into the concept of cloud native and start talking about what we each think of this thing? Seems like a pretty loaded topic. There are a lot of people who would think of cloud native as just a generic term, we should probably try and nail it down here. [0:09:27.9] KN: I'm excited for this one. [0:09:30.1] CC: Maybe we should talk about what this podcast show is going to be? [0:09:34.9] NL: Sure. Yeah. Totally. [0:09:37.9] CC: Since this is our first episode. [0:09:37.8] NL: Carlisia, why don't you tell us a little bit about the podcast? [0:09:40.4] CC: I will be glad to. The idea that we had was to have a show where we can discuss cloud native concepts. As opposed to talking about particular tools or particular project, we are going to aim to talk about the concepts themselves and approach it from the perspective of a distributed system idea, or issue, or concept, or a cloud native concept. From there, we can talk about what really is this problem, what people or companies have this problem? What usually are the solutions? What are the alternative ways to solve this problem? Then we can talk about tools that are out there that people can use. I don't think there is a show that approaches things from this angle. I'm really excited about bringing this to the community. [0:10:38.9] KN: It's almost like TGIK, but turned inside out, or flipped around where TGIK, we do tools first and we talk about what exactly is this tool and how do you use it, but I think this one, we're spinning that around and we're saying, “No, let's pick a broader idea and then let's explore all the different possibilities with this broader idea.” [0:10:59.2] CC: Yeah, I would say so. [0:11:01.0] JR: From the field standpoint, I think this is something we often times run into with people who are just getting started with larger projects, like Kubernetes perhaps, or anything really, where a lot of times they hear something like the word Istio come out, or some technology. Often times, the why behind it isn't really considered upfront, it's just this tool exists, it's being talked about, clearly we need to start looking at it. Really diving into the concepts and the why behind it, hopefully will bring some light to a lot of these things that we're all talking about day-to-day. [0:11:31.6] CC: Yeah. Really focusing on the what and the why. The how is secondary. That's what my vision of this show is. [0:11:41.7] KN: I like it. [0:11:43.0] NL: That's something that really excites me, because there are a lot of these concepts that I talk about in my day-to-day life, but some of them, I don't actually think that I understand pretty well. It's those words that you've heard a million times, so you know how to use them, but you don't actually know the definition of them. [0:11:57.1] CC: I'm super glad to hear you say that mister, because as a developer in many not a system – not having a sysadmin background. Of course, I did sysadmin things as a developer, but not it wasn't my day-to-day thing ever. When I started working with Kubernetes, a lot of things I didn't quite grasp and that's a super understatement. I noticed that I mean, I can ask questions. No problem. I will dig through and find out and learn. The problem is that in talking to experts, a lot of the time when people, I think, but let me talk about myself. A lot of time when I ask a question, the experts jump right to the how. What is this? “Oh, this is how you do it.” I don't know what this is. Back off a little bit, right? Back up. I don't know what this is. Why is this doing this? I don't know. If you tell me the how before I understand what that even is, I'm going to forget. That's what's going to happen. I mean, it’s great you're trying to make an effort and show me the how to do something. This is personal, the way I learn. I need to understand the how first. This is why I'm so excited about this show. It's going to be awesome. This is what we’re going to talk about. [0:13:19.2] DC: Yeah, I agree. This is definitely one of the things that excites me about this topic as well, is that I find my secret super power is troubleshooting. That means that I can actually understand what the expected relationships between things should do, right? Rather than trying to figure out. Without really digging into the actual problem of stuff and what and the how people were going, or the people who were developing the code were trying to actually solve it, or thought about it. It's hard to get to the point where you fully understand that that distributed system. I think this is a great place to start. The other thing I'll say is that I firmly believe that you can't – that you don't understand a thing if you can't teach it. This podcast for me is about that. Let's bring up all the questions and we should enable our audience to actually ask us questions somehow, and get to a place where we can get as many perspectives on a problem as we can, such that we can really dig into the detail of what the problem is before we ever talk about how to solve it. Good stuff. [0:14:18.4] CC: Yeah, absolutely. [0:14:19.8] KN: Speaking of a feedback loop from our audience and taking the problem first and then solution in second, how do we plan on interacting with our audience? Do we want to maybe start a GitHub repo, or what are we thinking? [0:14:34.2] NL: I think a GitHub repo makes a lot of sense. I also wouldn't mind doing some social media malarkey, maybe having a Twitter account that we run or something like that, where people can ask questions too. [0:14:46.5] CC: Yes. Yes to all of that. Yeah. Having an issue list that in a repo that people can just add comments, praises, thank you, questions, suggestions for concepts to talk about and say like, “Hey, I have no clue what this means. Can you all talk about it?” Yeah, we'll talk about it. Twitter. Yes. Interact with those on Twitter. I believe our Twitter handle is TheKubelets. [0:15:12.1] KN: Oh, we already have one. Nice. [0:15:12.4] NL: Yes. See, I'm learning something new already. [0:15:15.3] CC: We already have. I thought you were all were joking. We have the Kubernetes repo. We have a github repo called – [0:15:22.8] NL: Oh, perfect. [0:15:23.4] CC: heptio/thekubelets. [0:15:27.5] DC: The other thing I like that we do in TGIK is this HackMD thing. Although, I'm trying to figure out how we could really make that work for us in a show that's recorded every week like this one. I think, maybe what we could do is have it so that when people can listen to the recording, they could go to the HackMD document, put questions in or comments around things if they would like to hear more about, or maybe share their perspectives about these topics. Maybe in the following week, we could just go back and review what came in during that period of time, or during the next session. [0:15:57.7] KN: Yeah. Maybe we're merging the HackMD on the next recording. [0:16:01.8] DC: Yeah. [0:16:03.3] KN: Okay. I like it. [0:16:03.6] DC: Josh, you have any thoughts? Friendster, MySpace, anything like that? [0:16:07.2] JR: No. I think we could pass on MySpace for now, but everything else sounds great. [0:16:13.4] DC: Do we want to get into the meat of the episode? [0:16:15.3] KN: Yeah. [0:16:17.2] DC: Our true topic, what does cloud native mean to all of us? Kris, I'm interested to hear your thoughts on this. You might have written a book about this? [0:16:28.3] KN: I co-authored a book called Cloud Native Infrastructure, which it means a lot of things to a lot of people. It's one of those umbrella terms, like DevOps. It's up to you to interpret it. I think in the past couple of years of working in the cloud native space and working directly at the CNCF as a CNCF ambassador, Cloud Native Computing Foundation, they're the open source nonprofit folks behind this term cloud native. I think the best definition I've been able to come up with is when you're designing software and you start your main function to be built around the cloud, or to be built around what the cloud enables us to do in the services a cloud to offer you, that is when you start to look at cloud native engineering. I think all cloud native infrastructure is, it's designing software that manages and mutates infrastructure in that same way. I think the underlying theme here is we're no longer caddying configurations disk and doing system D restarts. Now we're just sending HTTPS API requests and getting messages back. Hopefully, if the cloud has done what we expect it to do, that broadcast some broader change. As software engineers, we can count on those guarantees to design our software around. I really think that you need to understand that it's starting with the main function first and completely engineering your app around these new ideas and these new paradigms and not necessarily a migration of a non-cloud native app. I mean, you technically could go through and do it. Sure, we've seen a lot of people do it, but I don't think that's technically cloud native. That's cloud alien. Yeah. I don't know. That's just my thought. [0:18:10.0] DC: Are you saying that cloud native approach is a greenfield approach generally? To be a cloud native application, you're going to take that into account in the DNA of your application? [0:18:20.8] KN: Right. That's exactly what I'm saying. [0:18:23.1] CC: It's interesting that never said – mentioned cloud alien, because that plays into the way I would describe the meaning of cloud native. I mean, what it is, I think Nova described it beautifully and it's a lot of – it really shows her know-how. For me, if I have to describe it, I will just parrot things that I have read, including her book. What it means to me, what it means really is I'm going to use a metaphor to explain what it means to me. Given my accent, I’m obviously not an American born, and so I'm a foreigner. Although, I do speak English pretty well, but I'm not native. English is not my native tongue. I speak English really well, but there are certain hiccups that I'm going to have every once in a while. There are things that I'm not going to know what to say, or it's going to take me a bit long to remember. I rarely run into not understanding it, something in English, but it happens sometimes. That's the same with the cloud native application. If it hasn't been built to run on cloud native platforms and systems, you can migrate an application to cognitive environment, but it's not going to fully utilize the environments, like a native app would. That's my take. [0:19:56.3] KN: Cloud immigrant. [0:19:57.9] CC: Cloud immigrant. Is Nick a cloud alien? [0:20:01.1] KN: Yeah. [0:20:02.8] CC: Are they cloud native alien, or cloud native aliens. Yeah. [0:20:07.1] JR: On that point, I'd be curious if you all feel there is a need to discern the notion of cloud native infrastructure, or platforms, then the notion of cloud native apps themselves. Where I'm going with this, it's funny hearing the Greenfield thing and what you said, Carlisia, with the immigration, if you will, notion. Oftentimes, you see these very cloud native platforms, things, the amount of Kubernetes, or even Mesos or whatever it might be. Then you see the applications themselves. Some people are using these platforms that are cloud native to be a forcing function, to make a lot of their legacy stuff adopt more cloud native principles, right? There’s this push and pull. It's like, “Do I make my app more cloud native? Do I make my infrastructure more cloud native? Do I do them both at the same time?” Be curious what your thoughts are on that, or if that resonates with you at all. [0:21:00.4] KN: I've got a response here, if I can jump in. Of course, Nova with opinions. Who would have thought? I think what I'm hearing here, Josh is as we're using these cloud native platforms, we're forcing the hand of our engineers. In a world where we may be used to just send this blind DNS request out so whatever, and we would be ignorant of where that was going, now in the cloud native world, we know there's the specific DNS implementation that we can count on. It has this feature set that we can guarantee our software around. I think it's a little bit of both and I think that there is definitely an art to understanding, yes, this is a good idea to do both applications and infrastructure. I think that's where you get into this what it needs to be a cloud native engineer. Just in the same traditional legacy infrastructure stack, there's going to be good engineering choices you can make and there's going to be bad ones and there's many different schools of thought over do I go minimalist? Do I go all in at once? What does that mean? I think we're seeing a lot of folks try a lot of different patterns here. I think there's pros and cons though. [0:22:03.9] CC: Do you want to talk about this pros and cons? Do you see patterns that are more successful for some kinds of company versus others? [0:22:11.1] KN: I mean, I think going back to the greenfield thing that we were talking about earlier, I think if you are lucky enough to build out a greenfield application, you're able to bake in greenfield infrastructure management instead as well. That's where you get these really interesting hybrid applications, just like Kubernetes, that span the course of infrastructure and application. If we were to go into Kubernetes and say, “I wanted to define a service of type load balancer,” it’s actually going to go and create a load balancer for you and actually mutate that underlying infrastructure. The only way we were able to get that power and get that paradigm is because on day one, we said we're going to do that as software engineers; taking the infrastructure where you were hidden behind the firewall, or hidden behind the load balancer in the past. The software would have no way to reason about it. They’re blind and greenfield really is going to make or break your ability to even you take the infrastructure layers. [0:23:04.3] NL: I think that's a good distinction to make, because something that I've been seeing in the field a lot is that the users will do cloud native practices, but they’ll use a tool to do the cloud native for them, right? They'll use something along the lines of HashiCorp’s Terraform to create the VMs and the load balancers for them. It's something I think that people forget about is that the application themselves can ask for these resources as well. Terraform is just using an API and your code can use an API to the same API, in fact. I think that's an important distinction. It forces the developer to think a little bit like a sysadmin sometimes. I think that's a good melding of the dev and operations into this new word. Regrettably, that word doesn't exist right now. [0:23:51.2] KN: That word can be cloud native. [0:23:53.3] DC: Cloud here to me breaks down into a different set of topics as well. I remember seeing a talk by Brandon Phillips a few years ago. In his talk, he was describing – he had some numbers up on the screen and he was talking about the fact that we were going to quickly become overwhelmed by the desire to continue to develop and put out more applications for our users. His point was that every day, there's another 10,000 new users of the Internet, new consumers that are showing up on the Internet, right? Globally, I think it's something to the tune of about 350,000 of the people in this room, right? People who understand infrastructure, people who understand how to interact with applications, or to build them, those sorts of things. There really aren't a lot of people who are in that space today, right? We're surrounded by them all the time, but they really just globally aren't that many. His point is that if we don't radically change the way that we think about the development as the deployment and the management of all of these applications that we're looking at today, we're going to quickly be overrun, right? There aren't going to be enough people on the planet to solve that problem without thinking about the problem in a fundamentally different way. For me, that's where the cloud native piece comes in. With that, comes a set of primitives, right? You need some way to automate, or to write software that will manage other software. You need the ability to manage the lifecycle of that software in a resilient way that can be managed. There are lots of platforms out there that thought about this problem, right? There are things like Mesos, there are things like Kubernetes. There's a number of different shots on goal here. There are lots of things that I've really tried to think about that problem in a fundamentally different way. I think of those primitives that being able to actually manage the lifecycle of software, being able to think about packaging that software in such a way that it can be truly portable, the idea that you have some API abstraction that brings again, that portability, such that you can make use of resources that may not be hosted on your infrastructure on your own personal infrastructure, but also in the cloud, like how do we actually make that API contract so complete that you can just take that application anywhere? These are all part of that cloud native definition in my opinion. [0:26:08.2] KN: This is so fascinating, because the human race totally already learned this lesson with the Linux kernel in the 90s, right? We had all these hardware manufacturers coming out and building all these different hardware components with different interfaces. Somebody said, “Hey, you know what? There's a lot of noise going on here. We should standardize these and build a contract.” That contract then implemented control loops, just like in Kubernetes and then Mesos. Poof, we have the Linux kernel now. We're just distributed Linux kernel version 2.0. The human race is here repeating itself all over again. [0:26:41.7] NL: Yeah. It seems like the blast radius of Linux kernel 2.0 is significantly higher than the Linux kernel itself. That made it sound like I was like, pooh-poohing what you're saying. It’s more like, we're learning the same lesson, but at a grander scale now. [0:27:00.5] KN: Yeah. I think that's a really elegant way of putting it. [0:27:03.6] DC: You do raise a good point. If you are embracing on a cloud native infrastructure, remember that little changes are big changes, right? Because you're thinking about managing the lifecycle of a thousand applications now, right? If you're going full-on cloud native, you're thinking about operating at scale, it's a byproduct of that. Little changes that you might be able to make to your laptop are now big changes that are going to affect a fleet of thousand machines, right? [0:27:30.0] KN: We see this in Kubernetes all the time, where a new version of Kubernetes comes out and something totally unexpected happens when it is ran at scale. Maybe it worked on 10 nodes, but when we need to fire up a thousand nodes, what happens then? [0:27:42.0] NL: Yeah, absolutely. That actually brings up something that to me, defines cloud native as well. A lot of my definition of cloud native follows in suit with Kris Nova's book, or Kris Nova, because your book was what introduced me to the phrase cloud native. It makes sense that your opinion informs my opinion, but something that I think that we were just starting to talk about a little bit is also the concept of stability. Cloud native applications and infrastructure means coding with instability in mind. It's not being guaranteed that your VM will live forever, because it's on somebody else's hardware, right? Their hardware could go down, and so what do you do? It has to move over really quickly, has to figure out, have the guarantees of its API and its endpoints are all going to be the same no matter what. All of these things have to exist for the code, or for your application to live in the cloud. That's something that I find to be very fascinating and that's something that really excites me, is not trying to make a barge, but rather trying to make a schooner when you're making an app. Something that can, instead of taking over the waves, can be buffeted by the waves and still continue. [0:28:55.6] KN: Yeah. It's a little more reactive. I think we see this in Kubernetes a lot. When I interviewed Joe a couple years ago, Joe Beda for the book to get a quote from him, he said, this magic phrase that has stuck with me over the past few years, which is “goal-seeking behavior.” If you look at a Kubernetes object, they all use this concept in Go called embedding. Every Kubernetes object has a status in the spec. All it is is it’s what's actually going on, versus what did I tell it, what do I want to go on. Then all we're doing is just like you said with your analogy, is we're just trying to be reactive to that and build to that. [0:29:31.1] JR: That's something I wonder if people don't think about a lot. They don't they think about the spec, but not the status part. I think the status part is as important, or more important maybe than the spec. [0:29:41.3] KN: It totally is. Because I mean, a status like, if you have one potentiality for status, your control loop is going to be relatively trivial. As you start understanding more of the problems that you could see and your code starts to mature and harden, those statuses get more complex and you get more edge cases and your code matures and your code hardens. Then we can take that and globally in these larger cloud native patterns. It's really cool. [0:30:06.6] NL: Yeah. Carlisia, you’re a developer who's now just getting into the cloud native ecosystem. What are your thoughts on developing with cloud native practices in mind? [0:30:17.7] CC: I’m not sure I can answer that. When I started developing for Kubernetes, I was like, “What is a pod?” What comes first? How does this all fit together? I joined the project [inaudible 00:30:24]. I don't have to think about that. It's basically moving the project along. I don't have to think what I have to do differently from the way I did things before. [0:30:45.1] DC: One thing that I think you probably ran into in working with the application is the management of state and how that relates to – where you actually end up coupling that state. Before in development, you might just assume that there is a database somewhere that you would have to interact with. That database is a way of actually pushing that state off of the code that you're actually going to work with. In this way, that you might think of being able to write multiple consumers of state, or multiple things that are going to mutate state and all share that same database. This is one of the patterns that comes up all the time when we start talking about cloud native architectures, is because we have to really be very careful about how we manage that state and mainly, because one of the other big benefits of it is the ability to horizontally scale things that are going to mutate, or consume state. [0:31:37.5] CC: My brain is in its infancy as it relates to Kubernetes. All that I see is APIs all the way down. It's just APIs all the way down. It’s not very different than as a developer for me, is not very much more complex than developing against the database that sits behind. Ask me again a year from now and I will have a more interesting answer. [0:32:08.7] KN: This is so fascinating, right? I remember a couple years ago when Kubernetes was first coming out and listening to some of the original “Elders of Kubernetes,” and even some of the stuff that we were working on at this time. One of the things that they said was we hope one day, somebody doesn't have to care about what's passed these APIs and gets to look at Kubernetes as APIs only. Then they hear that come from you authentically, it's like, “Hey, that's our success statement there. We nailed it.” It's really cool. [0:32:37.9] CC: Yeah. I don’t understood their patterns and I probably should be more cognizant about these patterns are, even if it's just to articulate them. To me, my day-to-day challenge is understanding the API, understanding what library call do I make to make this happen and how – which is just programming 101 almost. Not different from any other regular project. [0:33:10.1] JR: Yeah. That is something that's nice about programming with Kubernetes in mind, because a lot of times you can use the source code as documentation. I hate to say that particularly is a non-developer. I'm a sysadmin first getting into development and documentation is key in my mind. There's been more than a few times where I'm like, “How do I do this?” You can look in the source code for pretty much any application that you're using that's in Kubernetes, or around the Kubernetes ecosystem. The API for that application is there and it'll tell you what you need to do, right? It’s like, “Oh, this is how you format your config file. Got it.” [0:33:47.7] CC: At the same time, I don't want to minimize that knowing what the patterns are is very useful. I haven't had to do any design for Valero for our projects. Maybe if I had, I would have to be forced to look into that. I'm still getting to know the codebase and developing features, but no major design that I had to lead at least. I think with time, I will recognize those patterns and it will make it easier for me to understand what is happening. What I was saying is that not understanding the patterns that are behind the design of those APIs doesn't preclude me at all so call against it, against them. [0:34:30.0] KN: I feel this is the heart of cloud native. I think we totally nailed it. The heart of cloud native is in the APIs and your ability to interact with the APIs. That's what makes it programmable and that's what makes – gives you the interface for you and your software to interact with that. [0:34:45.1] DC: Yeah, I agree with that. API first. On the topic of cloud native, what about the Cloud Native Computing Foundation? What are our thoughts on the CNCF and what is the CNCF? Josh, you have any thoughts on that? [0:35:00.5] JR: Yeah. I haven't really been as close to the CNCF as I probably should, to be honest with you. One of the great things that the CNCF has put together are programs around getting projects into this, I don't know if you would call it vendor neutral type program. Maybe somebody can correct me on that. Effectively, there's a lot of different categories, like networking and storage and runtimes for containers and things of that nature. There's a really cool landscape that can show off a lot of these different technologies. A lot of the categories, I'm guessing we'll be talking about on this podcast too, right? Things like, what does it mean to do cloud native networking and so on and so forth? That's my purview of the CNCF. Of course, they put on KubeCon, which is the most important thing to me. I'm sure someone else on this call can talk deeper at an organization level what they do. [0:35:50.5] KN: I'm happy to jump in here. I've been working with them for I think three years now. I think first, it's important to know that they are a subsidiary of the Linux Foundation. The Linux Foundation is the original open source, nonprofit here, and then the CNCF is one of many, like Apache is another one that is underneath the broader Linux Foundation umbrella. I think the whole point of – or the CNCF is to be this neutral party that can help us as we start to grow and mature the ecosystem. Obviously, money is going to be involved here. Obviously, companies are going to be looking out for their best interest. It makes sense to have somebody managing the software that is outside, or external of these revenue-driven companies. That's where I think the CNCF comes into play. I think that's its main responsibility is. What happens when somebody from company A and somebody from Company B disagree with the direction that the software should go? The CNCF can come in and say, “Hey, you know what? Let's find a happy medium in here and let's find a solution that works for both folks and let's try to do this the best we can.” I think a lot of this came from lessons we learned the hard way with Linux. In a weird way, we did – we are in version 2.0, but we were able to take advantage of some of the priority here. [0:37:05.4] NL: Do you have any examples of a time in the CNCF jumped in and mediated between two companies? [0:37:11.6] KN: Yeah. I think the steering committee, the Kubernetes steering committee is a great example of this. It's a relatively new thing. It hasn't been around for a very long time. You look at the history of Kubernetes and we used to have this incubation process that has since been retired. We've tried a lot of solutions and the CNCF has been pretty instrumental and guiding the shape of how we're going to manage, solve governance for such a monolithic project. As Kubernetes grows, the problem space grows and more people get involved. We're having to come up with new ways of managing that. I think that's not necessarily a concrete example of two specific companies, but I think that's more of as people get involved, the things that used to work for us in the past are no longer working. The CNCF is able to recognize that and guide us out of that. [0:37:57.2] DC: Cool. That’s such a very good perspective on the CNCF that I didn't have before. Because like Josh, my perspective with CNCF was well, they put on that really cool party three times a year. [0:38:07.8] KN: I mean, they definitely are great at throwing parties. [0:38:12.6] NL: They are that. [0:38:14.1] CC: My perspective of the CNCF is from participating in the Kubernetes meetup here in San Diego. I’m trying to revive our meetup, which is really hard to do, but different topic. I know that they try to make it easier for people to find meetups, because they have on meetup.com, they have an organization. I don't know what the proper name is, but if you go there and you put your zip code, you'll find any meetup that's associated with them. My meetup here in San Diego is associated, can be easily found. They try to give a little bit of money for swags. We also give out ads for meetup. They offer help for finding speakers and they also have a speaker catalog on their website. They try to help in those ways, which I think is very helpful, very valuable. [0:39:14.9] DC: Yeah, I agree. I know about CNCF, mostly just from interacting with folks who are working on its behalf. Working at meeting a bunch of the people who are working on the Kubernetes project, on behalf of the CNCF, folks like Ihor and people like that, which are constantly amazingly with the amount of work that they do on behalf of the CNCF. I think it's been really good seeing what it means to provide governance over a project. I think that really highlights – that's really highlighted by the way that Kubernetes itself has managed. I think a lot of us on the call have probably worked with OpenStack and remember some of the crazy battles that went on between vendors around particular components in that stack. I've yet to actually really see that level of noise creep into the Kubernetes situation. I think squarely on the CNCF around managing governance, and also run the community for just making it accessible enough thing that people can plug into it, without actually having to get into a battle about taking ownership of CNI, for example. Nobody should own CNI. That should be its own project under its own governance. How you satisfy the needs for something like container networking should be a project that you develop as a company, and you can make the very best one that you could make it even attract as many customers to that as you want. Fundamentally, the way that your interface to that major project should be something that is abstracted in such a way that it isn't owned by any one company. There should be a contact in an API, that sort of thing. [0:40:58.1] KN: Yeah. I think the best analogy I ever heard was like, “We’re just building USB plugs.” [0:41:02.8] DC: That's actually really great. [0:41:05.7] JR: To that point Duffie, I think what's interesting is more and more companies are looking to the CNCF to determine what they're going to place their bets on from a technology perspective, right? Because they've been so burned historically from some project owned by one vendor and they don't really know where it's going to end up and so on and so forth. It's really become a very serious thing when people consider the technologies they're going to bet their business on. [0:41:32.0] DC: Yeah. When a project is absorbed into the CNCF, or donated to the CNCF, I guess. There are a number of projects that this has happened to. Obviously, if you see that iChart that is the CNCF landscape, there's just tons of things happening inside of there. It's a really interesting process, but I think that from my part, I remember recently seeing Sysdig Falco show up in that list and seeing them donate – seeing Sysdig donate Falco to the CNCF was probably one of the first times that I've actually have really tried to see what happens when that happens. I think that some of the neat stuff here that happens is that now this is an open source project. It's under the governance of the CNCF. It feels to me more an approachable project, right? I don't feel I have to deal with Sysdig directly to interact with Falco, or to contribute to it. It opens that ecosystem up around this idea, or the genesis of the idea that they built around Falco, which I think is really powerful. What do you all think of that? [0:42:29.8] KN: I think, to look at it from a different perspective, that's one example of when the CNCF helps a project liberate itself. There's plenty of other examples out there where the CNCF is an opt-in feature, that is only there if we need it. I think cluster API, which I'm sure we're going to talk about this in a later episode. I mean, just a quick overview is a lot of different vendors implementing the same API and making that composable and modular. I mean, nowhere along the way in the history of that project has the CNCF had to come and step in. We’ve been able to operate independently of that. I think because the CNCF is even there, we all are under this working agreement of we're going to take everybody's concerns into consideration and we're going to take everybody’s use case in some consideration, work together as an ecosystem. I think it's just even having that in place, whether or not you use it or not is a different story. [0:43:23.4] CC: Do you all know any project under the CNCF? [0:43:26.1] KN: I have one. [0:43:27.7] JR: Well, I've heard of this one. It's called Kubernetes. [0:43:30.1] CC: Is it called Kubernetes or Kubernetes? [0:43:32.8] JR: It’s called Kubernetes. [0:43:36.2] CC: Wow. That’s not what Duffie thinks. [0:43:38.3] DC: I don’t say it that way. No, it's been pretty fascinating seeing just the breadth of projects that are under there. In fact, I was just recently noticing that OpenEBS is up for joining the CNCF. There seems to be – it's fascinating that the things that are being generated through the CNCF and going through that life cycle as a project sometimes overlap with one another and it's very – it seems it's a delicate balance that the CNCF would have to play to keep from playing favorites. Because part of the charter of CNCF is to promote the project, right? I'm always curious to see and I'm fascinated to see how this plays out as we see projects that are normally competitive with one another under the auspice of the same organization, like a CNCF. How do they play this in such a way that they remain neutral, even it would – it seems like it would take a lot of intention. [0:44:39.9] KN: Yeah. Well, there's a difference between just being a CNCF project and being an official project, or a graduated project. There's different tiers. For instance, Kubicorn, a tool that I wrote, we just adopted the CNCF, like I think a code of conduct and there was another file I had to include in the repo and poof, were magically CNCF now. It's easy to get onboard. Once you're onboard, there's legal implications that come with that. There totally is this tier ladder stature that I'm not even super familiar with. That’s how officially CNCF you can be as your product grows and matures. [0:45:14.7] NL: What are some of the code of conduct that you have to do to be part of the CNCF? [0:45:20.8] KN: There's a repo on it. I can maybe find it and add it to the notes after this, but there's this whole tutorial that you can go through and it tells you everything you need to add and what the expectations are and what the implications are for everything. [0:45:33.5] NL: Awesome. [0:45:34.1] CC: Well, Valero is a CNCF project. We follow the what is it? The covenant? [0:45:41.2] KN: Yeah, I think that’s what it is. [0:45:43.0] CC: Yes. Which is the same that Kubernetes follows. I am not sure if there can be others that can be adopted, but this is definitely one. [0:45:53.9] NL: Yeah. According to Aaron Crickenberger, who was the Release Lead for Kubernetes 1.14, the CNCF code of conduct can be summarized as don't be a jerk. [0:46:06.6] KN: Yeah. I mean, there's more to it than that, but – [0:46:10.7] NL: That was him. [0:46:12.0] KN: Yeah. This is something that I remember seeing an open source my entire career, open source comes with this implication of you need to be well-rounded and polite and listen and be able to take others’ just thoughts and concerns into consideration. I think we just are getting used to working like that as an engineering industry. [0:46:32.6] NL: Agreed. Yeah. Which is a great point. It's something that I hadn't really thought of. The idea of development back in the day, it seems like before, there was such a thing as the CNCF are cloud native. It seemed that things were combative, or people were just trying to push their agenda as much as possible. Bully their way through. That doesn't seem that happens as much anymore. Do you guys have any thoughts on that? [0:46:58.9] DC: I think what you're highlighting is more the open source piece than the cloud native piece, which I – because I think that when you're working – open source, I think has been described a few times as a force multiplier for software development and software adoption. I think of these things are very true. If you look at a lot of the big successful closed source projects, they have – the way that people in this room and maybe people listening to this podcast might perceive them, it's definitely just fundamentally differently than some open source project. Mainly, because it feels it's more of a community-driven thing and it also feels you're not in a place where you're beholden to a set of developers that you don't know that are not interested in your best, and in what's best for you, or your organization to achieve whatever they set out to do. With open source, you can be a part of the voice of that project, right? You can jump in and say, “You know, it would really be great if this thing has this feature, or I really like how you would do this thing.” It really feels a lot more interactive and inclusive. [0:48:03.6] KN: I think that that is a natural segue to this idea of we build everything behind the scenes and then hey, it's this new open source project, that everything is done. I don't really think that's open source. We see some of these open source projects out there. If you go look at the git commit history, it's all everybody from the same company, or the same organization. To me, that's saying that while granted the source code might be technically open source, the actual act of engineering and architecting the software is not done as a group with multiple buyers into it. [0:48:37.5] NL: Yeah, that's a great point. [0:48:39.5] DC: Yeah. One of the things I really appreciate about Heptio actually is that all of the projects that we developed there were – that the developer chat for that was all kept in some neutral space, like the Kubernetes Slack, which I thought was really powerful. Because it means that not only is it open source and you can contribute code to a project, but if you want to talk to people who are also being paid to develop that project, you can just go to the channel and talk to them, right? It's more than open source. It's open community. I thought that was really great. [0:49:08.1] KN: Yeah. That's a really great way of putting it. [0:49:10.1] CC: With that said though, I hate to be a party pooper, but I think we need to say goodbye. [0:49:16.9] KN: Yeah. I think we should wrap it up. [0:49:18.5] JR: Yeah. [0:49:19.0] CC: I would like to re-emphasize that you can go to the issues list and add requests for what you want us to talk about. [0:49:29.1] DC: We should also probably link our HackMD from there, so that if you want to comment on something that we talked about during this episode, feel free to leave comments in it and we'll try to revisit those comments maybe in our next episode. [0:49:38.9] CC: Exactly. That's a good point. We will drop a link the HackMD page on the corresponding issue. There is going to be an issue for each episode, so just look for that. [0:49:51.8] KN: Awesome. Well, thanks for joining everyone. [0:49:54.1] NL: All right. Thank you. [0:49:54.6] CC: Thank you. I'm really glad to be here. [0:49:56.7] DC: Hope you enjoyed the episode and I look forward to a bunch more. [END OF EPISODE] [0:50:00.3] ANNOUNCER: Thank you for listening to The Podlets Cloud Native Podcast. Find us on Twitter https://twitter.com/ThePodlets and on the https://thepodlets.io, where you'll find transcripts and show notes. We'll be back next week. Stay tuned by subscribing. [END]See omnystudio.com/listener for privacy information.
Talking DevOps with Microsoft today. Special guest Michael Levan, a DevOps professional and Cloud Engineer. Twitter: @TheNJDevOpsGuy Blog: https://www.thelifeofanengineer.org ARM What is ARM? ARM is a configuration management tool that has a JSON syntax. It allows you, in a programmatic way with the use of functions to write your software-defined-infrastructure. These functions are anything from concatenating to creating random values. You define your infrastructure by calling certain API's for your resources. These API calls are for anything from VMs to storage accounts to function apps. ARM is a configuration management tool much like Ansible and Chef. AKS What is AKS? AKS is an Azure-hosted Kubernetes service. This service allows you to tie your Kubernetes micro-service infrastructure into Azure. Azure hosts the master node for you (where the Kubernetes API is) and allows you to manage your workers. Azure What is Azure? Azure is a cloud based platform to host your infrastructure vs using standard on-prem. Azure DevOps What is Azure DevOps? Azure DevOps is an entire toolset of DevOps tooling. Azure DevOps can also be used to deploy things in ESXi and AWS. Despite the name, it's not just for Azure. Azure DevOps is comprised of; A ticketing system A wiki CI CD Test plans Azure Repos (Like Github) PowerShell What is PowerShell? PowerShell is a programming/scripting language that is used to automate your deployments. Anything from building Docker settings to deploying VMs to building full-fledged automation solutions for your entire infrastructure .NET Framework What is the .NET Framework? The .NET framework is what is under the hood of PowerShell. Because of this, you're able to use assemblies (DLLs), namespaces, classes, and methods of the .NET framework to incorporate into your automation-based PowerShell tooling Git What is Git? Git is a distributed version control system to track your source code (commits, pushes, pulls, history, etc.) of your Dev, UAT, and Production code base. Github/Azure Repos What is Github/Azure Repos? This tooling is where you store your source code and use Git to interact with your source code (for DVC) VSCode What is VSCode? VSCode is an IDE/script editor to write your code. VSCode also has extensions that allow you to tie into services from Azure to PowerShell to Python to YAML etc. https://github.com/AdminTurnedDevOps/WebAppTesting/blob/master/New-Smoketest.ps1 https://www.thelifeofanengineer.org/2019/08/devops-tooling-in-microsoft-realm.html https://github.com/AdminTurnedDevOps/WebAppTesting/blob/master/New-Smoketest.ps1
Brian Grant joined the Borg team in 2009, and went on to co-found both Omega and Kubernetes. He is co-Technical Lead of Google Kubernetes Engine, co-Chair of Kubernetes SIG Architecture, a Kubernetes API approver, a Kubernetes Steering Committee member, and a CNCF Technical Oversight Committee member, where he’s sponsored 11 CNCF projects. Your hosts talk to him about all those things. Do you have something cool to share? Some questions? Let us know: web: kubernetespodcast.com mail: kubernetespodcast@google.com twitter: @kubernetespod Chatter of the week Sunset from Mauao (Mount Maunganui) Russian Doll on Netflix Edge of Tomorrow sequel back on News of the week Rancher introduces k3s Didn’t they launch it 5 months ago? k3s.io VMware launches VMware Essential PKS Istio Operator from BanzaiCloud CVE-2019-1002100 containerd graduates at the CNCF Scytale announces $5m funding and Scytale Enterprise SPIFFE and SPIRE Automate operations on your cluster with OperatorHub.io OperatorHub website RightScale State of the Cloud 2019 Links from the interview Borg, Omega and Kubernetes Borg paper Omega paper Issue 831: implement Image volumes and container volumes in Kubernetes Chubby key-value store paper IP per Pod LMCTFY CNCF TOC Updated 2018 mission for the CNCF SIG and Working Group List Devstats PR 1325: create kubectl Brian Grant on Twitter PR 607
Kubernetes is one of the main open source systems used for cloud computing. Janet Kuo, Software Engineer at Google explained various Kubernetes patterns. We talked about stateless, stateful, daemons and batch. Janet also explained the declarative nature of the Kubernetes API and examples of its application.
This week on The New Stack Analysts podcast, we take a closer look at the appeal of using virtual machines in Kubernetes environments. The discussion was sparked by a popular blog post penned last month by Pivotal Principal Technologist Paul Czarkowski. The problem with basic Docker-styled containers is that they do not offer sufficient security in multitenant environments, where multiple deployments intermingle on the same set of Kubernetes-controlled servers. So we spoke with Czarkowski to learn more of his thinking. Linux containers all rely on a shared kernel from the kernel, and isolation is provided by the kernel through namespaces. The Kubernetes API, however, is not secured, and most K8s components are not aware of the tenants. This is forcing service providers to provision Kubernetes workloads for different clients as separate clusters, not taking full advantage of the full savings that Kubernetes could provide by pooling workloads on the same cluster, Czarkowski argued.
Bu bölümde Kubernetes API açığı, Microsoft Edge'in Chromium'a geçiş kararı, Facebook privacy ihlalleri ve api'larini Vine'a kapatmasını, .NET Core 3.0 ile gelen yenilikleri ve biraz da 2018'in özetini konu aldık. Konuşmacılar; Deniz İrgin, Deniz Özgen, Fırat Özbolat, Mert Susur ve Uğur Atar.
Adam and Craig end the year by talking to Jordan Liggitt, the member of the Kubernetes Product Security Team who fixed the recent critical security vulnerability in the Kubernetes API server. We also take a look at the news from KubeCon. This is our last episode for 2018. Thank you for your support this year, and we’ll be back on the 8th of January! Do you have something cool to share? Some questions? Let us know: web: kubernetespodcast.com mail: kubernetespodcast@google.com twitter: @kubernetespod News of the week etcd donated to the CNCF Chubby paper Raft paper Blog post on the relationship between Kubernetes and etcd by Gyuho Lee and Joe Betz Istio: Geekwire: Has Istio become the new cloud-native darling? Google launches Istio on GKE VMware NSX Service Mesh Aspen Mesh open beta In other service mesh news: A10 Secure Service Mesh Knative: Knative: bringing serverless to Kubernetes everywhere SAP: Extensibility on cloud-native stack Red Hat to deliver hybrid serverless workloads to the enterprise Pivotal launches Function Service GitLab and TriggerMesh announce GitLab Serverless Oracle Cloud Native Framework Microsoft: Osiris Azure Monitor for Containers is GA Phippy Goes To The Zoo Phippy, Captain Kube and friends now in the CNCF Digital Ocean Kubernetes now open to everyone Linode Kubernetes CLI Terraform scripts VMware closes its acquisition of Heptio For $550M Dell will go public again Quickfire Kubernetes security news NeuVector announced containerd and CRI-O runtime support in their container firewall Aqua’s Container Security Platform is now certified to cover the Kubernetes CIS benchmarks Lacework announced their configuration scanning platform covers Kubernetes Sysdig released Sysdig Secure 2.2, which adds Kubernetes audit events, and the ability to block deployments using Kubernetes admission controllers Twistlock released 18.11, which “introduces security visualization for Kubernetes, and compliance and security configuration checks for Istio, including new alerting integrations with PagerDuty, and cloud services Grafana Loki Thanos: Prometheus at scale Maestro – A declarative, no-code approach to Kubernetes Day 2 Operators rbacsync PlanetScale announces funding TechCrunch article Links from the interview Jordan’s suggested KubeCon talks to watch: Kelsey Hightower’s keynote, “Kubernetes and the path to serverless” Julia Evans’ keynote, “High Reliability Infrastructure Migrations” OpenShift before Kubernetes in 2014 Kubernetes Product Security Team CVE-2018-1002105: proxy request handling in kube-apiserver can leave vulnerable TCP connections Listing in the National Vulnerability Database Originally filed as a bug against Rancher Rancher blog post How to report a vulnerability Proof of concept (third party) How it was fixed Distributor’s list Client certificate vulnerability in Kubernetes in 2016 Answering questions on Stack Overflow Jordan Liggitt on Twitter, GitHub, Slack or Stack Overflow
Complex cloud applications need decent visualizations to help infrastructure engineers understand what's going on. For cloud visibility, Kentik absorbs AWS & GCP flow logs, with Azure support coming. Kubernetes & Istio are also data providers to Kentik. Using the Kubernetes API, Kentik correlates pod IPs to pod names and cluster namespaces. With that information, Kentik can visualize pod to pod and service to service traffic flows within a Kubernetes cluster. The post BiB 066: Why Cloud Visibility Matters appeared first on Packet Pushers.
Complex cloud applications need decent visualizations to help infrastructure engineers understand what's going on. For cloud visibility, Kentik absorbs AWS & GCP flow logs, with Azure support coming. Kubernetes & Istio are also data providers to Kentik. Using the Kubernetes API, Kentik correlates pod IPs to pod names and cluster namespaces. With that information, Kentik can visualize pod to pod and service to service traffic flows within a Kubernetes cluster. The post BiB 066: Why Cloud Visibility Matters appeared first on Packet Pushers.
Complex cloud applications need decent visualizations to help infrastructure engineers understand what's going on. For cloud visibility, Kentik absorbs AWS & GCP flow logs, with Azure support coming. Kubernetes & Istio are also data providers to Kentik. Using the Kubernetes API, Kentik correlates pod IPs to pod names and cluster namespaces. With that information, Kentik can visualize pod to pod and service to service traffic flows within a Kubernetes cluster. The post BiB 066: Why Cloud Visibility Matters appeared first on Packet Pushers.
VP of Infrastructure at Google Cloud Eric Brewer, talks to Melanie and Mark all about open source at Google Cloud, distributed systems, hybrid cloud, and more! Eric Brewer Eric Brewer is the main inventor of a wireless networking scheme called WiLDNet, which promises to bring low-cost connectivity to rural areas of the developing world. He is a tenured professor of Computer Science at UC Berkeley. In 1996, Brewer co-founded Inktomi Corporation (bought by Yahoo! in 2003) and became a paper billionaire during the dot-com bubble. Working with Bill Clinton, he helped to create USA.gov, which launched in 2000.[1] He is known for formulating the CAP Theorem about distributed network applications in the late 1990s.[2] Starting in May 2011 he has been on a sabbatical at Google as VP of Infrastructure.[3] Credits: Wikipedia Cool things of the week Google Cloud Next site Google Cloud Next London site Google Cloud Next Tokyo site Cloud SQL for PostgreSQL now generally available and ready for your production workloads blog Calling C functions from BigQuery with Web Assembly blog BigQuery beyond SQL and JS: Running C and Rust code at scale blog Kubernetes best practices: How and why to build small container images blog youtube Interview Nine faculty elected to American Academy of Arts and Sciences blog USA.gov site Eric Brewer research at google Kubernetes site 2014 Dockercon Keynote youtube 2017 Google Cloud Next Keynote youtube Istio site Extend the Kubernetes API with CustomResourceDefinitions docs Mentors Butler Lampson Barbara Liskov David Patterson Question of the week If I want to visualise the network traffic between pods/services within my Kubernetes cluster, is there an easy way to do this? Weavescope features installation Where can you find us next? Mark can be found streaming Agones development on Twitch, and will be presenting on Agones at Cloud Next. Melanie will be presenting at the internet2 Global Summit, May 9th in San Diego, and will also be talking at the Understand Risk Forum on May 17th, in Mexico City.
Brendan Burns joins Donovan Brown to discuss the use of Kubernetes with Azure Container Instances. The Azure Container Instances Connector for Kubernetes allows Kubernetes clusters to deploy Azure Container Instances, which enables on-demand and nearly instantaneous container compute, orchestrated by Kubernetes, without having VM infrastructure to manage and while still leveraging the portable Kubernetes API. See also: Azure Container Instances with Sean McKenna on Azure Friday For more information, see: Azure Container Instances (Overview) Azure Container Instances (Docs) Azure Container Instances (Pricing) Fast and Easy Containers: Azure Container Instances (Azure Blog) Azure Container Instances Connector for Kubernetes (GitHub) Create a Free Account (Azure) Follow @SHanselman Follow @DonovanBrown Follow @AzureFriday Follow @BrendanBurns
Brendan Burns joins Donovan Brown to discuss the use of Kubernetes with Azure Container Instances. The Azure Container Instances Connector for Kubernetes allows Kubernetes clusters to deploy Azure Container Instances, which enables on-demand and nearly instantaneous container compute, orchestrated by Kubernetes, without having VM infrastructure to manage and while still leveraging the portable Kubernetes API. See also: Azure Container Instances with Sean McKenna on Azure Friday For more information, see: Azure Container Instances (Overview) Azure Container Instances (Docs) Azure Container Instances (Pricing) Fast and Easy Containers: Azure Container Instances (Azure Blog) Azure Container Instances Connector for Kubernetes (GitHub) Create a Free Account (Azure) Follow @SHanselman Follow @DonovanBrown Follow @AzureFriday Follow @BrendanBurns
Tim Hockin, one of the engineers that started the Kubernetes project, joins Francesc and Mark to talk about all of the cool stuff coming up with Kubernetes 1.7. About Tim Hockin Tim was one of the first engineers on Kubernetes and GKE, where he has been involved in things like networking, storage, node management, API, plugins, and more. Before Kubernetes, he worked on Google's internal systems, Borg and Omega, mostly on the node management side, and on Google's machine management, hardware bringup, and kernels. He has been contributing to open-source projects since 1995, when he first learned C. Cool things of the week Cloud Shell's code editor now in beta announcement How App Engine helped power Super Mario Run blog post New hands-on labs for scientific data processing on Google Cloud Platform blog post Interview kubernetes.io is an open-source system for automating deployment, scaling, and management of containerized applications. Kubernetes 1.7: Security Hardening, Stateful Application Updates and Extensibility blog post Kubernetes 1.7 release notes Kubernetes StatefulSets docs Kubernetes API Aggregation GitHub issue Extend the Kubernetes API with CustomResourceDefinitions docs Question of the week When should I use a pod and when a container? Tim Hockin's slides are here. Where can you find us next? Francesc just released a justforfunc episode on Go Testing. He'll be soon taking some well deserved holidays! Mark will be speaking at Pax Dev and then attending Pax West right after.
In the twenty second episode of this podcast, your hosts Francesc and Mark interview Rae Wang, a Product Manager at Google, about IAM on the Google Cloud Platform. About Rae Rae is a product manager at Google and looks after IAM (Identity and Access Management) on GCP. She has been at Google for 3 years and is based in the Seattle office. Before Google she worked in other software companies for over a decade. Cool thing of the week Bonus interview with Brendan Burns, lead engineer on Kubernetes: Google's open source cluster manager for containers. Latte vs. Kubernetes setup - which is faster? YouTube Kubernetes Config Maps docs Adding custom resources to the Kubernetes API server docs Kubernetes Cluster Federation (a.k.a. “Ubernetes”) docs Scaling neural network image classification using Kubernetes with TensorFlow Serving docs Episode #16, a Product Manager at Google that works amongh other projects on IAM GCPPodcast Interviews Google Cloud Identity & Access Management docs GCPNext - Identity and Access Management on Google Cloud Platform YouTube Google Cloud Platform Auth Guide guide Connecting to Other Google Cloud Platform Services guide Granting Access with IAM Roles guide OAuth 2.0 Service Accounts docs Question of the week Google Cloud CDN docs Google Container Engine docs