POPULARITY
Web and Mobile App Development (Language Agnostic, and Based on Real-life experience!)
In this conversation, Krish Palaniappan discusses the intricacies of deploying an API gateway on AWS, focusing on the management of API usage, reporting, and the challenges faced with certificate management. He elaborates on the deployment strategies across different environments, the debugging process for certificate issues, and the importance of understanding endpoint types and SSL certificates. The conversation also highlights the resolution of certificate chain issues and the necessary code adjustments to ensure smooth operation. In this conversation, Krish Palaniappan discusses the intricacies of optimizing AWS Lambda layers, the transition from AWS SDK version 2 to version 3, and the importance of efficient deployment strategies. He emphasizes the need for local development and testing using Express to enhance productivity and streamline the onboarding process for customers, including API key management and usage plans. Snowpal Products Backends as Services on AWS Marketplace Mobile Apps on App Store and Play Store Web App Education Platform for Learners and Course Creators
Episode NotesIntroduction to Cargo LambdaInteracts with AWS Lambda ecosystem from the terminalEnables native running, building, and deployment of Lambda functionsNo need for containers or VMsInstallation OptionsHomebrew (recommended for macOS and Linux)Scoop for WindowsDocker and Nix as alternativesBinary release or building from sourceGetting StartedUse cargo lambda new to create a projectDirectory structure includes package management, default code, compiler, and lintercargo lambda watch for immediate code writingcargo lambda invoke for testing with JSON payloadsWeb Framework SupportAbility to expose microservices with HTTP interfacesDeployment Processcargo lambda build --release for building (including ARM64 support)cargo lambda deploy for straightforward deploymentAdditional FeaturesVerbose mode and tracing options availableIntegration with GitHub Actions and AWS CDKAdvantages of Cargo LambdaLeverages the robust Rust ecosystemModern package management with CargoPotentially easier than scripting languages for Lambda developmentKey TakeawaysCargo Lambda offers a superior method for interacting with AWS Lambda compared to scripting languages.The tool provides a streamlined workflow for creating, testing, and deploying Lambda functions.It leverages the Rust ecosystem, offering modern package management and development tools.Cargo Lambda supports both function-based and web framework approaches for Lambda development.The ease of use and integration with AWS services make it an attractive option for Lambda developers.
Dans cet épisode du podcast AWS, Seb discute des dernières nouvelles et mises à jour d'AWS au cours des deux dernières semaines. Il met en avant les événements à venir, notamment la Journée des développeurs de nouvelle génération, et présente de nouvelles fonctionnalités dans les services AWS tels que CodeBuild, Corretto et Amplify. L'épisode couvre également l'introduction de l'AWS SDK pour Swift et les améliorations de performance des nouvelles instances EC2 Graviton 4. Enfin, Seb partage des mises à jour sur les modèles d'IA générative disponibles dans Amazon Bedrock.
AWS Morning Brief for the week of September 23, with Corey Quinn. Links:AWS Transfer Family increases throughput and file sizes supported by SFTP connectors AWS WAF Bot Control Managed Rule expands bot detection capabilities AWS named as a Leader in the 2024 Gartner Magic Quadrant for Desktop as a Service (DaaS)Announcing General Availability of the AWS SDK for Swift Reinventing the Amazon Q Developer agent for software development Support for AWS DeepComposer ending soon Unlock AWS Cost and Usage insights with generative AI powered by Amazon Bedrock AWS Welcomes the OpenSearch Software Foundation The Rise of Chatbots: Revolutionizing Customer Engagement
Andreas and Michael Wittig were pretty jazzed about writing unit tests using mocks for the AWS SDK v3 in JavaScript. They broke down Amazon's new GuardDuty malware protection for S3 and how it compares to their own product bucketAV. The duo also covered testing Terraform modules and using aws-nuke to clean up leftover resources from failed tests. They gave their two cents on some recent AWS service announcements too - CloudWatch, Fargate, CloudFormation and more!
In this episode, I spoke with Brian LeRoux, co-founder of begin.com and creator of the Architect framework. Brian is also an AWS Serverless Hero and is currently working on enhance.dev, an HTML-first full-stack web framework.In a wide-ranging conversation, we discussed:the Architect frameworkLambdalith vs. Single-purpose functionsBuilding a faster AWS SDK (aws-lite)Web componentsFunctionlessWASMInfra from code frameworks such as AmptLinks from the episode:AWS Lite SDKArchitect frameworkBeginEnhance frameworkThe LocalStack episodeThe LLRT episodeAmpt by Jeremy DalyMy serverless testing courseOpening theme song:Cheery Monday by Kevin MacLeodLink: https://incompetech.filmmusic.io/song/3495-cheery-mondayLicense: http://creativecommons.org/licenses/by/4.0
Cosa succede se il mondo dell'application modernization converge verso la cloud transformation, e il tuo linguaggio/framework non è supportato nativamente dal cloud vendor?Prendendo il caso di Delphi, ne parliamo con Marco Cantù che ne è Product Manager presso Embarcadero Technologies.Link menzionati nell'episodio:Delphi https://www.embarcadero.com/products/delphi Delphi Community Edition https://www.embarcadero.com/products/delphi/starterSito Marco Cantù https://www.marcocantu.com/AWS SDK for Delphi https://www.appercept.com/appercept-aws-sdk-for-delphiEsempi d'uso dell'SDK: https://github.com/appercept/aws-sdk-delphi-samplesKudosEmanuele Garofalo per la postproduzione dell'episodioContattiCanale Telegram di Cloud Champions: https://t.me/CloudChampions
AWS Morning Brief for the week of August 21, 2023 with Corey Quinn. Links: Corey is performing a live Q&A next month; submit your questions here! Amazon Polly launches new Gulf Arabic male NTTS voice AWS HealthOmics supports cross-account sharing of omics analytics stores New – Amazon EC2 M7a General Purpose Instances Powered by 4th Gen AMD EPYC Processors Amazon OpenSearch Serverless expands support for larger workloads and collections Reduce Lambda cold start times: migrate to AWS SDK for JavaScript v3 Architecting for Resilience in the cloud for critical railway systems How Amazon Shopping uses Amazon Rekognition Content Moderation to review harmful images in product reviews Zero-shot text classification with Amazon SageMaker JumpStart Build a multi-account access notification system with Amazon EventBridge Getting Started with CloudWatch agent and collectd Cost considerations and common options for AWS Network Firewall log management Addressing gender inequity in the technology industry
Join Brooke and Dave as they dive into an enthralling conversation with Noah Gift, Duke EIR for Data Science and AI, exploring the powerful synergy between Rust and the dynamic world of AI/ML. Journey through Noah's illustrious career, including his remarkable stint working on blockbuster movies like Avatar, and his transition into the fascinating sphere of Machine Learning. This episode uncovers the strengths of the Rust programming language, the evolving landscape of MLOps, the ways Rust enhances developers' work, its integration with Lambda, the substantial cost savings Rust can usher in, and the thrilling developments on the horizon for developers. If you want to stay ahead of the curve in technology, this episode is a must-listen! Noah on LinkedIn: https://www.linkedin.com/in/noahgift/ Noah on Coursera: https://www.coursera.org/instructor/noahgift Brooke's Twitter: twitter.com/brooke_jamieson Brooke on LinkedIn: www.linkedin.com/in/brookejamieson/ Brooke's TikTok: www.tiktok.com/@brookebytes Brooke's Instagram: www.instagram.com/brooke.bytes/ Dave on Twitter: twitter.com/thedavedev NewTek Video Toaster - https://en.wikipedia.org/wiki/Video_Toaster Podcast Episode 083 - Decoding the Future of Safe and Efficient Programming with Rust and Tim McNamara https://open.spotify.com/episode/5AsvddooY9a04gKviOC7cB [BLOG] Rust Runtime for AWS Lambda: https://aws.amazon.com/blogs/opensource/rust-runtime-for-aws-lambda/ [DOCS] Building Lambda functions with Rust: https://docs.aws.amazon.com/lambda/latest/dg/lambda-rust.html [DOCS] Getting started with the AWS SDK for Rust: https://docs.aws.amazon.com/sdk-for-rust/latest/dg/getting-started.html [DOCS] Onnx runtime for Rust: https://docs.rs/onnxruntime/latest/onnxruntime/ [DOCS] Polars – DataFrame library for Rust: https://docs.rs/polars/latest/polars/ [GIT] Noahs Rust MLOps Template: https://github.com/nogibjj/rust-mlops-template [PORTAL] Start with Rust here: www.rust-lang.org/ [PORTAL] Rayon - Data-parallelism library that makes it easy to convert sequential computations into parallel: https://docs.rs/rayon/latest/rayon/ [PORTAL] Amazon CodeWhisperer - Build applications faster and more securely with your AI coding companion: https://aws.amazon.com/codewhisperer/ Subscribe: Spotify: https://open.spotify.com/show/7rQjgnBvuyr18K03tnEHBI Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-developers-podcast/id1574162669 Stitcher: https://www.stitcher.com/show/1065378 Pandora: https://www.pandora.com/podcast/aws-developers-podcast/PC:1001065378 TuneIn: https://tunein.com/podcasts/Technology-Podcasts/AWS-Developers-Podcast-p1461814/ Amazon Music: https://music.amazon.com/podcasts/f8bf7630-2521-4b40-be90-c46a9222c159/aws-developers-podcast Google Podcasts: https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5zb3VuZGNsb3VkLmNvbS91c2Vycy9zb3VuZGNsb3VkOnVzZXJzOjk5NDM2MzU0OS9zb3VuZHMucnNz RSS Feed: https://feeds.soundcloud.com/users/soundcloud:users:994363549/sounds.rss
AB Periasamy, Co-Founder and CEO of MinIO, joins Corey on Screaming in the Cloud to discuss what it means to be truly open source and the current and future state of multi-cloud. AB explains how MinIO was born from the idea that the world was going to produce a massive amount of data, and what it's been like to see that come true and continue to be the future outlook. AB and Corey explore why some companies are hesitant to move to cloud, and AB describes why he feels the move is inevitable regardless of cost. AB also reveals how he has helped create a truly free open-source software, and how his partnership with Amazon has been beneficial. About ABAB Periasamy is the co-founder and CEO of MinIO, an open source provider of high performance, object storage software. In addition to this role, AB is an active investor and advisor to a wide range of technology companies, from H2O.ai and Manetu where he serves on the board to advisor or investor roles with Humio, Isovalent, Starburst, Yugabyte, Tetrate, Postman, Storj, Procurify, and Helpshift. Successful exits include Gitter.im (Gitlab), Treasure Data (ARM) and Fastor (SMART).AB co-founded Gluster in 2005 to commoditize scalable storage systems. As CTO, he was the primary architect and strategist for the development of the Gluster file system, a pioneer in software defined storage. After the company was acquired by Red Hat in 2011, AB joined Red Hat's Office of the CTO. Prior to Gluster, AB was CTO of California Digital Corporation, where his work led to scaling of the commodity cluster computing to supercomputing class performance. His work there resulted in the development of Lawrence Livermore Laboratory's “Thunder” code, which, at the time was the second fastest in the world. AB holds a Computer Science Engineering degree from Annamalai University, Tamil Nadu, India.AB is one of the leading proponents and thinkers on the subject of open source software - articulating the difference between the philosophy and business model. An active contributor to a number of open source projects, he is a board member of India's Free Software Foundation.Links Referenced: MinIO: https://min.io/ Twitter: https://twitter.com/abperiasamy LinkedIn: https://www.linkedin.com/in/abperiasamy/ Email: mailto:ab@min.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. When it costs more money and time to observe your environment than it does to build it, there's a problem. With Chronosphere, you can shape and transform observability data based on need, context and utility. Learn how to only store the useful data you need to see in order to reduce costs and improve performance at chronosphere.io/corey-quinn. That's chronosphere.io/corey-quinn. And my thanks to them for sponsor ing my ridiculous nonsense. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn, and I have taken a somewhat strong stance over the years on the relative merits of multi-cloud, and when it makes sense and when it doesn't. And it's time for me to start modifying some of those. To have that conversation and several others as well, with me today on this promoted guest episode is AB Periasamy, CEO and co-founder of MinIO. AB, it's great to have you back.AB: Yes, it's wonderful to be here again, Corey.Corey: So, one thing that I want to start with is defining terms. Because when we talk about multi-cloud, there are—to my mind at least—smart ways to do it and ways that are frankly ignorant. The thing that I've never quite seen is, it's greenfield, day one. Time to build something. Let's make sure we can build and deploy it to every cloud provider we might ever want to use.And that is usually not the right path. Whereas different workloads in different providers, that starts to make a lot more sense. When you do mergers and acquisitions, as big companies tend to do in lieu of doing anything interesting, it seems like they find it oh, we're suddenly in multiple cloud providers, should we move this acquisition to a new cloud? No. No, you should not.One of the challenges, of course, is that there's a lot of differentiation between the baseline offerings that cloud providers have. MinIO is interesting in that it starts and stops with an object store that is mostly S3 API compatible. Have I nailed the basic premise of what it is you folks do?AB: Yeah, it's basically an object store. Amazon S3 versus us, it's actually—that's the comparable, right? Amazon S3 is a hosted cloud storage as a service, but underneath the underlying technology is called object-store. MinIO is a software and it's also open-source and it's the software that you can deploy on the cloud, deploy on the edge, deploy anywhere, and both Amazon S3 and MinIO are exactly S3 API compatible. It's a drop-in replacement. You can write applications on MinIO and take it to AWS S3, and do the reverse. Amazon made S3 API a standard inside AWS, we made S3 API standard across the whole cloud, all the cloud edge, everywhere, rest of the world.Corey: I want to clarify two points because otherwise I know I'm going to get nibbled to death by ducks on the internet. When you say open-source, it is actually open-source; you're AGPL, not source available, or, “We've decided now we're going to change our model for licensing because oh, some people are using this without paying us money,” as so many companies seem to fall into that trap. You are actually open-source and no one reasonable is going to be able to disagree with that definition.The other pedantic part of it is when something says that it's S3 compatible on an API basis, like, the question is always does that include the weird bugs that we wish it wouldn't have, or some of the more esoteric stuff that seems to be a constant source of innovation? To be clear, I don't think that you need to be particularly compatible with those very corner and vertex cases. For me, it's always been the basic CRUD operations: can you store an object? Can you give it back to me? Can you delete the thing? And maybe an update, although generally object stores tend to be atomic. How far do you go down that path of being, I guess, a faithful implementation of what the S3 API does, and at which point you decide that something is just, honestly, lunacy and you feel no need to wind up supporting that?AB: Yeah, the unfortunate part of it is we have to be very, very deep. It only takes one API to break. And it's not even, like, one API we did not implement; one API under a particular circumstance, right? Like even if you see, like, AWS SDK is, right, Java SDK, different versions of Java SDK will interpret the same API differently. And AWS S3 is an API, it's not a standard.And Amazon has published the REST specifications, API specs, but they are more like religious text. You can interpret it in many ways. Amazon's own SDK has interpreted, like, this in several ways, right? The only way to get it right is, like, you have to have a massive ecosystem around your application. And if one thing breaks—today, if I commit a code and it introduced a regression, I will immediately hear from a whole bunch of community what I broke.There's no certification process here. There is no industry consortium to control the standard, but then there is an accepted standard. Like, if the application works, they need works. And one way to get it right is, like, Amazon SDKs, all of those language SDKs, to be cleaner, simpler, but applications can even use MinIO SDK to talk to Amazon and Amazon SDK to talk to MinIO. Now, there is a clear, cooperative model.And I actually have tremendous respect for Amazon engineers. They have only been kind and meaningful, like, reasonable partnership. Like, if our community reports a bug that Amazon rolled out a new update in one of the region and the S3 API broke, they will actually go fix it. They will never argue, “Why are you using MinIO SDK?” Their engineers, they do everything by reason. That's the reason why they gained credibility.Corey: I think, on some level, that we can trust that the API is not going to meaningfully shift, just because so much has been built on top of it over the last 15, almost 16 years now that even slight changes require massive coordination. I remember there was a little bit of a kerfuffle when they announced that they were going to be disabling the BitTorrent endpoint in S3 and it was no longer going to be supported in new regions, and eventually they were turning it off. There were still people pushing back on that. I'm still annoyed by some of the documentation around the API that says that it may not return a legitimate error code when it errors with certain XML interpretations. It's… it's kind of become very much its own thing.AB: [unintelligible 00:06:22] a problem, like, we have seen, like, even stupid errors similar to that, right? Like, HTTP headers are supposed to be case insensitive, but then there are some language SDKs will send us in certain type of casing and they expect the case to be—the response to be same way. And that's not HTTP standard. If we have to accept that bug and respond in the same way, then we are asking a whole bunch of community to go fix that application. And Amazon's problem are our problems too. We have to carry that baggage.But some places where we actually take a hard stance is, like, Amazon introduced that initially, the bucket policies, like access control list, then finally came IAM, then we actually, for us, like, the best way to teach the community is make best practices the standard. The only way to do it. We have been, like, educating them that we actually implemented ACLs, but we removed it. So, the customers will no longer use it. The scale at which we are growing, if I keep it, then I can never force them to remove.So, we have been pedantic about, like, how, like, certain things that if it's a good advice, force them to do it. That approach has paid off, but the problem is still quite real. Amazon also admits that S3 API is no longer simple, but at least it's not like POSIX, right? POSIX is a rich set of API, but doesn't do useful things that we need to do. So, Amazon's APIs are built on top of simple primitive foundations that got the storage architecture correct, and then doing sophisticated functionalities on top of the simple primitives, these atomic RESTful APIs, you can finally do it right and you can take it to great lengths and still not break the storage system.So, I'm not so concerned. I think it's time for both of us to slow down and then make sure that the ease of operation and adoption is the goal, then trying to create an API Bible.Corey: Well, one differentiation that you have that frankly I wish S3 would wind up implementing is this idea of bucket quotas. I would give a lot in certain circumstances to be able to say that this S3 bucket should be able to hold five gigabytes of storage and no more. Like, you could fix a lot of free tier problems, for example, by doing something like that. But there's also the problem that you'll see in data centers where, okay, we've now filled up whatever storage system we're using. We need to either expand it at significant cost and it's going to take a while or it's time to go and maybe delete some of the stuff we don't necessarily need to keep in perpetuity.There is no moment of reckoning in traditional S3 in that sense because, oh, you can just always add one more gigabyte at 2.3 or however many cents it happens to be, and you wind up with an unbounded growth problem that you're never really forced to wrestle with. Because it's infinite storage. They can add drives faster than you can fill them in most cases. So, it's it just feels like there's an economic story, if nothing else, just from a governance control and make sure this doesn't run away from me, and alert me before we get into the multi-petabyte style of storage for my Hello World WordPress website.AB: Mm-hm. Yeah, so I always thought that Amazon did not do this—it's not just Amazon, the cloud players, right—they did not do this because they want—is good for their business; they want all the customers' data, like unrestricted growth of data. Certainly it is beneficial for their business, but there is an operational challenge. When you set quota—this is why we grudgingly introduced this feature. We did not have quotas and we didn't want to because Amazon S3 API doesn't talk about quota, but the enterprise community wanted this so badly.And eventually we [unintelligible 00:09:54] it and we gave. But there is one issue to be aware of, right? The problem with quota is that you as an object storage administrator, you set a quota, let's say this bucket, this application, I don't see more than 20TB; I'm going to set 100TB quota. And then you forget it. And then you think in six months, they will reach 20TB. The reality is, in six months they reach 100TB.And then when nobody expected—everybody has forgotten that there was a code a certain place—suddenly application start failing. And when it fails, it doesn't—even though the S3 API responds back saying that insufficient space, but then the application doesn't really pass that error all the way up. When applications fail, they fail in unpredictable ways. By the time the application developer realizes that it's actually object storage ran out of space, the lost time and it's a downtime. So, as long as they have proper observability—because I mean, I've will also asked observability, that it can alert you that you are only going to run out of space soon. If you have those system in place, then go for quota. If not, I would agree with the S3 API standard that is not about cost. It's about operational, unexpected accidents.Corey: Yeah, on some level, we wound up having to deal with the exact same problem with disk volumes, where my default for most things was, at 70%, I want to start getting pings on it and at 90%, I want to be woken up for it. So, for small volumes, you wind up with a runaway log or whatnot, you have a chance to catch it and whatnot, and for the giant multi-petabyte things, okay, well, why would you alert at 70% on that? Well, because procurement takes a while when we're talking about buying that much disk for that much money. It was a roughly good baseline for these things. The problem, of course, is when you have none of that, and well it got full so oops-a-doozy.On some level, I wonder if there's a story around soft quotas that just scream at you, but let you keep adding to it. But that turns into implementation details, and you can build something like that on top of any existing object store if you don't need the hard limit aspect.AB: Actually, that is the right way to do. That's what I would recommend customers to do. Even though there is hard quota, I will tell, don't use it, but use soft quota. And the soft quota, instead of even soft quota, you monitor them. On the cloud, at least you have some kind of restriction that the more you use, the more you pay; eventually the month end bills, it shows up.On MinIO, when it's deployed on these large data centers, that it's unrestricted access, quickly you can use a lot of space, no one knows what data to delete, and no one will tell you what data to delete. The way to do this is there has to be some kind of accountability.j, the way to do it is—actually [unintelligible 00:12:27] have some chargeback mechanism based on the bucket growth. And the business units have to pay for it, right? That IT doesn't run for free, right? IT has to have a budget and it has to be sponsored by the applications team.And you measure, instead of setting a hard limit, you actually charge them that based on the usage of your bucket, you're going to pay for it. And this is a observability problem. And you can call it soft quotas, but it hasn't been to trigger an alert in observability. It's observability problem. But it actually is interesting to hear that as soft quotas, which makes a lot of sense.Corey: It's one of those problems that I think people only figure out after they've experienced it once. And then they look like wizards from the future who, “Oh, yeah, you're going to run into a quota storage problem.” Yeah, we all find that out because the first time we smack into something and live to regret it. Now, we can talk a lot about the nuances and implementation and low level detail of this stuff, but let's zoom out of it. What are you folks up to these days? What is the bigger picture that you're seeing of object storage and the ecosystem?AB: Yeah. So, when we started, right, our idea was that world is going to produce incredible amount of data. In ten years from now, we are going to drown in data. We've been saying that today and it will be true. Every year, you say ten years from now and it will still be valid, right?That was the reason for us to play this game. And we saw that every one of these cloud players were incompatible with each other. It's like early Unix days, right? Like a bunch of operating systems, everything was incompatible and applications were beginning to adopt this new standard, but they were stuck. And then the cloud storage players, whatever they had, like, GCS can only run inside Google Cloud, S3 can only run inside AWS, and the cloud player's game was bring all the world's data into the cloud.And that actually requires enormous amount of bandwidth. And moving data into the cloud at that scale, if you look at the amount of data the world is producing, if the data is produced inside the cloud, it's a different game, but the data is produced everywhere else. MinIO's idea was that instead of introducing yet another API standard, Amazon got the architecture right and that's the right way to build large-scale infrastructure. If we stick to Amazon S3 API instead of introducing it another standard, [unintelligible 00:14:40] API, and then go after the world's data. When we started in 2014 November—it's really 2015, we started, it was laughable. People thought that there won't be a need for MinIO because the whole world will basically go to AWS S3 and they will be the world's data store. Amazon is capable of doing that; the race is not over, right?Corey: And it still couldn't be done now. The thing is that they would need to fundamentally rethink their, frankly, you serious data egress charges. The problem is not that it's expensive to store data in AWS; it's that it's expensive to store data and then move it anywhere else for analysis or use on something else. So, there are entire classes of workload that people should not consider the big three cloud providers as the place where that data should live because you're never getting it back.AB: Spot on, right? Even if network is free, right, Amazon makes, like, okay, zero egress-ingress charge, the data we're talking about, like, most of MinIO deployments, they start at petabytes. Like, one to ten petabyte, feels like 100 terabyte. For even if network is free, try moving a ten-petabyte infrastructure into the cloud. How are you going to move it?Even with FedEx and UPS giving you a lot of bandwidth in their trucks, it is not possible, right? I think the data will continue to be produced everywhere else. So, our bet was there we will be [unintelligible 00:15:56]—instead of you moving the data, you can run MinIO where there is data, and then the whole world will look like AWS's S3 compatible object store. We took a very different path. But now, when I say the same story that when what we started with day one, it is no longer laughable, right?People believe that yes, MinIO is there because our market footprint is now larger than Amazon S3. And as it goes to production, customers are now realizing it's basically growing inside a shadow IT and eventually businesses realize the bulk of their business-critical data is sitting on MinIO and that's how it's surfacing up. So now, what we are seeing, this year particularly, all of these customers are hugely concerned about cost optimization. And as part of the journey, there is also multi-cloud and hybrid-cloud initiatives. They want to make sure that their application can run on any cloud or on the same software can run on their colos like Equinix, or like bunch of, like, Digital Reality, anywhere.And MinIO's software, this is what we set out to do. MinIO can run anywhere inside the cloud, all the way to the edge, even on Raspberry Pi. It's now—whatever we started with is now has become reality; the timing is perfect for us.Corey: One of the challenges I've always had with the idea of building an application with the idea to run it anywhere is you can make explicit technology choices around that, and for example, object store is a great example because most places you go now will or can have an object store available for your use. But there seem to be implementation details that get lost. And for example, even load balancers wind up being implemented in different ways with different scaling times and whatnot in various environments. And past a certain point, it's okay, we're just going to have to run it ourselves on top of HAproxy or Nginx, or something like it, running in containers themselves; you're reinventing the wheel. Where is that boundary between, we're going to build this in a way that we can run anywhere and the reality that I keep running into, which is we tried to do that but we implicitly without realizing it built in a lot of assumptions that everything would look just like this environment that we started off in.AB: The good part is that if you look at the S3 API, every request has the site name, the endpoint, bucket name, the path, and the object name. Every request is completely self-contained. It's literally a HTTP call away. And this means that whether your application is running on Android, iOS, inside a browser, JavaScript engine, anywhere across the world, they don't really care whether the bucket is served from EU or us-east or us-west. It doesn't matter at all, so it actually allows you by API, you can build a globally unified data infrastructure, some buckets here, some buckets there.That's actually not the problem. The problem comes when you have multiple clouds. Different teams, like, part M&A, the part—like they—even if you don't do M&A, different teams, no two data engineer will would agree on the same software stack. Then where they will all end up with different cloud players and some is still running on old legacy environment.When you combine them, the problem is, like, let's take just the cloud, right? How do I even apply a policy, that access control policy, how do I establish unified identity? Because I want to know this application is the only one who is allowed to access this bucket. Can I have that same policy on Google Cloud or Azure, even though they are different teams? Like if that employer, that project, or that admin, if he or she leaves the job, how do I make sure that that's all protected?You want unified identity, you want unified access control policies. Where are the encryption key store? And then the load balancer itself, the load, its—load balancer is not the problem. But then unless you adopt S3 API as your standard, the definition of what a bucket is different from Microsoft to Google to Amazon.Corey: Yeah, the idea of an of the PUTS and retrieving of actual data is one thing, but then you have how do you manage it the control plane layer of the object store and how do you rationalize that? What are the naming conventions? How do you address it? I even ran into something similar somewhat recently when I was doing an experiment with one of the Amazon Snowball edge devices to move some data into S3 on a lark. And the thing shows up and presents itself on the local network as an S3 endpoint, but none of their tooling can accept a different endpoint built into the configuration files; you have to explicitly use it as an environment variable or as a parameter on every invocation of something that talks to it, which is incredibly annoying.I would give a lot for just to be able to say, oh, when you're talking in this profile, that's always going to be your S3 endpoint. Go. But no, of course not. Because that would make it easier to use something that wasn't them, so why would they ever be incentivized to bake that in?AB: Yeah. Snowball is an important element to move data, right? That's the UPS and FedEx way of moving data, but what I find customers doing is they actually use the tools that we built for MinIO because the Snowball appliance also looks like S3 API-compatible object store. And in fact, like, I've been told that, like, when you want to ship multiple Snowball appliances, they actually put MinIO to make it look like one unit because MinIO can erase your code objects across multiple Snowball appliances. And the MC tool, unlike AWS CLI, which is really meant for developers, like low-level calls, MC gives you unique [scoring 00:21:08] tools, like lscp, rsync-like tools, and it's easy to move and copy and migrate data. Actually, that's how people deal with it.Corey: Oh, God. I hadn't even considered the problem of having a fleet of Snowball edges here that you're trying to do a mass data migration on, which is basically how you move petabyte-scale data, is a whole bunch of parallelism. But having to figure that out on a case-by-case basis would be nightmarish. That's right, there is no good way to wind up doing that natively.AB: Yeah. In fact, Western Digital and a few other players, too, now the Western Digital created a Snowball-like appliance and they put MinIO on it. And they are actually working with some system integrators to help customers move lots of data. But Snowball-like functionality is important and more and more customers who need it.Corey: This episode is sponsored in part by Honeycomb. I'm not going to dance around the problem. Your. Engineers. Are. Burned. Out. They're tired from pagers waking them up at 2 am for something that could have waited until after their morning coffee. Ring Ring, Who's There? It's Nagios, the original call of duty! They're fed up with relying on two or three different “monitoring tools” that still require them to manually trudge through logs to decipher what might be wrong. Simply put, there's a better way. Observability tools like Honeycomb (and very little else because they do admittedly set the bar) show you the patterns and outliers of how users experience your code in complex and unpredictable environments so you can spend less time firefighting and more time innovating. It's great for your business, great for your engineers, and, most importantly, great for your customers. Try FREE today at honeycomb.io/screaminginthecloud. That's honeycomb.io/screaminginthecloud.Corey: Increasingly, it felt like, back in the on-prem days, that you'd have a file server somewhere that was either a SAN or it was going to be a NAS. The question was only whether it presented it to various things as a volume or as a file share. And then in cloud, the default storage mechanism, unquestionably, was object store. And now we're starting to see it come back again. So, it started to increasingly feel, in a lot of ways, like Cloud is no longer so much a place that is somewhere else, but instead much more of an operating model for how you wind up addressing things.I'm wondering when the generation of prosumer networking equipment, for example, is going to say, “Oh, and send these logs over to what object store?” Because right now, it's still write a file and SFTP it somewhere else, at least the good ones; some of the crap ones still want old unencrypted FTP, which is neither here nor there. But I feel like it's coming back around again. Like, when do even home users wind up instead of where do you save this file to having the cloud abstraction, which hopefully, you'll never have to deal with an S3-style endpoint, but that can underpin an awful lot of things. It feels like it's coming back and that's cloud is the de facto way of thinking about things. Is that what you're seeing? Does that align with your belief on this?AB: I actually, fundamentally believe in the long run, right, applications will go SaaS, right? Like, if you remember the days that you used to install QuickBooks and ACT and stuff, like, on your data center, you used to run your own Exchange servers, like, those days are gone. I think these applications will become SaaS. But then the infrastructure building blocks for these SaaS, whether they are cloud or their own colo, I think that in the long run, it will be multi-cloud and colo all combined and all of them will look alike.But what I find from the customer's journey, the Old World and the New World is incompatible. When they shifted from bare metal to virtualization, they didn't have to rewrite their application. But this time, you have—it as a tectonic shift. Every single application, you have to rewrite. If you retrofit your application into the cloud, bad idea, right? It's going to cost you more and I would rather not do it.Even though cloud players are trying to make, like, the file and block, like, file system services [unintelligible 00:24:01] and stuff, they make it available ten times more expensive than object, but it's just to [integrate 00:24:07] some legacy applications, but it's still a bad idea to just move legacy applications there. But what I'm finding is that the cost, if you still run your infrastructure with enterprise IT mindset, you're out of luck. It's going to be super expensive and you're going to be left out modern infrastructure, because of the scale, it has to be treated as code. You have to run infrastructure with software engineers. And this cultural shift has to happen.And that's why cloud, in the long run, everyone will look like AWS and we always said that and it's now being becoming true. Like, Kubernetes and MinIO basically is leveling the ground everywhere. It's giving ECS and S3-like infrastructure inside AWS or outside AWS, everywhere. But what I find the challenging part is the cultural mindset. If they still have the old cultural mindset and if they want to adopt cloud, it's not going to work.You have to change the DNA, the culture, the mindset, everything. The best way to do it is go to the cloud-first. Adopt it, modernize your application, learn how to run and manage infrastructure, then ask economics question, the unit economics. Then you will find the answers yourself.Corey: On some level, that is the path forward. I feel like there's just a very long tail of systems that have been working and have been meeting the business objective. And well, we should go and refactor this because, I don't know, a couple of folks on a podcast said we should isn't the most compelling business case for doing a lot of it. It feels like these things sort of sit there until there is more upside than just cost-cutting to changing the way these things are built and run. That's the reason that people have been talking about getting off of mainframe since the '90s in some companies, and the mainframe is very much still there. It is so ingrained in the way that they do business, they have to rethink a lot of the architectural things that have sprung up around it.I'm not trying to shame anyone for the [laugh] state that their environment is in. I've never yet met a company that was super proud of its internal infrastructure. Everyone's always apologizing because it's a fire. But they think someone else has figured this out somewhere and it all runs perfectly. I don't think it exists.AB: What I am finding is that if you are running it the enterprise IT style, you are the one telling the application developers, here you go, you have this many VMs and then you have, like, a VMware license and, like, Jboss, like WebLogic, and like a SQL Server license, now you go build your application, you won't be able to do it. Because application developers talk about Kafka and Redis and like Kubernetes, they don't speak the same language. And that's when these developers go to the cloud and then finish their application, take it live from zero lines of code before it can procure infrastructure and provision it to these guys. The change that has to happen is how can you give what the developers want now that reverse journey is also starting. In the long run, everything will look alike, but what I'm finding is if you're running enterprise IT infrastructure, traditional infrastructure, they are ashamed of talking about it.But then you go to the cloud and then at scale, some parts of it, you want to move for—now you really know why you want to move. For economic reasons, like, particularly the data-intensive workloads becomes very expensive. And at that part, they go to a colo, but leave the applications on the cloud. So, it's the multi-cloud model, I think, is inevitable. The expensive pieces that where you can—if you are looking at yourself as hyperscaler and if your data is growing, if your business focus is data-centric business, parts of the data and data analytics, ML workloads will actually go out, if you're looking at unit economics. If all you are focused on productivity, stick to the cloud and you're still better off.Corey: I think that's a divide that gets lost sometimes. When people say, “Oh, we're going to move to the cloud to save money.” It's, “No you're not.” At a five-year time horizon, I would be astonished if that juice were worth the squeeze in almost any scenario. The reason you go for therefore is for a capability story when it's right for you.That also means that steady-state workloads that are well understood can often be run more economically in a place that is not the cloud. Everyone thinks for some reason that I tend to be its cloud or it's trash. No, I'm a big fan of doing things that are sensible and cloud is not the right answer for every workload under the sun. Conversely, when someone says, “Oh, I'm building a new e-commerce store,” or whatnot, “And I've decided cloud is not for me.” It's, “Ehh, you sure about that?”That sounds like you are smack-dab in the middle of the cloud use case. But all these things wind up acting as constraints and strategic objectives. And technology and single-vendor answers are rarely going to be a panacea the way that their sales teams say that they will.AB: Yeah. And I find, like, organizations that have SREs, DevOps, and software engineers running the infrastructure, they actually are ready to go multi-cloud or go to colo because they have the—exactly know. They have the containers and Kubernetes microservices expertise. If you are still on a traditional SAN, NAS, and VM architecture, go to cloud, rewrite your application.Corey: I think there's a misunderstanding in the ecosystem around what cloud repatriation actually looks like. Everyone claims it doesn't exist because there's basically no companies out there worth mentioning that are, “Yep, we've decided the cloud is terrible, we're taking everything out and we are going to data centers. The end.” In practice, it's individual workloads that do not make sense in the cloud. Sometimes just the back-of-the-envelope analysis means it's not going to work out, other times during proof of concepts, and other times, as things have hit a certain point of scale, we're in an individual workload being pulled back makes an awful lot of sense. But everything else is probably going to stay in the cloud and these companies don't want to wind up antagonizing the cloud providers by talking about it in public. But that model is very real.AB: Absolutely. Actually, what we are finding with the application side, like, parts of their overall ecosystem, right, within the company, they run on the cloud, but the data side, some of the examples, like, these are in the range of 100 to 500 petabytes. The 500-petabyte customer actually started at 500 petabytes and their plan is to go at exascale. And they are actually doing repatriation because for them, their customers, it's consumer-facing and it's extremely price sensitive, but when you're a consumer-facing, every dollar you spend counts. And if you don't do it at scale, it matters a lot, right? It will kill the business.Particularly last two years, the cost part became an important element in their infrastructure, they knew exactly what they want. They are thinking of themselves as hyperscalers. They get commodity—the same hardware, right, just a server with a bunch of [unintelligible 00:30:35] and network and put it on colo or even lease these boxes, they know what their demand is. Even at ten petabytes, the economics starts impacting. If you're processing it, the data side, we have several customers now moving to colo from cloud and this is the range we are talking about.They don't talk about it publicly because sometimes, like, you don't want to be anti-cloud, but I think for them, they're also not anti-cloud. They don't want to leave the cloud. The completely leaving the cloud, it's a different story. That's not the case. Applications stay there. Data lakes, data infrastructure, object store, particularly if it goes to a colo.Now, your applications from all the clouds can access this centralized—centralized, meaning that one object store you run on colo and the colos themselves have worldwide data centers. So, you can keep the data infrastructure in a colo, but applications can run on any cloud, some of them, surprisingly, that they have global customer base. And not all of them are cloud. Sometimes like some applications itself, if you ask what type of edge devices they are running, edge data centers, they said, it's a mix of everything. What really matters is not the infrastructure. Infrastructure in the end is CPU, network, and drive. It's a commodity. It's really the software stack, you want to make sure that it's containerized and easy to deploy, roll out updates, you have to learn the Facebook-Google style running SaaS business. That change is coming.Corey: It's a matter of time and it's a matter of inevitability. Now, nothing ever stays the same. Everything always inherently changes in the full sweep of things, but I'm pretty happy with where I see the industry going these days. I want to start seeing a little bit less centralization around one or two big companies, but I am confident that we're starting to see an awareness of doing these things for the right reason more broadly permeating.AB: Right. Like, the competition is always great for customers. They get to benefit from it. So, the decentralization is a path to bringing—like, commoditizing the infrastructure. I think the bigger picture for me, what I'm particularly happy is, for a long time we carried industry baggage in the infrastructure space.If no one wants to change, no one wants to rewrite application. As part of the equation, we carried the, like, POSIX baggage, like SAN and NAS. You can't even do [unintelligible 00:32:48] as a Service, NFS as a Service. It's too much of a baggage. All of that is getting thrown out. Like, the cloud players be helped the customers start with a clean slate. I think to me, that's the biggest advantage. And that now we have a clean slate, we can now go on a whole new evolution of the stack, keeping it simpler and everyone can benefit from this change.Corey: Before we wind up calling this an episode, I do have one last question for you. As I mentioned at the start, you're very much open-source, as in legitimate open-source, which means that anyone who wants to can grab an implementation and start running it. How do you, I guess make peace with the fact that the majority of your user base is not paying you? And I guess how do you get people to decide, “You know what? We like the cut of his jib. Let's give him some money.”AB: Mm-hm. Yeah, if I looked at it that way, right, I have both the [unintelligible 00:33:38], right, on the open-source side as well as the business. But I don't see them to be conflicting. If I run as a charity, right, like, I take donation. If you love the product, here is the donation box, then that doesn't work at all, right?I shouldn't take investor money and I shouldn't have a team because I have a job to pay their bills, too. But I actually find open-source to be incredibly beneficial. For me, it's about delivering value to the customer. If you pay me $5, I ought to make you feel $50 worth of value. The same software you would buy from a proprietary vendor, why would—if I'm a customer, same software equal in functionality, if its proprietary, I would actually prefer open-source and pay even more.But why are, really, customers paying me now and what's our view on open-source? I'm actually the free software guy. Free software and open-source are actually not exactly equal, right? We are the purest of the open-source community and we have strong views on what open-source means, right. That's why we call it free software. And free here means freedom, right? Free does not mean gratis, that free of cost. It's actually about freedom and I deeply care about it.For me it's a philosophy and it's a way of life. That's why I don't believe in open core and other models that holding—giving crippleware is not open-source, right? I give you some freedom but not all, right, like, it's it breaks the spirit. So, MinIO is a hundred percent open-source, but it's open-source for the open-source community. We did not take some community-developed code and then added commercial support on top.We built the product, we believed in open-source, we still believe and we will always believe. Because of that, we open-sourced our work. And it's open-source for the open-source community. And as you build applications that—like the AGPL license on the derivative works, they have to be compatible with AGPL because we are the creator. If you cannot open-source, you open-source your application derivative works, you can buy a commercial license from us. We are the creator, we can give you a dual license. That's how the business model works.That way, the open-source community completely benefits. And it's about the software freedom. There are customers, for them, open-source is good thing and they want to pay because it's open-source. There are some customers that they want to pay because they can't open-source their application and derivative works, so they pay. It's a happy medium; that way I actually find open-source to be incredibly beneficial.Open-source gave us that trust, like, more than adoption rate. It's not like free to download and use. More than that, the customers that matter, the community that matters because they can see the code and they can see everything we did, it's not because I said so, marketing and sales, you believe them, whatever they say. You download the product, experience it and fall in love with it, and then when it becomes an important part of your business, that's when they engage with us because they talk about license compatibility and data loss or a data breach, all that becomes important. Open-source isn't—I don't see that to be conflicting for business. It actually is incredibly helpful. And customers see that value in the end.Corey: I really want to thank you for being so generous with your time. If people want to learn more, where should they go?AB: I was on Twitter and now I think I'm spending more time on, maybe, LinkedIn. I think if they—they can send me a request and then we can chat. And I'm always, like, spending time with other entrepreneurs, architects, and engineers, sharing what I learned, what I know, and learning from them. There is also a [community open channel 00:37:04]. And just send me a mail at ab@min.io and I'm always interested in talking to our user base.Corey: And we will, of course, put links to that in the [show notes 00:37:12]. Thank you so much for your time. I appreciate it.AB: It's wonderful to be here.Corey: AB Periasamy, CEO and co-founder of MinIO. I'm Cloud Economist Corey Quinn and this has been a promoted guest episode of Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice that presumably will also include an angry, loud comment that we can access from anywhere because of shared APIs.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
EKS on Snow Devices On this episode of The Cloud Pod, the team highlights the new Graviton3-based images for users of AWS, new ways provided by Google to pay for its cloud services, the new partnership between Azure and the Finops Foundation, as well as Oracle's new cloud banking, and the automation of CCOE. A big thanks to this week's sponsor, Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. This week's highlights
AWS a annoncé 83 nouveautés depuis le 27 janvier. J'en ai épinglé 7 qui pourraient être intéressantes pour vous, les builders . On parle de visualisation de vos VPC dans la console et d'un changement sur les NAT Gateways. Pour les développeurs, on parle de SAM, de EC2 Mac, et du AWS SDK pour Java. On parle aussi d'une implémentation de référence pour vos pipelines de déployement d'application - c'est du costaud vous verrez. Je termine en parlant de l'espace, la nouvelle frontière pour le edge computing, vraiment edge puisque dans les satellites.
AWS a annoncé 83 nouveautés depuis le 27 janvier. J'en ai épinglé 7 qui pourraient être intéressantes pour vous, les builders . On parle de visualisation de vos VPC dans la console et d'un changement sur les NAT Gateways. Pour les développeurs, on parle de SAM, de EC2 Mac, et du AWS SDK pour Java. On parle aussi d'une implémentation de référence pour vos pipelines de déployement d'application - c'est du costaud vous verrez. Je termine en parlant de l'espace, la nouvelle frontière pour le edge computing, vraiment edge puisque dans les satellites.
Are you curious about using Rust to write AWS Lambda functions? In this episode of AWS BItes, we will be discussing the pros and cons of using Rust for serverless applications. With Rust, you'll be able to take advantage of its fast performance and memory efficiency. Plus, its programming model makes it easy to write safe and correct code. However, Rust is not a native runtime for Lambda, but rather a library that implements a custom runtime built and maintained by AWS. This custom runtime is built on top of the Tokio async runtime and even has a built-in middleware engine, which allows for easy hook-in of reusable logic and building your own middleware. But what if you're new to Rust? Don't worry, we'll also be walking you through the steps on how to write your first Lambda in Rust. From cargo-lambda to the serverless framework plugin for Rust, we'll be sharing different alternatives for building and deploying your Rust-based Lambda functions. So join us on this journey as we explore the exciting world of Rust and Lambda.
Se você não está usando Infra-as-a-code você não está usando cloud de verdade, isso é fato. Com o AWS SDK você integra sua aplicação à infra-estrutura da AWS e automatiza seus processos de deploy, implantação e operação dos seus serviços.Reduzir custos com a operação, aumentar a produtividade da sua equipe de desenvolvimento, melhorar a segurança e a performance da sua aplicação. Estas são só algumas vantagens do uso do SDK da AWS.Inscreva-se para o pré-lançamento do curso AWS:https://www.uminventorqualquer.com.br/curso-aws/Canal Wesley Milan: https://bit.ly/3LqiYwgInstagram: https://bit.ly/3tfzAj0LinkedIn: https://www.linkedin.com/in/wesleymilan/Podcast: https://bit.ly/3qa5JH1
RE:INVENT NOTICE Jonathan, Ryan and Justin will be live streaming the major keynotes starting Monday Night, followed by Adam's keynote on Tuesday, Swami's keynote on Wednesday and Wrap up our Re:Invent coverage with Werner's keynote on Thursday. Tune into our live stream here on the site or via Twitch/Twitter, etc. On The Cloud Pod this week, a new AWS region is open in Spain and NBA and Microsoft team up to transform fan experiences with cloud application modernization. Thank you to our sponsor, Foghorn Consulting, which provides top notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you're having trouble hiring? Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week. General News [0:04]
About KevinKevin Miller is currently the global General Manager for Amazon Simple Storage Service (S3), an object storage service that offers industry-leading scalability, data availability, security, and performance. Prior to this role, Kevin has had multiple leadership roles within AWS, including as the General Manager for Amazon S3 Glacier, Director of Engineering for AWS Virtual Private Cloud, and engineering leader for AWS Virtual Private Network and AWS Direct Connect. Kevin was also Technical Advisor to the Senior Vice President for AWS Utility Computing. Kevin is a graduate of Carnegie Mellon University with a Bachelor of Science in Computer Science.Links Referenced: snark.cloud/shirt: https://snark.cloud/shirt aws.amazon.com/s3: https://aws.amazon.com/s3 TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Datadog. Datadog is a SaaS monitoring and security platform that enables full-stack observability for modern infrastructure and applications at every scale. Datadog enables teams to see everything: dashboarding, alerting, application performance monitoring, infrastructure monitoring, UX monitoring, security monitoring, dog logos, and log management, in one tightly integrated platform. With 600-plus out-of-the-box integrations with technologies including all major cloud providers, databases, and web servers, Datadog allows you to aggregate all your data into one platform for seamless correlation, allowing teams to troubleshoot and collaborate together in one place, preventing downtime and enhancing performance and reliability. Get started with a free 14-day trial by visiting datadoghq.com/screaminginthecloud, and get a free t-shirt after installing the agent.Corey: Managing shards. Maintenance windows. Overprovisioning. ElastiCache bills. I know, I know. It's a spooky season and you're already shaking. It's time for caching to be simpler. Momento Serverless Cache lets you forget the backend to focus on good code and great user experiences. With true autoscaling and a pay-per-use pricing model, it makes caching easy. No matter your cloud provider, get going for free at gomomento.co/screaming. That's GO M-O-M-E-N-T-O dot co slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Right now, as I record this, we have just kicked off our annual charity t-shirt fundraiser. This year's shirt showcases S3 as the eighth wonder of the world. And here to either defend or argue the point—we're not quite sure yet—is Kevin Miller, AWS's vice president and general manager for Amazon S3. Kevin, thank you for agreeing to suffer the slings and arrows that are no doubt going to be interpreted, misinterpreted, et cetera, for the next half hour or so.Kevin: Oh, Corey, thanks for having me. And happy to do that, and really flattered for you to be thinking about S3 in this way. So more than happy to chat with you.Corey: It's absolutely one of those services that is foundational to the cloud. It was the first AWS service that was put into general availability, although the beta folks are going to argue back and forth about no, no, that was SQS instead. I feel like now that Mai-Lan handles both SQS and S3 as part of her portfolio, she is now the final arbiter of that. I'm sure that's an argument for a future day. But it's impossible to imagine cloud without S3.Kevin: I definitely think that's true. It's hard to imagine cloud, actually, with many of our foundational services, including SQS, of course, but we are—yes, we were the first generally available service with S3. And pretty happy with our anniversary being Pi Day, 3/14.Corey: I'm also curious, your own personal trajectory has been not necessarily what folks would expect. You were the general manager of Amazon Glacier, and now you're the general manager and vice president of S3. So, I've got to ask, because there are conflicting reports on this depending upon what angle you look at, are Glacier and S3 the same thing?Kevin: Yes, I was the general manager for S3 Glacier prior to coming over to S3 proper, and the answer is no, they are not the same thing. We certainly have a number of technologies where we're able to use those technologies both on S3 and Glacier, but there are certainly a number of things that are very distinct about Glacier and give us that ability to hit the ultra-low price points that we do for Glacier Deep Archive being as low as $1 per terabyte-month. And so, that definitely—there's a lot of actual ingenuity up and down the stack, from hardware to software, everywhere in between, to really achieve that with Glacier. But then there's other spots where S3 and Glacier have very similar needs, and then, of course, today many customers use Glacier through S3 as a storage class in S3, and so that's a great way to do that. So, there's definitely a lot of shared code, but certainly, when you get into it, there's [unintelligible 00:04:59] to both of them.Corey: I ran a number of obnoxiously detailed financial analyses, and they all came away with, unless you have a very specific very nuanced understanding of your data lifecycle and/or it is less than 30 or 60 days depending upon a variety of different things, the default S3 storage class you should be using for virtually anything is Intelligent Tiering. That is my purely economic analysis of it. Do you agree with that? Disagree with that? And again, I understand that all of these storage classes are like your children, and I am inviting you to tell me which one of them is your favorite, but I'm absolutely prepared to do that.Kevin: Well, we love Intelligent Tiering because it is very simple; customers are able to automatically save money using Intelligent Tiering for data that's not being frequently accessed. And actually, since we launched it a few years ago, we've already saved customers more than $250 million using Intelligent Tiering. So, I would say today, it is our default recommendation in almost every case. I think that the cases where we would recommend another storage class as the primary storage class tend to be specific to the use case where—and particularly for use cases where customers really have a good understanding of the access patterns. And we saw some customers do for their certain dataset, they know that it's going to be heavily accessed for a fixed period of time, or this data is actually for archival, it'll never be accessed, or very rarely if ever access, just maybe in an emergency.And those kinds of use cases, I think actually, customers are probably best to choose one of the specific storage classes where they're, sort of, paying that the lower cost from day one. But again, I would say for the vast majority of cases that we see, the data access patterns are unpredictable and customers like the flexibility of being able to very quickly retrieve the data if they decide they need to use it. But in many cases, they'll save a lot of money as the data is not being accessed, and so, Intelligent Tiering is a great choice for those cases.Corey: I would take it a step further and say that even when customers believe that they are going to be doing a deeper analysis and they have a better understanding of their data flow patterns than Intelligent Tiering would, in practice, I see that they rarely do anything about it. It's one of those things where they're like, “Oh, yeah, we're going to set up our own lifecycle policies real soon now,” whereas, just switch it over to Intelligent Tiering and never think about it again. People's time is worth so much more than the infrastructure they're working on in almost every case. It doesn't seem to make a whole lot of sense unless you have a very intentioned, very urgent reason to go and do that stuff by hand in most cases.Kevin: Yeah, that's right. I think I agree with you, Corey. And certainly, that is the recommendation we lead with customers.Corey: In previous years, our charity t-shirt has focused on other areas of AWS, and one of them was based upon a joke that I've been telling for a while now, which is that the best database in the world is Route 53 and storing TXT records inside of it. I don't know if I ever mentioned this to you or not, but the first iteration of that joke was featuring around S3. The challenge that I had with it is that S3 Select is absolutely a thing where you can query S3 with SQL which I don't see people doing anymore because Athena is the easier, more, shall we say, well-articulated version of all of that. And no, no, that joke doesn't work because it's actually true. You can use S3 as a database. Does that statement fill you with dread? Regret? Am I misunderstanding something? Or are you effectively running a giant subversive database?Kevin: Well, I think that certainly when most customers think about a database, they think about a collection of technology that's applied for given problems, and so I wouldn't count S3 as providing the whole range of functionality that would really make up a database. But I think that certainly a lot of the primitives and S3 Select as a great example of a primitive are available in S3. And we're looking at adding, you know, additional primitives going forward to make it possible to, you know, to build a database around S3. And as you see, other AWS services have done that in many ways. For example, obviously with Amazon Redshift having a lot of capability now to just directly access and use data in S3 and make that a super seamless so that you can then run data warehousing type queries on top of S3 and on top of your other datasets.So, I certainly think it's a great building block. And one other thing I would actually just say that you may not know, Corey, is that one of the things over the last couple of years we've been doing a lot more with S3 is actually working to directly contribute improvements to open-source connector software that uses S3, to make available automatically some of the performance improvements that can be achieved either using both the AWS SDK, and also using things like S3 Select. So, we started with a few of those things with Select; you're going to see more of that coming, most likely. And some of that, again, the idea there as you may not even necessarily know you're using Select, but when we can identify that it will improve performance, we're looking to be able to contribute those kinds of improvements directly—or we are contributing those directly to those open-source packages. So, one thing I would definitely recommend customers and developers do is have a capability of sort of keeping that software up-to-date because although it might seem like those are sort of one-and-done kind of software integrations, there's actually almost continuous improvement now going on, and around things like that capability, and then others we come out with.Corey: What surprised me is just how broadly S3 has been adopted by a wide variety of different clients' software packages out there. Back when I was running production environments in anger, I distinctly remember in one Ubuntu environment, we wound up installing a specific package that was designed to teach apt how to retrieve packages and its updates from S3, which was awesome. I don't see that anymore, just because it seems that it is so easy to do it now, just with the native features that S3 offers, as well as an awful lot of software under the hood has learned to directly recognize S3 as its own thing, and can react accordingly.Kevin: And just do the right thing. Exactly. No, we certainly see a lot of that. So that's, you know—I mean, obviously making that simple for end customers to use and achieve what they're trying to do, that's the whole goal.Corey: It's always odd to me when I'm talking to one of my clients who is looking to understand and optimize their AWS bill to see outliers in either direction when it comes to S3 itself. When they're driving large S3 bills as in a majority of their spend, it's, okay, that is very interesting. Let's dive into that. But almost more interesting to me is when it is effectively not being used at all. When, oh, we're doing everything with EBS volumes or EFS.And again, those are fine services. I don't have any particular problem with them anymore, but the problem I have is that the cloud long ago took what amounts to an economic vote. There's a tax savings for storing data in an object store the way that you—and by extension, most of your competitors—wind up pricing this, versus the idea of on a volume basis where you have to pre-provision things, you don't get any form of durability that extends beyond the availability zone boundary. It just becomes an awful lot of, “Well, you could do it this way. But it gets really expensive really quickly.”It just feels wild to me that there is that level of variance between S3 just sort of raw storage basis, economically, as well as then just the, frankly, ridiculous levels of durability and availability that you offer on top of that. How did you get there? Was the service just mispriced at the beginning? Like oh, we dropped to zero and probably should have put that in there somewhere.Kevin: Well, no, I wouldn't call it mispriced. I think that the S3 came about when we took a—we spent a lot of time looking at the architecture for storage systems, and knowing that we wanted a system that would provide the durability that comes with having three completely independent data centers and the elasticity and capability where, you know, customers don't have to provision the amount of storage they want, they can simply put data and the system keeps growing. And they can also delete data and stop paying for that storage when they're not using it. And so, just all of that investment and sort of looking at that architecture holistically led us down the path to where we are with S3.And we've definitely talked about this. In fact, in Peter's keynote at re:Invent last year, we talked a little bit about how the system is designed under the hood, and one of the thing you realize is that S3 gets a lot of the benefits that we do by just the overall scale. The fact that it is—I think the stat is that at this point more than 10,000 customers have data that's stored on more than a million hard drives in S3. And that's how you get the scale and the capability to do is through massive parallelization. Where customers that are, you know, I would say building more traditional architectures, those are inherently typically much more siloed architectures with a relatively small-scale overall, and it ends up with a lot of resource that's provisioned at small-scale in sort of small chunks with each resource, that you never get to that scale where you can start to take advantage of the some is more than the greater of the parts.And so, I think that's what the recognition was when we started out building S3. And then, of course, we offer that as an API on top of that, where customers can consume whatever they want. That is, I think, where S3, at the scale it operates, is able to do certain things, including on the economics, that are very difficult or even impossible to do at a much smaller scale.Corey: One of the more egregious clown-shoe statements that I hear from time to time has been when people will come to me and say, “We've built a competitor to S3.” And my response is always one of those, “Oh, this should be good.” Because when people say that, they generally tend to be focusing on one or maybe two dimensions that doesn't work for a particular use case as well as it could. “Okay, what was your story around why this should be compared to S3?” “Well, it's an object store. It has full S3 API compatibility.” “Does it really because I have to say, there are times where I'm not entirely convinced that S3 itself has full compatibility with the way that its API has been documented.”And there's an awful lot of magic that goes into this too. “Okay, great. You're running an S3 competitor. Great. How many buildings does it live in?” Like, “Well, we have a problem with the s at the end of that word.” It's, “Okay, great. If it fits on my desk, it is not a viable S3 competitor. If it fits in a single zip code, it is probably not a viable S3 competitor.” Now, can it be an object store? Absolutely. Does it provide a new interface to some existing data someone might have? Sure why not. But I think that, oh, it's S3 compatible, is something that gets tossed around far too lightly by folks who don't really understand what it is that drives S3 and makes it special.Kevin: Yeah, I mean, I would say certainly, there's a number of other implementations of the S3 API, and frankly we're flattered that customers recognize and our competitors and others recognize the simplicity of the API and go about implementing it. But to your point, I think that there's a lot more; it's not just about the API, it's really around everything surrounding S3 from, as you mentioned, the fact that the data in S3 is stored in three independent availability zones, all of which that are separated by kilometers from each other, and the resilience, the automatic failover, and the ability to withstand an unlikely impact to one of those facilities, as well as the scalability, and you know, the fact that we put a lot of time and effort into making sure that the service continues scaling with our customers need. And so, I think there's a lot more that goes into what is S3. And oftentimes just in a straight-up comparison, it's sort of purely based on just the APIs and generally a small set of APIs, in addition to those intangibles around—or not intangibles, but all of the ‘-ilities,' right, the elasticity and the durability, and so forth that I just talked about. In addition to all that also, you know, certainly what we're seeing for customers is as they get into the petabyte and tens of petabytes, hundreds of petabytes scale, their need for the services that we provide to manage that storage, whether it's lifecycle and replication, or things like our batch operations to help update and to maintain all the storage, those become really essential to customers wrapping their arms around it, as well as visibility, things like Storage Lens to understand, what storage do I have? Who's using it? How is it being used?And those are all things that we provide to help customers manage at scale. And certainly, you know, oftentimes when I see claims around S3 compatibility, a lot of those advanced features are nowhere to be seen.Corey: I also want to call out that a few years ago, Mai-Lan got on stage and talked about how, to my recollection, you folks have effectively rebuilt S3 under the hood into I think it was 235 distinct microservices at the time. There will not be a quiz on numbers later, I'm assuming. But what was wild to me about that is having done that for services that are orders of magnitude less complex, it absolutely is like changing the engine on a car without ever slowing down on the highway. Customers didn't know that any of this was happening until she got on stage and announced it. That is wild to me. I would have said before this happened that there was no way that would have been possible except it clearly was. I have to ask, how did you do that in the broad sense?Kevin: Well, it's true. A lot of the underlying infrastructure that's been part of S3, both hardware and software is, you know, you wouldn't—if someone from S3 in 2006 came and looked at the system today, they would probably be very disoriented in terms of understanding what was there because so much of it has changed. To answer your question, the long and short of it is a lot of testing. In fact, a lot of novel testing most recently, particularly with the use of formal logic and what we call automated reasoning. It's also something we've talked a fair bit about in re:Invent.And that is essentially where you prove the correctness of certain algorithms. And we've used that to spot some very interesting, the one-in-a-trillion type cases that S3 scale happens regularly, that you have to be ready for and you have to know how the system reacts, even in all those cases. I mean, I think one of our engineers did some calculations that, you know, the number of potential states for S3, sort of, exceeds the number of atoms in the universe or something so crazy. But yet, using methods like automated reasoning, we can test that state space, we can understand what the system will do, and have a lot of confidence as we begin to swap, you know, pieces of the system.And of course, nothing in S3 scale happens instantly. It's all, you know, I would say that for a typical engineering effort within S3, there's a certain amount of effort, obviously, in making the change or in preparing the new software, writing the new software and testing it, but there's almost an equal amount of time that goes into, okay, and what is the process for migrating from System A to System B, and that happens over a timescale of months, if not years, in some cases. And so, there's just a lot of diligence that goes into not just the new systems, but also the process of, you know, literally, how do I swap that engine on the system. So, you know, it's a lot of really hard working engineers that spent a lot of time working through these details every day.Corey: I still view S3 through the lens of it is one of the easiest ways in the world to wind up building a static web server because you basically stuff the website files into a bucket and then you check a box. So, it feels on some level though, that it is about as accurate as saying that S3 is a database. It can be used or misused or pressed into service in a whole bunch of different use cases. What have you seen from customers that has, I guess, taught you something you didn't expect to learn about your own service?Kevin: Oh, I'd say we have those [laugh] meetings pretty regularly when customers build their workloads and have unique patterns to it, whether it's the type of data they're retrieving and the access pattern on the data. You know, for example, some customers will make heavy use of our ability to do [ranged gets 00:22:47] on files and [unintelligible 00:22:48] objects. And that's pretty good capability, but that can be one where that's very much dependent on the type of file, right, certain files have structure, as far as you know, a header or footer, and that data is being accessed in a certain order. Oftentimes, those may also be multi-part objects, and so making use of the multi-part features to upload different chunks of a file in parallel. And you know, also certainly when customers get into things like our batch operations capability where they can literally write a Lambda function and do what they want, you know, we've seen some pretty interesting use cases where customers are running large-scale operations across, you know, billions, sometimes tens of billions of objects, and this can be pretty interesting as far as what they're able to do with them.So, for something is sort of what you might—you know, as simple and basics, in some sense, of GET and PUT API, just all the capability around it ends up being pretty interesting as far as how customers apply it and the different workloads they run on it.Corey: So, if you squint hard enough, what I'm hearing you tell me is that I can view all of this as, “Oh, yeah. S3 is also compute.” And it feels like that as a fast-track to getting a question wrong on one of the certification exams. But I have to ask, from your point of view, is S3 storage? And whether it's yes or no, what gets you excited about the space that it's in?Kevin: Yeah well, I would say S3 is not compute, but we have some great compute services that are very well integrated with S3, which excites me as well as we have things like S3 Object Lambda, where we actually handle that integration with Lambda. So, you're writing Lambda functions, we're executing them on the GET path. And so, that's a pretty exciting feature for me. But you know, to sort of take a step back, what excites me is I think that customers around the world, in every industry, are really starting to recognize the value of data and data at large scale. You know, I think that actually many customers in the world have terabytes or more of data that sort of flows through their fingers every day that they don't even realize.And so, as customers realize what data they have, and they can capture and then start to analyze and make ultimately make better business decisions that really help drive their top line or help them reduce costs, improve costs on whether it's manufacturing or, you know, other things that they're doing. That's what really excites me is seeing those customers take the raw capability and then apply it to really just to transform how they not just how their business works, but even how they think about the business. Because in many cases, transformation is not just a technical transformation, it's people and cultural transformation inside these organizations. And that's pretty cool to see as it unfolds.Corey: One of the more interesting things that I've seen customers misunderstand, on some level, has been a number of S3 releases that focus around, “Oh, this is for your data lake.” And I've asked customers about that. “So, what's your data lake strategy?” “Well, we don't have one of those.” “You have, like, eight petabytes and climbing in S3? What do you call that?” It's like, “Oh, yeah, that's just a bunch of buckets we dump things into. Some are logs of our assets and the rest.” It's—Kevin: Right.Corey: Yeah, it feels like no one thinks of themselves as having anything remotely resembling a structured place for all of the data that accumulates at a company.Kevin: Mm-hm.Corey: There is an evolution of people learning that oh, yeah, this is in fact, what it is that we're doing, and this thing that they're talking about does apply to us. But it almost feels like a customer communication challenge, just because, I don't know about you, but with my legacy AWS account, I have dozens of buckets in there that I don't remember what the heck they're for. Fortunately, you folks don't charge by the bucket, so I can smile, nod, remain blissfully ignorant, but it does make me wonder from time to time.Kevin: Yeah, no, I think that what you hear there is actually pretty consistent with what the reality is for a lot of customers, which is in distributed organizations, I think that's bound to happen, you have different teams that are working to solve problems, and they are collecting data to analyze, they're creating result datasets and they're storing those datasets. And then, of course, priorities can shift, and you know, and there's not necessarily the day-to-day management around data that we might think would be expected. I feel [we 00:26:56] sort of drew an architecture on a whiteboard. And so, I think that's the reality we are in. And we will be in, largely forever.I mean, I think that at a smaller-scale, that's been happening for years. So, I think that, one, I think that there's a lot of capability just being in the cloud. At the very least, you can now start to wrap your arms around it, right, where used to be that it wasn't even possible to understand what all that data was because there's no way to centrally inventory it well. In AWS with S3, with inventory reports, you can get a list of all your storage and we are going to continue to add capability to help customers get their arms around what they have, first off; understand how it's being used—that's where things like Storage Lens really play a big role in understanding exactly what data is being accessed and not. We're definitely listening to customers carefully around this, and I think when you think about broader data management story, I think that's a place that we're spending a lot of time thinking right now about how do we help customers get their arms around it, make sure that they know what's the categorization of certain data, do I have some PII lurking here that I need to be very mindful of?And then how do I get to a world where I'm—you know, I won't say that it's ever going to look like the perfect whiteboard picture you might draw on the wall. I don't think that's really ever achievable, but I think certainly getting to a point where customers have a real solid understanding of what data they have and that the right controls are in place around all that data, yeah, I think that's directionally where I see us heading.Corey: As you look around how far the service has come, it feels like, on some level, that there were some, I guess, I don't want to say missteps, but things that you learned as you went along. Like, back when the service was in beta, for example, there was no per-request charge. To my understanding that was changed, in part because people were trying to use it as a file system, and wow, that suddenly caused a tremendous amount of load on some of the underlying systems. You originally launched with a BitTorrent endpoint as an option so that people could download through peer-to-peer approaches for large datasets and turned out that wasn't really the way the internet evolved, either. And I'm curious, if you were to have to somehow build this off from scratch, are there any other significant changes you would make in how the service was presented to customers in how people talked about it in the early days? Effectively given a mulligan, what would you do differently?Kevin: Well, I don't know, Corey, I mean, just given where it's grown to in macro terms, you know, I definitely would be worried taking a mulligan, you know, that I [laugh] would change the sort of the overarching trajectory. Certainly, I think there's a few features here and there where, for whatever reason, it was exciting at the time and really spoke to what customers at the time were thinking, but over time, you know, sort of quickly those needs move to something a little bit different. And, you know, like you said things like the BitTorrent support is one where, at some level, it seems like a great technical architecture for the internet, but certainly not something that we've seen dominate in the way things are done. Instead, you know, we've largely kind of have a world where there's a lot of caching layers, but it still ends up being largely client-server kind of connections. So, I don't think I would do a—I certainly wouldn't do a mulligan on any of the major functionality, and I think, you know, there's a few things in the details where obviously, we've learned what really works in the end. I think we learned that we wanted bucket names to really strictly conform to rules for DNS encoding. So, that was the change that was made at some point. And we would tweak that, but no major changes, certainly.Corey: One subject of some debate while we were designing this year's charity t-shirt—which, incidentally, if you're listening to this, you can pick up for yourself at snark.cloud/shirt—was the is S3 itself dependent upon S3? Because we know that every other service out there is as well, but it is interesting to come up with an idea of, “Oh, yeah. We're going to launch a whole new isolated region of S3 without S3 to lean on.” That feels like it's an almost impossible bootstrapping problem.Kevin: Well, S3 is not dependent on S3 to come up, and it's certainly a critical dependency tree that we look at and we track and make sure that we'd like to have an acyclic graph as we look at dependencies.Corey: That is such a sophisticated way to say what I learned the hard way when I was significantly younger and working in production environments: don't put the DNS servers needed to boot the hypervisor into VMs that require a working hypervisor. It's one of those oh, yeah, in hindsight, that makes perfect sense, but you learn it right after that knowledge really would have been useful.Kevin: Yeah, absolutely. And one of the terms we use for that, as well as is the idea of static stability, or that's one of the techniques that can really help with isolating a dependency is what we call static stability. We actually have an article about that in the Amazon Builder Library, which there's actually a bunch of really good articles in there from very experienced operations-focused engineers in AWS. So, static stability is one of those key techniques, but other techniques—I mean, just pure minimization of dependencies is one. And so, we were very, very thoughtful about that, particularly for that core layer.I mean, you know, when you talk about S3 with 200-plus microservices, or 235-plus microservices, I would say not all of those services are critical for every single request. Certainly, a small subset of those are required for every request, and then other services actually help manage and scale the kind of that inner core of services. And so, we look at dependencies on a service by service basis to really make sure that inner core is as minimized as possible. And then the outer layers can start to take some dependencies once you have that basic functionality up.Corey: I really want to thank you for being as generous with your time as you have been. If people want to learn more about you and about S3 itself, where should they go—after buying a t-shirt, of course.Kevin: Well, certainly buy the t-shirt. First, I love the t-shirts and the charity that you work with to do that. Obviously, for S3, it's aws.amazon.com/s3. And you can actually learn more about me. I have some YouTube videos, so you can search for me on YouTube and kind of get a sense of myself.Corey: We will put links to that into the show notes, of course. Thank you so much for being so generous with your time. I appreciate it.Kevin: Absolutely. Yeah. Glad to spend some time. Thanks for the questions, Corey.Corey: Kevin Miller, vice president and general manager for Amazon S3. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, ignorant comment talking about how your S3 compatible service is going to blow everyone's socks off when it fails.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
[00:00.000 --> 00:04.560] All right, so I'm here with 52 weeks of AWS[00:04.560 --> 00:07.920] and still continuing to do developer certification.[00:07.920 --> 00:11.280] I'm gonna go ahead and share my screen here.[00:13.720 --> 00:18.720] All right, so we are on Lambda, one of my favorite topics.[00:19.200 --> 00:20.800] Let's get right into it[00:20.800 --> 00:24.040] and talk about how to develop event-driven solutions[00:24.040 --> 00:25.560] with AWS Lambda.[00:26.640 --> 00:29.440] With Serverless Computing, one of the things[00:29.440 --> 00:32.920] that it is going to do is it's gonna change[00:32.920 --> 00:36.000] the way you think about building software[00:36.000 --> 00:39.000] and in a traditional deployment environment,[00:39.000 --> 00:42.040] you would configure an instance, you would update an OS,[00:42.040 --> 00:45.520] you'd install applications, build and deploy them,[00:45.520 --> 00:47.000] load balance.[00:47.000 --> 00:51.400] So this is non-cloud native computing and Serverless,[00:51.400 --> 00:54.040] you really only need to focus on building[00:54.040 --> 00:56.360] and deploying applications and then monitoring[00:56.360 --> 00:58.240] and maintaining the applications.[00:58.240 --> 01:00.680] And so with really what Serverless does[01:00.680 --> 01:05.680] is it allows you to focus on the code for the application[01:06.320 --> 01:08.000] and you don't have to manage the operating system,[01:08.000 --> 01:12.160] the servers or scale it and really is a huge advantage[01:12.160 --> 01:14.920] because you don't have to pay for the infrastructure[01:14.920 --> 01:15.920] when the code isn't running.[01:15.920 --> 01:18.040] And that's really a key takeaway.[01:19.080 --> 01:22.760] If you take a look at the AWS Serverless platform,[01:22.760 --> 01:24.840] there's a bunch of fully managed services[01:24.840 --> 01:26.800] that are tightly integrated with Lambda.[01:26.800 --> 01:28.880] And so this is another huge advantage of Lambda,[01:28.880 --> 01:31.000] isn't necessarily that it's the fastest[01:31.000 --> 01:33.640] or it has the most powerful execution,[01:33.640 --> 01:35.680] it's the tight integration with the rest[01:35.680 --> 01:39.320] of the AWS platform and developer tools[01:39.320 --> 01:43.400] like AWS Serverless application model or AWS SAM[01:43.400 --> 01:45.440] would help you simplify the deployment[01:45.440 --> 01:47.520] of Serverless applications.[01:47.520 --> 01:51.960] And some of the services include Amazon S3,[01:51.960 --> 01:56.960] Amazon SNS, Amazon SQS and AWS SDKs.[01:58.600 --> 02:03.280] So in terms of Lambda, AWS Lambda is a compute service[02:03.280 --> 02:05.680] for Serverless and it lets you run code[02:05.680 --> 02:08.360] without provisioning or managing servers.[02:08.360 --> 02:11.640] It allows you to trigger your code in response to events[02:11.640 --> 02:14.840] that you would configure like, for example,[02:14.840 --> 02:19.200] dropping something into a S3 bucket like that's an image,[02:19.200 --> 02:22.200] Nevel Lambda that transcribes it to a different format.[02:23.080 --> 02:27.200] It also allows you to scale automatically based on demand[02:27.200 --> 02:29.880] and it will also incorporate built-in monitoring[02:29.880 --> 02:32.880] and logging with AWS CloudWatch.[02:34.640 --> 02:37.200] So if you look at AWS Lambda,[02:37.200 --> 02:39.040] some of the things that it does[02:39.040 --> 02:42.600] is it enables you to bring in your own code.[02:42.600 --> 02:45.280] So the code you write for Lambda isn't written[02:45.280 --> 02:49.560] in a new language, you can write things[02:49.560 --> 02:52.600] in tons of different languages for AWS Lambda,[02:52.600 --> 02:57.600] Node, Java, Python, C-sharp, Go, Ruby.[02:57.880 --> 02:59.440] There's also custom run time.[02:59.440 --> 03:03.880] So you could do Rust or Swift or something like that.[03:03.880 --> 03:06.080] And it also integrates very deeply[03:06.080 --> 03:11.200] with other AWS services and you can invoke[03:11.200 --> 03:13.360] third-party applications as well.[03:13.360 --> 03:18.080] It also has a very flexible resource and concurrency model.[03:18.080 --> 03:20.600] And so Lambda would scale in response to events.[03:20.600 --> 03:22.880] So you would just need to configure memory settings[03:22.880 --> 03:24.960] and AWS would handle the other details[03:24.960 --> 03:28.720] like the CPU, the network, the IO throughput.[03:28.720 --> 03:31.400] Also, you can use the Lambda,[03:31.400 --> 03:35.000] AWS Identity and Access Management Service or IAM[03:35.000 --> 03:38.560] to grant access to what other resources you would need.[03:38.560 --> 03:41.200] And this is one of the ways that you would control[03:41.200 --> 03:44.720] the security of Lambda is you have really guardrails[03:44.720 --> 03:47.000] around it because you would just tell Lambda,[03:47.000 --> 03:50.080] you have a role that is whatever it is you need Lambda to do,[03:50.080 --> 03:52.200] talk to SQS or talk to S3,[03:52.200 --> 03:55.240] and it would specifically only do that role.[03:55.240 --> 04:00.240] And the other thing about Lambda is that it has built-in[04:00.560 --> 04:02.360] availability and fault tolerance.[04:02.360 --> 04:04.440] So again, it's a fully managed service,[04:04.440 --> 04:07.520] it's high availability and you don't have to do anything[04:07.520 --> 04:08.920] at all to use that.[04:08.920 --> 04:11.600] And one of the biggest things about Lambda[04:11.600 --> 04:15.000] is that you only pay for what you use.[04:15.000 --> 04:18.120] And so when the Lambda service is idle,[04:18.120 --> 04:19.480] you don't have to actually pay for that[04:19.480 --> 04:21.440] versus if it's something else,[04:21.440 --> 04:25.240] like even in the case of a Kubernetes-based system,[04:25.240 --> 04:28.920] still there's a host machine that's running Kubernetes[04:28.920 --> 04:31.640] and you have to actually pay for that.[04:31.640 --> 04:34.520] So one of the ways that you can think about Lambda[04:34.520 --> 04:38.040] is that there's a bunch of different use cases for it.[04:38.040 --> 04:40.560] So let's start off with different use cases,[04:40.560 --> 04:42.920] web apps, I think would be one of the better ones[04:42.920 --> 04:43.880] to think about.[04:43.880 --> 04:46.680] So you can combine AWS Lambda with other services[04:46.680 --> 04:49.000] and you can build powerful web apps[04:49.000 --> 04:51.520] that automatically scale up and down.[04:51.520 --> 04:54.000] And there's no administrative effort at all.[04:54.000 --> 04:55.160] There's no backups necessary,[04:55.160 --> 04:58.320] no multi-data center redundancy, it's done for you.[04:58.320 --> 05:01.400] Backends, so you can build serverless backends[05:01.400 --> 05:05.680] that lets you handle web, mobile, IoT,[05:05.680 --> 05:07.760] third-party applications.[05:07.760 --> 05:10.600] You can also build those backends with Lambda,[05:10.600 --> 05:15.400] with API Gateway, and you can build applications with them.[05:15.400 --> 05:17.200] In terms of data processing,[05:17.200 --> 05:19.840] you can also use Lambda to run code[05:19.840 --> 05:22.560] in response to a trigger, change in data,[05:22.560 --> 05:24.440] shift in system state,[05:24.440 --> 05:27.360] and really all of AWS for the most part[05:27.360 --> 05:29.280] is able to be orchestrated with Lambda.[05:29.280 --> 05:31.800] So it's really like a glue type service[05:31.800 --> 05:32.840] that you're able to use.[05:32.840 --> 05:36.600] Now chatbots, that's another great use case for it.[05:36.600 --> 05:40.760] Amazon Lex is a service for building conversational chatbots[05:42.120 --> 05:43.560] and you could use it with Lambda.[05:43.560 --> 05:48.560] Amazon Lambda service is also able to be used[05:50.080 --> 05:52.840] with voice IT automation.[05:52.840 --> 05:55.760] These are all great use cases for Lambda.[05:55.760 --> 05:57.680] In fact, I would say it's kind of like[05:57.680 --> 06:01.160] the go-to automation tool for AWS.[06:01.160 --> 06:04.160] So let's talk about how Lambda works next.[06:04.160 --> 06:06.080] So the way Lambda works is that[06:06.080 --> 06:09.080] there's a function and there's an event source,[06:09.080 --> 06:10.920] and these are the core components.[06:10.920 --> 06:14.200] The event source is the entity that publishes events[06:14.200 --> 06:19.000] to AWS Lambda, and Lambda function is the code[06:19.000 --> 06:21.960] that you're gonna use to process the event.[06:21.960 --> 06:25.400] And AWS Lambda would run that Lambda function[06:25.400 --> 06:29.600] on your behalf, and a few things to consider[06:29.600 --> 06:33.840] is that it really is just a little bit of code,[06:33.840 --> 06:35.160] and you can configure the triggers[06:35.160 --> 06:39.720] to invoke a function in response to resource lifecycle events,[06:39.720 --> 06:43.680] like for example, responding to incoming HTTP,[06:43.680 --> 06:47.080] consuming events from a queue, like in the case of SQS[06:47.080 --> 06:48.320] or running it on a schedule.[06:48.320 --> 06:49.760] So running it on a schedule is actually[06:49.760 --> 06:51.480] a really good data engineering task, right?[06:51.480 --> 06:54.160] Like you could run it periodically to scrape a website.[06:55.120 --> 06:58.080] So as a developer, when you create Lambda functions[06:58.080 --> 07:01.400] that are managed by the AWS Lambda service,[07:01.400 --> 07:03.680] you can define the permissions for the function[07:03.680 --> 07:06.560] and basically specify what are the events[07:06.560 --> 07:08.520] that would actually trigger it.[07:08.520 --> 07:11.000] You can also create a deployment package[07:11.000 --> 07:12.920] that includes application code[07:12.920 --> 07:17.000] in any dependency or library necessary to run the code,[07:17.000 --> 07:19.200] and you can also configure things like the memory,[07:19.200 --> 07:23.200] you can figure the timeout, also configure the concurrency,[07:23.200 --> 07:25.160] and then when your function is invoked,[07:25.160 --> 07:27.640] Lambda will provide a runtime environment[07:27.640 --> 07:30.080] based on the runtime and configuration options[07:30.080 --> 07:31.080] that you selected.[07:31.080 --> 07:36.080] So let's talk about models for invoking Lambda functions.[07:36.360 --> 07:41.360] In the case of an event source that invokes Lambda function[07:41.440 --> 07:43.640] by either a push or a pool model,[07:43.640 --> 07:45.920] in the case of a push, it would be an event source[07:45.920 --> 07:48.440] directly invoking the Lambda function[07:48.440 --> 07:49.840] when the event occurs.[07:50.720 --> 07:53.040] In the case of a pool model,[07:53.040 --> 07:56.960] this would be putting the information into a stream or a queue,[07:56.960 --> 07:59.400] and then Lambda would pull that stream or queue,[07:59.400 --> 08:02.800] and then invoke the function when it detects an events.[08:04.080 --> 08:06.480] So a few different examples would be[08:06.480 --> 08:11.280] that some services can actually invoke the function directly.[08:11.280 --> 08:13.680] So for a synchronous invocation,[08:13.680 --> 08:15.480] the other service would wait for the response[08:15.480 --> 08:16.320] from the function.[08:16.320 --> 08:20.680] So a good example would be in the case of Amazon API Gateway,[08:20.680 --> 08:24.800] which would be the REST-based service in front.[08:24.800 --> 08:28.320] In this case, when a client makes a request to your API,[08:28.320 --> 08:31.200] that client would get a response immediately.[08:31.200 --> 08:32.320] And then with this model,[08:32.320 --> 08:34.880] there's no built-in retry in Lambda.[08:34.880 --> 08:38.040] Examples of this would be Elastic Load Balancing,[08:38.040 --> 08:42.800] Amazon Cognito, Amazon Lex, Amazon Alexa,[08:42.800 --> 08:46.360] Amazon API Gateway, AWS CloudFormation,[08:46.360 --> 08:48.880] and Amazon CloudFront,[08:48.880 --> 08:53.040] and also Amazon Kinesis Data Firehose.[08:53.040 --> 08:56.760] For asynchronous invocation, AWS Lambda queues,[08:56.760 --> 09:00.320] the event before it passes to your function.[09:00.320 --> 09:02.760] The other service gets a success response[09:02.760 --> 09:04.920] as soon as the event is queued,[09:04.920 --> 09:06.560] and if an error occurs,[09:06.560 --> 09:09.760] Lambda will automatically retry the invocation twice.[09:10.760 --> 09:14.520] A good example of this would be S3, SNS,[09:14.520 --> 09:17.720] SES, the Simple Email Service,[09:17.720 --> 09:21.120] AWS CloudFormation, Amazon CloudWatch Logs,[09:21.120 --> 09:25.400] CloudWatch Events, AWS CodeCommit, and AWS Config.[09:25.400 --> 09:28.280] But in both cases, you can invoke a Lambda function[09:28.280 --> 09:30.000] using the invoke operation,[09:30.000 --> 09:32.720] and you can specify the invocation type[09:32.720 --> 09:35.440] as either synchronous or asynchronous.[09:35.440 --> 09:38.760] And when you use the AWS service as a trigger,[09:38.760 --> 09:42.280] the invocation type is predetermined for each service,[09:42.280 --> 09:44.920] and so you have no control over the invocation type[09:44.920 --> 09:48.920] that these events sources use when they invoke your Lambda.[09:50.800 --> 09:52.120] In the polling model,[09:52.120 --> 09:55.720] the event sources will put information into a stream or a queue,[09:55.720 --> 09:59.360] and AWS Lambda will pull the stream or the queue.[09:59.360 --> 10:01.000] If it first finds a record,[10:01.000 --> 10:03.280] it will deliver the payload and invoke the function.[10:03.280 --> 10:04.920] And this model, the Lambda itself,[10:04.920 --> 10:07.920] is basically pulling data from a stream or a queue[10:07.920 --> 10:10.280] for processing by the Lambda function.[10:10.280 --> 10:12.640] Some examples would be a stream-based event service[10:12.640 --> 10:17.640] would be Amazon DynamoDB or Amazon Kinesis Data Streams,[10:17.800 --> 10:20.920] and these stream records are organized into shards.[10:20.920 --> 10:24.640] So Lambda would actually pull the stream for the record[10:24.640 --> 10:27.120] and then attempt to invoke the function.[10:27.120 --> 10:28.800] If there's a failure,[10:28.800 --> 10:31.480] AWS Lambda won't read any of the new shards[10:31.480 --> 10:34.840] until the failed batch of records expires or is processed[10:34.840 --> 10:36.160] successfully.[10:36.160 --> 10:39.840] In the non-streaming event, which would be SQS,[10:39.840 --> 10:42.400] Amazon would pull the queue for records.[10:42.400 --> 10:44.600] If it fails or times out,[10:44.600 --> 10:46.640] then the message would be returned to the queue,[10:46.640 --> 10:49.320] and then Lambda will keep retrying the failed message[10:49.320 --> 10:51.800] until it's processed successfully.[10:51.800 --> 10:53.600] If the message will expire,[10:53.600 --> 10:56.440] which is something you can do with SQS,[10:56.440 --> 10:58.240] then it'll just be discarded.[10:58.240 --> 11:00.400] And you can create a mapping between an event source[11:00.400 --> 11:02.960] and a Lambda function right inside of the console.[11:02.960 --> 11:05.520] And this is how typically you would set that up manually[11:05.520 --> 11:07.600] without using infrastructure as code.[11:08.560 --> 11:10.200] All right, let's talk about permissions.[11:10.200 --> 11:13.080] This is definitely an easy place to get tripped up[11:13.080 --> 11:15.760] when you're first using AWS Lambda.[11:15.760 --> 11:17.840] There's two types of permissions.[11:17.840 --> 11:20.120] The first is the event source and permission[11:20.120 --> 11:22.320] to trigger the Lambda function.[11:22.320 --> 11:24.480] This would be the invocation permission.[11:24.480 --> 11:26.440] And the next one would be the Lambda function[11:26.440 --> 11:29.600] needs permissions to interact with other services,[11:29.600 --> 11:31.280] but this would be the run permissions.[11:31.280 --> 11:34.520] And these are both handled via the IAM service[11:34.520 --> 11:38.120] or the AWS identity and access management service.[11:38.120 --> 11:43.120] So the IAM resource policy would tell the Lambda service[11:43.600 --> 11:46.640] which push event the sources have permission[11:46.640 --> 11:48.560] to invoke the Lambda function.[11:48.560 --> 11:51.120] And these resource policies would make it easy[11:51.120 --> 11:55.280] to grant access to a Lambda function across AWS account.[11:55.280 --> 11:58.400] So a good example would be if you have an S3 bucket[11:58.400 --> 12:01.400] in your account and you need to invoke a function[12:01.400 --> 12:03.880] in another account, you could create a resource policy[12:03.880 --> 12:07.120] that allows those to interact with each other.[12:07.120 --> 12:09.200] And the resource policy for a Lambda function[12:09.200 --> 12:11.200] is called a function policy.[12:11.200 --> 12:14.160] And when you add a trigger to your Lambda function[12:14.160 --> 12:16.760] from the console, the function policy[12:16.760 --> 12:18.680] will be generated automatically[12:18.680 --> 12:20.040] and it allows the event source[12:20.040 --> 12:22.820] to take the Lambda invoke function action.[12:24.400 --> 12:27.320] So a good example would be in Amazon S3 permission[12:27.320 --> 12:32.120] to invoke the Lambda function called my first function.[12:32.120 --> 12:34.720] And basically it would be an effect allow.[12:34.720 --> 12:36.880] And then under principle, if you would have service[12:36.880 --> 12:41.880] S3.AmazonEWS.com, the action would be Lambda colon[12:41.880 --> 12:45.400] invoke function and then the resource would be the name[12:45.400 --> 12:49.120] or the ARN of actually the Lambda.[12:49.120 --> 12:53.080] And then the condition would be actually the ARN of the bucket.[12:54.400 --> 12:56.720] And really that's it in a nutshell.[12:57.560 --> 13:01.480] The Lambda execution role grants your Lambda function[13:01.480 --> 13:05.040] permission to access AWS services and resources.[13:05.040 --> 13:08.000] And you select or create the execution role[13:08.000 --> 13:10.000] when you create a Lambda function.[13:10.000 --> 13:12.320] The IAM policy would define the actions[13:12.320 --> 13:14.440] of Lambda functions allowed to take[13:14.440 --> 13:16.720] and the trust policy allows the Lambda service[13:16.720 --> 13:20.040] to assume an execution role.[13:20.040 --> 13:23.800] To grant permissions to AWS Lambda to assume a role,[13:23.800 --> 13:27.460] you have to have the permission for IAM pass role action.[13:28.320 --> 13:31.000] A couple of different examples of a relevant policy[13:31.000 --> 13:34.560] for an execution role and the example,[13:34.560 --> 13:37.760] the IAM policy, you know,[13:37.760 --> 13:39.840] basically that we talked about earlier,[13:39.840 --> 13:43.000] would allow you to interact with S3.[13:43.000 --> 13:45.360] Another example would be to make it interact[13:45.360 --> 13:49.240] with CloudWatch logs and to create a log group[13:49.240 --> 13:51.640] and stream those logs.[13:51.640 --> 13:54.800] The trust policy would give Lambda service permissions[13:54.800 --> 13:57.600] to assume a role and invoke a Lambda function[13:57.600 --> 13:58.520] on your behalf.[13:59.560 --> 14:02.600] Now let's talk about the overview of authoring[14:02.600 --> 14:06.120] and configuring Lambda functions.[14:06.120 --> 14:10.440] So really to start with, to create a Lambda function,[14:10.440 --> 14:14.840] you first need to create a Lambda function deployment package,[14:14.840 --> 14:19.800] which is a zip or jar file that consists of your code[14:19.800 --> 14:23.160] and any dependencies with Lambda,[14:23.160 --> 14:25.400] you can use the programming language[14:25.400 --> 14:27.280] and integrated development environment[14:27.280 --> 14:29.800] that you're most familiar with.[14:29.800 --> 14:33.360] And you can actually bring the code you've already written.[14:33.360 --> 14:35.960] And Lambda does support lots of different languages[14:35.960 --> 14:39.520] like Node.js, Python, Ruby, Java, Go,[14:39.520 --> 14:41.160] and.NET runtimes.[14:41.160 --> 14:44.120] And you can also implement a custom runtime[14:44.120 --> 14:45.960] if you wanna use a different language as well,[14:45.960 --> 14:48.480] which is actually pretty cool.[14:48.480 --> 14:50.960] And if you wanna create a Lambda function,[14:50.960 --> 14:52.800] you would specify the handler,[14:52.800 --> 14:55.760] the Lambda function handler is the entry point.[14:55.760 --> 14:57.600] And a few different aspects of it[14:57.600 --> 14:59.400] that are important to pay attention to,[14:59.400 --> 15:00.720] the event object,[15:00.720 --> 15:03.480] this would provide information about the event[15:03.480 --> 15:05.520] that triggered the Lambda function.[15:05.520 --> 15:08.280] And this could be like a predefined object[15:08.280 --> 15:09.760] that AWS service generates.[15:09.760 --> 15:11.520] So you'll see this, like for example,[15:11.520 --> 15:13.440] in the console of AWS,[15:13.440 --> 15:16.360] you can actually ask for these objects[15:16.360 --> 15:19.200] and it'll give you really the JSON structure[15:19.200 --> 15:20.680] so you can test things out.[15:21.880 --> 15:23.900] In the contents of an event object[15:23.900 --> 15:26.800] includes everything you would need to actually invoke it.[15:26.800 --> 15:29.640] The context object is generated by AWS[15:29.640 --> 15:32.360] and this is really a runtime information.[15:32.360 --> 15:35.320] And so if you needed to get some kind of runtime information[15:35.320 --> 15:36.160] about your code,[15:36.160 --> 15:40.400] let's say environmental variables or AWS request ID[15:40.400 --> 15:44.280] or a log stream or remaining time in Millies,[15:45.320 --> 15:47.200] like for example, that one would return[15:47.200 --> 15:48.840] the number of milliseconds that remain[15:48.840 --> 15:50.600] before your function times out,[15:50.600 --> 15:53.300] you can get all that inside the context object.[15:54.520 --> 15:57.560] So what about an example that runs a Python?[15:57.560 --> 15:59.280] Pretty straightforward actually.[15:59.280 --> 16:01.400] All you need is you would put a handler[16:01.400 --> 16:03.280] inside the handler would take,[16:03.280 --> 16:05.000] that it would be a Python function,[16:05.000 --> 16:07.080] it would be an event, there'd be a context,[16:07.080 --> 16:10.960] you pass it inside and then you return some kind of message.[16:10.960 --> 16:13.960] A few different best practices to remember[16:13.960 --> 16:17.240] about AWS Lambda would be to separate[16:17.240 --> 16:20.320] the core business logic from the handler method[16:20.320 --> 16:22.320] and this would make your code more portable,[16:22.320 --> 16:24.280] enable you to target unit tests[16:25.240 --> 16:27.120] without having to worry about the configuration.[16:27.120 --> 16:30.400] So this is always a really good idea just in general.[16:30.400 --> 16:32.680] Make sure you have modular functions.[16:32.680 --> 16:34.320] So you have a single purpose function,[16:34.320 --> 16:37.160] you don't have like a kitchen sink function,[16:37.160 --> 16:40.000] you treat functions as stateless as well.[16:40.000 --> 16:42.800] So you would treat a function that basically[16:42.800 --> 16:46.040] just does one thing and then when it's done,[16:46.040 --> 16:48.320] there is no state that's actually kept anywhere[16:49.320 --> 16:51.120] and also only include what you need.[16:51.120 --> 16:55.840] So you don't want to have a huge sized Lambda functions[16:55.840 --> 16:58.560] and one of the ways that you can avoid this[16:58.560 --> 17:02.360] is by reducing the time it takes a Lambda to unpack[17:02.360 --> 17:04.000] the deployment packages[17:04.000 --> 17:06.600] and you can also minimize the complexity[17:06.600 --> 17:08.640] of your dependencies as well.[17:08.640 --> 17:13.600] And you can also reuse the temporary runtime environment[17:13.600 --> 17:16.080] to improve the performance of a function as well.[17:16.080 --> 17:17.680] And so the temporary runtime environment[17:17.680 --> 17:22.280] initializes any external dependencies of the Lambda code[17:22.280 --> 17:25.760] and you can make sure that any externalized configuration[17:25.760 --> 17:27.920] or dependency that your code retrieves are stored[17:27.920 --> 17:30.640] and referenced locally after the initial run.[17:30.640 --> 17:33.800] So this would be limit re-initializing variables[17:33.800 --> 17:35.960] and objects on every invocation,[17:35.960 --> 17:38.200] keeping it alive and reusing connections[17:38.200 --> 17:40.680] like an HTTP or database[17:40.680 --> 17:43.160] that were established during the previous invocation.[17:43.160 --> 17:45.880] So a really good example of this would be a socket connection.[17:45.880 --> 17:48.040] If you make a socket connection[17:48.040 --> 17:51.640] and this socket connection took two seconds to spawn,[17:51.640 --> 17:54.000] you don't want every time you call Lambda[17:54.000 --> 17:55.480] for it to wait two seconds,[17:55.480 --> 17:58.160] you want to reuse that socket connection.[17:58.160 --> 18:00.600] A few good examples of best practices[18:00.600 --> 18:02.840] would be including logging statements.[18:02.840 --> 18:05.480] This is a kind of a big one[18:05.480 --> 18:08.120] in the case of any cloud computing operation,[18:08.120 --> 18:10.960] especially when it's distributed, if you don't log it,[18:10.960 --> 18:13.280] there's no way you can figure out what's going on.[18:13.280 --> 18:16.560] So you must add logging statements that have context[18:16.560 --> 18:19.720] so you know which particular Lambda instance[18:19.720 --> 18:21.600] is actually occurring in.[18:21.600 --> 18:23.440] Also include results.[18:23.440 --> 18:25.560] So make sure that you know it's happening[18:25.560 --> 18:29.000] when the Lambda ran, use environmental variables as well.[18:29.000 --> 18:31.320] So you can figure out things like what the bucket was[18:31.320 --> 18:32.880] that it was writing to.[18:32.880 --> 18:35.520] And then also don't do recursive code.[18:35.520 --> 18:37.360] That's really a no-no.[18:37.360 --> 18:40.200] You want to write very simple functions with Lambda.[18:41.320 --> 18:44.440] Few different ways to write Lambda actually would be[18:44.440 --> 18:46.280] that you can do the console editor,[18:46.280 --> 18:47.440] which I use all the time.[18:47.440 --> 18:49.320] I like to actually just play around with it.[18:49.320 --> 18:51.640] Now the downside is that if you don't,[18:51.640 --> 18:53.800] if you do need to use custom libraries,[18:53.800 --> 18:56.600] you're not gonna be able to do it other than using,[18:56.600 --> 18:58.440] let's say the AWS SDK.[18:58.440 --> 19:01.600] But for just simple things, it's a great use case.[19:01.600 --> 19:06.080] Another one is you can just upload it to AWS console.[19:06.080 --> 19:09.040] And so you can create a deployment package in an IDE.[19:09.040 --> 19:12.120] Like for example, Visual Studio for.NET,[19:12.120 --> 19:13.280] you can actually just right click[19:13.280 --> 19:16.320] and deploy it directly into Lambda.[19:16.320 --> 19:20.920] Another one is you can upload the entire package into S3[19:20.920 --> 19:22.200] and put it into a bucket.[19:22.200 --> 19:26.280] And then Lambda will just grab it outside of that S3 package.[19:26.280 --> 19:29.760] A few different things to remember about Lambda.[19:29.760 --> 19:32.520] The memory and the timeout are configurations[19:32.520 --> 19:35.840] that determine how the Lambda function performs.[19:35.840 --> 19:38.440] And these will affect the billing.[19:38.440 --> 19:40.200] Now, one of the great things about Lambda[19:40.200 --> 19:43.640] is just amazingly inexpensive to run.[19:43.640 --> 19:45.560] And the reason is that you're charged[19:45.560 --> 19:48.200] based on the number of requests for a function.[19:48.200 --> 19:50.560] A few different things to remember would be the memory.[19:50.560 --> 19:53.560] Like so if you specify more memory,[19:53.560 --> 19:57.120] it's going to increase the cost timeout.[19:57.120 --> 19:59.960] You can also control the memory duration of the function[19:59.960 --> 20:01.720] by having the right kind of timeout.[20:01.720 --> 20:03.960] But if you make the timeout too long,[20:03.960 --> 20:05.880] it could cost you more money.[20:05.880 --> 20:08.520] So really the best practices would be test the performance[20:08.520 --> 20:12.880] of Lambda and make sure you have the optimum memory size.[20:12.880 --> 20:15.160] Also load test it to make sure[20:15.160 --> 20:17.440] that you understand how the timeouts work.[20:17.440 --> 20:18.280] Just in general,[20:18.280 --> 20:21.640] anything with cloud computing, you should load test it.[20:21.640 --> 20:24.200] Now let's talk about an important topic[20:24.200 --> 20:25.280] that's a final topic here,[20:25.280 --> 20:29.080] which is how to deploy Lambda functions.[20:29.080 --> 20:32.200] So versions are immutable copies of a code[20:32.200 --> 20:34.200] in the configuration of your Lambda function.[20:34.200 --> 20:35.880] And the versioning will allow you to publish[20:35.880 --> 20:39.360] one or more versions of your Lambda function.[20:39.360 --> 20:40.400] And as a result,[20:40.400 --> 20:43.360] you can work with different variations of your Lambda function[20:44.560 --> 20:45.840] in your development workflow,[20:45.840 --> 20:48.680] like development, beta, production, et cetera.[20:48.680 --> 20:50.320] And when you create a Lambda function,[20:50.320 --> 20:52.960] there's only one version, the latest version,[20:52.960 --> 20:54.080] dollar sign, latest.[20:54.080 --> 20:57.240] And you can refer to this function using the ARN[20:57.240 --> 20:59.240] or Amazon resource name.[20:59.240 --> 21:00.640] And when you publish a new version,[21:00.640 --> 21:02.920] AWS Lambda will make a snapshot[21:02.920 --> 21:05.320] of the latest version to create a new version.[21:06.800 --> 21:09.600] You can also create an alias for Lambda function.[21:09.600 --> 21:12.280] And conceptually, an alias is just like a pointer[21:12.280 --> 21:13.800] to a specific function.[21:13.800 --> 21:17.040] And you can use that alias in the ARN[21:17.040 --> 21:18.680] to reference the Lambda function version[21:18.680 --> 21:21.280] that's currently associated with the alias.[21:21.280 --> 21:23.400] What's nice about the alias is you can roll back[21:23.400 --> 21:25.840] and forth between different versions,[21:25.840 --> 21:29.760] which is pretty nice because in the case of deploying[21:29.760 --> 21:32.920] a new version, if there's a huge problem with it,[21:32.920 --> 21:34.080] you just toggle it right back.[21:34.080 --> 21:36.400] And there's really not a big issue[21:36.400 --> 21:39.400] in terms of rolling back your code.[21:39.400 --> 21:44.400] Now, let's take a look at an example where AWS S3,[21:45.160 --> 21:46.720] or Amazon S3 is the event source[21:46.720 --> 21:48.560] that invokes your Lambda function.[21:48.560 --> 21:50.720] Every time a new object is created,[21:50.720 --> 21:52.880] when Amazon S3 is the event source,[21:52.880 --> 21:55.800] you can store the information for the event source mapping[21:55.800 --> 21:59.040] in the configuration for the bucket notifications.[21:59.040 --> 22:01.000] And then in that configuration,[22:01.000 --> 22:04.800] you could identify the Lambda function ARN[22:04.800 --> 22:07.160] that Amazon S3 can invoke.[22:07.160 --> 22:08.520] But in some cases,[22:08.520 --> 22:11.680] you're gonna have to update the notification configuration.[22:11.680 --> 22:14.720] So Amazon S3 will invoke the correct version each time[22:14.720 --> 22:17.840] you publish a new version of your Lambda function.[22:17.840 --> 22:21.800] So basically, instead of specifying the function ARN,[22:21.800 --> 22:23.880] you can specify an alias ARN[22:23.880 --> 22:26.320] in the notification of configuration.[22:26.320 --> 22:29.160] And as you promote a new version of the Lambda function[22:29.160 --> 22:32.200] into production, you only need to update the prod alias[22:32.200 --> 22:34.520] to point to the latest stable version.[22:34.520 --> 22:36.320] And you also don't need to update[22:36.320 --> 22:39.120] the notification configuration in Amazon S3.[22:40.480 --> 22:43.080] And when you build serverless applications[22:43.080 --> 22:46.600] as common to have code that's shared across Lambda functions,[22:46.600 --> 22:49.400] it could be custom code, it could be a standard library,[22:49.400 --> 22:50.560] et cetera.[22:50.560 --> 22:53.320] And before, and this was really a big limitation,[22:53.320 --> 22:55.920] was you had to have all the code deployed together.[22:55.920 --> 22:58.960] But now, one of the really cool things you can do[22:58.960 --> 23:00.880] is you can have a Lambda function[23:00.880 --> 23:03.600] to include additional code as a layer.[23:03.600 --> 23:05.520] So layer is basically a zip archive[23:05.520 --> 23:08.640] that contains a library, maybe a custom runtime.[23:08.640 --> 23:11.720] Maybe it isn't gonna include some kind of really cool[23:11.720 --> 23:13.040] pre-trained model.[23:13.040 --> 23:14.680] And then the layers you can use,[23:14.680 --> 23:15.800] the libraries in your function[23:15.800 --> 23:18.960] without needing to include them in your deployment package.[23:18.960 --> 23:22.400] And it's a best practice to have the smaller deployment packages[23:22.400 --> 23:25.240] and share common dependencies with the layers.[23:26.120 --> 23:28.520] Also layers will help you keep your deployment package[23:28.520 --> 23:29.360] really small.[23:29.360 --> 23:32.680] So for node, JS, Python, Ruby functions,[23:32.680 --> 23:36.000] you can develop your function code in the console[23:36.000 --> 23:39.000] as long as you keep the package under three megabytes.[23:39.000 --> 23:42.320] And then a function can use up to five layers at a time,[23:42.320 --> 23:44.160] which is pretty incredible actually,[23:44.160 --> 23:46.040] which means that you could have, you know,[23:46.040 --> 23:49.240] basically up to a 250 megabytes total.[23:49.240 --> 23:53.920] So for many languages, this is plenty of space.[23:53.920 --> 23:56.620] Also Amazon has published a public layer[23:56.620 --> 23:58.800] that includes really popular libraries[23:58.800 --> 24:00.800] like NumPy and SciPy,[24:00.800 --> 24:04.840] which does dramatically help data processing[24:04.840 --> 24:05.680] in machine learning.[24:05.680 --> 24:07.680] Now, if I had to predict the future[24:07.680 --> 24:11.840] and I wanted to predict a massive announcement,[24:11.840 --> 24:14.840] I would say that what AWS could do[24:14.840 --> 24:18.600] is they could have a GPU enabled layer at some point[24:18.600 --> 24:20.160] that would include pre-trained models.[24:20.160 --> 24:22.120] And if they did something like that,[24:22.120 --> 24:24.320] that could really open up the doors[24:24.320 --> 24:27.000] for the pre-trained model revolution.[24:27.000 --> 24:30.160] And I would bet that that's possible.[24:30.160 --> 24:32.200] All right, well, in a nutshell,[24:32.200 --> 24:34.680] AWS Lambda is one of my favorite services.[24:34.680 --> 24:38.440] And I think it's worth everybody's time[24:38.440 --> 24:42.360] that's interested in AWS to play around with AWS Lambda.[24:42.360 --> 24:47.200] All right, next week, I'm going to cover API Gateway.[24:47.200 --> 25:13.840] All right, see you next week.If you enjoyed this video, here are additional resources to look at:Coursera + Duke Specialization: Building Cloud Computing Solutions at Scale Specialization: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scalePython, Bash, and SQL Essentials for Data Engineering Specialization: https://www.coursera.org/specializations/python-bash-sql-data-engineering-dukeAWS Certified Solutions Architect - Professional (SAP-C01) Cert Prep: 1 Design for Organizational Complexity:https://www.linkedin.com/learning/aws-certified-solutions-architect-professional-sap-c01-cert-prep-1-design-for-organizational-complexity/design-for-organizational-complexity?autoplay=trueEssentials of MLOps with Azure and Databricks: https://www.linkedin.com/learning/essentials-of-mlops-with-azure-1-introduction/essentials-of-mlops-with-azureO'Reilly Book: Implementing MLOps in the EnterpriseO'Reilly Book: Practical MLOps: https://www.amazon.com/Practical-MLOps-Operationalizing-Machine-Learning/dp/1098103017O'Reilly Book: Python for DevOps: https://www.amazon.com/gp/product/B082P97LDW/O'Reilly Book: Developing on AWS with C#: A Comprehensive Guide on Using C# to Build Solutions on the AWS Platformhttps://www.amazon.com/Developing-AWS-Comprehensive-Solutions-Platform/dp/1492095877Pragmatic AI: An Introduction to Cloud-based Machine Learning: https://www.amazon.com/gp/product/B07FB8F8QP/Pragmatic AI Labs Book: Python Command-Line Tools: https://www.amazon.com/gp/product/B0855FSFYZPragmatic AI Labs Book: Cloud Computing for Data Analysis: https://www.amazon.com/gp/product/B0992BN7W8Pragmatic AI Book: Minimal Python: https://www.amazon.com/gp/product/B0855NSRR7Pragmatic AI Book: Testing in Python: https://www.amazon.com/gp/product/B0855NSRR7Subscribe to Pragmatic AI Labs YouTube Channel: https://www.youtube.com/channel/UCNDfiL0D1LUeKWAkRE1xO5QSubscribe to 52 Weeks of AWS Podcast: https://52-weeks-of-cloud.simplecast.comView content on noahgift.com: https://noahgift.com/View content on Pragmatic AI Labs Website: https://paiml.com/
If you enjoyed this video, here are additional resources to look at:Coursera + Duke Specialization: Building Cloud Computing Solutions at Scale Specialization: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scalePython, Bash, and SQL Essentials for Data Engineering Specialization: https://www.coursera.org/specializations/python-bash-sql-data-engineering-dukeAWS Certified Solutions Architect - Professional (SAP-C01) Cert Prep: 1 Design for Organizational Complexity:https://www.linkedin.com/learning/aws-certified-solutions-architect-professional-sap-c01-cert-prep-1-design-for-organizational-complexity/design-for-organizational-complexity?autoplay=trueO'Reilly Book: Practical MLOps: https://www.amazon.com/Practical-MLOps-Operationalizing-Machine-Learning/dp/1098103017O'Reilly Book: Python for DevOps: https://www.amazon.com/gp/product/B082P97LDW/O'Reilly Book: Developing on AWS with C#: A Comprehensive Guide on Using C# to Build Solutions on the AWS Platformhttps://www.amazon.com/Developing-AWS-Comprehensive-Solutions-Platform/dp/1492095877Pragmatic AI: An Introduction to Cloud-based Machine Learning: https://www.amazon.com/gp/product/B07FB8F8QP/Pragmatic AI Labs Book: Python Command-Line Tools: https://www.amazon.com/gp/product/B0855FSFYZPragmatic AI Labs Book: Cloud Computing for Data Analysis: https://www.amazon.com/gp/product/B0992BN7W8Pragmatic AI Book: Minimal Python: https://www.amazon.com/gp/product/B0855NSRR7Pragmatic AI Book: Testing in Python: https://www.amazon.com/gp/product/B0855NSRR7Subscribe to Pragmatic AI Labs YouTube Channel: https://www.youtube.com/channel/UCNDfiL0D1LUeKWAkRE1xO5QSubscribe to 52 Weeks of AWS Podcast: https://52-weeks-of-cloud.simplecast.comView content on noahgift.com: https://noahgift.com/View content on Pragmatic AI Labs Website: https://paiml.com/[00:00.000 --> 00:02.260] Hey, three, two, one, there we go, we're live.[00:02.260 --> 00:07.260] All right, so welcome Simon to Enterprise ML Ops interviews.[00:09.760 --> 00:13.480] The goal of these interviews is to get people exposed[00:13.480 --> 00:17.680] to real professionals who are doing work in ML Ops.[00:17.680 --> 00:20.360] It's such a cutting edge field[00:20.360 --> 00:22.760] that I think a lot of people are very curious about.[00:22.760 --> 00:23.600] What is it?[00:23.600 --> 00:24.960] You know, how do you do it?[00:24.960 --> 00:27.760] And very honored to have Simon here.[00:27.760 --> 00:29.200] And do you wanna introduce yourself[00:29.200 --> 00:31.520] and maybe talk a little bit about your background?[00:31.520 --> 00:32.360] Sure.[00:32.360 --> 00:33.960] Yeah, thanks again for inviting me.[00:34.960 --> 00:38.160] My name is Simon Stebelena or Simon.[00:38.160 --> 00:40.440] I am originally from Austria,[00:40.440 --> 00:43.120] but currently working in the Netherlands and Amsterdam[00:43.120 --> 00:46.080] at Transaction Monitoring Netherlands.[00:46.080 --> 00:48.780] Here I am the lead ML Ops engineer.[00:49.840 --> 00:51.680] What are we doing at TML actually?[00:51.680 --> 00:55.560] We are a data processing company actually.[00:55.560 --> 00:59.320] We are owned by the five large banks of Netherlands.[00:59.320 --> 01:02.080] And our purpose is kind of what the name says.[01:02.080 --> 01:05.920] We are basically lifting specifically anti money laundering.[01:05.920 --> 01:08.040] So anti money laundering models that run[01:08.040 --> 01:11.440] on a personalized transactions of businesses[01:11.440 --> 01:13.240] we get from these five banks[01:13.240 --> 01:15.760] to detect unusual patterns on that transaction graph[01:15.760 --> 01:19.000] that might indicate money laundering.[01:19.000 --> 01:20.520] That's a natural what we do.[01:20.520 --> 01:21.800] So as you can imagine,[01:21.800 --> 01:24.160] we are really focused on building models[01:24.160 --> 01:27.280] and obviously ML Ops is a big component there[01:27.280 --> 01:29.920] because that is really the core of what you do.[01:29.920 --> 01:32.680] You wanna do it efficiently and effectively as well.[01:32.680 --> 01:34.760] In my role as lead ML Ops engineer,[01:34.760 --> 01:36.880] I'm on the one hand the lead engineer[01:36.880 --> 01:38.680] of the actual ML Ops platform team.[01:38.680 --> 01:40.200] So this is actually a centralized team[01:40.200 --> 01:42.680] that builds out lots of the infrastructure[01:42.680 --> 01:47.320] that's needed to do modeling effectively and efficiently.[01:47.320 --> 01:50.360] But also I am the craft lead[01:50.360 --> 01:52.640] for the machine learning engineering craft.[01:52.640 --> 01:55.120] These are actually in our case, the machine learning engineers,[01:55.120 --> 01:58.360] the people working within the model development teams[01:58.360 --> 01:59.360] and cross functional teams[01:59.360 --> 02:01.680] actually building these models.[02:01.680 --> 02:03.640] That's what I'm currently doing[02:03.640 --> 02:05.760] during the evenings and weekends.[02:05.760 --> 02:09.400] I'm also lecturer at the University of Applied Sciences, Vienna.[02:09.400 --> 02:12.080] And there I'm teaching data mining[02:12.080 --> 02:15.160] and data warehousing to master students, essentially.[02:16.240 --> 02:19.080] Before TMNL, I was at bold.com,[02:19.080 --> 02:21.960] which is the largest eCommerce retailer in the Netherlands.[02:21.960 --> 02:25.040] So I always tend to see the Amazon of the Netherlands[02:25.040 --> 02:27.560] or been a lux actually.[02:27.560 --> 02:30.920] It is still the biggest eCommerce retailer in the Netherlands[02:30.920 --> 02:32.960] even before Amazon actually.[02:32.960 --> 02:36.160] And there I was an expert machine learning engineer.[02:36.160 --> 02:39.240] So doing somewhat comparable stuff,[02:39.240 --> 02:42.440] a bit more still focused on the actual modeling part.[02:42.440 --> 02:44.800] Now it's really more on the infrastructure end.[02:45.760 --> 02:46.760] And well, before that,[02:46.760 --> 02:49.360] I spent some time in consulting, leading a data science team.[02:49.360 --> 02:50.880] That's actually where I kind of come from.[02:50.880 --> 02:53.360] I really come from originally the data science end.[02:54.640 --> 02:57.840] And there I kind of started drifting towards ML Ops[02:57.840 --> 02:59.200] because we started building out[02:59.200 --> 03:01.640] a deployment and serving platform[03:01.640 --> 03:04.440] that would as consulting company would make it easier[03:04.440 --> 03:07.920] for us to deploy models for our clients[03:07.920 --> 03:10.840] to serve these models, to also monitor these models.[03:10.840 --> 03:12.800] And that kind of then made me drift further and further[03:12.800 --> 03:15.520] down the engineering lane all the way to ML Ops.[03:17.000 --> 03:19.600] Great, yeah, that's a great background.[03:19.600 --> 03:23.200] I'm kind of curious in terms of the data science[03:23.200 --> 03:25.240] to ML Ops journey,[03:25.240 --> 03:27.720] that I think would be a great discussion[03:27.720 --> 03:29.080] to dig into a little bit.[03:30.280 --> 03:34.320] My background is originally more on the software engineering[03:34.320 --> 03:36.920] side and when I was in the Bay Area,[03:36.920 --> 03:41.160] I did individual contributor and then ran companies[03:41.160 --> 03:44.240] at one point and ran multiple teams.[03:44.240 --> 03:49.240] And then as the data science field exploded,[03:49.240 --> 03:52.880] I hired multiple data science teams and worked with them.[03:52.880 --> 03:55.800] But what was interesting is that I found that[03:56.840 --> 03:59.520] I think the original approach of data science[03:59.520 --> 04:02.520] from my perspective was lacking[04:02.520 --> 04:07.240] in that there wasn't really like deliverables.[04:07.240 --> 04:10.520] And I think when you look at a software engineering team,[04:10.520 --> 04:12.240] it's very clear there's deliverables.[04:12.240 --> 04:14.800] Like you have a mobile app and it has to get better[04:14.800 --> 04:15.880] each week, right?[04:15.880 --> 04:18.200] Where else, what are you doing?[04:18.200 --> 04:20.880] And so I would love to hear your story[04:20.880 --> 04:25.120] about how you went from doing kind of more pure data science[04:25.120 --> 04:27.960] to now it sounds like ML Ops.[04:27.960 --> 04:30.240] Yeah, yeah, actually.[04:30.240 --> 04:33.800] So back then in consulting one of the,[04:33.800 --> 04:36.200] which was still at least back then in Austria,[04:36.200 --> 04:39.280] data science and everything around it was still kind of[04:39.280 --> 04:43.720] in this infancy back then 2016 and so on.[04:43.720 --> 04:46.560] It was still really, really new to many organizations,[04:46.560 --> 04:47.400] at least in Austria.[04:47.400 --> 04:50.120] There might be some years behind in the US and stuff.[04:50.120 --> 04:52.040] But back then it was still relatively fresh.[04:52.040 --> 04:55.240] So in consulting, what we very often struggled with was[04:55.240 --> 04:58.520] on the modeling end, problems could be solved,[04:58.520 --> 05:02.040] but actually then easy deployment,[05:02.040 --> 05:05.600] keeping these models in production at client side.[05:05.600 --> 05:08.880] That was always a bit more of the challenge.[05:08.880 --> 05:12.400] And so naturally kind of I started thinking[05:12.400 --> 05:16.200] and focusing more on the actual bigger problem that I saw,[05:16.200 --> 05:19.440] which was not so much building the models,[05:19.440 --> 05:23.080] but it was really more, how can we streamline things?[05:23.080 --> 05:24.800] How can we keep things operating?[05:24.800 --> 05:27.960] How can we make that move easier from a prototype,[05:27.960 --> 05:30.680] from a PUC to a productionized model?[05:30.680 --> 05:33.160] Also how can we keep it there and maintain it there?[05:33.160 --> 05:35.480] So personally I was really more,[05:35.480 --> 05:37.680] I saw that this problem was coming up[05:38.960 --> 05:40.320] and that really fascinated me.[05:40.320 --> 05:44.120] So I started jumping more on that exciting problem.[05:44.120 --> 05:45.080] That's how it went for me.[05:45.080 --> 05:47.000] And back then we then also recognized it[05:47.000 --> 05:51.560] as a potential product in our case.[05:51.560 --> 05:54.120] So we started building out that deployment[05:54.120 --> 05:56.960] and serving and monitoring platform, actually.[05:56.960 --> 05:59.520] And that then really for me, naturally,[05:59.520 --> 06:01.840] I fell into that rabbit hole[06:01.840 --> 06:04.280] and I also never wanted to get out of it again.[06:05.680 --> 06:09.400] So the system that you built initially,[06:09.400 --> 06:10.840] what was your stack?[06:10.840 --> 06:13.760] What were some of the things you were using?[06:13.760 --> 06:17.000] Yeah, so essentially we had,[06:17.000 --> 06:19.560] when we talk about the stack on the backend,[06:19.560 --> 06:20.560] there was a lot of,[06:20.560 --> 06:23.000] so the full backend was written in Java.[06:23.000 --> 06:25.560] We were using more from a user perspective,[06:25.560 --> 06:28.040] the contract that we kind of had,[06:28.040 --> 06:32.560] our goal was to build a drag and drop platform for models.[06:32.560 --> 06:35.760] So basically the contract was you package your model[06:35.760 --> 06:37.960] as an MLflow model,[06:37.960 --> 06:41.520] and then you basically drag and drop it into a web UI.[06:41.520 --> 06:43.640] It's gonna be wrapped in containers.[06:43.640 --> 06:45.040] It's gonna be deployed.[06:45.040 --> 06:45.880] It's gonna be,[06:45.880 --> 06:49.680] there will be a monitoring layer in front of it[06:49.680 --> 06:52.760] based on whatever the dataset is you trained it on.[06:52.760 --> 06:55.920] You would automatically calculate different metrics,[06:55.920 --> 06:57.360] different distributional metrics[06:57.360 --> 06:59.240] around your variables that you are using.[06:59.240 --> 07:02.080] And so we were layering this approach[07:02.080 --> 07:06.840] to, so that eventually every incoming request would be,[07:06.840 --> 07:08.160] you would have a nice dashboard.[07:08.160 --> 07:10.040] You could monitor all that stuff.[07:10.040 --> 07:12.600] So stackwise it was actually MLflow.[07:12.600 --> 07:15.480] Specifically MLflow models a lot.[07:15.480 --> 07:17.920] Then it was Java in the backend, Python.[07:17.920 --> 07:19.760] There was a lot of Python,[07:19.760 --> 07:22.040] especially PySpark component as well.[07:23.000 --> 07:25.880] There was a, it's been quite a while actually,[07:25.880 --> 07:29.160] there was a quite some part written in Scala.[07:29.160 --> 07:32.280] Also, because there was a component of this platform[07:32.280 --> 07:34.800] was also a bit of an auto ML approach,[07:34.800 --> 07:36.480] but that died then over time.[07:36.480 --> 07:40.120] And that was also based on PySpark[07:40.120 --> 07:43.280] and vanilla Spark written in Scala.[07:43.280 --> 07:45.560] So we could facilitate the auto ML part.[07:45.560 --> 07:48.600] And then later on we actually added that deployment,[07:48.600 --> 07:51.480] the easy deployment and serving part.[07:51.480 --> 07:55.280] So that was kind of, yeah, a lot of custom build stuff.[07:55.280 --> 07:56.120] Back then, right?[07:56.120 --> 07:59.720] There wasn't that much MLOps tooling out there yet.[07:59.720 --> 08:02.920] So you need to build a lot of that stuff custom.[08:02.920 --> 08:05.280] So it was largely custom built.[08:05.280 --> 08:09.280] Yeah, the MLflow concept is an interesting concept[08:09.280 --> 08:13.880] because they provide this package structure[08:13.880 --> 08:17.520] that at least you have some idea of,[08:17.520 --> 08:19.920] what is gonna be sent into the model[08:19.920 --> 08:22.680] and like there's a format for the model.[08:22.680 --> 08:24.720] And I think that part of MLflow[08:24.720 --> 08:27.520] seems to be a pretty good idea,[08:27.520 --> 08:30.080] which is you're creating a standard where,[08:30.080 --> 08:32.360] you know, if in the case of,[08:32.360 --> 08:34.720] if you're using scikit learn or something,[08:34.720 --> 08:37.960] you don't necessarily want to just throw[08:37.960 --> 08:40.560] like a pickled model somewhere and just say,[08:40.560 --> 08:42.720] okay, you know, let's go.[08:42.720 --> 08:44.760] Yeah, that was also our thinking back then.[08:44.760 --> 08:48.040] So we thought a lot about what would be a,[08:48.040 --> 08:51.720] what would be, what could become the standard actually[08:51.720 --> 08:53.920] for how you package models.[08:53.920 --> 08:56.200] And back then MLflow was one of the little tools[08:56.200 --> 08:58.160] that was already there, already existent.[08:58.160 --> 09:00.360] And of course there was data bricks behind it.[09:00.360 --> 09:02.680] So we also made a bet on that back then and said,[09:02.680 --> 09:04.920] all right, let's follow that packaging standard[09:04.920 --> 09:08.680] and make it the contract how you would as a data scientist,[09:08.680 --> 09:10.800] then how you would need to package it up[09:10.800 --> 09:13.640] and submit it to the platform.[09:13.640 --> 09:16.800] Yeah, it's interesting because the,[09:16.800 --> 09:19.560] one of the, this reminds me of one of the issues[09:19.560 --> 09:21.800] that's happening right now with cloud computing,[09:21.800 --> 09:26.800] where in the cloud AWS has dominated for a long time[09:29.480 --> 09:34.480] and they have 40% market share, I think globally.[09:34.480 --> 09:38.960] And Azure's now gaining and they have some pretty good traction[09:38.960 --> 09:43.120] and then GCP's been down for a bit, you know,[09:43.120 --> 09:45.760] in that maybe the 10% range or something like that.[09:45.760 --> 09:47.760] But what's interesting is that it seems like[09:47.760 --> 09:51.480] in the case of all of the cloud providers,[09:51.480 --> 09:54.360] they haven't necessarily been leading the way[09:54.360 --> 09:57.840] on things like packaging models, right?[09:57.840 --> 10:01.480] Or, you know, they have their own proprietary systems[10:01.480 --> 10:06.480] which have been developed and are continuing to be developed[10:06.640 --> 10:08.920] like Vertex AI in the case of Google,[10:09.760 --> 10:13.160] the SageMaker in the case of Amazon.[10:13.160 --> 10:16.480] But what's interesting is, let's just take SageMaker,[10:16.480 --> 10:20.920] for example, there isn't really like this, you know,[10:20.920 --> 10:25.480] industry wide standard of model packaging[10:25.480 --> 10:28.680] that SageMaker uses, they have their own proprietary stuff[10:28.680 --> 10:31.040] that kind of builds in and Vertex AI[10:31.040 --> 10:32.440] has their own proprietary stuff.[10:32.440 --> 10:34.920] So, you know, I think it is interesting[10:34.920 --> 10:36.960] to see what's gonna happen[10:36.960 --> 10:41.120] because I think your original hypothesis which is,[10:41.120 --> 10:44.960] let's pick, you know, this looks like it's got some traction[10:44.960 --> 10:48.760] and it wasn't necessarily tied directly to a cloud provider[10:48.760 --> 10:51.600] because Databricks can work on anything.[10:51.600 --> 10:53.680] It seems like that in particular,[10:53.680 --> 10:56.800] that's one of the more sticky problems right now[10:56.800 --> 11:01.800] with MLopsis is, you know, who's the leader?[11:02.280 --> 11:05.440] Like, who's developing the right, you know,[11:05.440 --> 11:08.880] kind of a standard for tooling.[11:08.880 --> 11:12.320] And I don't know, maybe that leads into kind of you talking[11:12.320 --> 11:13.760] a little bit about what you're doing currently.[11:13.760 --> 11:15.600] Like, do you have any thoughts about the, you know,[11:15.600 --> 11:18.720] current tooling and what you're doing at your current company[11:18.720 --> 11:20.920] and what's going on with that?[11:20.920 --> 11:21.760] Absolutely.[11:21.760 --> 11:24.200] So at my current organization,[11:24.200 --> 11:26.040] Transaction Monitor Netherlands,[11:26.040 --> 11:27.480] we are fully on AWS.[11:27.480 --> 11:32.000] So we're really almost cloud native AWS.[11:32.000 --> 11:34.840] And so that also means everything we do on the modeling side[11:34.840 --> 11:36.600] really evolves around SageMaker.[11:37.680 --> 11:40.840] So for us, specifically for us as MLops team,[11:40.840 --> 11:44.680] we are building the platform around SageMaker capabilities.[11:45.680 --> 11:48.360] And on that end, at least company internal,[11:48.360 --> 11:52.880] we have a contract how you must actually deploy models.[11:52.880 --> 11:56.200] There is only one way, what we call the golden path,[11:56.200 --> 11:59.800] in that case, this is the streamlined highly automated path[11:59.800 --> 12:01.360] that is supported by the platform.[12:01.360 --> 12:04.360] This is the only way how you can actually deploy models.[12:04.360 --> 12:09.360] And in our case, that is actually a SageMaker pipeline object.[12:09.640 --> 12:12.680] So in our company, we're doing large scale batch processing.[12:12.680 --> 12:15.040] So we're actually not doing anything real time at present.[12:15.040 --> 12:17.040] We are doing post transaction monitoring.[12:17.040 --> 12:20.960] So that means you need to submit essentially DAX, right?[12:20.960 --> 12:23.400] This is what we use for training.[12:23.400 --> 12:25.680] This is what we also deploy eventually.[12:25.680 --> 12:27.720] And this is our internal contract.[12:27.720 --> 12:32.200] You need to provision a SageMaker in your model repository.[12:32.200 --> 12:34.640] You got to have one place,[12:34.640 --> 12:37.840] and there must be a function with a specific name[12:37.840 --> 12:41.440] and that function must return a SageMaker pipeline object.[12:41.440 --> 12:44.920] So this is our internal contract actually.[12:44.920 --> 12:46.600] Yeah, that's interesting.[12:46.600 --> 12:51.200] I mean, and I could see like for, I know many people[12:51.200 --> 12:53.880] that are using SageMaker in production,[12:53.880 --> 12:58.680] and it does seem like where it has some advantages[12:58.680 --> 13:02.360] is that AWS generally does a pretty good job[13:02.360 --> 13:04.240] at building solutions.[13:04.240 --> 13:06.920] And if you just look at the history of services,[13:06.920 --> 13:09.080] the odds are pretty high[13:09.080 --> 13:12.880] that they'll keep getting better, keep improving things.[13:12.880 --> 13:17.080] And it seems like what I'm hearing from people,[13:17.080 --> 13:19.080] and it sounds like maybe with your organization as well,[13:19.080 --> 13:24.080] is that potentially the SDK for SageMaker[13:24.440 --> 13:29.120] is really the win versus some of the UX tools they have[13:29.120 --> 13:32.680] and the interface for Canvas and Studio.[13:32.680 --> 13:36.080] Is that what's happening?[13:36.080 --> 13:38.720] Yeah, so I think, right,[13:38.720 --> 13:41.440] what we try to do is we always try to think about our users.[13:41.440 --> 13:44.880] So how do our users, who are our users?[13:44.880 --> 13:47.000] What capabilities and skills do they have?[13:47.000 --> 13:50.080] And what freedom should they have[13:50.080 --> 13:52.640] and what abilities should they have to develop models?[13:52.640 --> 13:55.440] In our case, we don't really have use cases[13:55.440 --> 13:58.640] for stuff like Canvas because our users[13:58.640 --> 14:02.680] are fairly mature teams that know how to do their,[14:02.680 --> 14:04.320] on the one hand, the data science stuff, of course,[14:04.320 --> 14:06.400] but also the engineering stuff.[14:06.400 --> 14:08.160] So in our case, things like Canvas[14:08.160 --> 14:10.320] do not really play so much role[14:10.320 --> 14:12.960] because obviously due to the high abstraction layer[14:12.960 --> 14:15.640] of more like graphical user interfaces,[14:15.640 --> 14:17.360] drag and drop tooling,[14:17.360 --> 14:20.360] you are also limited in what you can do,[14:20.360 --> 14:22.480] or what you can do easily.[14:22.480 --> 14:26.320] So in our case, really, it is the strength of the flexibility[14:26.320 --> 14:28.320] that the SageMaker SDK gives you.[14:28.320 --> 14:33.040] And in general, the SDK around most AWS services.[14:34.080 --> 14:36.760] But also it comes with challenges, of course.[14:37.720 --> 14:38.960] You give a lot of freedom,[14:38.960 --> 14:43.400] but also you're creating a certain ask,[14:43.400 --> 14:47.320] certain requirements for your model development teams,[14:47.320 --> 14:49.600] which is also why we've also been working[14:49.600 --> 14:52.600] about abstracting further away from the SDK.[14:52.600 --> 14:54.600] So our objective is actually[14:54.600 --> 14:58.760] that you should not be forced to interact with the raw SDK[14:58.760 --> 15:00.600] when you use SageMaker anymore,[15:00.600 --> 15:03.520] but you have a thin layer of abstraction[15:03.520 --> 15:05.480] on top of what you are doing.[15:05.480 --> 15:07.480] That's actually something we are moving towards[15:07.480 --> 15:09.320] more and more as well.[15:09.320 --> 15:11.120] Because yeah, it gives you the flexibility,[15:11.120 --> 15:12.960] but also flexibility comes at a cost,[15:12.960 --> 15:15.080] comes often at the cost of speeds,[15:15.080 --> 15:18.560] specifically when it comes to the 90% default stuff[15:18.560 --> 15:20.720] that you want to do, yeah.[15:20.720 --> 15:24.160] And one of the things that I have as a complaint[15:24.160 --> 15:29.160] against SageMaker is that it only uses virtual machines,[15:30.000 --> 15:35.000] and it does seem like a strange strategy in some sense.[15:35.000 --> 15:40.000] Like for example, I guess if you're doing batch only,[15:40.000 --> 15:42.000] it doesn't matter as much,[15:42.000 --> 15:45.000] which I think is a good strategy actually[15:45.000 --> 15:50.000] to get your batch based predictions very, very strong.[15:50.000 --> 15:53.000] And in that case, maybe the virtual machines[15:53.000 --> 15:56.000] make a little bit less of a complaint.[15:56.000 --> 16:00.000] But in the case of the endpoints with SageMaker,[16:00.000 --> 16:02.000] the fact that you have to spend up[16:02.000 --> 16:04.000] these really expensive virtual machines[16:04.000 --> 16:08.000] and let them run 24 seven to do online prediction,[16:08.000 --> 16:11.000] is that something that your organization evaluated[16:11.000 --> 16:13.000] and decided not to use?[16:13.000 --> 16:15.000] Or like, what are your thoughts behind that?[16:15.000 --> 16:19.000] Yeah, in our case, doing real time[16:19.000 --> 16:22.000] or near real time inference is currently not really relevant[16:22.000 --> 16:25.000] for the simple reason that when you think a bit more[16:25.000 --> 16:28.000] about the money laundering or anti money laundering space,[16:28.000 --> 16:31.000] typically when, right,[16:31.000 --> 16:34.000] all every individual bank must do anti money laundering[16:34.000 --> 16:37.000] and they have armies of people doing that.[16:37.000 --> 16:39.000] But on the other hand,[16:39.000 --> 16:43.000] the time it actually takes from one of their systems,[16:43.000 --> 16:46.000] one of their AML systems actually detecting something[16:46.000 --> 16:49.000] that's unusual that then goes into a review process[16:49.000 --> 16:54.000] until it eventually hits the governmental institution[16:54.000 --> 16:56.000] that then takes care of the cases that have been[16:56.000 --> 16:58.000] at least twice validated that they are indeed,[16:58.000 --> 17:01.000] they look very unusual.[17:01.000 --> 17:04.000] So this takes a while, this can take quite some time,[17:04.000 --> 17:06.000] which is also why it doesn't really matter[17:06.000 --> 17:09.000] whether you ship your prediction within a second[17:09.000 --> 17:13.000] or whether it takes you a week or two weeks.[17:13.000 --> 17:15.000] It doesn't really matter, hence for us,[17:15.000 --> 17:19.000] that problem so far thinking about real time inference[17:19.000 --> 17:21.000] has not been there.[17:21.000 --> 17:25.000] But yeah, indeed, for other use cases,[17:25.000 --> 17:27.000] for also private projects,[17:27.000 --> 17:29.000] we've also been considering SageMaker Endpoints[17:29.000 --> 17:31.000] for a while, but exactly what you said,[17:31.000 --> 17:33.000] the fact that you need to have a very beefy machine[17:33.000 --> 17:35.000] running all the time,[17:35.000 --> 17:39.000] specifically when you have heavy GPU loads, right,[17:39.000 --> 17:43.000] and you're actually paying for that machine running 2047,[17:43.000 --> 17:46.000] although you do have quite fluctuating load.[17:46.000 --> 17:49.000] Yeah, then that definitely becomes quite a consideration[17:49.000 --> 17:51.000] of what you go for.[17:51.000 --> 17:58.000] Yeah, and I actually have been talking to AWS about that,[17:58.000 --> 18:02.000] because one of the issues that I have is that[18:02.000 --> 18:07.000] the AWS platform really pushes serverless,[18:07.000 --> 18:10.000] and then my question for AWS is,[18:10.000 --> 18:13.000] so why aren't you using it?[18:13.000 --> 18:16.000] I mean, if you're pushing serverless for everything,[18:16.000 --> 18:19.000] why is SageMaker nothing serverless?[18:19.000 --> 18:21.000] And so maybe they're going to do that, I don't know.[18:21.000 --> 18:23.000] I don't have any inside information,[18:23.000 --> 18:29.000] but it is interesting to hear you had some similar concerns.[18:29.000 --> 18:32.000] I know that there's two questions here.[18:32.000 --> 18:37.000] One is someone asked about what do you do for data versioning,[18:37.000 --> 18:41.000] and a second one is how do you do event based MLOps?[18:41.000 --> 18:43.000] So maybe kind of following up.[18:43.000 --> 18:46.000] Yeah, what do we do for data versioning?[18:46.000 --> 18:51.000] On the one hand, we're running a data lakehouse,[18:51.000 --> 18:54.000] where after data we get from the financial institutions,[18:54.000 --> 18:57.000] from the banks that runs through massive data pipeline,[18:57.000 --> 19:01.000] also on AWS, we're using glue and step functions actually for that,[19:01.000 --> 19:03.000] and then eventually it ends up modeled to some extent,[19:03.000 --> 19:06.000] sanitized, quality checked in our data lakehouse,[19:06.000 --> 19:10.000] and there we're actually using hoodie on top of S3.[19:10.000 --> 19:13.000] And this is also what we use for versioning,[19:13.000 --> 19:16.000] which we use for time travel and all these things.[19:16.000 --> 19:19.000] So that is hoodie on top of S3,[19:19.000 --> 19:21.000] when then pipelines,[19:21.000 --> 19:24.000] so actually our model pipelines plug in there[19:24.000 --> 19:27.000] and spit out predictions, alerts,[19:27.000 --> 19:29.000] what we call alerts eventually.[19:29.000 --> 19:33.000] That is something that we version based on unique IDs.[19:33.000 --> 19:36.000] So processing IDs, we track pretty much everything,[19:36.000 --> 19:39.000] every line of code that touched,[19:39.000 --> 19:43.000] is related to a specific row in our data.[19:43.000 --> 19:46.000] So we can exactly track back for every single row[19:46.000 --> 19:48.000] in our predictions and in our alerts,[19:48.000 --> 19:50.000] what pipeline ran on it,[19:50.000 --> 19:52.000] which jobs were in that pipeline,[19:52.000 --> 19:56.000] which code exactly was running in each job,[19:56.000 --> 19:58.000] which intermediate results were produced.[19:58.000 --> 20:01.000] So we're basically adding lineage information[20:01.000 --> 20:03.000] to everything we output along that line,[20:03.000 --> 20:05.000] so we can track everything back[20:05.000 --> 20:09.000] using a few tools we've built.[20:09.000 --> 20:12.000] So the tool you mentioned,[20:12.000 --> 20:13.000] I'm not familiar with it.[20:13.000 --> 20:14.000] What is it called again?[20:14.000 --> 20:15.000] It's called hoodie?[20:15.000 --> 20:16.000] Hoodie.[20:16.000 --> 20:17.000] Hoodie.[20:17.000 --> 20:18.000] Oh, what is it?[20:18.000 --> 20:19.000] Maybe you can describe it.[20:19.000 --> 20:22.000] Yeah, hoodie is essentially,[20:22.000 --> 20:29.000] it's quite similar to other tools such as[20:29.000 --> 20:31.000] Databricks, how is it called?[20:31.000 --> 20:32.000] Databricks?[20:32.000 --> 20:33.000] Delta Lake maybe?[20:33.000 --> 20:34.000] Yes, exactly.[20:34.000 --> 20:35.000] Exactly.[20:35.000 --> 20:38.000] It's basically, it's equivalent to Delta Lake,[20:38.000 --> 20:40.000] just back then when we looked into[20:40.000 --> 20:42.000] what are we going to use.[20:42.000 --> 20:44.000] Delta Lake was not open sourced yet.[20:44.000 --> 20:46.000] Databricks open sourced a while ago.[20:46.000 --> 20:47.000] We went for Hoodie.[20:47.000 --> 20:50.000] It essentially, it is a layer on top of,[20:50.000 --> 20:53.000] in our case, S3 that allows you[20:53.000 --> 20:58.000] to more easily keep track of what you,[20:58.000 --> 21:03.000] of the actions you are performing on your data.[21:03.000 --> 21:08.000] So it's essentially very similar to Delta Lake,[21:08.000 --> 21:13.000] just already before an open sourced solution.[21:13.000 --> 21:15.000] Yeah, that's, I didn't know anything about that.[21:15.000 --> 21:16.000] So now I do.[21:16.000 --> 21:19.000] So thanks for letting me know.[21:19.000 --> 21:21.000] I'll have to look into that.[21:21.000 --> 21:27.000] The other, I guess, interesting stack related question is,[21:27.000 --> 21:29.000] what are your thoughts about,[21:29.000 --> 21:32.000] I think there's two areas that I think[21:32.000 --> 21:34.000] are interesting and that are emerging.[21:34.000 --> 21:36.000] Oh, actually there's, there's multiple.[21:36.000 --> 21:37.000] Maybe I'll just bring them all up.[21:37.000 --> 21:39.000] So we'll do one by one.[21:39.000 --> 21:42.000] So these are some emerging areas that I'm, that I'm seeing.[21:42.000 --> 21:49.000] So one is the concept of event driven, you know,[21:49.000 --> 21:54.000] architecture versus, versus maybe like a static architecture.[21:54.000 --> 21:57.000] And so I think obviously you're using step functions.[21:57.000 --> 22:00.000] So you're a fan of, of event driven architecture.[22:00.000 --> 22:04.000] Maybe we start, we'll start with that one is what are your,[22:04.000 --> 22:08.000] what are your thoughts on going more event driven in your organization?[22:08.000 --> 22:09.000] Yeah.[22:09.000 --> 22:13.000] In, in, in our case, essentially everything works event driven.[22:13.000 --> 22:14.000] Right.[22:14.000 --> 22:19.000] So since we on AWS, we're using event bridge or cloud watch events.[22:19.000 --> 22:21.000] I think now it's called everywhere.[22:21.000 --> 22:22.000] Right.[22:22.000 --> 22:24.000] This is how we trigger pretty much everything in our stack.[22:24.000 --> 22:27.000] This is how we trigger our data pipelines when data comes in.[22:27.000 --> 22:32.000] This is how we trigger different, different lambdas that parse our[22:32.000 --> 22:35.000] certain information from your log, store them in different databases.[22:35.000 --> 22:40.000] This is how we also, how we, at some point in the back in the past,[22:40.000 --> 22:44.000] how we also triggered new deployments when new models were approved in[22:44.000 --> 22:46.000] your model registry.[22:46.000 --> 22:50.000] So basically everything we've been doing is, is fully event driven.[22:50.000 --> 22:51.000] Yeah.[22:51.000 --> 22:56.000] So, so I think this is a key thing you bring up here is that I've,[22:56.000 --> 23:00.000] I've talked to many people who don't use AWS, who are, you know,[23:00.000 --> 23:03.000] all alternatively experts at technology.[23:03.000 --> 23:06.000] And one of the things that I've heard some people say is like, oh,[23:06.000 --> 23:13.000] well, AWS is in as fast as X or Y, like Lambda is in as fast as X or Y or,[23:13.000 --> 23:17.000] you know, Kubernetes or, but, but the point you bring up is exactly the[23:17.000 --> 23:24.000] way I think about AWS is that the true advantage of AWS platform is the,[23:24.000 --> 23:29.000] is the tight integration with the services and you can design event[23:29.000 --> 23:31.000] driven workflows.[23:31.000 --> 23:33.000] Would you say that's, that's absolutely.[23:33.000 --> 23:34.000] Yeah.[23:34.000 --> 23:35.000] Yeah.[23:35.000 --> 23:39.000] I think designing event driven workflows on AWS is incredibly easy to do.[23:39.000 --> 23:40.000] Yeah.[23:40.000 --> 23:43.000] And it also comes incredibly natural and that's extremely powerful.[23:43.000 --> 23:44.000] Right.[23:44.000 --> 23:49.000] And simply by, by having an easy way how to trigger lambdas event driven,[23:49.000 --> 23:52.000] you can pretty much, right, pretty much do everything and glue[23:52.000 --> 23:54.000] everything together that you want.[23:54.000 --> 23:56.000] I think that gives you a tremendous flexibility.[23:56.000 --> 23:57.000] Yeah.[23:57.000 --> 24:00.000] So, so I think there's two things that come to mind now.[24:00.000 --> 24:07.000] One is that, that if you are developing an ML ops platform that you[24:07.000 --> 24:09.000] can't ignore Lambda.[24:09.000 --> 24:12.000] So I, because I've had some people tell me, oh, well, we can do this and[24:12.000 --> 24:13.000] this and this better.[24:13.000 --> 24:17.000] It's like, yeah, but if you're going to be on AWS, you have to understand[24:17.000 --> 24:18.000] why people use Lambda.[24:18.000 --> 24:19.000] It isn't speed.[24:19.000 --> 24:24.000] It's, it's the ease of, ease of developing very rich solutions.[24:24.000 --> 24:25.000] Right.[24:25.000 --> 24:26.000] Absolutely.[24:26.000 --> 24:28.000] And then the glue between, between what you are building eventually.[24:28.000 --> 24:33.000] And you can even almost your, the thoughts in your mind turn into Lambda.[24:33.000 --> 24:36.000] You know, like you can be thinking and building code so quickly.[24:36.000 --> 24:37.000] Absolutely.[24:37.000 --> 24:41.000] Everything turns into which event do I need to listen to and then I trigger[24:41.000 --> 24:43.000] a Lambda and that Lambda does this and that.[24:43.000 --> 24:44.000] Yeah.[24:44.000 --> 24:48.000] And the other part about Lambda that's pretty, pretty awesome is that it[24:48.000 --> 24:52.000] hooks into services that have infinite scale.[24:52.000 --> 24:56.000] Like so SQS, like you can't break SQS.[24:56.000 --> 24:59.000] Like there's nothing you can do to ever take SQS down.[24:59.000 --> 25:02.000] It handles unlimited requests in and unlimited requests out.[25:02.000 --> 25:04.000] How many systems are like that?[25:04.000 --> 25:05.000] Yeah.[25:05.000 --> 25:06.000] Yeah, absolutely.[25:06.000 --> 25:07.000] Yeah.[25:07.000 --> 25:12.000] So then this kind of a followup would be that, that maybe data scientists[25:12.000 --> 25:17.000] should learn Lambda and step functions in order to, to get to[25:17.000 --> 25:18.000] MLOps.[25:18.000 --> 25:21.000] I think that's a yes.[25:21.000 --> 25:25.000] If you want to, if you want to put the foot into MLOps and you are on AWS,[25:25.000 --> 25:31.000] then I think there is no way around learning these fundamentals.[25:31.000 --> 25:32.000] Right.[25:32.000 --> 25:35.000] There's no way around learning things like what is a Lambda?[25:35.000 --> 25:39.000] How do I, how do I create a Lambda via Terraform or whatever tool you're[25:39.000 --> 25:40.000] using there?[25:40.000 --> 25:42.000] And how do I hook it up to an event?[25:42.000 --> 25:47.000] And how do I, how do I use the AWS SDK to interact with different[25:47.000 --> 25:48.000] services?[25:48.000 --> 25:49.000] So, right.[25:49.000 --> 25:53.000] I think if you want to take a step into MLOps from, from coming more from[25:53.000 --> 25:57.000] the data science and it's extremely important to familiarize yourself[25:57.000 --> 26:01.000] with how do you, at least the fundamentals, how do you architect[26:01.000 --> 26:03.000] basic solutions on AWS?[26:03.000 --> 26:05.000] How do you glue services together?[26:05.000 --> 26:07.000] How do you make them speak to each other?[26:07.000 --> 26:09.000] So yeah, I think that's quite fundamental.[26:09.000 --> 26:14.000] Ideally, ideally, I think that's what the platform should take away from you[26:14.000 --> 26:16.000] as a, as a pure data scientist.[26:16.000 --> 26:19.000] You don't, should not necessarily have to deal with that stuff.[26:19.000 --> 26:23.000] But if you're interested in, if you want to make that move more towards MLOps,[26:23.000 --> 26:27.000] I think learning about infrastructure and specifically in the context of AWS[26:27.000 --> 26:31.000] about the services and how to use them is really fundamental.[26:31.000 --> 26:32.000] Yeah, it's good.[26:32.000 --> 26:33.000] Because this is automation eventually.[26:33.000 --> 26:37.000] And if you want to automate, if you want to automate your complex processes,[26:37.000 --> 26:39.000] then you need to learn that stuff.[26:39.000 --> 26:41.000] How else are you going to do it?[26:41.000 --> 26:42.000] Yeah, I agree.[26:42.000 --> 26:46.000] I mean, that's really what, what, what Lambda step functions are is their[26:46.000 --> 26:47.000] automation tools.[26:47.000 --> 26:49.000] So that's probably the better way to describe it.[26:49.000 --> 26:52.000] That's a very good point you bring up.[26:52.000 --> 26:57.000] Another technology that I think is an emerging technology is the[26:57.000 --> 26:58.000] managed file system.[26:58.000 --> 27:05.000] And the reason why I think it's interesting is that, so I 20 plus years[27:05.000 --> 27:11.000] ago, I was using file systems in the university setting when I was at[27:11.000 --> 27:14.000] Caltech and then also in film, film industry.[27:14.000 --> 27:22.000] So film has been using managed file servers with parallel processing[27:22.000 --> 27:24.000] farms for a long time.[27:24.000 --> 27:27.000] I don't know how many people know this, but in the film industry,[27:27.000 --> 27:32.000] the, the, the architecture, even from like 2000 was there's a very[27:32.000 --> 27:38.000] expensive file server and then there's let's say 40,000 machines or 40,000[27:38.000 --> 27:39.000] cores.[27:39.000 --> 27:40.000] And that's, that's it.[27:40.000 --> 27:41.000] That's the architecture.[27:41.000 --> 27:46.000] And now what's interesting is I see with data science and machine learning[27:46.000 --> 27:52.000] operations that like that, that could potentially happen in the future is[27:52.000 --> 27:57.000] actually a managed NFS mount point with maybe Kubernetes or something like[27:57.000 --> 27:58.000] that.[27:58.000 --> 28:01.000] Do you see any of that on the horizon?[28:01.000 --> 28:04.000] Oh, that's a good question.[28:04.000 --> 28:08.000] I think for our, for our, what we're currently doing, that's probably a[28:08.000 --> 28:10.000] bit further away.[28:10.000 --> 28:15.000] But in principle, I could very well imagine that in our use case, not,[28:15.000 --> 28:17.000] not quite.[28:17.000 --> 28:20.000] But in principle, definitely.[28:20.000 --> 28:26.000] And then maybe a third, a third emerging thing I'm seeing is what's going[28:26.000 --> 28:29.000] on with open AI and hugging face.[28:29.000 --> 28:34.000] And that has the potential, but maybe to change the game a little bit,[28:34.000 --> 28:38.000] especially with hugging face, I think, although both of them, I mean,[28:38.000 --> 28:43.000] there is that, you know, in the case of pre trained models, here's a[28:43.000 --> 28:48.000] perfect example is that an organization may have, you know, maybe they're[28:48.000 --> 28:53.000] using AWS even for this, they're transcribing videos and they're going[28:53.000 --> 28:56.000] to do something with them, maybe they're going to detect, I don't know,[28:56.000 --> 29:02.000] like, you know, if you recorded customers in your, I'm just brainstorm,[29:02.000 --> 29:05.000] I'm not seeing your company did this, but I'm just creating a hypothetical[29:05.000 --> 29:09.000] situation that they recorded, you know, customer talking and then they,[29:09.000 --> 29:12.000] they transcribe it to text and then run some kind of a, you know,[29:12.000 --> 29:15.000] criminal detection feature or something like that.[29:15.000 --> 29:19.000] Like they could build their own models or they could download the thing[29:19.000 --> 29:23.000] that was released two days ago or a day ago from open AI that transcribes[29:23.000 --> 29:29.000] things, you know, and then, and then turn that transcribe text into[29:29.000 --> 29:34.000] hugging face, some other model that summarizes it and then you could[29:34.000 --> 29:38.000] feed that into a system. So it's, what is, what is your, what are your[29:38.000 --> 29:42.000] thoughts around some of these pre trained models and is your, are you[29:42.000 --> 29:48.000] thinking of in terms of your stack, trying to look into doing fine tuning?[29:48.000 --> 29:53.000] Yeah, so I think pre trained models and especially the way that hugging face,[29:53.000 --> 29:57.000] I think really revolutionized the space in terms of really kind of[29:57.000 --> 30:02.000] platformizing the entire business around or the entire market around[30:02.000 --> 30:07.000] pre trained models. I think that is really quite incredible and I think[30:07.000 --> 30:10.000] really for the ecosystem a changing way how to do things.[30:10.000 --> 30:16.000] And I believe that looking at the, the costs of training large models[30:16.000 --> 30:19.000] and looking at the fact that many organizations are not able to do it[30:19.000 --> 30:23.000] for, because of massive costs or because of lack of data.[30:23.000 --> 30:29.000] I think this is a, this is a clear, makes it very clear how important[30:29.000 --> 30:33.000] such platforms are, how important sharing of pre trained models actually is.[30:33.000 --> 30:37.000] I believe it's a, we are only at the, quite at the beginning actually of that.[30:37.000 --> 30:42.000] And I think we're going to see that nowadays you see it mostly when it[30:42.000 --> 30:47.000] comes to fairly generalized data format, images, potentially videos, text,[30:47.000 --> 30:52.000] speech, these things. But I believe that we're going to see more marketplace[30:52.000 --> 30:57.000] approaches when it comes to pre trained models in a lot more industries[30:57.000 --> 31:01.000] and in a lot more, in a lot more use cases where data is to some degree[31:01.000 --> 31:05.000] standardized. Also when you think about, when you think about banking,[31:05.000 --> 31:10.000] for example, right? When you think about transactions to some extent,[31:10.000 --> 31:14.000] transaction, transaction data always looks the same, kind of at least at[31:14.000 --> 31:17.000] every bank. Of course you might need to do some mapping here and there,[31:17.000 --> 31:22.000] but also there is a lot of power in it. But because simply also thinking[31:22.000 --> 31:28.000] about sharing data is always a difficult thing, especially in Europe.[31:28.000 --> 31:32.000] Sharing data between organizations is incredibly difficult legally.[31:32.000 --> 31:36.000] It's difficult. Sharing models is a different thing, right?[31:36.000 --> 31:40.000] Basically, similar to the concept of federated learning. Sharing models[31:40.000 --> 31:44.000] is significantly easier legally than actually sharing data.[31:44.000 --> 31:48.000] And then applying these models, fine tuning them and so on.[31:48.000 --> 31:52.000] Yeah, I mean, I could just imagine. I really don't know much about[31:52.000 --> 31:56.000] banking transactions, but I would imagine there could be several[31:56.000 --> 32:01.000] kinds of transactions that are very normal. And then there's some[32:01.000 --> 32:06.000] transactions, like if you're making every single second,[32:06.000 --> 32:11.000] you're transferring a lot of money. And it happens just[32:11.000 --> 32:14.000] very quickly. It's like, wait, why are you doing this? Why are you transferring money[32:14.000 --> 32:20.000] constantly? What's going on? Or the huge sum of money only[32:20.000 --> 32:24.000] involves three different points in the network. Over and over again,[32:24.000 --> 32:29.000] just these three points are constantly... And so once you've developed[32:29.000 --> 32:33.000] a model that is anomaly detection, then[32:33.000 --> 32:37.000] yeah, why would you need to develop another one? I mean, somebody already did it.[32:37.000 --> 32:41.000] Exactly. Yes, absolutely, absolutely. And that's[32:41.000 --> 32:45.000] definitely... That's encoded knowledge, encoded information in terms of the model,[32:45.000 --> 32:49.000] which is not personally... Well, abstracts away from[32:49.000 --> 32:53.000] but personally identifiable data. And that's really the power. That is something[32:53.000 --> 32:57.000] that, yeah, as I've said before, you can share significantly easier and you can[32:57.000 --> 33:03.000] apply to your use cases. The kind of related to this in[33:03.000 --> 33:09.000] terms of upcoming technologies is, I think, dealing more with graphs.[33:09.000 --> 33:13.000] And so is that something from a stackwise that your[33:13.000 --> 33:19.000] company's investigated resource can do? Yeah, so when you think about[33:19.000 --> 33:23.000] transactions, bank transactions, right? And bank customers.[33:23.000 --> 33:27.000] So in our case, again, it's a... We only have pseudonymized[33:27.000 --> 33:31.000] transaction data, so actually we cannot see anything, right? We cannot see names, we cannot see[33:31.000 --> 33:35.000] iPads or whatever. We really can't see much. But[33:35.000 --> 33:39.000] you can look at transactions moving between[33:39.000 --> 33:43.000] different entities, between different accounts. You can look at that[33:43.000 --> 33:47.000] as a network, as a graph. And that's also what we very frequently do.[33:47.000 --> 33:51.000] You have your nodes in your network, these are your accounts[33:51.000 --> 33:55.000] or your presence, even. And the actual edges between them,[33:55.000 --> 33:59.000] that's what your transactions are. So you have this[33:59.000 --> 34:03.000] massive graph, actually, that also we as TMNL, as Transaction Montenegro,[34:03.000 --> 34:07.000] are sitting on. We're actually sitting on a massive transaction graph.[34:07.000 --> 34:11.000] So yeah, absolutely. For us, doing analysis on top of[34:11.000 --> 34:15.000] that graph, building models on top of that graph is a quite important[34:15.000 --> 34:19.000] thing. And like I taught a class[34:19.000 --> 34:23.000] a few years ago at Berkeley where we had to[34:23.000 --> 34:27.000] cover graph databases a little bit. And I[34:27.000 --> 34:31.000] really didn't know that much about graph databases, although I did use one actually[34:31.000 --> 34:35.000] at one company I was at. But one of the things I learned in teaching that[34:35.000 --> 34:39.000] class was about the descriptive statistics[34:39.000 --> 34:43.000] of a graph network. And it[34:43.000 --> 34:47.000] is actually pretty interesting, because I think most of the time everyone talks about[34:47.000 --> 34:51.000] median and max min and standard deviation and everything.[34:51.000 --> 34:55.000] But then with a graph, there's things like centrality[34:55.000 --> 34:59.000] and I forget all the terms off the top of my head, but you can see[34:59.000 --> 35:03.000] if there's a node in the network that's[35:03.000 --> 35:07.000] everybody's interacting with. Absolutely. You can identify communities[35:07.000 --> 35:11.000] of people moving around a lot of money all the time. For example,[35:11.000 --> 35:15.000] you can detect different metric features eventually[35:15.000 --> 35:19.000] doing computations on your graph and then plugging in some model.[35:19.000 --> 35:23.000] Often it's feature engineering. You're computing between the centrality scores[35:23.000 --> 35:27.000] across your graph or your different entities. And then[35:27.000 --> 35:31.000] you're building your features actually. And then you're plugging in some[35:31.000 --> 35:35.000] model in the end. If you do classic machine learning, so to say[35:35.000 --> 35:39.000] if you do graph deep learning, of course that's a bit different.[35:39.000 --> 35:43.000] So basically that could for people that are analyzing[35:43.000 --> 35:47.000] essentially networks of people or networks, then[35:47.000 --> 35:51.000] basically a graph database would be step one is[35:51.000 --> 35:55.000] generate the features which could be centrality.[35:55.000 --> 35:59.000] There's a score and then you then go and train[35:59.000 --> 36:03.000] the model based on that descriptive statistic.[36:03.000 --> 36:07.000] Exactly. So one way how you could think about it is[36:07.000 --> 36:11.000] whether we need a graph database or not, that always depends on your specific use case[36:11.000 --> 36:15.000] and what database. We're actually also running[36:15.000 --> 36:19.000] that using Spark. You have graph frames, you have[36:19.000 --> 36:23.000] graph X actually. So really stuff in Spark built for[36:23.000 --> 36:27.000] doing analysis on graphs.[36:27.000 --> 36:31.000] And then what you usually do is exactly what you said. You are trying[36:31.000 --> 36:35.000] to build features based on that graph.[36:35.000 --> 36:39.000] Based on the attributes of the nodes and the attributes on the edges and so on.[36:39.000 --> 36:43.000] And so I guess in terms of graph databases right[36:43.000 --> 36:47.000] now, it sounds like maybe the three[36:47.000 --> 36:51.000] main players maybe are there's Neo4j which[36:51.000 --> 36:55.000] has been around for a long time. There's I guess Spark[36:55.000 --> 36:59.000] and then there's also, I forgot what the one is called for AWS[36:59.000 --> 37:03.000] is it? Neptune, that's Neptune.[37:03.000 --> 37:07.000] Have you played with all three of those and did you[37:07.000 --> 37:11.000] like Neptune? Neptune was something we, Spark of course we actually currently[37:11.000 --> 37:15.000] using for exactly that. Also because it allows us to do[37:15.000 --> 37:19.000] to keep our stack fairly homogeneous. We did[37:19.000 --> 37:23.000] also PUC in Neptune a while ago already[37:23.000 --> 37:27.000] and well Neptune you definitely have essentially two ways[37:27.000 --> 37:31.000] how to query Neptune either using Gremlin or SparkQL.[37:31.000 --> 37:35.000] So that means the people, your data science[37:35.000 --> 37:39.000] need to get familiar with that which then is already one bit of a hurdle[37:39.000 --> 37:43.000] because usually data scientists are not familiar with either.[37:43.000 --> 37:47.000] But also what we found with Neptune[37:47.000 --> 37:51.000] is also that it's not necessarily built for[37:51.000 --> 37:55.000] as an analytics graph database. It's not necessarily made for[37:55.000 --> 37:59.000] that. And that then become, then it's sometimes, at least[37:59.000 --> 38:03.000] for us, it has become quite complicated to handle different performance considerations[38:03.000 --> 38:07.000] when you actually do fairly complex queries across that graph.[38:07.000 --> 38:11.000] Yeah, so you're bringing up like a point which[38:11.000 --> 38:15.000] happens a lot in my experience with[38:15.000 --> 38:19.000] technology is that sometimes[38:19.000 --> 38:23.000] the purity of the solution becomes the problem[38:23.000 --> 38:27.000] where even though Spark isn't necessarily[38:27.000 --> 38:31.000] designed to be a graph database system, the fact is[38:31.000 --> 38:35.000] people in your company are already using it. So[38:35.000 --> 38:39.000] if you just turn on that feature now you can use it and it's not like[38:39.000 --> 38:43.000] this huge technical undertaking and retraining effort.[38:43.000 --> 38:47.000] So even if it's not as good, if it works, then that's probably[38:47.000 --> 38:51.000] the solution your company will use versus I agree with you like a lot of times[38:51.000 --> 38:55.000] even if a solution like Neo4j is a pretty good example of[38:55.000 --> 38:59.000] it's an interesting product but[38:59.000 --> 39:03.000] you already have all these other products like do you really want to introduce yet[39:03.000 --> 39:07.000] another product into your stack. Yeah, because eventually[39:07.000 --> 39:11.000] it all comes with an overhead of course introducing it. That is one thing[39:11.000 --> 39:15.000] it requires someone to maintain it even if it's a[39:15.000 --> 39:19.000] managed service. Somebody needs to actually own it and look after it[39:19.000 --> 39:23.000] and then as you said you need to retrain people to also use it effectively.[39:23.000 --> 39:27.000] So it comes at significant cost and that is really[39:27.000 --> 39:31.000] something that I believe should be quite critically[39:31.000 --> 39:35.000] assessed. What is really the game you have? How far can you go with[39:35.000 --> 39:39.000] your current tooling and then eventually make[39:39.000 --> 39:43.000] that decision. At least personally I'm really[39:43.000 --> 39:47.000] not a fan of thinking tooling first[39:47.000 --> 39:51.000] but personally I really believe in looking at your organization, looking at the people[39:51.000 --> 39:55.000] what skills are there, looking at how effective[39:55.000 --> 39:59.000] are these people actually performing certain activities and processes[39:59.000 --> 40:03.000] and then carefully thinking about what really makes sense[40:03.000 --> 40:07.000] because it's one thing but people need to[40:07.000 --> 40:11.000] adopt and use the tooling and eventually it should really speed them up and improve[40:11.000 --> 40:15.000] how they develop. Yeah, I think it's very[40:15.000 --> 40:19.000] that's great advice that it's hard to understand how good of advice it is[40:19.000 --> 40:23.000] because it takes experience getting burned[40:23.000 --> 40:27.000] creating new technology. I've[40:27.000 --> 40:31.000] had experiences before where[40:31.000 --> 40:35.000] one of the mistakes I've made was putting too many different technologies in an organization[40:35.000 --> 40:39.000] and the problem is once you get enough complexity[40:39.000 --> 40:43.000] it can really explode and then[40:43.000 --> 40:47.000] this is the part that really gets scary is that[40:47.000 --> 40:51.000] let's take Spark for example. How hard is it to hire somebody that knows Spark? Pretty easy[40:51.000 --> 40:55.000] how hard is it going to be to hire somebody that knows[40:55.000 --> 40:59.000] Spark and then hire another person that knows the gremlin query[40:59.000 --> 41:03.000] language for Neptune, then hire another person that knows Kubernetes[41:03.000 --> 41:07.000] then tire another, after a while if you have so many different kinds of tools[41:07.000 --> 41:11.000] you have to hire so many different kinds of people that all[41:11.000 --> 41:15.000] productivity goes to a stop. So it's the hiring as well[41:15.000 --> 41:19.000] Absolutely, I mean it's virtually impossible[41:19.000 --> 41:23.000] to find someone who is really well versed with gremlin for example[41:23.000 --> 41:27.000] it's incredibly hard and I think tech hiring is hard[41:27.000 --> 41:31.000] by itself already[41:31.000 --> 41:35.000] so you really need to think about what can I hire for as well[41:35.000 --> 41:39.000] what expertise can I realistically build up?[41:39.000 --> 41:43.000] So that's why I think AWS[41:43.000 --> 41:47.000] even with some of the limitations about the ML platform[41:47.000 --> 41:51.000] the advantages of using AWS is that[41:51.000 --> 41:55.000] you have a huge audience of people to hire from and then the same thing like[41:55.000 --> 41:59.000] Spark, there's a lot of things I don't like about Spark but a lot of people[41:59.000 --> 42:03.000] use Spark and so if you use AWS and you use Spark[42:03.000 --> 42:07.000] let's say those two which you are then you're going to have a much easier time[42:07.000 --> 42:11.000] hiring people, you're going to have a much easier time training people[42:11.000 --> 42:15.000] there's tons of documentation about it so I think a lot of people[42:15.000 --> 42:19.000] are very wise that you're thinking that way but a lot of people don't think about that[42:19.000 --> 42:23.000] they're like oh I've got to use the latest, greatest stuff and this and this and this[42:23.000 --> 42:27.000] and then their company starts to get into trouble because they can't hire[42:27.000 --> 42:31.000] people, they can't maintain systems and then productivity starts to[42:31.000 --> 42:35.000] to degrees. Also something[42:35.000 --> 42:39.000] not to ignore is the cognitive load you put on a team[42:39.000 --> 42:43.000] that needs to manage a broad range of very different[42:43.000 --> 42:47.000] tools or services. It also puts incredible[42:47.000 --> 42:51.000] cognitive load on that team and you suddenly also need an incredible breadth[42:51.000 --> 42:55.000] of expertise in that team and that means you're also going[42:55.000 --> 42:59.000] to create single points of failures if you don't really[42:59.000 --> 43:03.000] scale up your team.[43:03.000 --> 43:07.000] It's something to really, I think when you go for[43:07.000 --> 43:11.000] new tooling you should really look at it from a holistic perspective[43:11.000 --> 43:15.000] not only about this is the latest and greatest.[43:15.000 --> 43:19.000] In terms of Europe versus[43:19.000 --> 43:23.000] US, have you spent much time in the US at all?[43:23.000 --> 43:27.000] Not at all actually, flying to the US Monday but no, not at all.[43:27.000 --> 43:31.000] That also would be kind of an interesting[43:31.000 --> 43:35.000] comparison in that the culture of the United States[43:35.000 --> 43:39.000] is really this culture of[43:39.000 --> 43:43.000] I would say more like survival of the fittest or you work[43:43.000 --> 43:47.000] seven days a week and you're constantly like you don't go on vacation[43:47.000 --> 43:51.000] and you're proud of it and I think it's not[43:51.000 --> 43:55.000] a good culture. I'm not saying that's a good thing, I think it's a bad[43:55.000 --> 43:59.000] thing and that a lot of times the critique people have[43:59.000 --> 44:03.000] about Europe is like oh will people take vacation all the time and all this[44:03.000 --> 44:07.000] and as someone who has spent time in both I would say[44:07.000 --> 44:11.000] yes that's a better approach. A better approach is that people[44:11.000 --> 44:15.000] should feel relaxed because when[44:15.000 --> 44:19.000] especially the kind of work you do in MLOPs[44:19.000 --> 44:23.000] is that you need people to feel comfortable and happy[44:23.000 --> 44:27.000] and more the question[44:27.000 --> 44:31.000] what I was going to is that[44:31.000 --> 44:35.000] I wonder if there is a more productive culture[44:35.000 --> 44:39.000] for MLOPs in Europe[44:39.000 --> 44:43.000] versus the US in terms of maintaining[44:43.000 --> 44:47.000] systems and building software where the US[44:47.000 --> 44:51.000] what it's really been good at I guess is kind of coming up with new[44:51.000 --> 44:55.000] ideas and there's lots of new services that get generated but[44:55.000 --> 44:59.000] the quality and longevity[44:59.000 --> 45:03.000] is not necessarily the same where I could see[45:03.000 --> 45:07.000] in the stuff we just talked about which is if you're trying to build a team[45:07.000 --> 45:11.000] where there's low turnover[45:11.000 --> 45:15.000] you have very high quality output[45:15.000 --> 45:19.000] it seems like that maybe organizations[45:19.000 --> 45:23.000] could learn from the European approach to building[45:23.000 --> 45:27.000] and maintaining systems for MLOPs.[45:27.000 --> 45:31.000] I think there's definitely some truth in it especially when you look at the median[45:31.000 --> 45:35.000] tenure of a tech person in an organization[45:35.000 --> 45:39.000] I think that is actually still significantly lower in the US[45:39.000 --> 45:43.000] I'm not sure I think in the Bay Area somewhere around one year or two months or something like that[45:43.000 --> 45:47.000] compared to Europe I believe[45:47.000 --> 45:51.000] still fairly low. Here of course in tech people also like to switch companies more often[45:51.000 --> 45:55.000] but I would say average is still more around[45:55.000 --> 45:59.000] two years something around that staying with the same company[45:59.000 --> 46:03.000] also in tech which I think is a bit longer[46:03.000 --> 46:07.000] than you would typically have it in the US.[46:07.000 --> 46:11.000] I think from my perspective where I've also built up most of the[46:11.000 --> 46:15.000] current team I think it's[46:15.000 --> 46:19.000] super important to hire good people[46:19.000 --> 46:23.000] and people that fit to the team fit to the company culture wise[46:23.000 --> 46:27.000] but also give them[46:27.000 --> 46:31.000] let them not be in a sprint all the time[46:31.000 --> 46:35.000] it's about having a sustainable way of working in my opinion[46:35.000 --> 46:39.000] and that sustainable way means you should definitely take your vacation[46:39.000 --> 46:43.000] and I think usually in Europe we have quite generous[46:43.000 --> 46:47.000] even by law vacation I mean in Netherlands by law you get 20 days a year[46:47.000 --> 46:51.000] but most companies give you 25 many IT companies[46:51.000 --> 46:55.000] 30 per year so that's quite nice[46:55.000 --> 46:59.000] but I do take that so culture wise it's really everyone[46:59.000 --> 47:03.000] likes to take vacations whether that's sea level or whether that's an engineer on a team[47:03.000 --> 47:07.000] and that's in many companies that's also really encouraged[47:07.000 --> 47:11.000] to have a healthy work life balance[47:11.000 --> 47:15.000] and of course it's not only about vacations also but growth opportunities[47:15.000 --> 47:19.000] letting people explore develop themselves[47:19.000 --> 47:23.000] and not always pushing on max performance[47:23.000 --> 47:27.000] so really at least I always see like a partnership[47:27.000 --> 47:31.000] the organization wants to get something from an[47:31.000 --> 47:35.000] employee but the employee should also be encouraged and developed[47:35.000 --> 47:39.000] in that organization a
About AllenAllen is a cloud architect at Tyler Technologies. He helps modernize government software by creating secure, highly scalable, and fault-tolerant serverless applications.Allen publishes content regularly about serverless concepts and design on his blog - Ready, Set Cloud!Links Referenced: Ready, Set, Cloud blog: https://readysetcloud.io Tyler Technologies: https://www.tylertech.com/ Twitter: https://twitter.com/allenheltondev Linked: https://www.linkedin.com/in/allenheltondev/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at AWS AppConfig. Engineers love to solve, and occasionally create, problems. But not when it's an on-call fire-drill at 4 in the morning. Software problems should drive innovation and collaboration, NOT stress, and sleeplessness, and threats of violence. That's why so many developers are realizing the value of AWS AppConfig Feature Flags. Feature Flags let developers push code to production, but hide that that feature from customers so that the developers can release their feature when it's ready. This practice allows for safe, fast, and convenient software development. You can seamlessly incorporate AppConfig Feature Flags into your AWS or cloud environment and ship your Features with excitement, not trepidation and fear. To get started, go to snark.cloud/appconfig. That's snark.cloud/appconfig.Corey: I come bearing ill tidings. Developers are responsible for more than ever these days. Not just the code that they write, but also the containers and the cloud infrastructure that their apps run on. Because serverless means it's still somebody's problem. And a big part of that responsibility is app security from code to cloud. And that's where our friend Snyk comes in. Snyk is a frictionless security platform that meets developers where they are - Finding and fixing vulnerabilities right from the CLI, IDEs, Repos, and Pipelines. Snyk integrates seamlessly with AWS offerings like code pipeline, EKS, ECR, and more! As well as things you're actually likely to be using. Deploy on AWS, secure with Snyk. Learn more at Snyk.co/scream That's S-N-Y-K.co/screamCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Every once in a while I wind up stumbling into corners of the internet that I previously had not traveled. Somewhat recently, I wound up having that delightful experience again by discovering readysetcloud.io, which has a whole series of, I guess some people might call it thought leadership, I'm going to call it instead how I view it, which is just amazing opinion pieces on the context of serverless, mixed with APIs, mixed with some prognostications about the future.Allen Helton by day is a cloud architect at Tyler Technologies, but that's not how I encountered you. First off, Allen, thank you for joining me.Allen: Thank you, Corey. Happy to be here.Corey: I was originally pointed towards your work by folks in the AWS Community Builder program, of which we both participate from time to time, and it's one of those, “Oh, wow, this is amazing. I really wish I'd discovered some of this sooner.” And every time I look through your back catalog, and I click on a new post, I see things that are either I've really agree with this or I can't stand this opinion, I want to fight about it, but more often than not, it's one of those recurring moments that I love: “Damn, I wish I had written something like this.” So first, you're absolutely killing it on the content front.Allen: Thank you, Corey, I appreciate that. The content that I make is really about the stuff that I'm doing at work. It's stuff that I'm passionate about, stuff that I'd spend a decent amount of time on, and really the most important thing about it for me, is it's stuff that I'm learning and forming opinions on and wants to share with others.Corey: I have to say, when I saw that you were—oh, your Tyler Technologies, which sounds for all the world like, oh, it's a relatively small consultancy run by some guy presumably named Tyler, and you know, it's a petite team of maybe 20, 30 people on the outside. Yeah, then I realized, wait a minute, that's not entirely true. For example, for starters, you're publicly traded. And okay, that does change things a little bit. First off, who are you people? Secondly, what do you do? And third, why have I never heard of you folks, until now?Allen: Tyler is the largest company that focuses completely on the public sector. We have divisions and products for pretty much everything that you can imagine that's in the public sector. We have software for schools, software for tax and appraisal, we have software for police officers, for courts, everything you can think of that runs the government can and a lot of times is run on Tyler software. We've been around for decades building our expertise in the domain, and the reason you probably haven't heard about us is because you might not have ever been in trouble with the law before. If you [laugh] if you have been—Corey: No, no, I learned very early on in the course of my life—which will come as a surprise to absolutely no one who spent more than 30 seconds with me—that I have remarkably little filter and if ten kids were the ones doing something wrong, I'm the one that gets caught. So, I spent a lot of time in the principal's office, so this taught me to keep my nose clean. I'm one of those squeaky-clean types, just because I was always terrified of getting punished because I knew I would get caught. I'm not saying this is the right way to go through life necessarily, but it did have the side benefit of, no, I don't really engage with law enforcement going throughout the course of my life.Allen: That's good. That's good. But one exposure that a lot of people get to Tyler is if you look at the bottom of your next traffic ticket, it'll probably say Tyler Technologies on the bottom there.Corey: Oh, so you're really popular in certain circles, I'd imagine?Allen: Super popular. Yes, yes. And of course, you get all the benefits of writing that code that says ‘if defendant equals Allen Helton then return.'Corey: I like that. You get to have the exception cases built in that no one's ever going to wind up looking into.Allen: That's right. Yes.Corey: The idea of what you're doing makes an awful lot of sense. There's a tremendous need for a wide variety of technical assistance in the public sector. What surprises me, although I guess it probably shouldn't, is how much of your content is aimed at serverless technologies and API design, which to my way of thinking, isn't really something that public sector has done a lot with. Clearly I'm wrong.Allen: Historically, you're not wrong. There's an old saying that government tends to run about ten years behind on technology. Not just technology, but all over the board and runs about ten years behind. And until recently, that's really been true. There was a case last year, a situation last year where one of the state governments—I don't remember which one it was—but they were having a crisis because they couldn't find any COBOL developers to come in and maintain their software that runs the state.And it's COBOL; you're not going to find a whole lot of people that have that skill. A lot of those people are retiring out. And what's happening is that we're getting new people sitting in positions of power and government that want innovation. They know about the cloud and they want to be able to integrate with systems quickly and easily, have little to no onboarding time. You know, there are people in power that have grown up with technology and understand that, well, with everything else, I can be up and running in five or ten minutes. I cannot do this with the software I'm consuming now.Corey: My opinion on it is admittedly conflicted because on the one hand, yeah, I don't think that governments should be running on COBOL software that runs on mainframes that haven't been supported in 25 years. Conversely, I also don't necessarily want them being run like a seed series startup, where, “Well, I wrote this code last night, and it's awesome, so off I go to production with it.” Because I can decide not to do business anymore with Twitter for Pets, and I could go on to something else, like PetFlicks, or whatever it is I choose to use. I can't easily opt out of my government. The decisions that they make stick and that is going to have a meaningful impact on my life and everyone else's life who is subject to their jurisdiction. So, I guess I don't really know where I believe the proper, I guess, pace of technological adoption should be for governments. Curious to get your thoughts on this.Allen: Well, you certainly don't want anything that's bleeding edge. That's one of the things that we kind of draw fine lines around. Because when we're dealing with government software, we're dealing with, usually, critically sensitive information. It's not medical records, but it's your criminal record, and it's things like your social security number, it's things that you can't have leaking out under any circumstances. So, the things that we're building on are things that have proven out to be secure and have best practices around security, uptime, reliability, and in a lot of cases as well, and maintainability. You know, if there are issues, then let's try to get those turned around as quickly as we can because we don't want to have any sort of downtime from the software side versus the software vendor side.Corey: I want to pivot a little bit to some of the content you've put out because an awful lot of it seems to be, I think I'll call it variations on a theme. For example, I just read some recent titles, and to illustrate my point, “Going API First: Your First 30 Days,” “Solutions Architect Tips how to Design Applications for Growth,” “3 Things to Know Before Building A Multi-Tenant Serverless App.” And the common thread that I see running through all of these things are these are things that you tend to have extraordinarily strong and vocal opinions about only after dismissing all of them the first time and slapping something together, and then sort of being forced to live with the consequences of the choices that you've made, in some cases you didn't realize you were making at the time. Are you one of those folks that has the wisdom to see what's coming down the road, or did you do what the rest of us do and basically learn all this stuff by getting it hilariously wrong and having to careen into rebound situations as a result?Allen: [laugh]. I love that question. I would like to say now, I feel like I have the vision to see something like that coming. Historically, no, not at all. Let me talk a little bit about how I got to where I am because that will shed a lot of context on that question.A few years ago, I was put into a position at Tyler that said, “Hey, go figure out this cloud thing.” Let's figure out what we need to do to move into the cloud safely, securely, quickly, all that rigmarole. And so, I did. I got to hand-select team of engineers from people that I worked with at Tyler over the past few years, and we were basically given free rein to learn. We were an R&D team, a hundred percent R&D, for about a year's worth of time, where we were learning about cloud concepts and theory and building little proof of concepts.CI/CD, serverless, APIs, multi-tenancy, a whole bunch of different stuff. NoSQL was another one of the things that we had to learn. And after that year of R&D, we were told, “Okay, now go do something with that. Go build this application.” And we did, building on our theory our cursory theory knowledge. And we get pretty close to go live, and then the business says, “What do you do in this scenario? What do you do in that scenario? What do you do here?”Corey: “I update my resume and go work somewhere else. Where's the hard part here?”Allen: [laugh].Corey: Turns out, that's not a convincing answer.Allen: Right. So, we moved quickly. And then I wouldn't say we backpedaled, but we hardened for a long time before the—prior to the go-live, with the lessons that we've learned with the eyes of Tyler, the mature enterprise company, saying, “These are the things that you have to make sure that you take into consideration in an actual production application.” One of the things that I always pushed—I was a manager for a few years of all these cloud teams—I always push do it; do it right; do it better. Right?It's kind of like crawl, walk, run. And if you follow my writing from the beginning, just looking at the titles and reading them, kind of like what you were doing, Corey, you'll see that very much. You'll see how I talk about CI/CD, you'll see me how I talk about authorization, you'll see me how I talk about multi-tenancy. And I kind of go in waves where maybe a year passes and you see my content revisit some of the topics that I've done in the past. And they're like, “No, no, no, don't do what I said before. It's not right.”Corey: The problem when I'm writing all of these things that I do, for example, my entire newsletter publication pipeline is built on a giant morass of Lambda functions and API Gateways. It's microservices-driven—kind of—and each microservice is built, almost always, with a different framework. Lately, all the new stuff is CDK. I started off with the serverless framework. There are a few other things here and there.And it's like going architecting, back in time as I have to make updates to these things from time to time. And it's the problem with having done all that myself is that I already know the answer to, “What fool designed this?” It's, well, you're basically watching me learn what I was, doing bit by bit. I'm starting to believe that the right answer on some level, is to build an inherent shelf-life into some of these things. Great, in five years, you're going to come back and re-architect it now that you know how this stuff actually works rather than patching together 15 blog posts by different authors, not all of whom are talking about the same thing and hoping for the best.Allen: Yep. That's one of the things that I really like about serverless, I view that as a giant pro of doing Serverless is that when we revisit with the lessons learned, we don't have to refactor everything at once like if it was just a big, you know, MVC controller out there in the sky. We can refactor one Lambda function at a time if now we're using a new version of the AWS SDK, or we've learned about a new best practice that needs to go in place. It's a, “While you're in there, tidy up, please,” kind of deal.Corey: I know that the DynamoDB fanatics will absolutely murder me over this one, but one of the reasons that I have multiple Dynamo tables that contain, effectively, variations on the exact same data, is because I want to have the dependency between the two different microservices be the API, not, “Oh, and under the hood, it's expecting this exact same data structure all the time.” But it just felt like that was the wrong direction to go in. That is the justification I use for myself why I run multiple DynamoDB tables that [laugh] have the same content. Where do you fall on the idea of data store separation?Allen: I'm a big single table design person myself, I really like the idea of being able to store everything in the same table and being able to create queries that can return me multiple different types of entity with one lookup. Now, that being said, one of the issues that we ran into, or one of the ambiguous areas when we were getting started with serverless was, what does single table design mean when you're talking about microservices? We were wondering does single table mean one DynamoDB table for an entire application that's composed of 15 microservices? Or is it one table per microservice? And that was ultimately what we ended up going with is a table per microservice. Even if multiple microservices are pushed into the same AWS account, we're still building that logical construct of a microservice and one table that houses similar entities in the same domain.Corey: So, something I wish that every service team at AWS would do as a part of their design is draw the architecture of an application that you're planning to build. Great, now assume that every single resource on that architecture diagram lives in its own distinct AWS account because somewhere in some customer, there's going to be an account boundary at every interconnection point along the way. And so, many services don't do that where it's, “Oh, that thing and the other thing has to be in the same account.” So, people have to write their own integration shims, and it makes doing the right thing of putting different services into distinct bounded AWS accounts for security or compliance reasons way harder than I feel like it needs to be.Allen: [laugh]. Totally agree with you on that one. That's one of the things that I feel like I'm still learning about is the account-level isolation. I'm still kind of early on, personally, with my opinions in how we're structuring things right now, but I'm very much of a like opinion that deploying multiple things into the same account is going to make it too easy to do something that you shouldn't. And I just try not to inherently trust people, in the sense that, “Oh, this is easy. I'm just going to cross that boundary real quick.”Corey: For me, it's also come down to security risk exposure. Like my lasttweetinaws.com Twitter shitposting thread client lives in a distinct AWS account that is separate from the AWS account that has all of our client billing data that lives within it. The idea being that if you find a way to compromise my public-facing Twitter client, great, the blast radius should be constrained to, “Yay, now you can, I don't know, spin up some cryptocurrency mining in my AWS account and I get to look like a fool when I beg AWS for forgiveness.”But that should be the end of it. It shouldn't be a security incident because I should not have the credit card numbers living right next to the funny internet web thing. That sort of flies in the face of the original guidance that AWS gave at launch. And right around 2008-era, best practices were one customer, one AWS account. And then by 2012, they had changed their perspective, but once you've made a decision to build multiple services in a single account, unwinding and unpacking that becomes an incredibly burdensome thing. It's about the equivalent of doing a cloud migration, in some ways.Allen: We went through that. We started off building one application with the intent that it was going to be a siloed application, a one-off, essentially. And about a year into it, it's one of those moments of, “Oh, no. What we're building is not actually a one-off. It's a piece to a much larger puzzle.”And we had a whole bunch of—unfortunately—tightly coupled things that were in there that we're assuming that resources were going to be in the same AWS account. So, we ended up—how long—I think we took probably two months, which in the grand scheme of things isn't that long, but two months, kind of unwinding the pieces and decoupling what was possible at the time into multiple AWS accounts, kind of, segmented by domain, essentially. But that's hard. AWS puts it, you know, it's those one-way door decisions. I think this one was a two-way door, but it locked and you could kind of jimmy the lock on the way back out.Corey: And you could buzz someone from the lobby to let you back in. Yeah, the biggest problem is not necessarily the one-way door decisions. It's the one-way door decisions that you don't realize you're passing through at the time that you do them. Which, of course, brings us to a topic near and dear to your heart—and I only recently started have opinions on this myself—and that is the proper design of APIs, which I'm sure will incense absolutely no one who's listening to this. Like, my opinions on APIs start with well, probably REST is the right answer in this day and age. I had people, like, “Well, I don't know, GraphQL is pretty awesome.” Like, “Oh, I'm thinking SOAP,” and people look at me like I'm a monster from the Black Lagoon of centuries past in XML-land. So, my particular brand of strangeness side, what do you see that people are doing in the world of API design that is the, I guess, most common or easy to make mistakes that you really wish they would stop doing?Allen: If I could boil it down to one word, fundamentalism. Let me unpack that for you.Corey: Oh, please, absolutely want to get a definition on that one.Allen: [laugh]. I approach API design from a developer experience point of view: how easy is it for both internal and external integrators to consume and satisfy the business processes that they want to accomplish? And a lot of times, REST guidelines, you know, it's all about entity basis, you know, drill into the appropriate entities and name your endpoints with nouns, not verbs. I'm actually very much onto that one.But something that you could easily do, let's say you have a business process that given a fundamentally correct RESTful API design takes ten API calls to satisfy. You could, in theory, boil that down to maybe three well-designed endpoints that aren't, quote-unquote, “RESTful,” that make that developer experience significantly easier. And if you were a fundamentalist, that option is not even on the table, but thinking about it pragmatically from a developer experience point of view, that might be the better call. So, that's one of the things that, I know feels like a hot take. Every time I say it, I get a little bit of flack for it, but don't be a fundamentalist when it comes to your API designs. Do something that makes it easier while staying in the guidelines to do what you want.Corey: For me the problem that I've kept smacking into with API design, and it honestly—let me be very clear on this—my first real exposure to API design rather than API consumer—which of course, I complain about constantly, especially in the context of the AWS inconsistent APIs between services—was when I'm building something out, and I'm reading the documentation for API Gateway, and oh, this is how you wind up having this stage linked to this thing, and here's the endpoint. And okay, great, so I would just populate—build out a structure or a schema that has the positional parameters I want to use as variables in my function. And that's awesome. And then I realized, “Oh, I might want to call this a different way. Aw, crap.” And sometimes it's easy; you just add a different endpoint. Other times, I have to significantly rethink things. And I can't shake the feeling that this is an entire discipline that exists that I just haven't had a whole lot of exposure to previously.Allen: Yeah, I believe that. One of the things that you could tie a metaphor to for what I'm saying and kind of what you're saying, is AWS SAM, the Serverless Application Model, all it does is basically macros CloudFormation resources. It's just a transform from a template into CloudFormation. CDK does same thing. But what the developers of SAM have done is they've recognized these business processes that people do regularly, and they've made these incredibly easy ways to satisfy those business processes and tie them all together, right?If I want to have a Lambda function that is backed behind a endpoint, an API endpoint, I just have to add four or five lines of YAML or JSON that says, “This is the event trigger, here's the route, here's the API.” And then it goes and does four, five, six different things. Now, there's some engineers that don't like that because sometimes that feels like magic. Sometimes a little bit magic is okay.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig secures your cloud from source to run. They believe, as do I, that DevOps and security are inextricably linked. If you wanna learn more about how they view this, check out their blog, it's definitely worth the read. To learn more about how they are absolutely getting it right from where I sit, visit Sysdig.com and tell them that I sent you. That's S Y S D I G.com. And my thanks to them for their continued support of this ridiculous nonsense.Corey: I feel like one of the benefits I've had with the vast majority of APIs that I've built is that because this is all relatively small-scale stuff for what amounts to basically shitposting for the sake of entertainment, I'm really the only consumer of an awful lot of these things. So, I get frustrated when I have to backtrack and make changes and teach other microservices to talk to this thing that has now changed. And it's frustrating, but I have the capacity to do that. It's just work for a period of time. I feel like that equation completely shifts when you have published this and it is now out in the world, and it's not just users, but in many cases paying customers where you can't really make those changes without significant notice, and every time you do you're creating work for those customers, so you have to be a lot more judicious about it.Allen: Oh, yeah. There is a whole lot of governance and practice that goes into production-level APIs that people integrate with. You know, they say once you push something out the door into production that you're going to support it forever. I don't disagree with that. That seems like something that a lot of people don't understand.And that's one of the reasons why I push API-first development so hard in all the content that I write is because you need to be intentional about what you're letting out the door. You need to go in and work, not just with the developers, but your product people and your analysts to say, what does this absolutely need to do, and what does it need to do in the future? And you take those things, and you work with analysts who want specifics, you work with the engineers to actually build it out. And you're very intentional about what goes out the door that first time because once it goes out with a mistake, you're either going to version it immediately or you're going to make some people very unhappy when you make a breaking change to something that they immediately started consuming.Corey: It absolutely feels like that's one of those things that AWS gets astonishingly right. I mean, I had the privilege of interviewing, at the time, Jeff Barr and then Ariel Kelman, who was their head of marketing, to basically debunk a bunch of old myths. And one thing that they started talking about extensively was the idea that an API is fundamentally a promise to your customers. And when you make a promise, you'd better damn well intend on keeping it. It's why API deprecations from AWS are effectively unique whenever something happens.It's the, this is a singular moment in time when they turn off a service or degrade old functionality in favor of new. They can add to it, they can launch a V2 of something and then start to wean people off by calling the old one classic or whatnot, but if I built something on AWS in 2008 and I wound up sleeping until today, and go and try and do the exact same thing and deploy it now, it will almost certainly work exactly as it did back then. Sure, reliability is going to be a lot better and there's a crap ton of features and whatnot that I'm not taking advantage of, but that fundamental ability to do that is awesome. Conversely, it feels like Google Cloud likes to change around a lot of their API stories almost constantly. And it's unplanned work that frustrates the heck out of me when I'm trying to build something stable and lasting on top of it.Allen: I think it goes to show the maturity of these companies as API companies versus just vendors. It's one of the things that I think AWS does [laugh]—Corey: You see the similar dichotomy with Microsoft and Apple. Microsoft's new versions of Windows generally still have functionalities in them to support stuff that was written in the '90s for a few use cases, whereas Apple's like, “Oh, your computer's more than 18-months old? Have you tried throwing it away and buying a new one? And oh, it's a new version of Mac OS, so yeah, maybe the last one would get security updates for a year and then get with the times.” And I can't shake the feeling that the correct answer is in some way, both of those, depending upon who your customer is and what it is you're trying to achieve.If Microsoft adopted the Apple approach, their customers would mutiny, and rightfully so; the expectation has been set for decades that isn't what happens. Conversely, if Apple decided now we're going to support this version of Mac OS in perpetuity, I don't think a lot of their application developers wouldn't quite know what to make of that.Allen: Yeah. I think it also comes from a standpoint of you better make it worth their while if you're going to move their cheese. I'm not a Mac user myself, but from what I hear for Mac users—and this could be rose-colored glasses—but is that their stuff works phenomenally well. You know, when a new thing comes out—Corey: Until it doesn't, absolutely. It's—whenever I say things like that on this show, I get letters. And it's, “Oh, yeah, really? They'll come up with something that is a colossal pain in the ass on Mac.” Like, yeah, “Try building a system-wide mute key.”It's yeah, that's just a hotkey away on windows and here in Mac land. It's, “But it makes such beautiful sounds. Why would you want them to be quiet?” And it's, yeah, it becomes this back-and-forth dichotomy there. And you can even explain it to iPhones as well and the Android ecosystem where it's, oh, you're going to support the last couple of versions of iOS.Well, as a developer, I don't want to do that. And Apple's position is, “Okay, great.” Almost half of the mobile users on the planet will be upgrading because they're in the ecosystem. Do you want us to be able to sell things those people are not? And they're at a point of scale where they get to dictate those terms.On some level, there are benefits to it and others, it is intensely frustrating. I don't know what the right answer is on the level of permanence on that level of platform. I only have slightly better ideas around the position of APIs. I will say that when AWS deprecates something, they reach out individually to affected customers, on some level, and invariably, when they say, “This is going to be deprecated as of August 31,” or whenever it is, yeah, it is going to slip at least twice in almost every case, just because they're not going to turn off a service that is revenue-bearing or critical-load-bearing for customers without massive amounts of notice and outreach, and in some cases according to rumor, having engineers reach out to help restructure things so it's not as big of a burden on customers. That's a level of customer focus that I don't think most other companies are capable of matching.Allen: I think that comes with the size and the history of Amazon. And one of the things that they're doing right now, we've used Amazon Cloud Cams for years, in my house. We use them as baby monitors. And they—Corey: Yea, I saw this I did something very similar with Nest. They didn't have the Cloud Cam at the right time that I was looking at it. And they just announced that they're going to be deprecating. They're withdrawing them for sale. They're not going to support them anymore. Which, oh at Amazon—we're not offering this anymore. But you tell the story; what are they offering existing customers?Allen: Yeah, so slightly upset about it because I like my Cloud Cams and I don't want to have to take them off the wall or wherever they are to replace them with something else. But what they're doing is, you know, they gave me—or they gave all the customers about eight months head start. I think they're going to be taking them offline around Thanksgiving this year, just mid-November. And what they said is as compensation for you, we're going to send you a Blink Cam—a Blink Mini—for every Cloud Cam that you have in use, and then we are going to gift you a year subscription to the Pro for Blink.Corey: That's very reasonable for things that were bought years ago. Meanwhile, I feel like not to be unkind or uncharitable here, but I use Nest Cams. And that's a Google product. I half expected if they ever get deprecated, I'll find out because Google just turns it off in the middle of the night—Allen: [laugh].Corey: —and I wake up and have to read a blog post somewhere that they put an update on Nest Cams, the same way they killed Google Reader once upon a time. That's slightly unfair, but the fact that joke even lands does say a lot about Google's reputation in this space.Allen: For sure.Corey: One last topic I want to talk with you about before we call it a show is that at the time of this recording, you recently had a blog post titled, “What does the Future Hold for Serverless?” Summarize that for me. Where do you see this serverless movement—if you'll forgive the term—going?Allen: So, I'm going to start at the end. I'm going to work back a little bit on what needs to happen for us to get there. I have a feeling that in the future—I'm going to be vague about how far in the future this is—that we'll finally have a satisfied promise of all you're going to write in the future is business logic. And what does that mean? I think what can end up happening, given the right focus, the right companies, the right feedback, at the right time, is we can write code as developers and have that get pushed up into the cloud.And a phrase that I know Jeremy Daly likes to say ‘infrastructure from code,' where it provisions resources in the cloud for you based on your use case. I've developed an application and it gets pushed up in the cloud at the time of deploying it, optimized resource allocation. Over time, what will happen—with my future vision—is when you get production traffic going through, maybe it's spiky, maybe it's consistently at a scale that outperforms the resources that it originally provisioned. We can have monitoring tools that analyze that and pick that out, find the anomalies, find the standard patterns, and adjust that infrastructure that it deployed for you automatically, where it's based on your production traffic for what it created, optimizes it for you. Which is something that you can't do on an initial deployment right now. You can put what looks best on paper, but once you actually get traffic through your application, you realize that, you know, what was on paper might not be correct.Corey: You ever noticed that whiteboard diagrams never show the reality, and they're always aspirational, and they miss certain parts? And I used to think that this was the symptom I had from working at small, scrappy companies because you know what, those big tech companies, everything they build is amazing and awesome. I know it because I've seen their conference talks. But I've been a consultant long enough now, and for a number of those companies, to realize that nope, everyone's infrastructure is basically a trash fire at any given point in time. And it works almost in spite of itself, rather than because of it.There is no golden path where everything is shiny, new and beautiful. And that, honestly, I got to say, it was really [laugh] depressing when I first discovered it. Like, oh, God, even these really smart people who are so intelligent they have to have extra brain packs bolted to their chests don't have the magic answer to all of this. The rest of us are just screwed, then. But we find ways to make it work.Allen: Yep. There's a quote, I wish I remembered who said it, but it was a military quote where, “No battle plan survives impact with the enemy—first contact with the enemy.” It's kind of that way with infrastructure diagrams. We can draw it out however we want and then you turn it on in production. It's like, “Oh, no. That's not right.”Corey: I want to mix the metaphors there and say, yeah, no architecture survives your first fight with a customer. Like, “Great, I don't think that's quite what they're trying to say.” It's like, “What, you don't attack your customers? Pfft, what's your customer service line look like?” Yeah, it's… I think you're onto something.I think that inherently everything beyond the V1 design of almost anything is an emergent property where this is what we learned about it by running it and putting traffic through it and finding these problems, and here's how it wound up evolving to account for that.Allen: I agree. I don't have anything to add on that.Corey: [laugh]. Fair enough. I really want to thank you for taking so much time out of your day to talk about how you view these things. If people want to learn more, where is the best place to find you?Allen: Twitter is probably the best place to find me: @AllenHeltonDev. I have that username on all the major social platforms, so if you want to find me on LinkedIn, same thing: AllenHeltonDev. My blog is always open as well, if you have any feedback you'd like to give there: readysetcloud.io.Corey: And we will, of course, put links to that in the show notes. Thanks again for spending so much time talking to me. I really appreciate it.Allen: Yeah, this was fun. This was a lot of fun. I love talking shop.Corey: It shows. And it's nice to talk about things I don't spend enough time thinking about. Allen Helton, cloud architect at Tyler Technologies. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that I will reject because it was not written in valid XML.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About RafalRafal is Serverless Engineer at Stedi by day, and Dynobase founder by night - a modern DynamoDB UI client. When he is not coding or answering support tickets, he loves climbing and tasting whiskey (not simultaneously).Links Referenced:Company Website: https://dynobase.dev TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and its spelled R-E-V-E-L-O. It means “I reveal.” Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Revelo has recognized is something I've been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to, basically, those of us without a presence in Latin America via their platform. It's the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes—but isn't limited to—talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability, as well as you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone overlap with what we have here in the United States, so you can hire full-time remote engineers who share most of the workday as your team. It's an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles all of it. If you're hiring engineers, check out revelo.io/screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming.Corey: The company 0x4447 builds products to increase standardization and security in AWS organizations. They do this with automated pipelines that use well-structured projects to create secure, easy-to-maintain and fail-tolerant solutions, one of which is their VPN product built on top of the popular OpenVPN project which has no license restrictions; you are only limited by the network card in the instance. To learn more visit: snark.cloud/deployandgoCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. It's not too often that I wind up building an episode here out of a desktop application. I've done it once or twice, and I'm sure that the folks at Microsoft Excel are continually hoping for an invite to talk about things. But we're going in a bit of a different direction today. Rafal Wilinski is a serverless engineer at Stedi and, in apparently what is the job requirement at Stedi, he also has a side project that manifests itself as a desktop app. Rafal, thank you for joining me today. I appreciate it.Rafal: Yeah. Hi, everyone. Thanks for having me, Corey.Corey: I first heard about you when you launched Dynobase, which is awesome. It sounds evocative of dinosaurs unless you read it, then it's D-Y-N-O, and it's, “Ah, this sounds a lot like DynamoDB. Let me see what it is.” And sure enough, it was. As much as I love misusing things as databases, DynamoDB is actually a database that is decent and good at what it does.And please correct me if I get any of this wrong, but Dynobase is effectively an Electron app that you install, at least on a Mac, in my case; I don't generally use other desktops, that's other people's problems. And it provides a user-friendly interface to DynamoDB that is not actively hostile to the customer.Rafal: Yeah, exactly. That was the goal. That's how I envisioned it, and I hope I executed correctly.Corey: It was almost prescient in some ways because they recently redid the DynamoDB console in AWS to actively make it worse, to wind up working with individual items, to modify things. It feels like they are validating your market for you by, “Oh, we really like Dynobase. How do we drive more traffic to it? We're going to make this thing worse.” But back then when you first created this, the console was his previous version. What was it that inspired you to say, “You know what I'm going to build? A desktop application for a cloud service.” Because on the surface, it seems relatively close to psychotic, but it's brilliant.Rafal: [laugh]. Yeah, sure. So, a few years ago, I was freelancing on AWS. I was jumping between clients and my side projects. That also involved jumping between regions, and AWS doesn't have a good out-of-the-box solution for switching your accounts and switching your regions, so when you want it to work on your client table in Australia and simultaneously on my side project in Europe, there was no other solution than to have two browser windows open or to, even, browsers open.And it was super frustrating. So, I was like, hey, “DynamoDB has SDK. Electron is this thing that allows you to make a desktop application using HTML and JS and some CSS, so maybe I can do something with it.” And I was so naive to think that it's going to be a trivial task because it's going to be—come on, it's like, a couple of SDK calls, displaying some lists and tables, and that's pretty much it, right?Corey: Right. I use Retool as my system to build my newsletter every week, and that is the front-end I use to interact with DynamoDB. And it's great. It has a table component that just—I run a query that, believe it or not, is a query, not a scan—I know, imagine that, I did something slightly right this one time—and it populates things for the current issue into it, and then I basically built a CRUD API around it and have components that let me update, delete, remove, the usual stuff. And it's great, it works for my purposes, and it's fine.And that's what I use most of the time until I, you know, hit an edge case or a corner case—because it turns out, surprise everyone, I'm bad at programming—and I need to go in and tweak the table myself manually. And that's where Dynobase, at least for my use case, really comes into its own.Rafal: Good to hear. Good to hear. Yeah, that was exactly same case why I built it because yeah, I was also, a few years ago, I started working on some project which was really crazy. It was before AppSync times. We wanted to have GraphQL serverless API using single table design and testing principles [unintelligible 00:04:38] there.So, we've been verifying many things by just looking at the contents of the table, and sometimes fixing them manually. So, that was also the thing that motivated me to make the editing experience a little bit better.Corey: One thing I appreciate about the application is that it does things right. I mean, there's no real other way to frame that. When I fire up the application myself and I go to the account that I've been using it with—because in this case, there's really only one account that I have that contains the data that I spent that my time working with—and I get access to it on my machine via Granted, which because it's a federated SSO login. And it says, “Ah, this is an SSL account. Click here to open the browser tab and do the thing.”I didn't have to configure Dynobase. It is automatically reading my AWS config file in my user directory. It does a lot of things right. There's no duplication of work. From my perspective. It doesn't freak out because it doesn't know how SSO works. It doesn't have run into these obnoxious edge case problems that so many early generation desktop interfaces for AWS things seem to.Rafal: Wow, it seems like it works for you even better than for me. [laugh].Corey: Oh, well again, how I get into accounts has always been a little weird. I've ranted before about Granted, which is something that Common Fate puts out. It is a binary utility that winds up logging into different federated SSO accounts, opens them in Firefox containers so you could have you know, two accounts open, side-by-side. It's some nice affordances like that. But it still uses the standard AWS profile syntax which Dynobase does as well.There are a bunch of different ways I've logged into things, and I've never experienced friction [unintelligible 00:06:23] using Dynobase for this. To be clear, you haven't paid me a dime. In fact, just the opposite. I wind up paying my monthly Dynobase subscription with a smile on my face. It is worth every penny, just because on those rare moments when I have to work with something odd in DynamoDB, it's great having the tool.I want to be very clear here. I don't recall what the current cost on this is, but I know for a fact it is more than I spend every month on DynamoDB itself, which is fine. You pay for utility, not for the actual raw cost of the underlying resources on it. Some people tend to have issues with that and I think it's the wrong direction to go in.Rafal: Yeah, exactly. So, my logic was that it's a productivity improvement. And a lot of programmers are simply obsessed with productivity, right? We tend to write those obnoxious nasty Bash and Python scripts to automate boring tasks in our day jobs. So, if you can eliminate this chore of logging to different AWS accounts and trying to find them, and even if it takes, like, five or ten seconds, if I can shave that five or ten seconds every time you try to do something, that over time accumulates into a big number and it's a huge time investment. So, even if you save, like, I don't know, maybe one hour a month or one hour a quarter, I think it's still a fair price.Corey: Your pricing is very interesting, and the reason I say that is you do not have a free tier as such, you have a free seven-day trial, which is great. That is the way to do it. You can sign up with no credit card, grab the thing, and it's awesome. Dynobase.dev for folks who are wondering.And you have a solo yearly plan, which is what I'm on, which is $9 a month. Which means that you end up, I think, charging me $108 a year billed annually. You have a solo lifetime option for 200 bucks—and I'm going to fight with you about that one in a second; we're going to come back to it—then you have a team plan that is for I think for ten licenses at 79 bucks a month, and for 20 licenses it's 150 bucks a month. Great. And then you have an enterprise option for 250 a month, the end. Billed annually. And I have problems with that, too.So, I like arguing with pricing, I [unintelligible 00:08:43] about pricing with people just because I find that is one of those underappreciated aspects of things. Let's start with my own decisions on this, if I may. The reason that I go for the solo yearly plan instead of a lifetime subscription of I buy this and I get to use it forever in perpetuity. I like the tool but, like, the AWS service that underlies it, it's going to have to evolve in the fullness of time. It is going to have to continue to support new DynamoDB functionality, like the fact that they have infrequent access storage classes now, for tables, as an example. I'm sure they're coming up with other things as well, like, I don't know, maybe a sane query syntax someday. That might be nice if they ever built one of those.Some people don't like the idea of a subscription software. I do just because I like the fact that it is a continual source of revenue. It's not the, “Well, five years ago, you paid me that one-off thing and now you expect feature enhancements for the rest of time.” How do you think about that?Rafal: So, there are a couple of things here. First thing is that the lifetime support, it doesn't mean that I will be always implementing to my death all the features that are going to appear in DynamoDB. Maybe there is going to be a some feature and I'm not going to implement it. For instance, it's not possible to create the global tables via Dynobase right now, and it won't be possible because we think that majority of people dealing with cloud are using infrastructure as a code, and creating tables via Dynobase is not a super useful feature. And we also believe that it's not going to break even without support. [laugh]. I know it sounds bad; it sounds like I'm not going to support it at some point, but don't worry, there are no plans to discontinue support [crosstalk 00:10:28]—Corey: We all get hit by buses from time to time, let's be clear.Rafal: [laugh].Corey: And I want to also point out as well that this is a graphical tool that is a front-end for an underlying AWS service. It is extremely convenient, there is tremendous value in it, but it is not critical path as if suddenly I cannot use Dynobase, my production app is down. It doesn't work that way, in the sense—Rafal: Yes.Corey: Of a SaaS product. It is a desktop application. And huge fan of that as well. So, please continue.Rafal: Yeah, exactly—Corey: I just want to make sure that I'm not misleading people into thinking it's something it's not here. It's, “Oh, that sounds dangerous if that's critical pa”—yeah, it's not designed to be. I imagine, at least. If so it seems like a very strange use case.Rafal: Yeah. Also, you have to keep in mind that AWS isn't basically introducing breaking changes, especially in a service that is so popular as DynamoDB. I cannot imagine them, like, announcing, like, “Hey, in a month, we are going to deprecate this API, so you'd better start, you know, using this new API because this one is going to be removed.” I think that's not going to happen because of the millions of clients using DynamoDB actively. So, I think that makes Dynobase safe. It's built on a rock-solid foundation that is going to change only additively. No features are going to be just being removed.Corey: I think that there's a direction in a number of at least consumer offerings where people are upset at the idea of software subscriptions, the idea of why should I pay in perpetuity for a thing? And I want to call out my own bias here. For something like this, where you're charging $9 a month, I do not care about the price, truly I don't. I am a price inflexible customer. It could go and probably as high as 50 bucks a month and I would neither notice nor care.That is probably not the common case customer, and it's certainly not over in consumer-land. I understand that I am significantly in a privileged position when it comes to being able to acquire the tools that I need. It turns out compared to the AWS bill I have to deal with, I don't have to worry about the small stuff, comparatively. Not everyone is in that position, so I am very sympathetic to that. Which is why I want to deviate here a little bit because somewhat recently, Dynobase showed up on the AWS Marketplace.And I can go into the Marketplace now and get a yearly subscription for a single seat for $129. It is slightly more than buying it directly through your website, but there are some advantages for many folks in getting it on the Marketplace. AWS is an approved vendor, for example, so there's no procurement dance. It counts toward your committed spend on contracts if someone is trying to wind up hitting certain levels of spend on their EDP. It provides a centralized place to manage things, as far as those licenses go when people are purchasing it. What was it that made you decide to put this on the Marketplace?Rafal: So, this decision was pretty straightforward. It's just, you know, yet another distribution channel for us. So, imagine you're a software engineer that works for a really, really big company and it's super hard to approve some kind of expense using traditional credit card. You basically cannot go to my site and check out with a company credit card because of the processes, or maybe it takes two years. But maybe it's super easy to click this subscribe on your AWS account. So yeah, we thought that, hey, maybe it's going to unlock some engineers working at those big corporations, and maybe this is the way that they are going to start using Dynobase.Corey: Are you seeing significant adoption yet? Or is it more or less a—it's something that's still too early to say? And beyond that, are you finding that people are discovering the product via the AWS Marketplace, or is it strictly just a means of purchasing it?Rafal: So, when it comes to discovering, I think we don't have any data about it yet, which is supported by the fact that we also have zero subscriptions from the Marketplace yet. But it's also our fault because we haven't actually actively promoted the fact, apart from me sending just a tweet on Twitter, which is in [crosstalk 00:14:51]—Corey: Which did not include a link to it as well, which means that Google was our friend for this because let's face it, AWS Marketplace search is bad.Rafal: Well, maybe. I didn't know. [laugh]. I was just, you know, super relieved to see—Corey: No, I—you don't need to agree with that statement. I'm stating it as a fact. I am not a fan of Marketplace search. It irks me because for whatever reason whenever I'm in there looking for something, it does not show me the things I'm looking for, it shows me the biggest partners first that AWS has and it seems like the incentives are misaligned. I'm sure someone is going to come on the show to yell about me. I'm waiting for your call.Rafal: [laugh].Corey: Do you find that if someone is going to purchase it, do you have a preference that they go directly, that they go through the Marketplace? Is there any direction for you that makes more sense than another?Rafal: So ideally, would like to continue all the customers to purchase the software using the classical way, using the subscriptions for our website because it's just one flow, one system, it's simpler, it's cleaner, but we want it to give that option and to have more adoption. We'll see if that's going to work.Corey: I was going to say there were two issues I had with the pricing. That was one of them. The other is at the high end, the enterprise pricing being $250 a month for unlimited licenses, that doesn't feel like it is the right direction, and the reason I say that is a 50-person company would wind up being able to spend 250 bucks a month to get this for their entire team, and that's great and they're happy. So, could AWS or Coca-Cola, and at that very high level, it becomes something that you are signing up for significant amount of support work, in theory, or a bunch of other directions.I've always found that from where I stand, especially dealing with those very large companies with very specific SLA requirements and the rest, the pricing for enterprise that I always look for as the right answer for my mind is ‘click here to contact us.' Because procurement departments, for example, we want this, this, this, this, and this around data guarantees and indemnities and all the rest. And well, yeah, that's going to be expensive. And well, yeah. We're a procurement company at a Fortune 50. We don't sign contracts that don't have two commas in them.So, it feels like there's a dialing it in with some custom optionality that feels like it is signaling to the quote-unquote, ‘sophisticated buyer,' as patio11 likes to say on Twitter from time to time, that might be the right direction.Rafal: That's really good feedback. I haven't thought about it this way, but you really opened my eyes on this issue.Corey: I'm glad it was helpful. The reason I think about it this way is that more and more I'm realizing that pricing is one of the most key parts of marketing and messaging around something, and that is not really well understood, even by larger companies with significant staff and full marketing teams. I still see the pricing often feels like an afterthought, but personally, when I'm trying to figure out is this tool for me, the first thing I do is—I don't even read the marketing copy of the landing page; I look for the pricing tab and click because if the only prices ‘call for details,' I know, A, it's going to be expensive, be it's going to be a pain in the neck to get to use it because it's two in the morning; I'm trying to get something done. I want to use it right now. If I had to have a conversation with your sales team first, that's not going to be cheap and it's not going to be something I'm going to be able to solve my problem this week. And that is the other end of it. I yell at people on both sides on that one.Rafal: Okay.Corey: Again, none of this stuff is intuitive; all of this stuff is complicated, and the way that I tend to see the world is, granted, a little bit different than the way that most folks who are kicking around databases and whatnots tend to view the world. Do you have plans in the future to extend Dynobase beyond strictly DynamoDB, looking to explore other fine database options like Redis, or MongoDB, or my personal favorite Route 53 TXT records?Rafal: [laugh]. Yeah. So, we had plans. Oh, we had really big plans. We felt that we are going to create a second JetBrains company. We started analyzing the market when it comes to MongoDB, when it comes to Cassandra, when it comes to Redis. And our first pick was Cassandra because it seemed, like, to have really, really similar structure of the table.I mean, it's also no secret it also has a primary index, secondary global indexes, and things like that. But as always, reality surprises us over the amount of detail that we cannot see from the very top. And it isn't as simple as just an install AWS SDK and install Cassandra Connector on—or Cassandra SDK and just roll with that. It requires a really big and significant investment. And we decided to focus just on one thing and nail this one thing and do this properly.It's like, if you go into the cloud, you can try to build a service that is agnostic, it's not using the best features of the cloud. And you can move your containers, for instance, across the clouds and say, “Hey, I'm cloud-agnostic,” but at the same time, you're missing out all the best features. And this is the same way we thought about Dynabase. Hey, we can provide an agnostic core, but then the agnostic application isn't going to be as good and as sophisticated as something tailored specifically for the needs of this database and user using this exact database.Corey: This episode is sponsored in parts by our friend EnterpriseDB. EnterpriseDB has been powering enterprise applications with PostgreSQL for 15 years. And now EnterpriseDB has you covered wherever you deploy PostgreSQL on premises, private cloud, and they just announced a fully managed service on AWS and Azure called BigAnimal, all one word.Don't leave managing your database to your cloud vendor because they're too busy launching another half dozen manage databases to focus on any one of them that they didn't build themselves. Instead, work with the experts over at EnterpriseDB. They can save you time and money, they can even help you migrate legacy applications, including Oracle, to the cloud.To learn more, try BigAnimal for free. Go to biganimal.com/snark, and tell them Corey sent you.Corey: Some of the things that you do just make so much sense that I get actively annoyed that there aren't better ways to do it and other places for other things. For example, when I fire up a table in a particular region within Dynobase, first it does a scan, which, okay, that's not terrible. But on some big tables, that can get really expensive. But you cap it automatically to a thousand items. And okay, great.Then it tells me, how long did it take? In this case because, you know, I am using on-demand and the rest and it's a little bit of a pokey table, that scan took about a second-and-a-half. Okay. You scanned a thousand items. Well, there's a lot more than a thousand items in this table. Ah, you limited it, so you didn't wind up taking all that time.It also says that it took 51-and-a-half RCUs—or Read Credit Units—because you know, why use normal numbers when you're AWS and doing pricing dimensions on this stuff.Rafal: [laugh].Corey: And to be clear, I forget the exact numbers for reads, but it's something like a million read RCUs cost me a dollar or something like that. It is trivial; it does not matter, but because it is consumption-based pricing, I always live in a little bit of a concern that, okay, if I screw up and just, like, scan the entire 10-megabyte table every time I want to make an operation here, and I make a lot of operations in the course of a week, that's going to start showing up in the bill in some really unfortunate ways. This sort of tells me as an ongoing basis of what it is that I'm going to wind up encountering.And these things are all configurable, too. The initial stream limit that you have configured as a thousand. I can set that to any number I want if I think that's too many or too few. You have a bunch of pagination options around it. And you also help people build out intelligent queries, [unintelligible 00:22:11] can export that to code. It's not just about the graphical interface clickety and done—because I do love my ClickOps but there are limits to it—it helps formulate what kind of queries I want to build and then wind up implementing in code. And that is no small thing.Rafal: Yeah, exactly. This is how we also envision that. The language syntax in DynamoDB is really… hard.Corey: Awful. The term is awful.Rafal: [laugh]. Yeah, especially for people—Corey: I know, people are going to be mad at me, but they're wrong. It is not intuitive, it took a fair bit of wrapping my head around. And more than once, what I found myself doing is basically just writing a thin CRUD API in Lambda in front of it just so I can query it in a way that I think about it as opposed to—now I'm not even talking changing the query modeling; I just want better syntax. That's all it is.Rafal: Yeah. You also touch on modeling; that's also very important thing, especially—or maybe even scan or query. Suppose I'm an engineer with tens years of experience. I come to the DynamoDB, I jump straight into the action without reading any of the documentation—at least that's my way of working—and I have no idea what's the difference between a scan and query. So, in Dynobase, when I'm going to enter all those filtering parameters into the UI, I'm going to hit scan, Dynobase is automatically going to figure out for you what's the best way to query—or to scan if query is not possible—and also give you the code that actually was behind that operation so you can just, like, copy and paste that straight to your code or service or API and have exactly the same result.So yeah, we want to abstract away some of the weird things about DynamoDB. Like, you know, scan versus query, expression attribute names, expression attribute values, filter, filtering conditions, all sorts of that stuff. Also the DynamoDB JSON, that's also, like, a bizarre thing. This JSON-type thing we should get out of the box, we also take care of that. So, yeah. Yeah, that's also our mission to make the DynamoDB as approachable as possible. Because it's a great database, but to truly embrace it and to truly use it, it's hard.Corey: I want to be clear, just for folks who are not seeing some of the benefits of it the way that I've described it thus far. Yes, on some level, it basically just provides a attractive, usable interface to wind up looking at items in a DynamoDB table. You can also use it to wind up refining queries to look at very specific things. You can export either a selection or an entire table either to a local file—or to S3, which is convenient—but it goes beyond on that because once you have the query dialed in and you're seeing the things you want to see, there's a generate code button that spits it out in—for Python, for JavaScript, for Golang.And there are a few things that the AWS CLI is coming soon, according to the drop-down itself. Java; ooh, you do like pain. And Golang for example, it effectively exports the thing you have done by clicking around as code, which is, for some godforsaken reason, anathema to most AWS services. “Oh, you clicked around to the console to do a thing. Good job. Now, throw it all away and figure out how to do it in code.” As opposed to, “Here's how to do what you just did programmatically.” My God, the console could be the best IDE in the world, except that they don't do it for some reason.Rafal: Yeah, yeah.Corey: And I love the fact that Dynobase does.Rafal: Thank you.Corey: I'm a big fan of this. You can also import data from a variety of formats, export data, as well. And one of the more obnoxious—you talk about weird problems I have with DynamoDB that I wish to fix: I would love to move this table to a table in a different AWS account. Great, to do that, I effectively have to pause the service that is in front of this because I need to stop all writes—great—export the table, take the table to the new account, import the table, repoint the code to talk to that thing, and then get started again. Now, there are ways to do it without that, and they all suck because you have to either write a shim for it or you have to wind up doing a stream that winds up feeding from one to the other.And in many cases, well okay, I want to take the table here, I do a knife-edge cutover so that new rights go to the new thing, and then I just want to backfill this old table data into it. How do I do that? The official answer is not what you would expect it to be, the DynamoDB console of ‘import this data.' Instead, it's, “Oh, use AWS Glue to wind up writing an ETL function to do all of this.” And it's… what? How is that the way to do these things?There are import and export buttons in Dynobase that solve this problem beautifully without having to do all of that. It really is such a different approach to thinking about this, and I am stunned that this had to be done as a third party. It feels like you were using the native tooling and the native console the same way the rest of us do, grousing about it the same way the rest of us do, and then set out to fix it like none of us do. What was it that finally made you say, “You know, I think there's a better way and I'm going to prove it.” What pushed you over the edge?Rafal: Oh, I think I was spending, just, hours in the console, and I didn't have a really sophisticated suite of tests, which forced me [unintelligible 00:27:43] time to look at the data a lot and import data a lot and edit it a lot. And it was just too much. I don't know, at some point I realized, like, hey, there's got to be a better way. I browsed for the solutions on the internet; I realized that there is nothing on the market, so I asked a couple of my friends saying like, “Hey, do you also have this problem? Is this also a problem for you? Do you see the same challenges?”And basically, every engineer I talked to said, “Yeah. I mean, this really sucks. You should do something about it.” And that was the moment I realized that I'm really onto something and this is a pain that I'm not alone. And so… yeah, that gave me a lot of motivation. So, there was a lot of frustration, but there was also a lot of motivation to push me to create a first product in my life.Corey: It's your first product, but it does follow an interesting pattern that seems to be emerging, Cloudash—Tomasz and Maciej—wound up doing that as well. They're also working at Stedi and they have their side project which is an Electron-based desktop application that winds up, we're interfacing with AWS services. And it's. What are your job requirements over at Stedi, exactly?People could be forgiven for seeing these things and not knowing what the hell EDI is—which guilty—and figure, “Ah, it's just a very fancy term for a DevRels company because they're doing serverless DevRel as a company.” It increasingly feels an awful lot like that.j, what's going on over there where that culture just seems to be an emergent property?Rafal: So, I feel like Stedi just attracts a lot of people that like challenges and the people that have a really strong sense of ownership and like to just create things. And this is also how it feels inside. There is plenty of individuals that basically have tons of energy and motivation to solve so many problems not only in Stedi, but as you can see also outside of Stedi, which is a result—Cloudash is a result, the mapping tool from Zack Charles is also a result, and Michael Barr created a scheduling service. So, yeah, I think the principles that we have at Stedi basically attract top-notch builders.Corey: It certainly seems so. I'm going to have to do a little more digging and see what some of those projects are because they're new to me. I really want to thank you for taking so much time to speak with me about what you're building. If people want to learn more or try to kick the tires on Dynobase which I heartily recommend, where should they go?Rafal: Go to dynobase.dev, and there's a big download button that you cannot miss. You download the software, you start it. No email, no credit card required. You just run it. It scans your credentials, profiles, SSOs, whatever, and you can play with it. And that's pretty much it.Corey: Excellent. And we will put a link to that in the [show notes 00:30:48]. Thank you so much for your time. I really appreciate it.Rafal: Yeah. Thanks for having me.Corey: Rafal Wilinski, serverless engineer at Stedi and creator of Dynobase. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice—or a thumbs up and like and subscribe buttons on the YouTubes if that's where you're watching it—whereas if you've hated this podcast, same thing—five-star review, hit the buttons and such—but also leave an angry, bitter comment that you're not going to be able to find once you write it because no one knows how to put it into DynamoDB by hand.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Remember: you can also always follow the show on Twitter @dotnetcoreshow, and the shows host on Twitter @podcasterJay or visit our Contact page. Welcome to season 4 of the award-winning .NET Core Podcast! Check that link for proof. Hello everyone and welcome to The .NET Core Podcast is a podcast where we reach into the core of the .NET technology stack and, with the help of the .NET community, present you with the information that you need in order to grok the many moving parts of one of the biggest cross-platform, multi-application frameworks on the planet. I am your host, Jamie "GaProgMan" Taylor. In this episode, I talked with Josh Hurley and Norm Johanson about the AWS Microservice Extractor for .NET, and a whole heap of .NET things that AWS are doing with .NET. - things like the .NET deployment tool, which allow you to deploy a .NET application to AWS in as few as two mouse clicks, even if you don't know the names of AWS services yet. We also talked about the fact that AWS was the first cloud services provider to offer .NET hosting, and the fact that the AWS SDK for .NET was one of the first public NuGet packages. The full show notes, including links to some of the things we discussed and a full transcription of this episode, can be found at https://dotnetcore.show/episode-98-aws-microservices-extractor-for-dotnet-with-josh-hurley-and-norm-johanson/ Useful Links from the episode: .NET on AWS Twitter Josh on Twitter Norm on Twitter Microservice Extractor for .NET Service home page User Guide Blogs Workshop Feedback and to report issues .NET deployment tool (in preview) AWS SDK for .NET .NET on AWS High level libraries on GitHub AWS Toolkit for Visual Studio Remember to rate and review the show on Apple Podcasts, Podchaser, or wherever you find your podcasts, this will help the show's audience grow. Or you can just share the show with a friend. And don't forget to reach out via our Contact page. We're very interested in your opinions of the show, so please do get in touch. You can support the show by making a monthly donation one the show's Patreon page at: https://www.patreon.com/TheDotNetCorePodcast
There are lots of options for programming languages on AWS these days but one of the most popular ones remains JavaScript. In this episode of AWS Bites we discuss what it's like to develop with JavaScript, Node.js and TypeScript on AWS and what's new in this field. We explore why you would choose JavaScript and what are the trade-offs that come with this choice. We present some of the main features of the all-new AWS SDK v3 for JavaScript. We discuss runtime support and tooling for AWS Lambda and finally some interesting developments in the JavaScript ecosystem for the cloud and AWS. - Our previous episode on What language to use for lambda: https://www.youtube.com/watch?v=S0tpReRa6m4 - AI as a Service by Eoin Shanaghy and Peter Elger (book): https://www.manning.com/books/ai-as-a-service - Node.js Design Patterns by Mario Casciaro and Luciano Mammino (book): https://www.nodejsdesignpatterns.com/ - AWS SDK for JavaScript v3 high level concepts (including command based model): https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/index.html#high-level-concepts - AWS SDK for JavaScript v3 paginators using Async Iterators: https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/index.html#paginators - Mocking support for the AWS SDK for JavaScript v3: https://aws.amazon.com/blogs/developer/mocking-modular-aws-sdk-for-javascript-v3-in-unit-tests/ - Various interesting benchmarks on different Lambda runtimes: https://github.com/theam/aws-lambda-benchmark - https://filia-aleks.medium.com/benchmarking-all-aws-lambda-runtimes-in-2021-cold-start-part-1-e4146fe89385 - https://www.simform.com/blog/aws-lambda-performance/ - Support for ESM modules in AWS Lambda (Node.js 14): https://aws.amazon.com/about-aws/whats-new/2022/01/aws-lambda-es-modules-top-level-await-node-js-14/ - The Middy Framework (middleware pattern for AWS Lambda): https://middy.js.org/ - Lambda Power Tools library for TypeScript: https://awslabs.github.io/aws-lambda-powertools-typescript/ - Yan Cui's article on performance improvements with bundling: https://lumigo.io/blog/3-major-ways-to-improve-aws-lambda-performance/ - ZX project (scripting with JavaScript) by Google: https://github.com/google/zx Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige
PHP Internals News: Episode 97: Redacting Parameters London, UK Thursday, January 27th 2022, 09:09 GMT In this episode of "PHP Internals News" I chat with Tim Düsterhus (GitHub) about the "Redacting Parameters in Back Traces" RFC. The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news Transcript Derick Rethans 0:00 Before we start with this episode, I want to apologize for the bad audio quality. Instead of using my nice mic I managed to use to one built into my computer. I hope you'll still enjoy the episode. Derick Rethans 0:30 Hi, I'm Derick. Welcome to PHP internals news, a podcast dedicated to explaining the latest developments in the PHP language. This is episode 97. Today I'm talking with Tim Düsterhus about Redacting Parameters in Backtraces RFC that he's proposing. Tim, would you please introduce yourself? Tim Düsterhus 0:50 Hi, Derick, thank you for inviting me. I am Tim Düsterhus, and I'm a developer at WoltLab. We are building a web application suite for you to build online communities. Derick Rethans 0:59 Thanks for coming on this morning. What is the problem that you're trying to solve with this RFC? Tim Düsterhus 1:05 If everything is going well, we don't need this RFC. But errors can and will happen and our application might encounter some exceptional situation, maybe some request to an external service fails. And so the application throws an error, this exception will bubble up a stack trace and either be caught, or go into a global exception handler. And then basically, in both cases, the exception will be logged into the error log. If it can be handled, we want to make the admin side aware of the issues so they can maybe fix their networking. If it is unable to be handled because of a programming error, we need to log it as well to fix the bug. In our case, we have the exception in the error log. And what happens next? In our case, we have many, many lay person administrators that run a community for their hobby, they're not really programmers with no technical expertise. And we also have a strong customers help customers environment. What do those customers do? They grab their error log and post it within our forums in public. Now in our forum, we have the error log with the full stack trace, including all sensitive values, maybe user passwords, if the Authentication Service failed, or something else, that should not really happen. In our case, it's lay person administrators. But I'm also seeing that experienced developers can make this mistake. I am triaging issues with an open source software written in C. And I've sometimes seeing system administrators posting their full core dump, including their TLS certificates there, and they don't really realize what they have just done. That's really an issue that affects laypersons, and professional administrators the same. In our case, our application attempts to strip those sensitive information from this backtrace. We have a custom exception handler that scans the full stack face, tries to match up class names and method names e.g. the PDO constructor to scrub the database password. And now recently, we have extended this stripping to also strip anything from parameters that are called password, secret, or something like that. That mostly works well. But in any case, this exception handler will miss sensitive information because it needs to basically guess what parameters are sensitive values and which don't. And also our exception handler grew very complex because to match up those parameters, it needs to use reflection. And any failures within the exception handler cannot really be recovered from, if the exception handler fails, you're out of luck. Derick Rethans 3:51 Quite a few things to think of to make sure that you're not sharing any secrets. And I certainly have seen almost doing this myself. We now know what the problem is. How is this RFC proposing to fix this? Tim Düsterhus 4:03 Primarily, we want to propose a standardized way for applications or libraries to indicate which parameters hold sensitive values. Our custom exception handler uses reflection as we said before, and it only matches up the parameter's names, but we also have this attribute I am proposing, SensitiveParameter within our application itself. Any parameter names that are not definitely sensitive can be attributed with this attribute. But this only works within our software, but not with any third party libraries we are using, e.g. for encryption or whatever there is. Primarily we want to propose a standardized way an attribute that is in PHP core, anyone can use that and everyone knows what this attribute means. Secondarily, the RFC is proposing a default implementation to keep the exception handler simple. As I said before, we are using reflection. This is very complex, it does not work with the require_once or include_once family, because that are not functions. We need to handle this case to not try to attempt to reflect on those non functions when redacting any parameters. This is complex. And we want to simplify that. Derick Rethans 5:20 From what I understand this is then a way to make sure that there's a standardized method for marking arguments as being sensitive. And because this is that now standardized, only one solution to the problem has to be found right? Tim Düsterhus 5:34 Basically, not every library is using their own attributes, possibly, or we can match parameter names that are not like password, secret, but it can be documented: hey, if you are using sensitive parameters, you should put this attribute and then those exception handlers will be aware that this attribute is sensitive and can strip it, or in case of the RFC PHP itself, will already strip those parameters from the stack trace. Derick Rethans 6:04 You're suggesting that PHP standard way of showing stack traces also takes care of the sensitive parameter here? Tim Düsterhus 6:11 Yes, exactly. Derick Rethans 6:13 Which internal PHP functions are likely to get this attribute? Tim Düsterhus 6:16 Basically anything with a parameter called password or secret, as I said before, examples include PDO's constructor, the database password will be in there and possibly also the user name or host name, which might be considered sensitive. But the password is the most important thing I have on my list. ldap_bind, which possibly includes user passwords; the password_hash function; possibly various OpenSSL functions. One will need to look and this list can be extended in the future as well, if someone realizes we missed anything. Derick Rethans 6:55 Now, I know sometimes that there's a problem where an application connects to the wrong server with PDO. And as you say, the host name was also in this PDO constructor, would it not then make debugging that specific case harder because the hostname would also be redacted from the stack traces? Tim Düsterhus 7:14 The attribute I am proposing as the parameter attribute, each parameter can be sensitive or non sensitive. We would need to decide whether we consider the hostname sensitive or not. It usually is not. So I would not put the attribute on the host name, or on the DSN string in the first parameter. The password definitely is sensitive. And the username possibly is a grey area. By default, I probably would not put the attribute there. But this is something that needs to be discussed in the greater community possibly. Derick Rethans 7:47 I saw in the RFC that when you request a stack trace in PHP with get back trace or whatever the name of this function is, is that the sensitive parameters are being replaced by an object of the class SensitiveParameter. Why did you pick that instead of just a string, saying something like "redacted". Tim Düsterhus 8:06 We cannot force users to put the attribute only on parameters that take strings. If we use a redacted string we might violate the type hint. If a function takes some key pair class, or an option of a key pair class, this usually is a sensitive attribute, we cannot simply put a string there. We can but then we would violate the typing. And as we violate the typing in at least some of the cases, we can also violate it in all of the cases and then make it very clear that this parameter was redacted and not a real value that just looks like a string "redacted". Exception handlers would be able to use an instanceof SensitiveParameter check to possibly make it more user friendly when they render the stack trace. When you using an GUI to handle your exceptions as such a Sentry can show some placeholder instead of pretending it's a real string in there. Derick Rethans 8:07 And of course, the string "redacted" can already exist as an argument value yet anyway, right? Tim Düsterhus 9:12 Yeah. Derick Rethans 9:13 Where would attribute be checked? Tim Düsterhus 9:16 My proposal would extend PHP to check this attribute within the function that generates the stack trace, because as I said, I want to keep my exception handler simple, so they won't need to use reflection to check this attribute. PHP itself will check this attribute when the stack trace is generated. So no exception handler can miss to check this attribute. Derick Rethans 9:39 Would it be possible for code that checks for SensitiveParameter to see what the original value was? I can imagine that in some cases, an exception handler as part of a debugging toolbar, whatever does want to show this extra information, although there's going to be hidden by default. Tim Düsterhus 9:58 Not with the current version of my RFC, but I can imagine that this sensitive parameter replacement value gets an attribute where the original value can be stored. Care would need to be taken, so exception handlers don't simply serialize that value and ship it to a third party service, basically negating the benefit. But a future extension, or maybe the further discussion of my RFC can extend this replacement value. So you can use sensitive parameter, arrow, original value, or whatever. Derick Rethans 10:34 In PHP attributes are basically markers on parameters or arguments. But they don't necessarily have to have an object implementation. Is your RFC also including the SensitiveParameter class that PHP core implements? Tim Düsterhus 10:51 Yes, in my current RFC, and my current proof of concept implementation, I'm just reusing that attribute class as the replacement value within the stack trace. So we can kill two birds with one stone by doing that, by including proper class, also, any IDE will be able to see that class and know where that attribute can be applied. Because attributes have a property where they say where they can be applied in this case parameters only. And by putting it on the method by accident, you will possibly get an error or the IDE can warn you that you're doing this not correctly, Derick Rethans 11:32 You might be aware that I work on Xdebug, a debugger for PHP. And in many cases, some of the users have already previously said that Xdebug should, for example, follow the debug_info() magic method on objects to show redacted information. Now, would you think that when people debug PHP with a debugger such as Xdebug, should they see the contents of the arguments that are set with SensitiveParameter, or should it stack traces show the real value? Tim Düsterhus 12:07 In case of debugging, you're not usually not in production. So within your debugging environment or development environment, you shouldn't really have any sensitive value such as passwords, or credit card numbers, or whatever there is. In that case, debugability and ease of development should be more important. Xdebug, or any other debugger should see through those sensitive attributes and show the real value, possibly with an indicator that this value would usually be sensitive. But you shouldn't need to work around PHP hiding something from you, because you really want or need to see what happens there. Derick Rethans 12:48 Now Xdebug also override PHP's standard exception handler, and then creates a stack trace of its own. Do you think that should redact the SensitiveParameter arguments? Tim Düsterhus 13:00 I'm not really sure if people run this in production. If this is something people usually do, then of course, Xdebug should make sure to redact those values, possibly with a special ini flag or something. If that's only used in development. In my case, I only use Xdebug in development and production servers don't have that; you don't really connect to your production server with your IDE and then step through the code. That does not happen. So we don't need Xdebug in production. Derick Rethans 13:32 I know some people do run Xdebug in production. But I also don't think those are the people that care about leaking sensitive parameters. I think the RFC talks about a few existing features that PHP already has for redacting some values. What are these? And how are they not sufficient? Tim Düsterhus 13:49 There are two php.ini values you can set. One of those is do not collect parameters in stack traces, I don't have the exact name. But basically, all functions will just show an empty parameter list within the stack trace. That makes debugging very hard, especially with PHP and the non-strict typing, it can happen that you pass some completely invalid value to a function, even in production after testing and such. And you really want to know about this value, because it makes debugging very hard. Not collecting the parameters makes the stack traces much, much less useful. So this targeted redaction, as I'm proposing, hides the sensitive values but the non sensitive values will still be visible. And the other one is that the length of collected strings within the stack place can be configured. By default. I think it's on 15, but 15 characters already include user passwords such as password, exclamation mark, or 12345. And also credit card numbers will be exposed to three fourths by then. And the last four digits are shown in clear text on many pages. So that doesn't really help with those type of user credentials. Of course, your database password might be 40 characters completely random. But that's not really the values you want, or need to protect, because the database server will not be exposed to the internet, in many cases. Derick Rethans 15:33 What has the feedback been so far to this RFC? Tim Düsterhus 15:36 Both positive, and "we don't need that nobody does that". It's a bit mixed. I've got some very good feedback. There's a Twitter account that tweets any new RFCs. And so the users on Twitter, the actual users, and not PHP internals list seem to be very happy with my proposal. On the list, many said, just don't log that values, or they don't really see the benefit yet, I think. Not really sure how the feedback is really. Derick Rethans 16:07 That's always a tricky thing, isn't it? Because the people that think "Oh, this is all right", often bother responding, because they don't have anything to add or criticize. Tim Düsterhus 16:17 Exactly. People that are happy won't write any reviews for whatever, just the people that complain are complaining. Derick Rethans 16:24 Yeah, it's either the people that are complaining are the people that are really happy about something. Are you expecting there to be any backward compatibility breaks? Tim Düsterhus 16:34 Yeah, obviously, when the attribute class name will be taken by default by PHP, userland code cannot use that any more. But I don't think that anyone is using a SensitiveParameter class in the global namespace. I used GitHub search and SensitiveParameter in PHP code only appears in some strings, in the AWS SDK or something like that. The replacement value will break any type signature. So if the exception handler checks, the original parameter types for whatever reason, that will, or might break, but I don't really think that's likely either. I don't expect any major backwards compatibility breaks. Derick Rethans 17:17 That's good to hear. And also good to hear that you have done some research into this. Do you have any extra selling points to convince people? Tim Düsterhus 17:26 My initial selling point was PDO's constructor. Or not really selling point, but example, because it's very obvious and it's in PHP core. I later expanded that with the credit card numbers and user passwords, and made, attempted to make this more clear that those sensitive values are not just values from your personal computing environment, but also something user input into your application. And that stack traces will be sent to third parties e.g. Sentry, which might even be run as a software as a service solution. And then your deep in GDPR territory. You don't want that. Derick Rethans 18:03 No, absolutely not. Tim, thank you for taking the time this morning to talk to me about your RFC. Tim Düsterhus 18:10 Thank you for having me. Derick Rethans 18:15 Thank you for listening to this installment of PHP internals news, a podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening. I'll see you next time. Show Notes RFC: Redacting parameters in back traces PHP RFC Bot on Twitter Credits Music: Chipper Doodle v2 — Kevin MacLeod (incompetech.com) — Creative Commons: By Attribution 3.0
On The Cloud Pod this week, the team finds out whose re:Invent 2021 crystal ball was most accurate. Also Graviton3 is announced, and Adam Selipsky gives his first re:Invent keynote. A big thanks to this week's sponsors: Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. JumpCloud, which offers a complete platform for identity, access, and device management — no matter where your users and devices are located. This week's highlights
About AntAnt Co-founded A Cloud Guru, ServerlessConf, JeffConf, ServerlessDays and now running Senzo/Homeschool, in between other things. He needs to work on his decision making.Links: A Cloud Guru: https://acloudguru.com homeschool.dev: https://homeschool.dev aws.training: https://aws.training learn.microsoft.com: https://learn.microsoft.com Twitter: https://twitter.com/iamstan TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It's an awesome approach. I've used something similar for years. Check them out. But wait, there's more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It's awesome. If you don't do something like this, you're likely to find out that you've gotten breached, the hard way. Take a look at this. It's one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That's canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I'm a big fan of this. More from them in the coming weeks.Corey: This episode is sponsored in part my Cribl Logstream. Cirbl Logstream is an observability pipeline that lets you collect, reduce, transform, and route machine data from anywhere, to anywhere. Simple right? As a nice bonus it not only helps you improve visibility into what the hell is going on, but also helps you save money almost by accident. Kind of like not putting a whole bunch of vowels and other letters that would be easier to spell in a company name. To learn more visit: cribl.ioCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Every once in a while I talk to someone about, “Oh, yeah, remember that time that you appeared on Screaming in the Cloud?” And it turns out that they didn't; it was something of a fever dream. Today is one of those guests that I'm, frankly, astonished I haven't had on before: Ant Stanley. Ant, thank you so much for indulging me and somehow forgiving me for not having you on previously.Ant: Hey, Corey, thanks for that. Yeah, I'm not too sure why I haven't been on previously. You can explain that to me over a beer one day.Corey: Absolutely, and I'm sure I'll be the one that buys it because that is just inexcusable. So, who are you? What do you do? I know that you're a Serverless Hero at AWS, which is probably the most self-aggrandizing thing you can call someone because who in the world in their right mind is going to introduce themselves that way? That's what you have me for. I'll introduce you that way. So, you're an AWS Serverless Hero. What does that mean?Ant: So, the Serverless Hero, effectively I've been recognized for my contribution to the serverless community, what that contribution is potentially dubious. But yeah, I was one of the original co-founders of A Cloud Guru. We were a serverless-first company, way back when. So, from 2015 to 2016, I was with A Cloud Guru with Ryan and Sam, the two other co-founders.I left in 2016 after we'd run ServerlessConf. So, I led and ran the first ServerlessConf. And then for various reasons, I decided, hey, the pressure was too much; I needed a break, and a few other reasons I decided to leave A Cloud Guru. A very amicable split with my former co-founders. And then yeah, I kind of took a break, took some time off, de-stressed, got the serverless user group in London up and running; ran a small conference in London called JeffConf, which was a take on a blog that Paul Johnson, who was one of the folks who ran JeffConf with me, wrote a while ago saying we could have called it serverless—and we might as well have called it Jeff. Could have called it anything; might as well have called it Jeff. So, we had this joke about JeffConf. Not a reference to Mr. Bazos.Corey: No, no. Though they do have an awful lot of Jeffs working over there. But that's neither here nor there. ‘The Land of the Infinite Jeffs' as it were.Ant: Yeah, exactly. There are more Jeffs than women in the exec team if I remember correctly.Corey: I think it's now it's a Dave problem instead.Ant: Yeah, it's a Dave problem. Yeah. [laugh]. It's not a problem either way. Yeah. So, JeffConf morphed into SeverlessDays, which is a group of community events around the world. So, I think AWS said, “Hey, this guy likes running serverless events for some silly reason. Let's make him a Serverless Hero.”Corey: And here we are. Which is interesting because a few directions you can take this in. One of them, most recently, we were having a conversation, and you were opining on your thoughts of the current state of serverless, which can succinctly be distilled down to ‘serverless sucks,' which is not something you'd expect to hear from a Serverless Hero—and I hope you can hear the initial caps when I say ‘Serverless Hero'—or the founder of a serverless conference. So, what's the deal with that? Why does it suck?Ant: So, whole serverless movement started to gather momentum in 2015. The early adopters were all extremely experienced technologists, folks like Ben Kehoe, the chief robotics scientist at iRobot—he's incredibly smart—and folks of that caliber. And those were the kinds of people who spoke at the first serverless conference, spoke at all the first serverless events. And, you know, you'd kind of expect that with a new technology where there's not a lot of body of knowledge, you'd expect these high-level, really advanced folks being the ones putting themselves out there, being the early adopters. The problem is we're in 2021 and that's still the profile of the people who are adopting serverless, you know? It's still not this mass adoption.And part of the reason for me is because of the complexity around it. The user experience for most serverless tools is not great. It's not easy to adopt. The patterns aren't standardized and well known—even though there are a million websites out there saying that there are serverless patterns—and the concepts aren't well explained. I think there's still a fair amount of education that needs to happen.I think folks have focused far too much on the technical aspects of serverless, and what is serverless and not serverless, or how you deploy something, or how you monitor something, observability, instead of going back to basics and first principles of what is this thing? Why should you do it? How do you do it? And how do we make that easy? There's no real focus on user experience and adoption for inexperienced folks.The adoption curve, the learning curve for serverless, no matter what platform you do, if you want to do anything that's beyond a side project it's really difficult because there's no easy path. And I know there's going to be folks that are going to complain about it, but the Serverless Stack just got a million dollars to solve this problem.Corey: I love the Serverless Stack. They had a great way of building things out.Ant: Yeah.Corey: I cribbed a fair bit from what they built when I was building out my own serverless project of the newsletter production pipeline system. And that's awesome. And I built that, and I run it mostly as a technology testbed. But my website, lastweekinaws.com?I pay WP Engine to host it on WordPress and the reason behind that is not that I can't figure out the serverless pieces of it, it's because when I want to hire someone to do something that's a bit off the beaten path on WordPress, I don't have to spend $400 an hour for a consultant to do it because there's more than 20 people in the world who understand how all this stuff fits together and integrates well. There's something to be said for going in the direction the rest of the market is when there's not a lot of reason to differentiate yourselves. Yeah, could I save thousands of dollars a year in infrastructure costs if I'd gone with serverless? Of course, but people's time is worth more than that. It's expensive to have people work on these things.And even on the serverless stuff that I've built, if it's been more than six months since I've touched a component, someone else may have written it; I have to rediscover what the hell I was thinking and what the constraints are, what the constraints I thought existed there in the platform. And every time I deal with Lambda or API Gateway, I come away with a spiraling sense of complexity tied to all of it. And the vision of serverless I believe in, truly, but the execution has lagged from all providers.Ant: Yeah. I agree with that completely. The execution is just not there. I look at the situation—so Datadog had their report, “The State of Serverless Report” that came out about a month or two ago; I think it's the second year they've done it, now, might be the third. And in the report, one of the sections, they talked about tooling.And they said, “What's the most adopted tools?” And they had the Serverless Framework in there, they had SAM in there, they had CloudFormation, I think they had Terraform in there. But basically, Serverless Framework had 70% of the respondents. 70% of folks using Datadog and using serverless tools were using Serverless Framework. But SAM, AWS's preferred solution, was like 12%.It was really tiny and this is the thing that every single AWS demo example uses, that the serverless developer advocates push heavily. And it's the official solution, but the Serverless Application Model is just not being adopted and there are reasons for that, and it's because it's the way they approach the market because it's highly opinionated, and they don't really listen to end-users that much. And their CDK out there. So, that's the other AWS organizational complexity as well, you've got another team within AWS, another product team who've developed this different way—CDK—doing things.Corey: This is all AWS's fault, by the way. For the longest time, I've been complaining about Lambda edge functions because they are not at all transparent; you have to wait for a CloudFront deployment for it to update every time, only to figure out that in my case, I forgot a comma because I've never heard of a linter. And it becomes this awful thing. Only recently did I find out they only run at regional edge caches, not just in all of the CloudFront pop, so I said, “The hell with it,” ripped it out of everything I was using it with, and wound up implementing it in bog-standard Lambda because it was easier. But then rather than fixing that, they've created their—what was it—their CloudFront Workers. Or is it—is it CloudFront Workers, or is it CloudFront Functions?Ant: No, CloudFront Functions.Corey: I don't even remember it because rather than fixing the thing, you just released a different thing that addresses these problems in very different ways that aren't directly compatible. And it's oh, great, awesome. Terrific. As a customer, I want absolutely not this. It's one of these where, honestly, I've left in many cases with the resigned position of, if you're not going to take this seriously, why am I?Ant: Yeah, exactly. And it's bizarre. So, the CloudFront Functions thing, it's based on Nginx's [little 00:08:39] JavaScript engine. So, it's the Nginx team supporting it—the engine—which is really small number of users; it's tiny, there's no foundation behind it. So, you've got these massive companies reliant on some tiny organization to support the runtime of one of their businesses, one of their services.And they expect people to adopt it. And on top of that, that engine supports primary language is JavaScript's ES5 or ES2015, which is the 2015 edition of JavaScript, so it's a six-year-old version of JavaScript. You cannot use one JavaScript with it which also means you can't use any other tools in the JavaScript ecosystem for it. So basically, anything you write for that is going to be vanilla, you're going to write yourself, there's no tooling, no community to really leverage to use that thing. Again, like, why have you even done that? Why if you now gone off and taken an engine no one uses—they will say someone uses it, but basically no one uses—Corey: No one willingly uses or knowingly uses it.Ant: Yeah. No one really uses. And then decided to run that. Why not look at WebAssembly—it's crazy—which has a foundation behind it and they're doing great things, and other providers are using WebAssembly on the edge. I just don't understand the thought process—well, I say I don't understand, but I do understand the thought processes behind Amazon. Every single GM in Amazon is effectively incentivized to release stuff, and build stuff, and to get stuff out the door. That's how they make money. You hear the stories—Corey: Oh, it's been clear for years. They only recently stopped—in their keynotes every year—talking about the number of feature releases that they've had over the past 12 months. And I think they finally had it clued into them by someone snarky on Twitter—ahem—that the only people that feel good about that are people internal to AWS because customers see that and get horrified of, “I haven't kept up with most of those things. How many of those are important? How many of them are nonsense?”And I'm sure somewhere you have released a serverless that will solve my business problem perfectly so I don't have to build it together myself out of Lambda functions, and string, and popsicle sticks, but I'll never hear about it because you're too busy talking about nonsense. And that problem still exists and it's writ large. There's a philosophy around not breaking existing workloads—which I get; that's a hard problem to solve for—but their solution is, rather than fixing existing services will launch a new one that doesn't have those constraints and takes a different approach to it. And it's horrible.Ant: Yeah, exactly. If you compare Amazon to Apple, Apple releases a net-new product once a year, once every two years.Corey: You're talking about new generations of products, that comes out on an annualized basis, but when you're talking about actual new product, not that frequently. The last one—Ant: Yeah.Corey: —I can really think of is probably going to be AirPods, at least of any significance.Ant: AirTags is the new one.Corey: Oh, AirTags. AirTags is recent, which is a neat—but it's an accessory to the rest of those things. It is—Ant: And then there's AirPods. But yeah, it's once—because they—everything works. If you're in that Apple ecosystem, everything works. And everything's back-ported and supported. My four-year-old phone still works and had a five-year-old MacBook before this current one, still worked, you know, not a problem.And those two philosophies—and the Amazon folk are heavily incentivized to release products and to grow the usage of those products. And they're all incentivized within their bubbles. So, that's why you get competing products. That's why Proton exists when CodeBuild and CodePipeline, and all of those things exist, and you have all these competing products. I'm waiting for the container team to fully recreate AWS on top of containers. They're not far away.Corey: They're already in the process of recreating AWS on top of Lightsail. It's more or less the, “Oh, we're making this the simpler version.” Which is great. You know who likes simplicity? Freaking everyone.So, it's the vision of a cloud, we could have had but didn't. “Oh, you want a virtual machine. Spin up a Lightsail instance; you're going to get a fixed amount of compute, disk, RAM, and CPU that you can adjust, and it's going to cost you a flat fee per month until you exceed some fairly high limits.” Why can't everything be like that, on some level? Because in many cases, I don't care about wanting to know exactly to the penny shave things off.I want to spin up a fleet of 20 virtual machines, and if they cost me 20 bucks a pop each a month, I can forecast that, I can budget for that, I can do a lot and I don't actually care in any business context about the money there, but dialing it in and having the variable charges and the rest, and, “Oh, you went through a managed NAT gateway. That's going to double your bandwidth price and it's going to be expensive. Surprise, you should have looked more closely at it,” is sort of the lesson of the original AWS services. At some level, they've deviated away from anything resembling simplicity and increasingly we're seeing a world where in order to do something effectively with cloud, you have to spend 12 weeks going to cloud school first.Ant: Oh, yeah. Completely. See, that's one of the major barriers with serverless. You can't use serverless for any of the major cloud providers until you understand that cloud provider. So yeah, do your 12 weeks of cloud school. And there's more than enough providers.Corey: Whoa, whoa, whoa. Before you spin up a function that runs code, you have to understand the identity and security model, and how the network works, and a bunch of other ancillary nonsense that isn't directly tied to business value.Ant: And all these fun things. How are you're going to test this, and how are you're going to do all that?Corey: How do you write the entry point? Where is it going to enter? What is it expecting? What objects are getting passed in, if any? What format is it going to take?I've spent days, previously, trying to figure out the exact invocation for working with a JSON object in Python, what that's going to show up as, and how specifically to refer to it. And once you've done that a couple of times, great, fine, it's easy. Copy and paste it from the last time you did it. But figuring it out from first principles, particularly in a time when there isn't a lot of good public demonstrations of this—especially early days—it's hard to do.Ant: Yeah. And they just love complexity. Have you looked at the second edition—so the third version of the AWS SDK for JavaScript?Corey: I don't touch JavaScript with my hands most days, just because I'm bad at it and I don't understand the asynchronous model and computers are really not my thing most.Ant: So, unfortunately for my sins, I do use JavaScript a lot. So, version two of the SDK is effectively the single most popular Cloud SDK of any language, anything out there; 20 million downloads a week. It's crazy. It's huge—version two. And JavaScript's a very fast-evolving language, though.Basically, it's a bit like the English language in that it adopts things from other languages through osmosis, and co-opts various other features of other languages. So, JavaScript has—if there's a feature you love in your language, it's going to end up in JavaScript at some point. So, it becomes a very broad Swiss Army knife that can do almost anything. And there's always better ways to do things. So, the problem is, the version two was written in old JavaScript from years twenty fifteen years five years six kind of level.So, from 2015, 2016, I—you know, 2020, 2021, JavaScript has changed. So, they said, “Oh, we're going to rewrite this.” Which good; you should do. But they absolutely broke all compatibility with version two. So, there is no path from version two to version three without rewriting what you've got.So, if you want to take anything you've written—not even serverless—anything in JavaScript you've written and you want to upgrade it to get some of the new features of JavaScript in the SDK, you have to rewrite your code to do that. And some instances, if you're using hexagonal architecture and you're doing all the right things, that's a really small thing to do. But most people aren't doing that.Corey: But let's face it, a lot of things grow organically.Ant: Yeah.Corey: And again, I can sit here and tell you how to build things appropriately and then I look at my own environment and… yeah, pay no attention to that burning dumpster fire behind the camera. And it's awful. You want to make sure that you're doing things the right way but it's hard to do and taking on additional toil because the provider decides the time to focus on this is a problem.Ant: But it's completely not a user-centric way of thinking. You know, they've got all their 14—is it 16 principles now? Did they add two principles, didn't they?Corey: They added two to get up to 16; one less than the numbers of ways to run containers in AWS.Ant: Yeah. They could barely contain themselves. [laugh]. It's just not customer-centric. They've moved themselves away from that customer-centric view of the world because the reality is, they are centered on the goals of the team, the goals of the GM, and the goals of that particular product.That famous drawing of all the different organizational charts, they got the Facebook chart, and the Google Chart, and the Amazon chart has all these little circles, everyone pointing guns at each other. And the more Amazon grows, the more you feel like that's reality. And it's hurting users, it's massively hurting users. And we feel the pain every day, absolutely every day, which is not great. And it's going to hurt Amazon in the long run, but short-term, they're not going to see that pain quarterly, they're not going to see that pain, probably within 12 months.But they will see the pain long run. And if they want to fix it, they probably should have started fixing it two years ago. But it's going to take years to fix because that's a massive cultural shift to say, “Okay, how do we get back to being more customer-focused? How do we stop that organizational targets and goals from getting in the way of delivering value to the customer?”Corey: It's a good question. The hard part is getting customers to understand enough of what you put out there to be able to disambiguate what you've built, and what parts to trust, what parts not the trust, what parts are going to be hard, et cetera, et cetera, et cetera, et cetera. The concern that I've got across the board here is, how do you learn? How do you get started with this? And the way that I came into this was I started off, in the early days of AWS, there were a dozen services, and okay, I could sort of stumble my way through it.And the UI was rough, but it got better with time. So, the answer for a lot of folks these days is training, which makes sense. In the beginning, we learned through things like podcasts. Like there was a company called Jupiter Broadcasting which did a bunch of Linux-oriented podcasts and learned how this stuff works. And then they were acquired by Linux Academy which really focused on training.And then A Cloud Guru acquired Linux Academy. And then Pluralsight acquired A Cloud Guru and is now in the process of itself being acquired by Vista Equity Partners. There's always a bigger fish eating something somewhere. It feels like a tremendous, tremendous consolidation in the training market. Given that you were one of the founders of A Cloud Guru, where do you stand on that?Ant: So, in terms of that actual transaction, I don't know the details because I'm a long time out of A Cloud Guru, but I've stayed within the whole training sphere, and so effectively, the bigger fish scenario, it's making the market smaller in terms of providers are there. You really don't have many providers doing cloud-specific training anymore. On one level you don't, but then another level, you've got lots of independent folks doing tons of stuff. So, you've got this explosion at the bottom end. If you go to Udemy—which is where A Cloud Guru started, on Udemy—you will see tons of folks offering courses at ten bucks a pop.And then there's what I'm doing now on homeschool.dev; there's serverless-focused training on there. But that's really focused on a really small niche. So, there's this explosion at the bottom end of lots of small people doing lots of things, and then you've got this consolidation at the top end, all the big providers buying each other, which leaves a massive gap in the middle.And on top of that, you've got AWS themselves, and all the other cloud providers, offering a lot of their own free training, whether it's on their own platforms—there's aws.training now, and Microsoft have similar as well—I think it's learn.microsoft.com is theirs. And you've got all these different providers doing their own training, so there's lots out there.There's actually probably more training for lower costs than ever before. The problem is, it's like the complexity of too many services, it's the 17 container problem. Which training do you use because the actual cost of the training is your time? It's not the cost of the course. Your time is always going to be more expensive.Corey: Yeah, the course is never going to be anywhere comparable to the time you spend on it. And I've never understood, frankly, why these large companies charge money for training on their own platform and also charge money for certifications because I don't care what you're going to pay for those things, once you know a platform well enough to hit a certification, you're going to use the thing you know, in most cases; it's a great bottom-up adoption story.Ant: Yeah, completely. That was actually one of Amazon's first early problems with their trainings, why A Cloud Guru even exists, and Linux Academy, and Cloud Academy all actually came into being is because Amazon hired a bunch of folks from VMware to set up their training program. And VMware's training, back in the day, was a profit center. So, you'd have a one-and-a-half thousand, two thousand dollar training course you'd go on for three to five days, and then you'd have a couple hundred dollars to do the certification. It was a profit center because VMware didn't really have that much competition. Zen and Microsoft's Hyper V were so late to the market, they basically own the market at the time. So—Corey: Oh, yeah. They still do in some corners.Ant: Yeah. They're still massively doing in this place as they still exist. And so they Amazon hired a bunch of ex-VMware folk, and they said, “We're just going to do what we did at VMware and do it at Amazon,” not realizing Amazon didn't own the market at the time, was still growing, and they tried to make it a profit center, which basically left a huge gap for folks who just did something at a reasonable price, which was basically everyone else. [laugh].This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of "Hello, World" demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking databases, observability, management, and security.And - let me be clear here - it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself all while gaining the networking load, balancing and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build.With Always Free you can do things like run small scale applications, or do proof of concept testing without spending a dime. You know that I always like to put asterisks next to the word free. This is actually free. No asterisk. Start now. Visit https://snark.cloud/oci-free that's https://snark.cloud/oci-free.Corey: The challenge I found with a few of these courses as well, is that they teach you the certification, and the certifications are, in some ways, crap when it comes to things you actually need to know to intelligently use a platform. So, many of them distill down not to the things you need to know, but to the things that are easy to test in a multiple-choice format. So, it devolves inherently into trivia such as, “Which is the right syntax for this thing?” Or, “Which one of these CloudFormations stanzas or functions isn't real?” Things like that where it's, no one in the real world needs to know any of those things.I don't know anyone these days—sensible—who can write CloudFormation from scratch without pulling up some reference somewhere because most people don't have that stuff in their head. And if you do, I'd suggest forgetting it so you can use that space to remember something that's more valuable. It doesn't make sense for how people interact with these things. But I do see the value as well in large companies trying to upskill thousands and thousands of people. You have 5000 people that are trying to come up to speed because you're migrating into cloud. How do you judge people's progress? Well, certifications are an easy answer.Ant: Yeah, massively. Probably the most successful blog post ever written—I don't think it's up anymore, but it was when I was at A Cloud Gurus—like, what's the value of a certification? And ultimately, it came down to, it's a way for companies that are hiring to filter people easily. That's it. That's really it. It's if you've got to hire ten people and you get 1000 CVs or resumes for those ten roles, first thing you do is you filter by who's certified for that role. And then you go through anything else. Does the certification mean you can actually do the job? Not really. There are hundreds of people who are not cer—thousands, millions of people who are not certified to do jobs that they do. But when you're getting hired and there's lots of people applying for the same role, it's literally the first thing they will filter on. And it's—so you want to get certified, it's hard to get through that filter. That's what the certification does, it's how you get through that first filter of whatever the talent tracking system they're using is. That's it. And how to get into the dev lounge at re:Invent.Corey: Oh yeah, that's my reason for getting a certification, originally. And again, for folks who learn effectively that way, I have no problem with people getting certifications. If you're trying to advance in your career, especially early stage, and you need a piece of paper that says you know what you're talking about, a certification is a decent approach. In time, with seniority, that gets replaced by a piece of paper, it's called your resume or your CV, but that is a longer-term more senior-focused approach. I don't begrudge people getting certifications and I don't think that they're foolish for doing it.But in time, it feels like the market for training is simultaneously contracting into only a few players left, and also, I'm curious as to whether or not the large companies out there are increasing their spend with the training providers or not. On the community side, the direct-to-consumer approach, that is exploding, but at the same time, you're then also dealing—forgive me, listeners—with the general public and there is nothing worse than a customer, from a customer service perspective, who was only paying a little money to you. I used to work in a web hosting company that $3,000 a month customers were great to work with. The $2999 a month customers were hell on earth who expected that they were entitled to 80 hours a month of systems engineering time. And you see something similar in the training space. It's always the small individual customers who are spending personal money instead of corporate money that are more difficult to serve. You've been in the space for a while. What do you see around that?Ant: Yeah, I definitely see that. So, the smaller customers, there's a correlation between the amount of money you spend and the amount of hand-holding that someone needs. The more money someone spends, the less hand-holding they need, generally. But the other side of it, what training businesses—particularly for subscription-based business—it's the same model as most gyms. You pay for it and you never use it.And it's not just subscription; like, Udemy is a perfect example of that, you know, people who have hundreds of Udemy courses they've never done, but they spend ten bucks on each. So, there's a lot of that at the lower end, which is why people offer courses at that level. So, there's people who actually do the course who are going to give you a lot of a headache, but then you're going to have a bunch of folk who never do the course and you're just taking their money. Which is also not great, either, but those folks don't feel bad because I only spent 10, 20 bucks on it. It's like, oh, it's their fault for not doing it, and you've made the money.So, that's kind of how a lot of the training works. So, the other problem with training as well is you get the quality is so variable at the bottom end. It's so, so variable. You really struggle to find—there's a lot of people just copying, like, you see instances where folks upload videos to Udemy that are literally they've downloaded someone's, video resized it, cut out a logo or something like that, and re-uploaded it and it's taken a few weeks for them to get caught. But they made money in the meantime.That's how blatant it does get to some level, but there are levels where people will copy someone else's content and just basically make it their own slides, own words, that kind of thing; that happens a lot. At the low end, it's a bit all over the place, but you still have quality, as well, at the low end, where you have these cheapest smaller courses. And how do you find that quality, as well? That's the other side of it. And also people will just trade in their name.That's the other problem you see. Someone has a name for doing X whatever, and they'll go out and bring a course on whatever that is. Doesn't mean they're a good teacher; it means they're good at building a brand.Corey: Oh, teaching is very much its own skill set.Ant: Oh, yeah.Corey: I learned to speak publicly by being a corporate trainer for Puppet and it teaches you an awful lot. But I had the benefit, in that case, of a team of people who spent their entire careers building curricula, so it wasn't just me throwing together some slides; I would teach a well-structured curriculum that was built by someone who knew exactly what they're doing. And yeah, I needed to understand failure modes, and how to get things to work when they weren't working properly, and how to explain it in different ways for folks who learn in different ways—and that is the skill of teaching right there—but curriculum development is usually not the same thing. And when you're bootstrapping, learning—I'm going to build my own training course, you have to do all of those things, and more. And it lends itself to, in many cases, what can come across as relatively low-quality offerings.Ant: Yeah, completely. And it's hard. But one thing you will often see is sometimes you'll see a course that's really high production quality, but actually, the content isn't great because folks have focused on making it look good. That's another common, common problem I see. If you're going to do training out there, just get referrals, get references, find people who've done it.Don't believe the references you see on a website; there's a good chance they might be fake or exaggerated. Put something out on Twitter, put out something on Reddit, whatever communities—and Slack or Discord, whatever groups you're in, ask questions. And folks will recommend. In the world of Google where you could search for anything, [laugh], the only way to really find out if something is any good is to find out if someone else has done it first and get their opinion on it.Corey: That's really the right answer. And frankly, I think that is sort of the network effect that makes a lot of software work for folks. Because you don't want to wind up being the first person on your provider trying to do a certain thing. The right answer is making sure that you are basically 8,000th person to try and do this thing so you can just Google it and there's a bunch of results and you can borrow code on GitHub—which is how we call ‘thought leadership' because plagiarism just doesn't work the same way—and effectively realizing this has been solved before. If you find a brand new cloud that has no customers, you are trailblazing every time you do anything with the platform. And that's personally never where I wanted to spend my innovation points.Ant: We did that at Cloud Guru. I think when we were—in 2015 and we had problems with Lambda and you go to Stack Overflow, and there was no Lambda tag on Stack Overflow, no serverless tag on Stack Overflow, but you asked a question and Tim Wagner would probably be the one answering. And he was the former head of product on Lambda. But it was painful, and in general you don't want to do it. Like [sigh] whenever AWS comes out with a new product, I've done it a few times, I'll go, “I think I might want to use this thing.”AWS Proton is a really good example. It's like, “Hey, this looks awesome. It looks better than CodeBuild and CodePipeline,” the headlines or what I thought it would be. I basically went while the keynote was on, I logged in to our console, had a look at it, and realized it was awful. And then I started tweeting about it as well and then got a lot of feedback [laugh] on my tweets on that.And in general, my attitude from whatever the new shiny thing is if I'm going to try it, it needs to work perfectly and it needs to live up to its billing on day one. Otherwise, I'm not going to touch it. And in general with AWS products now, you announce something, I'm not going to look at it for a year.Corey: And it's to their benefit that you don't look at it for a year because the answer is going to be, ah, if you're going to see that it's terrible, that's going to form your opinion and you won't go back later when it's actually decent and reevaluate your opinion because no one ever does. We're all busy.Ant: Yeah, exactly.Corey: And there's nothing wrong with doing that, but it is obnoxious they're not doing themselves favors here.Ant: Yeah, completely. And I think that's actually a failure of marketing and communication more than anything else. I don't blame the product teams too much there. Don't bill something as a finished glossy product when it's not. Pitch it at where it is.Say, “Hey, we are building”—like, I don't think at the re:Invent stage they should announce anything that's not GA and anything that it does not live up to the billing, the hype they're going to give it to. And they're getting more and more guilty of that the last few re:Invents, of announcing products that do not live up to the hype that they promote it at and that are not GA. Literally, they should just have a straight-up rule, they can announce products, but don't put it on the keynote stage if it's not GA. That's it.Corey: The whole re:Invent release is a whole separate series of arguments.Ant: [laugh]. Yeah, yeah.Corey: There are very few substantial releases throughout the year and then they drop a whole bunch of them at re:Invent, and it doesn't matter what you're talking about, whose problem it solves, how great it is, it gets drowned out in the flood. The only thing more foolish that I see than that is companies that are not AWS releasing things during re:Invent that are not on the re:Invent keynote stage, which in turn means that no one pays attention. The only thing you should be releasing is news about your data breach.Ant: [laugh]. Yeah. That's exactly it.Corey: What do I want to bury? Whenever Adam Selipsky gets on stage and starts talking, great, then it's time to push the button on the, “We regret to inform you,k” dance.Ant: Yeah, exactly. Microsoft will announce yet another print spooler bug malware.Corey: Ugh, don't get me started on that. Thank you so much for taking the time to speak with me today. If people want to hear more about your thoughts and how you view these nonsenses, and of course to send angry emails because they are serverless fans, where can they find you?Ant: Twitter is probably the easiest place to find me, @iamstan—Corey: It is a place for outrage. Yes. Your Twitter user account is?Ant: [laugh], my Twitter user account's all over the place. It's probably about 20% serverless. So, yeah @iamstan. Tweet me; I will probably respond to you… unless you're rude, then I probably won't. If you're rude about something else, I probably will. But if you're rude about me, I won't.And I expect a few DMs from Amazon after this. I'm waiting for you, [unintelligible 00:32:02], as I always do. So yeah, that's probably the easiest place to get hold of me. I check my email once a month. And I'm actually not joking about that; I really do check my email once a month.Corey: Yeah, people really need me then they'll find me. Thank you so much for taking the time to speak with me. I appreciate it.Ant: Yes, Corey. Thank you.Corey: Ant Stanley, AWS Serverless Hero, and oh so much more. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment defending serverless's good name just as soon as you string together the 85 components necessary to submit that comment.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Once again Arjen, Jean-Manuel, and Guy discuss the latest and greatest announcements from AWS in this roundup of the news of May. Also once again, this was recorded 2 months before it went up, but luckily it's all still relevant. Even the comments about being in lockdown. News Finally in Sydney
En este episodio te contamos qué es infraestructura como código, las herramientas más comunes que hay en AWS, sus beneficios y buenas prácticas. También te contamos como podes empezar a aplicar infraestructura como código en tus proyectos o si sos un usuario nuevo de la nube.En este es el episodio #2.08 del Podcast de Charlas Técnicas de AWS.00:00 - Introducción06:00 - Qué es infraestructura como código y porqué usarla?18:20 - AWS CloudFormation21:25 - AWS CDK25:25 - AWS SAM27:11 - AWS SDK y AWS CLI28:45 - Terraform33:05 - Beneficios de la infraestructura como código45:35 - Buenas practicas54:54 - Cómo empezar?01:04:30 - Empezar a en la nube con infraestructura como código
最新情報を "ながら" でキャッチアップ! ラジオ感覚放送 「毎日AWS」 おはようございます、サーバーワークスの加藤です。 今日は 12/15 に出たアップデートをピックアップしてご紹介。 感想は Twitter にて「#サバワ」をつけて投稿してください! ■ UPDATE PICKUP AWS CloudShell を発表 Amazon Managed Service for Grafana をプレビューで発表 Amazon Managed Service for Prometheus をプレビューで発表 東京で最初の AWS Wavelength を発表 AWS Lambda がストリームデータの処理で失敗したデータのみリトライできるように AWS SDK for JavaScript v3 が一般利用可能に AWS Single Sign-On が Microsoft Active Directory とのグループの同期をサポート ■ re:Invent 開催中 公式ページから登録して今すぐ参加しましょう! re:Invent に関するサーバーワークスの情報発信はこちら ■ サーバーワークスSNS Twitter / Facebook ■ サーバーワークスブログ サーバーワークスエンジニアブログ
In another slightly delayed episode Arjen, JM, and Guy talk about all the many things that were announced in October. But before that, they will first discuss exactly how badly Lex understands "a fair shake of the sauce bottle". Talk to us in our Slack or on Twitter! The News Finally in Sydney Amazon Connect supports Amazon Lex bots using the Australian English dialect Amazon EC2 G4dn Bare Metal Instances with NVIDIA T4 Tensor Core GPUs, now available in 15 additional regions AWS IoT SiteWise is now available in Asia Pacific (Singapore) and Asia Pacific (Sydney) AWS regions Amazon Relational Database Service (RDS) Snapshot Export to S3 available in additional regions Serverless Introducing AWS Lambda Extensions – In preview | AWS Compute Blog Announcing Amazon CloudWatch Lambda Insights (preview) New – Use AWS PrivateLink to Access AWS Lambda Over Private AWS Network | AWS News Blog Amazon EventBridge announces support for Dead Letter Queues AWS Step Functions now supports Amazon Athena service integration Amazon API Gateway now supports disabling the default REST API endpoint Containers Amazon EKS now supports Kubernetes version 1.18 Amazon EKS now supports the Los Angeles AWS Local Zones Amazon EKS now supports configurable Kubernetes service IP address range Amazon ECS extensions for AWS Cloud Development Kit now available as a Developer Preview AWS Elastic Beanstalk Adds Support for Running Multi-Container Applications on AL2 based Docker Platform Fluent Bit supports Amazon S3 as a destination to route container logs AWS App Mesh supports cross account sharing of ACM Private Certificate Authority Introducing the AWS Load Balancer Controller AWS Copilot CLI launches v0.5 to let users deploy scheduled jobs and more EC2 & VPC AWS Nitro Enclaves – Isolated EC2 Environments to Process Confidential Data | AWS News Blog Announcing SSL/TLS certificates for Amazon EC2 instances with AWS Certificate Manager (ACM) for Nitro Enclaves New – Application Load Balancer Support for End-to-End HTTP/2 and gRPC | AWS News Blog AWS Compute Optimizer enhances EC2 instance type recommendations with Amazon EBS metrics AWS Cloud Map simplifies service discovery with optional parameters AWS Global Accelerator launches port overrides AWS IoT SiteWise launches support for VPC private links AWS Site-to-Site VPN now supports health notifications Dev & Ops AWS CloudFormation now supports increased limits on five service quotas AWS CloudFormation Guard – an open-source CLI for infrastructure compliance – is now generally available AWS CloudFormation Drift Detection now supports CloudFormation Registry resource types Amazon CloudWatch Synthetics now supports prebuilt canary monitoring dashboard Amazon CloudWatch Synthetics launches Recorder to generate user flow scripts for canaries AWS and Grafana Labs launch AWS X-Ray data source plugin Now author AWS Systems Manager Automation runbooks using Visual Studio Code AWS Systems Manager now supports free-text search of runbooks AWS Systems Manager now allows filtering automation executions by applications or environments Now use AWS Systems Manager to view vulnerability identifiers for missing patches on your Linux instances Port forwarding sessions created using Session Manager now support multiple simultaneous connections Now customize your Session Manager shell environment with configurable shell profiles AWS End of Support Migration Program for Windows Server now available as a self-serve solution for customers EC2 Image Builder now supports AMI distribution across AWS accounts Announcing general availability of waiters in the AWS SDK for Java 2.x Porting Assistant for .NET is now open source Amazon Corretto 8u272, 11.0.9, 15.0.1 quarterly updates are now available Security AWS Config adds 15 new sample conformance pack templates and introduces simplified setup experience for conformance packs AWS IAM Access Analyzer now supports archive rules for existing findings AWS AppSync adds support for AWS WAF AWS Shield now provides global and per-account event summaries to all AWS customers Amazon CloudWatch Logs now supports two subscription filters per log group Amazon S3 Object Ownership is available to enable bucket owners to automatically assume ownership of objects uploaded to their buckets Protect Your AWS Compute Optimizer Recommendation Data with customer master keys (CMKs) Stored in AWS Key Management Service Manage access to AWS centrally for Ping Identity users with AWS Single Sign-On Amazon Elasticsearch Service adds native SAML Authentication for Kibana Amazon Inspector has expanded operating system support for Red Hat Enterprise Linux (RHEL) 8, Ubuntu 20.04 LTS, Debian 10, and Windows Server 2019 Data Storage & Processing New – Amazon RDS on Graviton2 Processors | AWS News Blog Amazon ElastiCache now supports M6g and R6g Graviton2-based instances Easily restore an Amazon RDS for MySQL database from your MySQL 8.0 backup Amazon RDS for PostgreSQL supports concurrent major version upgrades of read replicas Amazon Aurora enables dynamic resizing for database storage space AWS Lake Formation now supports Active Directory and SAML providers for Amazon Athena AWS Lake Formation now supports cross account database sharing Now generally available – design and visualize Amazon Keyspaces data models more easily by using NoSQL Workbench You now can manage access to Amazon Keyspaces by using temporary security credentials for the Python, Go, and Node.js Cassandra drivers Amazon ElastiCache on Outposts is now available Amazon EMR now supports placing your EMR master nodes in distinct racks to reduce risk of simultaneous failure Amazon EMR integration with AWS Lake Formation is now generally available Amazon EMR now provides up to 35% lower cost and up to 15% improved performance for Spark workloads on Graviton2-based instances AWS Glue Streaming ETL jobs support schema detection and evolution AWS Glue supports reading from self-managed Apache Kafka AWS Glue crawlers now support Amazon DocumentDB (with MongoDB compatibility) and MongoDB collections Amazon Kinesis Data Analytics now supports Force Stop and a new Autoscaling status Kinesis Client Library now enables multi-stream processing Announcing cross-database queries for Amazon Redshift (preview) Amazon Redshift announces support for Lambda UDFs and enables tokenization New Amazon Neptune engine release now enforces a minimum version of TLS 1.2 and SSL client connections AWS Database Migration Service now supports Amazon DocumentDB (with MongoDB compatibility) as a source AI & ML Amazon SageMaker Autopilot now Creates Machine Learning Models up to 40% Faster with up to 200% Higher Accuracy Now launch Amazon SageMaker Studio in your Amazon Virtual Private Cloud (VPC) Amazon SageMaker Price Reductions – Up to 18% for ml.P3 and ml.P2 instances Amazon SageMaker Studio Notebooks now support custom images Amazon Rekognition adds support for six new content moderation categories Amazon Rekognition now detects Personal Protective Equipment (PPE) such as face covers, head covers, and hand covers on persons in images Amazon Transcribe announces support for AWS PrivateLink for Batch APIs Amazon Kendra now supports custom data sources Amazon Kendra adds Confluence Server connector Amazon Textract announces improvements to reduce average API processing times by up to 20% Other Cool Stuff AWS DeepRacer announces new Community Races updates Amazon WorkSpaces introduces sharing images across accounts AWS Batch now supports Custom Logging Configurations, Swap Space, and Shared Memory Amazon Connect supports Amazon Lex bots using the British English dialect Amazon Connect chat now provides automation and personalization capabilities with whisper flows CloudWatch Application Insights offers new, improved user interface CloudWatch Application Insights adds EBS volume and API Gateway metrics Announcing AWS Budgets price reduction Announcing AWS Budgets Actions Resource Access Manager Support is now available on AWS Outposts Announcing Amazon CloudFront Origin Shield Announcing AWS Distro for OpenTelemetry in Preview Introducing Amazon SNS FIFO – First-In-First-Out Pub/Sub Messaging | AWS News Blog Amazon SNS now supports selecting the origination number when sending SMS messages Amazon SES now offers list and subscription management capabilities Nano candidates Amazon WorkDocs now supports Dark Mode on iOS Amazon Corretto 8u272, 11.0.9, 15.0.1 quarterly updates are now available AWS OpsWorks for Configuration Management now supports new version of Chef Automate Sponsors Gold Sponsor Innablr Silver Sponsors AC3 CMD Solutions DoIT International
最新情報を "ながら" でキャッチアップ! ラジオ感覚放送 「毎日AWS!」 おはようございます、サーバーワークスの加藤です。 今日は 10/1 に出たアップデート13件をご紹介。 感想は Twitter にて「#サバワ」をつけて投稿してください! ■ UPDATE ラインナップ AWS CloudFormation Guard が一般利用可能に AWS CloudFormation のドリフト検出機能が CloudFormation レジストリリソースタイプをサポート AWS CDK の Amazon ECS 拡張機能が開発者プレビューで発表 Amazon SageMaker Autopilot のモデル作成性能が大幅に向上 Amazon Rekognition カスタムラベルが単一の推論APIで複数のオブジェクトやシーンの検出をできるように AWS AppSync が AWS WAF をサポート Amazon WorkSpaces が AWS アカウント間でのイメージ共有を発表 Amazon EC2 のスポットインスタンスにvCPUベースの制限を開始 AWS DeepRacer が新しいコミュニティレースのアップデートを発表 AWS Marketplace が製品を提供する国を制御する Geo-Fencing 機能を提供 VSCode を使って AWS Systems Manager の Automation ランブックを作成できるように AWS SDK for Java 2.x で waiters 機能が利用可能に AWS 認定試験のオンライン受験をより多くの人が受けられるように ■ サーバーワークスSNS Twitter / Facebook ■ サーバーワークスブログ サーバーワークスエンジニアブログ
最新情報を "ながら" でキャッチアップ! ラジオ感覚放送 「毎日AWS!」 おはようございます、サーバーワークスの加藤です。 今日は 9/30 に出たアップデート12件をご紹介。 感想は Twitter にて「#サバワ」をつけて投稿してください! ■ UPDATE ラインナップ Amazon Timestream が一般利用可能に Amazon SageMaker Processing がビッグデータ処理のためのビルトインSparkコンテナをサポート Amazon Pinpoint がイベントトリガージャーニーを発表 AWS CodePipeline がソースアクションに git clone をサポート AWS CodePipeline が GitHub Enterprise Server に対応 AWS Client VPN がクライアント間接続をサポート Amazon QLDB がインデックスの改善を発表 AWS SDK for Java 2.x に AWS CRT HTTPクライアントがプレビューリリース Amazon MSK がクラスタストレージを自動で拡張できるように 新しいソリューションが発表 - AWS WAF と VPC セキュリティグループの管理中央化 AWS Elemental MediaConnect で予約済みのアウトバウンド帯域幅を利用可能に Amazon S3 on Outposts が一般利用可能に ■ サーバーワークスSNS Twitter / Facebook ■ サーバーワークスブログ サーバーワークスエンジニアブログ
最新情報を "ながら" でキャッチアップ! ラジオ感覚放送 「毎日AWS!」 おはようございます、サーバーワークスの加藤です。 今日は 9/9 に出たアップデート8件をご紹介。 感想は Twitter にて「#サバワ」をつけて投稿してください! ■ UPDATE ラインナップ Amazon CloudWatch がコンテナ環境のPrometheus メトリクスを監視できるように AWS CloudFormation を用いて EKS の Fargate Profiles が作成・管理できるように Amazon EKS が pods への EC2 セキュリティグループのアサインをサポート API Gateway HTTP API が Lambda と IAM 認証をサポート AWS Service Catalog が製品名を利用した検索に対応 Systems Manager を利用し、たった2回のクリックでパッチを実行可能に Amazon MSK が version 2.4.1.1 をリリース AWS SDK for .NET v3.5 をリリース ■ サーバーワークスSNS Twitter / Facebook ■ サーバーワークスブログ サーバーワークスエンジニアブログ
最新情報を "ながら" でキャッチアップ! ラジオ感覚放送 「毎日AWS!」 おはようございます、サーバーワークスの加藤です。 今日は 8/24 に出たアップデート7件をご紹介。 感想は Twitter にて「#サバワ」をつけて投稿してください! ■ UPDATE ラインナップ Amazon EKS が EC2インスタンスメタデータサービス v2 をサポート Amazon EC2 インスタンスメタデータサービスが自動化と操作性向上のため追加のフィールドをサポート AWS Systems Manager Distributor に 3rd パーティ製品であるDynatrace ONEのエージェントが追加 AWS Transit Gateway が VPC プリフィックスリストに対応 AWS SDK for .NET v3.5 が一般利用可能に AWS Transfer Family がユーザー名に Email アドレスを利用できるように AWS Database Migration Service が MongoDB 4.0 をソースとしてサポート ■IMDSv2の説明はこちら InstanceMetaDataV2を分かりやすく解説してみる ■ サーバーワークスSNS Twitter / Facebook ■ サーバーワークスブログ サーバーワークスエンジニアブログ
Not real bugs, software bugs with AWS SDK. And other humanity topics
最新情報を "ながら" でキャッチアップ! ラジオ感覚放送 「毎日AWS!」 おはようございます、サーバーワークスの加藤です。 今日は 6/30 に出た 11件のアップデートをご紹介。 感想は Twitter にて「#サバワ」をつけて投稿してください! ■ UPDATE ラインナップ Amazon RDS Proxy が一般利用可能に AWS CodeDeploy エージェントの自動インストールとスケジュールアップデートが可能に Amazon CloudWatch でAWS CodeBuildのリソース使用率メトリクスを サポート Amazon EFS がファイルシステムの最小スループットを向上 Amazon QuickSight が Lake Formation で保護された Athena データソースをサポート開始 Amazon Connect でエージェントの通話切断後にフローを追加できるように AWS SDK for C++ Version 1.8 が一般利用可能に Amazon QuickSight がヒストグラム機能、クロスリージョンAPI を提供開始 Amazon DocumentDB がt3.medium をサポート Amazon Chime SDK がモバイルブラウザからの音声通話・ビデオ通話をサポート Amazon Lex がアジア東京リージョンで利用可能に AWS Systems Manager のパッチマネージャー機能がLinuxプラットフォームの新しいバージョンをサポート ■ サーバーワークスSNS Twitter / Facebook ■ サーバーワークスブログ サーバーワークスエンジニアブログ
THE NEWS FROM REDMOND Working with GitHub Issues in Visual Studio Code Visual Studio 2019 Preview Release Notes Visual Studio 2019 version 16.5 Release Notes Releasing Today! Visual Studio 2019 v16.6 & v16.7 Preview 1 Windows Package Manager Preview Introducing .NET Multi-platform App UI Blazor WebAssembly 3.2.0 now available Windows Terminal 1.0 Introducing WinUI 3 Preview 1 The Windows Subsystem for Linux BUILD 2020 Summary .NET Conf 2020 Welcome to C# 9.0 F# 5 and F# tools update Live Share, now with chat and audio support! Announcing .NET 5 Preview 4 and our journey to one .NET ASP.NET Core updates in .NET 5 Preview 4 Windows Forms Designer for .NET Core Released AROUND THE WORLD Rider 2020.1.3 and ReSharper Ultimate 2020.1.3 Bugfixes Are Here! Rider 2020.1.2 and ReSharper Ultimate 2020.1.2 Bugfixes Are Available! TeamCity 2020.1 RC is out Announcing end of support for .NET Standard 1.3 in AWS SDK for .NET Why model binding to JObject from a request doesn’t work anymore in ASP.NET Core 3.1 and what’s the alternative? Announcing Uno Platform 3.0 – Support for WinUI 3.0 Preview 1 Announcing Uno Platform 2.4 – macOS support and Windows Calculator on macOS PROJECTS OF THE WEEK CSLA CSLA .NET is a software development framework that helps you build a reusable, maintainable object-oriented business layer for your app. This framework reduces the cost of building and maintaining applications. Also, be sure and check out the Project of the Week archives! SHOUT-OUTS / PLUGS .NET Bytes on Twitter Matt Groves is: Tweeting on Twitter Live Streaming on Twitch Calvin Allen is: Tweeting on Twitter Live Streaming on Twitch
THE NEWS FROM REDMOND Announcing Experimental Mobile Blazor Bindings February update .NET Interactive is here! .NET Notebooks Preview 2 .NET Framework February 2020 Security and Quality Rollup Making our Unity Analyzers Open-Source Introducing Scalar: Git at scale for everyone Windows Terminal Preview v0.9 Release AndroidX NuGet Packages are Stable! VS Code January 2020 (version 1.42) Accessibility Improvements in Visual Studio 2019 for Mac Using .NET for Apache Spark to Analyze Log Data Decompilation of C# code made easy with Visual Studio February 2020 release of Azure Data Studio is now available GitHub Enterprise is now free through Microsoft for Startups AROUND THE WORLD Rider 2019.3.2 is Available! ReSharper Ultimate 2019.3.2 is Out! AWS SDK for .NET v3.5 Preview JetBrains .NET Day Online 2020 – Call for Speakers Announcing PostSharp 6.5 RC Rider 2020.1 Roadmap PROJECTS OF THE WEEK NetLearner - Shahed Chowdhuri NetLearner is an ASP .NET Core web app to allow any user to consolidate multiple learning resources all under one umbrella. The codebase itself is a way for new/existing .NET developers to learn ASP .NET Core, while a deployed instance of NetLearner can be used as a curated link-sharing web application. Also, be sure and check out the Project of the Week archives! SHOUT-OUTS / PLUGS .NET Bytes on Twitter Matt Groves is: Tweeting on Twitter Live Streaming on Twitch Calvin Allen is: Tweeting on Twitter Live Streaming on Twitch
Добрый день уважаемые слушатели. Представляем новый выпуск подкаста RWpod. В этом выпуске: Ruby Rails 5.2.3 has been released, Rails 6 shows routes in expanded format, Announcing Amazon Transcribe streaming transcription support in the AWS SDK for Ruby и Bye Bye (Ruby Powered) Sass How we Built a Highly Performant App with Ruby on Rails and Phoenix, Why I stuck with Windows for 6 years while developing Discourse и The status of Ruby memory trimming & how you can help with testing Web Announcing TypeScript 3.4, Cube.js, the Open Source Dashboard Framework: Ultimate Guide и How I ruined my JavaScript code and still won the Coding Challenge PreVue - all in One Prototyping Tool For Vue Developers, Eslint-plugin-unicorn - various awesome ESLint rules и JavaScript Chord Charts
Now it’s possible to write Lambda functions as idiomatic Ruby code, and run them on AWS. Joining Brittany is Alex Wood, the software engineer working on the AWS SDK for Ruby and author of the AWS Lambda Ruby runtime.
Now it’s possible to write Lambda functions as idiomatic Ruby code, and run them on AWS. Joining Brittany is Alex Wood, the software engineer working on the AWS SDK for Ruby and author of the AWS Lambda Ruby runtime.
The AWS SDK for Java 2.0 includes a number of new features and performance improvements. Using real code examples, we'll build a serverless application that makes use of the SDKs new HTTP2-based event-streaming APIs and deploy it using AWS Java tooling introduced in 2018. You'll learn what's new in 2.0 and the benefits of upgrading, as well as how to take advantage of new tooling in AWS's already rich Java ecosystem.
In this session we introduce AWS Cloud Map, a new service that lets you build the map of your cloud. It allows you to define friendly names for any resource such as S3 buckets, DynamoDB tables, SQS queues, or custom cloud services built on EC2, ECS, EKS, or Lambda. Your applications can then discover resource location, credentials, and metadata by friendly name using the AWS SDK and authenticated API queries. You can further filter resources discovered by custom attributes, such as deployment stage or version. AWS Cloud Map is a highly available service with rapid configuration change propagation.
Do you like to develop in Ruby? There are a raft of awesome features in the AWS SDK for Ruby that you might not be aware of! Simon is joined by Alex Wood to discuss all the goodies customers have access to. Shownotes: Getting Started Workshop: https://aws.amazon.com/blogs/developer/railsconf-2015-recap/ Alex on Twitter: @alexwwood AWS SDK for Ruby on GitHub: https://github.com/aws/aws-sdk-ruby AWS SDK Ruby Record: https://github.com/aws/aws-sdk-ruby-record AWS SDK Ruby Rails: https://github.com/aws/aw...
The AWS SDK for Java (Version 1.x) has been connecting JVM based applications to AWS services since 2010. However, the JVM eco-system has changed a lot in the last 7 years. Based on a lot of customer feedback, we recently launched a developer preview of Version 2.0 of the AWS SDK for Java, which has been completely re-written from the core HTTP layer to the service clients. In this session, we'll get under the covers of the code-base to see how we've been able to get over 100,000 TPS from a single client instance during initial testing. We'll also go over some of the many new features and highlight some of the major differences with 1.x including: pluggable HTTP, non-blocking I/O, enhanced pagination, immutability and more.
In this session, we first look at common approaches to refactoring common legacy .NET applications to microservices and AWS serverless architectures. We also look at modern approaches to .NET-based architectures on AWS. We then elaborate on running .NET Core microservices in Docker containers natively on Linux in AWS while examining the use of AWS SDK and .NET Core platform. We also look at the use of the various AWS services such as Amazon SNS, Amazon SQS, Amazon Kinesis, and Amazon DynamoDB, which provide the backbone of the platform. For example, Experian Consumer Services runs a large ecommerce platform that is now cloud based in the AWS. We look at how they went from monolithic platform to microservices, primarily in .NET Core. With a heavy push to move to Java and open source, we look at the development process, which started in the beta days of .NET Core, and how the direction Microsoft was going allowed them to use existing C# skills while pushing themselves to innovate in AWS. The large, single team of Windows based developers was broken down into several small teams to allow for rapid development into an all Linux environment.
Merhaba,Bu hafta, geçen haftaki haberleri yayınlamak zorunda kalıyoruz. Gecikme için lütfen kusura bakmayın. Anlayışınız için teşekkürler.[ Dinle ]Multiple vulnerabilities in RubyGemsRubyGems blog postGraphQL in RailsNews in Ruby on RailsWebpacker 3.0AWS Sdk 3.0 for RubyGithub RepoRails and Third PartiesGCP for Private RubyGemsZen Rails Base AppFacade Design Pattern in RailsValidate TCKN RubyGemsTolga GezginişTorrent searching RubyGems CLIMurat Baştaş
RR 314 DynamoDB on Rails with Chandan Jhunjhunwal Today's Ruby Rogues podcast features DynamoDB on Rails with Chandan Jhunjhunwal. DynamoDB is a NoSQL database that helps your team solve managing infrastructure issues like setup, costing and maintenance. Take some time to listen and know more about DynamoDB! [00:02:18] – Introduction to Chandan Jhunjhunwal Chanchan Jhunjhunwal is an owner of Faodail Technology, which is currently helping many startups for their web and mobile applications. They started from IBM, designing and building scalable mobile and web applications. He mainly worked on C++ and DB2 and later on, worked primarily on Ruby on Rails. Questions for Chandan [00:04:05] – Introduction to DynamoDB on Rails I would say that majority of developers work in PostgreSQL, MySQL or other relational database. On the other hand, Ruby on Rails is picked up by many startup or founder for actually implementing their ideas and bringing them to scalable products. I would say that more than 80% of developers are mostly working on RDBMS databases. For the remaining 20%, their applications need to capture large amounts of data so they go with NoSQL. In NoSQL, there are plenty of options like MongoDB, Cassandra, or DynamoDB. When using AWS, there’s no provided MongoDB. With Cassandra, it requires a lot of infrastructure setup and costing, and you’ll have to have a team which is kind of maintaining it on a day to day basis. So DynamoDB takes all those pain out of your team and you no longer have to focus on managing the infrastructure. [00:07:35] – Is it a good idea to start with a regular SQL database and then, switch to NoSQL database or is it better to start with NoSQL database from day one? It depends on a couple of factors. For many of the applications, they start with RDBMS because they just want to get some access, and probably switch to something like NoSQL. First, you have to watch the incoming data and their capacity. Second is familiarity because most of the developers are more familiar with RDBMS and SQL queries. For example, you have a feed application, or a messaging application, where you know that there will be a lot of chat happening and you’d expect that you’re going to take a huge number of users. You can accommodate that in RDBMS but I would probably not recommend that. [00:09:30] Can I use DynamoDB as a caching mechanism or cache store? I would not say replacement, exactly. On those segments where I could see that there’s a lot of activity happening, I plugged in DynamoDB. The remaining part of the application was handled by RDBMS. In many applications, what I’ve seen is that they have used a combination of them. [00:13:05] How do you decide if you actually want to use DynamoDB for all the data in your system? The place where we say that this application is going to be picked from day one is where the number of data which will be coming will increase. It also depends on the development team that you have if they’re familiar with DynamoDB, or any other NoSQL databases. [00:14:50] Is DynamoDB has document store or do you have of columns? You can say key value pairs or document stores. The terminologies are just different and the way you design the database. In DynamoDB, you have something like hash key and range key. [00:22:10] – Why don’t we store images in the database? I would say that there are better places to store the, which is faster and cheaper. There are better storage like CDN or S3. Another good reason is that if you want to fetch a proper size of image based on the user devices screen, resizing and all of the stuff inside the database could be cumbersome. You’ll repeat adding different columns where we’ll be storing those different sizes of images. [00:24:40] – Is there a potentially good reason for NoSQL database as your default go-to data store? If you have some data, which is complete unstructured, if you try to store back in RDBMS, it will be a pain. If we talk about the kind of media which gets generated in our day to day life, if you try to model them in a relational database, it will be pretty painful and eventually, there will be a time when you don’t know how to create correlations. [00:28:30] – Horizontally scalable versus vertically scalable In vertically scalable, when someone posts, we keep adding that at the same table. As we add data to the table, the database size increases (number of rows increases). But in horizontally scalable, we keep different boxes connected via Hadoop or Elastic MapReduce which will process the added data. [00:30:20] – What does it take to hook up a DynamoDB instance to a Rails app? We could integrate DynamoDB by using the SDK provided by AWS. I provided steps which I’ve outlined in the blog - how to create different kinds of tables, how to create those indexes, how to create the throughput, etc. We could configure AWS SDK, add the required credential, then we could create different kinds of tables. [00:33:00] – In terms of scaling, what is the limit for something like PostgreSQL or MySQL, versus DynamoDB? There’s no scalability limit in DynamoDB, or any other NoSQL solutions. Picks David Kimura CorgUI Jason Swett Database Design for Mere Mortals Charles Maxwood VMWare Workstation GoCD Ruby Rogues Parley Ruby Dev Summit Chandan Jhunjhunwal Twitter @ChandanJ chandan@faodailtechnology.com
RR 314 DynamoDB on Rails with Chandan Jhunjhunwal Today's Ruby Rogues podcast features DynamoDB on Rails with Chandan Jhunjhunwal. DynamoDB is a NoSQL database that helps your team solve managing infrastructure issues like setup, costing and maintenance. Take some time to listen and know more about DynamoDB! [00:02:18] – Introduction to Chandan Jhunjhunwal Chanchan Jhunjhunwal is an owner of Faodail Technology, which is currently helping many startups for their web and mobile applications. They started from IBM, designing and building scalable mobile and web applications. He mainly worked on C++ and DB2 and later on, worked primarily on Ruby on Rails. Questions for Chandan [00:04:05] – Introduction to DynamoDB on Rails I would say that majority of developers work in PostgreSQL, MySQL or other relational database. On the other hand, Ruby on Rails is picked up by many startup or founder for actually implementing their ideas and bringing them to scalable products. I would say that more than 80% of developers are mostly working on RDBMS databases. For the remaining 20%, their applications need to capture large amounts of data so they go with NoSQL. In NoSQL, there are plenty of options like MongoDB, Cassandra, or DynamoDB. When using AWS, there’s no provided MongoDB. With Cassandra, it requires a lot of infrastructure setup and costing, and you’ll have to have a team which is kind of maintaining it on a day to day basis. So DynamoDB takes all those pain out of your team and you no longer have to focus on managing the infrastructure. [00:07:35] – Is it a good idea to start with a regular SQL database and then, switch to NoSQL database or is it better to start with NoSQL database from day one? It depends on a couple of factors. For many of the applications, they start with RDBMS because they just want to get some access, and probably switch to something like NoSQL. First, you have to watch the incoming data and their capacity. Second is familiarity because most of the developers are more familiar with RDBMS and SQL queries. For example, you have a feed application, or a messaging application, where you know that there will be a lot of chat happening and you’d expect that you’re going to take a huge number of users. You can accommodate that in RDBMS but I would probably not recommend that. [00:09:30] Can I use DynamoDB as a caching mechanism or cache store? I would not say replacement, exactly. On those segments where I could see that there’s a lot of activity happening, I plugged in DynamoDB. The remaining part of the application was handled by RDBMS. In many applications, what I’ve seen is that they have used a combination of them. [00:13:05] How do you decide if you actually want to use DynamoDB for all the data in your system? The place where we say that this application is going to be picked from day one is where the number of data which will be coming will increase. It also depends on the development team that you have if they’re familiar with DynamoDB, or any other NoSQL databases. [00:14:50] Is DynamoDB has document store or do you have of columns? You can say key value pairs or document stores. The terminologies are just different and the way you design the database. In DynamoDB, you have something like hash key and range key. [00:22:10] – Why don’t we store images in the database? I would say that there are better places to store the, which is faster and cheaper. There are better storage like CDN or S3. Another good reason is that if you want to fetch a proper size of image based on the user devices screen, resizing and all of the stuff inside the database could be cumbersome. You’ll repeat adding different columns where we’ll be storing those different sizes of images. [00:24:40] – Is there a potentially good reason for NoSQL database as your default go-to data store? If you have some data, which is complete unstructured, if you try to store back in RDBMS, it will be a pain. If we talk about the kind of media which gets generated in our day to day life, if you try to model them in a relational database, it will be pretty painful and eventually, there will be a time when you don’t know how to create correlations. [00:28:30] – Horizontally scalable versus vertically scalable In vertically scalable, when someone posts, we keep adding that at the same table. As we add data to the table, the database size increases (number of rows increases). But in horizontally scalable, we keep different boxes connected via Hadoop or Elastic MapReduce which will process the added data. [00:30:20] – What does it take to hook up a DynamoDB instance to a Rails app? We could integrate DynamoDB by using the SDK provided by AWS. I provided steps which I’ve outlined in the blog - how to create different kinds of tables, how to create those indexes, how to create the throughput, etc. We could configure AWS SDK, add the required credential, then we could create different kinds of tables. [00:33:00] – In terms of scaling, what is the limit for something like PostgreSQL or MySQL, versus DynamoDB? There’s no scalability limit in DynamoDB, or any other NoSQL solutions. Picks David Kimura CorgUI Jason Swett Database Design for Mere Mortals Charles Maxwood VMWare Workstation GoCD Ruby Rogues Parley Ruby Dev Summit Chandan Jhunjhunwal Twitter @ChandanJ chandan@faodailtechnology.com
RR 314 DynamoDB on Rails with Chandan Jhunjhunwal Today's Ruby Rogues podcast features DynamoDB on Rails with Chandan Jhunjhunwal. DynamoDB is a NoSQL database that helps your team solve managing infrastructure issues like setup, costing and maintenance. Take some time to listen and know more about DynamoDB! [00:02:18] – Introduction to Chandan Jhunjhunwal Chanchan Jhunjhunwal is an owner of Faodail Technology, which is currently helping many startups for their web and mobile applications. They started from IBM, designing and building scalable mobile and web applications. He mainly worked on C++ and DB2 and later on, worked primarily on Ruby on Rails. Questions for Chandan [00:04:05] – Introduction to DynamoDB on Rails I would say that majority of developers work in PostgreSQL, MySQL or other relational database. On the other hand, Ruby on Rails is picked up by many startup or founder for actually implementing their ideas and bringing them to scalable products. I would say that more than 80% of developers are mostly working on RDBMS databases. For the remaining 20%, their applications need to capture large amounts of data so they go with NoSQL. In NoSQL, there are plenty of options like MongoDB, Cassandra, or DynamoDB. When using AWS, there’s no provided MongoDB. With Cassandra, it requires a lot of infrastructure setup and costing, and you’ll have to have a team which is kind of maintaining it on a day to day basis. So DynamoDB takes all those pain out of your team and you no longer have to focus on managing the infrastructure. [00:07:35] – Is it a good idea to start with a regular SQL database and then, switch to NoSQL database or is it better to start with NoSQL database from day one? It depends on a couple of factors. For many of the applications, they start with RDBMS because they just want to get some access, and probably switch to something like NoSQL. First, you have to watch the incoming data and their capacity. Second is familiarity because most of the developers are more familiar with RDBMS and SQL queries. For example, you have a feed application, or a messaging application, where you know that there will be a lot of chat happening and you’d expect that you’re going to take a huge number of users. You can accommodate that in RDBMS but I would probably not recommend that. [00:09:30] Can I use DynamoDB as a caching mechanism or cache store? I would not say replacement, exactly. On those segments where I could see that there’s a lot of activity happening, I plugged in DynamoDB. The remaining part of the application was handled by RDBMS. In many applications, what I’ve seen is that they have used a combination of them. [00:13:05] How do you decide if you actually want to use DynamoDB for all the data in your system? The place where we say that this application is going to be picked from day one is where the number of data which will be coming will increase. It also depends on the development team that you have if they’re familiar with DynamoDB, or any other NoSQL databases. [00:14:50] Is DynamoDB has document store or do you have of columns? You can say key value pairs or document stores. The terminologies are just different and the way you design the database. In DynamoDB, you have something like hash key and range key. [00:22:10] – Why don’t we store images in the database? I would say that there are better places to store the, which is faster and cheaper. There are better storage like CDN or S3. Another good reason is that if you want to fetch a proper size of image based on the user devices screen, resizing and all of the stuff inside the database could be cumbersome. You’ll repeat adding different columns where we’ll be storing those different sizes of images. [00:24:40] – Is there a potentially good reason for NoSQL database as your default go-to data store? If you have some data, which is complete unstructured, if you try to store back in RDBMS, it will be a pain. If we talk about the kind of media which gets generated in our day to day life, if you try to model them in a relational database, it will be pretty painful and eventually, there will be a time when you don’t know how to create correlations. [00:28:30] – Horizontally scalable versus vertically scalable In vertically scalable, when someone posts, we keep adding that at the same table. As we add data to the table, the database size increases (number of rows increases). But in horizontally scalable, we keep different boxes connected via Hadoop or Elastic MapReduce which will process the added data. [00:30:20] – What does it take to hook up a DynamoDB instance to a Rails app? We could integrate DynamoDB by using the SDK provided by AWS. I provided steps which I’ve outlined in the blog - how to create different kinds of tables, how to create those indexes, how to create the throughput, etc. We could configure AWS SDK, add the required credential, then we could create different kinds of tables. [00:33:00] – In terms of scaling, what is the limit for something like PostgreSQL or MySQL, versus DynamoDB? There’s no scalability limit in DynamoDB, or any other NoSQL solutions. Picks David Kimura CorgUI Jason Swett Database Design for Mere Mortals Charles Maxwood VMWare Workstation GoCD Ruby Rogues Parley Ruby Dev Summit Chandan Jhunjhunwal Twitter @ChandanJ chandan@faodailtechnology.com
The PHP Bard himself, Jeremy Lindblom, joins us for much discussion of fun things, like how PHP is used inside Amazon, the upcoming Pacific Northwest PHP Conference (PNWPHP), and what it’s like to be a bard in the age of automation. Also Ed bought a new TV and wrote some music once. Do these things! Check out our sponsors: Roave and WonderNetwork Follow us on Twitter here. Rate us on iTunes here Listen Download now (MP3, 46.8MB, 1:05:09) Links and Notes Kickstarter for PNWPHP Jeremy on Twitter The PHP Bard on Twitter AWS SDK for PHP Amber monitors Ed’s music project Dead Agent Dead Agent – Retina EP